Analyzing ReAct: Synergizing Reasoning and Acting in Language Models Paper
Re-asoning and Act-ing Synergistically
Language models are getting better at reasoning. Or more aptly we are getting better at getting reasoning out of them. Through methods like Chain of Thought and Tree of Thought.
Language models are also gettin better at acting, which is performing tasks, whether in the real world or in the digital. Doing things like browsing and grabbing information from the web.
This paper tried to combine the 2.
Abstract
While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics.
In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information.
We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components.
Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces.
On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples.
Summary of Abstract
- It’s better than methods like imitation and reinforcement learning.
- It’s better at not hallucinating than baseline model.
Important Concepts
- Reasoning traces
- Task-specific actions
- Error propagation
- Task-solving trajectories
Benchmarks
- Question Answering (HotPotQA)
- Fact Verification (Fever)
- Interactive Decision Making (ALFWorld)
- Interactive Decision Making (WebShop)
Introduction
Criticisms of Previous Methods
- Hallucinations
- Error propagation in multi step task solving processes
- Limited ability to combine reasoning and acting
Important Concepts
- Systematic ablations
How Tho?
ReAct prompts the LLM to create a continuous loop of reasoning, acting, observing. The looping part allows it to adjust its plan.
However the reasoning and acting are bundled into the same step. While the observation step, could be asking the user, or querying an API. But the point of the observation step is to gather information about the current state and check with external sources of data.
And in the next iteration of the reasoning and acting, this observation is integrated to adjust it’s plan.
Example
I ran a simple LangChain ZeroShot ReAct agent in Colab to show how exactly it runs.
My input was:
Who is the current royalty of England? When did they start their reign?
What's the date today? How many days have they been in power?
The output was:
Thought: I need to find out who the current royalty is, when they started their reign,
and how many days they have been in power.
Action: Search
Action Input: "Current royalty of England"
Observation: King Charles III
Thought: I need to find out when he started his reign.
Action: Search
Action Input: "When did King Charles III start his reign?"
Observation: His Majesty King Charles III succeeded to the throne of the United Kingdom of Great Britain and Northern Ireland – as well as 14 other Commonwealth Realms – at 15:10 on Thursday 8 September 2022. This occurred immediately upon the death of his predecessor and mother, Her Majesty Queen Elizabeth II.
Thought: I need to find out what the date is today.
Action: Search
Action Input: "What is the date today?"
Observation: ["The date today is Monday, November 20, 2023. What is the date today in numbers? Depending where you reside and who's asking the ...", "Today's Date ; November 2023 ; Sun, Mon, Tue ; 5, 6, 7 ; 12, 13, 14 ...", "Today's Date · Saturday · November 18, 2023 · 11/18/2023 ...", "Today's date in numbers · What is today's date? · November 19, 2023 · 11/19/2023 · 11-19-2023 · 11/19/2023.", "This page includes the following information: Today's date: day of week, month, day, year. Current time: hours, minutes, seconds.", "Details about today's date with count of days, weeks, and months, Sun and Moon cycles, Zodiac signs and date formats.", "- Today's is: Sunday, November 19, 2023 | 04:57:46pm. - It is the 323th day in the 46th week of the year. - There are 30 days in Nov, 2023. - There are ...", 'Day of the year is a number between 1 and 365 (in 2023), January 1 is day 1. After today 42 days are remaining in this year. This page uses the ISO-8601 ...', 'Date to Date Calculator. Calculate how many days there are between two dates. A vector ... Weather forecast for the next hour, today, tomorrow, and 14 days ahead.', "Today's Date in Long Date Format: Sunday, November 19, 2023. The long date format includes the day of the week, the month, the day, and the year, all written ..."]
Thought: I need to calculate how many days have passed since the start of the reign.
Action: Calculator
Action Input: 8 September 2022 - 19 November 2023
Observation: Answer: 444
Thought: I now know the final answer.
Final Answer: King Charles III has been in power for 444 days.
You will notice that the action has 2 components:
- Action
- Action input
So you can equip the agent with different tools. In this case we equipped it with web search ability as well as calculator ability.
It knows when to use which tool and how to input values into it.
Interesting Tips
One of the methods that the paper tested ReAct prompting with was COT or Chain of Thought.
Chain of Thought
The main idea behind COT is to get the LLM to reason well on a multistep process. The way we do this is by, prompting it “Let’s think step by step…”.
This would be a zero shot prompting.
Whereas if you were to tell it to think step by step and show several example tasks, where the LLM can get the right answer and also show the right sequence of reasoning. Then that is called few-shot prompting.
COT Weaknesses
Main issue is that it can hallucinate an answer. And especially problematic, if it makes an error in one of the earlier steps. Then that error will propagate through all the subsequent steps as well.
Quick Fix
Chain of Thought with Self Consistency is a method to try and solve the issues of regular COT.
To do this we will essentially generate n=21 different results from the same COT prompt, then selecting the majority answer significantly improves hallucination free answers over regular COT.
ReAct or COT-SC
Pros of ReAct:
- More factual
- Grounded
Pros of COT-SC:
- More accurate at formulating reasoning structure
Personal Take
In the very few simple tasks I provided ReAct, it kept providing me close but not quite answers. Which was mildly annoying.
The paper suggests, doing this.
Letting the agent decide when to use ReAct and when to use COT-SC. But exactly under what conditions?
- Always start with ReAct
- However set the max number of steps to take to somewhere between 5–7, after this the performance doesn’t improve, but you can find your specific number yourself
- If ReAct fails to find the right answer, falldown to COT-SC
- The reason being COT-SC is kind expensive to run, both time and resource wise
However something useful to also note is, that if the majority answer of a COT-SC model is less than n/2, then we conclude that there isn’t enough internalized knowledge around this topic within the LLM.
Next I will build a simple COT-SC Google Colab that I can automate some reasoning tasks with.