Figure 1.

Figure 2.

Figure 3.

Figure 4.

Experimental results on various datasets
| DataSet | Model | NoAgent | ReAct | InterAct | VIMBank |
|---|---|---|---|---|---|
| ALFWorld | Qwen2-7b | 48.8 | 54.7 | 60.1 | 72.3 |
| ChatGLM3 | 46.2 | 49.2 | 55.8 | 64.9 | |
| HotpotQA | Qwen2-7b | 51.6 | 57.3 | 63.4 | 76.3 |
| ChatGLM3 | 45.9 | 51.8 | 59.7 | 71.5 | |
| KAgentBench | Qwen2-7b | 34.2 | 48.5 | 52.6 | 58.7 |
| ChatGLM3 | 32.6 | 44.7 | 46.3 | 54.2 |
Reasoning cost of ALFWorld environment
| 200 | 600 | 1000 | |
|---|---|---|---|
| NoAgent | 63.2K | 164.7K | 334.7K |
| VIMBank | 56.8K | 142.6K | 258.3K |
Experimental Environment
| Experimental Environment | Version |
|---|---|
| CPU | Intel Core i9-10900K |
| GPU | NVIDIA Tesla V100 PCIe |
| 32G | |
| Language | Python 3.9 |
| Framework | LangChain |