Figure 1.

Figure 2.

Figure 3.

Figure 4.

Comparison of ICTO and Baseline Performances
| Method | WebShop | ScienceWorld | ALFWorld |
|---|---|---|---|
| SFT | 63.1 | 70.0% | 12.5 |
| ETO | 67.4 | 72.3% | 11.2 |
| IPR | 68.3 | 73.8% | 10.8 |
| RLCD | 65.8 | 71.5% | 11.5 |
| NAT | 66.5 | 72.0% | 11.0 |
| ICTO (ours) | 70.2 | 75.6% | 9.7 |
Experimental environment
| Component | Details |
|---|---|
| CPU | Intel Core i9-10900K |
| GPU | NVIDIA Tesla V100 PCIe 32GB |
| LLM Agent Model | Llama2-7B Chat |
| Optimizer | AdamW Optimizer |
| Experiment Management Tool | DeepSpeed |
Generalization Performance of ICTO on OOD Tasks
| Method | WebShop | ScienceWorld | ALFWorld |
|---|---|---|---|
| SFT | 52.3 | 60.0% | 15.0 |
| ETO | 55.8 | 62.0% | 14.2 |
| IPR | 57.1 | 63.5% | 13.8 |
| RLCD | 54.2 | 61.0% | 14.5 |
| NAT | 56.0 | 62.5% | 14.0 |
| ICTO (ours) | 59.5 | 66.0% | 12.5 |
Ablation Study of ICTO Modules
| Training Scheme | WebShop | ScienceWorld | ALFWorld |
|---|---|---|---|
| w/o Contrastive Learning | 64.2 | 67.8% | 11.6 |
| w/o Behavioral Cloning | 60.7 | 62.5% | 13.1 |
| Iteration=1 | 66.1 | 69.2% | 12.8 |
| Iteration=2 | 68.5 | 70.6% | 12.3 |
| Iteration=3 | 70.9 | 72.3% | 11.7 |
| Iteration=4 | 72.3 | 73.1% | 11.0 |
| Iteration=5 | 72.0 | 72.8% | 10.5 |