Proactive Agent Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
. 1.### Data collection and pipeline construction . - Collect the operation history of actual users in "coding," "document creation," "daily life," etc., and obtain logs (events) of the environment and user behavior from them. - Using a large-scale language model (such as GPT-4), create a virtual "Environment Gym" based on these logs. - On the gym, a large amount of training data is generated by simulating the actions of a user (User Agent), the agent's anticipatory proposals, and the user's acceptance or rejection of them.
2.### Building ProactiveBench . - Created a dataset called "ProactiveBench" using the above pipeline (6,790 events in total). - Includes a variety of "event histories" and "anticipatory task suggestions (adopted or rejected)" for training purposes. - A test set (233 events) created from actual user behavior logs is also available.
3.### Reward Model . - The human annotator assigns a label to the agent's proposed task as "accept (was necessary)" or "reject (was unnecessary)". - It learns a reward model that mimics this human judgment and automatically judges the "appropriateness (acceptability)" of tasks proposed by the agent.
4.### Experiments and evaluations . - We have just learned and fine-tuned based on open source LLMs such as LLaMA and Qwen, - F1 scores for appropriate anticipatory task proposals improved from about 55% to around 66%. - Closed source LLMs such as GPT-4 and Claude were also evaluated, suggesting cases where the fine-tuned models of open source LLMs were able to match or exceed the results.
.
--
This paper presents a framework for evolving large-scale language models from "reactive" response devices to "proactive" assistants that anticipate and suggest tasks. The main features of the system are a mechanism for creating large amounts of simulation data and the use of that data for fine-tuning and evaluating reward models to enable the agent to read user behavior and suggest task assistance at useful times. While experiments have shown significant performance improvements, the system also presents challenges in actual operation, such as subtlety of timing and frequent false positives.
This page is auto-translated from [/nishio/Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance](https://scrapbox.io/nishio/Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.