Categories
Tags
259 words
1 minutes
SCRIBEAGENT:TOWARDS SPECIALIZED WEB AGENTS USING PRODUCTION-SCALE WORKFLOW DATA
ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data
π Problem Statement
Most web agents today:
- Use proprietary, general-purpose LLMs (e.g., GPT-4)
- Rely on prompt engineering for task completion
- Lack deep understanding of HTML/DOM structure
- Are inefficient and expensive to serve
ScribeAgent proposes a new paradigm:
- Fine-tune open-source LLMs using real-world web workflow data (6B tokens)
- Use direct, single-stage generation instead of multi-stage pipelines
- Outperform GPT-4-based agents in both static and dynamic environments
π§ Method Overview
ποΈ Data Collection
Data collected from Scribe plugin, which captures:
- Webpage URL
- HTML-DOM
- Action type: click, type, hotkey
- Natural language description of the action
- CSS selector of the target element
Covers:
- 250+ websites
- 10,000+ subdomains
- Avg. 11 steps per task
- ~6B training tokens
π Preprocessing
Clean HTML using BeautifulSoup
Remove irrelevant metadata/scripts
Use heuristics (char-to-token ratio) to filter out noisy attributes
Encode each action in a 5-line format:
Description: Click the "Menu" button Action: mouse click action Node: 832 Target: <svg class="..." node="832">
π§ Model Training
- Fine-tune using LoRA (parameter-efficient)
- Models evaluated: Mistral 7B, Qwen2.5 32B, Codestral 22B
- Best performing: Qwen2 and Qwen2.5 series
π Experimental Results
Proprietary Dataset
- ScribeAgent significantly outperforms GPT-4 and GPT-4o
- 6Γ improvement in exact match accuracy after fine-tuning
Mind2Web Benchmark (Static)
- Zero-shot ScribeAgent beats all existing baselines
- Good generalization across unseen tasks and domains
WebArena Benchmark (Dynamic)
- ScribeAgent + GPT-4o forms a multi-agent system for end-to-end task execution
- Outperforms GPT-4-based agents in all categories
- Improves task success rate by +14.1% over SOTA (WebPilot)
β Key Contributions
- First LLM agent fine-tuned on large-scale, real-world workflow data
- Outperforms prompt-based GPT-4 agents in navigation and planning
- Full ablation studies on model selection, context length, and dataset size
- Reduced inference cost using smaller open-source models
π§ Limitations and Future Work
- DOM length still challenges context size; plan to use memory module
- No planning module included yet
- Aim to extend to multi-modal and multilingual settings
SCRIBEAGENT:TOWARDS SPECIALIZED WEB AGENTS USING PRODUCTION-SCALE WORKFLOW DATA
https://nanshanvv.github.io/shuchangwen-webpage/posts/paper-review/scribeagent/