Abstract
Data lakehouses run sensitive workloads, where AI-driven automation raises
concerns about trust, correctness, and governance. We argue that API-first,
programmable lakehouses provide the right abstractions for safe-by-design,
agentic workflows. Using Bauplan as a case study, we show how data branching
and declarative environments extend naturally to agents, enabling
reproducibility and observability while reducing the attack surface. We present
a proof-of-concept in which agents repair data pipelines using correctness
checks inspired by proof-carrying code. Our prototype demonstrates that
untrusted AI agents can operate safely on production data and outlines a path
toward a fully agentic lakehouse.
Meta Superintelligence,FA
Abstract
A long-term goal of language agents is to learn and improve through their own
experience, ultimately outperforming humans in complex, real-world tasks.
However, training agents from experience data with reinforcement learning
remains difficult in many environments, which either lack verifiable rewards
(e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn
tool use). As a result, most current agents rely on supervised fine-tuning on
expert data, which is challenging to scale and generalizes poorly. This
limitation stems from the nature of expert demonstrations: they capture only a
narrow range of scenarios and expose the agent to limited environment
diversity. We address this limitation with a middle-ground paradigm we call
early experience: interaction data generated by the agent's own actions, where
the resulting future states serve as supervision without reward signals. Within
this paradigm we study two strategies of using such data: (1) Implicit world
modeling, which uses collected states to ground the policy in environment
dynamics; and (2) Self-reflection, where the agent learns from its suboptimal
actions to improve reasoning and decision-making. We evaluate across eight
diverse environments and multiple model families. Our approaches consistently
improve effectiveness and out-of-domain generalization, highlighting the value
of early experience. Moreover, in environments with verifiable rewards, our
results provide promising signals that early experience offers a strong
foundation for subsequent reinforcement learning, positioning it as a practical
bridge between imitation learning and fully experience-driven agents.
AI Insights - WebArena and its lightweight subset WebArenaâLite benchmark realistic webâinteraction tasks.
- Llamaâ3.2â3BâInstruct, Qwenâ2.5â7BâInstruct, and GPTâ4âTurbo all benefit from earlyâexperience methods.
- Implicit worldâmodeling trains a latent dynamics predictor from selfâgenerated states, enabling rewardâfree anticipation.
- Selfâreflection replays suboptimal actions, letting the agent learn corrective policies from its own traces.
- Experiment code is hosted at https://github.com/your-repo-name/webarena-benchmark for quick replication.
- Key papers include âWebArena: A Benchmark for Evaluating Web Interaction Capabilitiesâ and âImplicit World Modeling and SelfâReflection for Improving AI Performance on Web Tasks.â
- The study shows early experience bridges imitation learning and fullâexperience RL, boosting outâofâdomain generalization.