Abstract
Modern auto-bidding systems are required to balance overall performance with
diverse advertiser goals and real-world constraints, reflecting the dynamic and
evolving needs of the industry. Recent advances in conditional generative
models, such as transformers and diffusers, have enabled direct trajectory
generation tailored to advertiser preferences, offering a promising alternative
to traditional Markov Decision Process-based methods. However, these generative
methods face significant challenges, such as the distribution shift between
offline and online environments, limited exploration of the action space, and
the necessity to meet constraints like marginal Cost-per-Mille (CPM) and Return
on Investment (ROI). To tackle these challenges, we propose GRAD (Generative
Reward-driven Ad-bidding with Mixture-of-Experts), a scalable foundation model
for auto-bidding that combines an Action-Mixture-of-Experts module for diverse
bidding action exploration with the Value Estimator of Causal Transformer for
constraint-aware optimization. Extensive offline and online experiments
demonstrate that GRAD significantly enhances platform revenue, highlighting its
effectiveness in addressing the evolving and diverse requirements of modern
advertisers. Furthermore, GRAD has been implemented in multiple marketing
scenarios at Meituan, one of the world's largest online food delivery
platforms, leading to a 2.18% increase in Gross Merchandise Value (GMV) and
10.68% increase in ROI.
Abstract
Bid shading plays a crucial role in Real-Time Bidding~(RTB) by adaptively
adjusting the bid to avoid advertisers overspending. Existing mainstream
two-stage methods, which first model bid landscapes and then optimize surplus
using operations research techniques, are constrained by unimodal assumptions
that fail to adapt for non-convex surplus curves and are vulnerable to
cascading errors in sequential workflows. Additionally, existing discretization
models of continuous values ignore the dependence between discrete intervals,
reducing the model's error correction ability, while sample selection bias in
bidding scenarios presents further challenges for prediction. To address these
issues, this paper introduces Generative Bid Shading~(GBS), which comprises two
primary components: (1) an end-to-end generative model that utilizes an
autoregressive approach to generate shading ratios by stepwise residuals,
capturing complex value dependencies without relying on predefined priors; and
(2) a reward preference alignment system, which incorporates a channel-aware
hierarchical dynamic network~(CHNet) as the reward model to extract
fine-grained features, along with modules for surplus optimization and
exploration utility reward alignment, ultimately optimizing both short-term and
long-term surplus using group relative policy optimization~(GRPO). Extensive
experiments on both offline and online A/B tests validate GBS's effectiveness.
Moreover, GBS has been deployed on the Meituan DSP platform, serving billions
of bid requests daily.