Abstract
Large Language Models (LLMs) are increasingly applied to domains that require
reasoning about other agents' behavior, such as negotiation, policy design, and
market simulation, yet existing research has mostly evaluated their adherence
to equilibrium play or their exhibited depth of reasoning. Whether they display
genuine strategic thinking, understood as the coherent formation of beliefs
about other agents, evaluation of possible actions, and choice based on those
beliefs, remains unexplored. We develop a framework to identify this ability by
disentangling beliefs, evaluation, and choice in static, complete-information
games, and apply it across a series of non-cooperative environments. By jointly
analyzing models' revealed choices and reasoning traces, and introducing a new
context-free game to rule out imitation from memorization, we show that current
frontier models exhibit belief-coherent best-response behavior at targeted
reasoning depths. When unconstrained, they self-limit their depth of reasoning
and form differentiated conjectures about human and synthetic opponents,
revealing an emergent form of meta-reasoning. Under increasing complexity,
explicit recursion gives way to internally generated heuristic rules of choice
that are stable, model-specific, and distinct from known human biases. These
findings indicate that belief coherence, meta-reasoning, and novel heuristic
formation can emerge jointly from language modeling objectives, providing a
structured basis for the study of strategic cognition in artificial agents.
Abstract
Unlike Business-to-Consumer e-commerce platforms (e.g., Amazon),
inexperienced individual sellers on Consumer-to-Consumer platforms (e.g., eBay)
often face significant challenges in setting prices for their second-hand
products efficiently. Therefore, numerous studies have been proposed for
automating price prediction. However, most of them are based on static
regression models, which suffer from poor generalization performance and fail
to capture market dynamics (e.g., the price of a used iPhone decreases over
time). Inspired by recent breakthroughs in Large Language Models (LLMs), we
introduce LLP, the first LLM-based generative framework for second-hand product
pricing. LLP first retrieves similar products to better align with the dynamic
market change. Afterwards, it leverages the LLMs' nuanced understanding of key
pricing information in free-form text to generate accurate price suggestions.
To strengthen the LLMs' domain reasoning over retrieved products, we apply a
two-stage optimization, supervised fine-tuning (SFT) followed by group relative
policy optimization (GRPO), on a dataset built via bidirectional reasoning.
Moreover, LLP employs a confidence-based filtering mechanism to reject
unreliable price suggestions. Extensive experiments demonstrate that LLP
substantially surpasses existing methods while generalizing well to unseen
categories. We have successfully deployed LLP on Xianyu\footnote\{Xianyu is
China's largest second-hand e-commerce platform.\}, significantly outperforming
the previous pricing method. Under the same 30\% product coverage, it raises
the static adoption rate (SAR) from 40\% to 72\%, and maintains a strong SAR of
47\% even at 90\% recall.