NanKai University, TianJi
Abstract
The BESIII experiment is a symmetric e+ e- collider experiment operating at
center-of-mass energies from 2.0 to 4.95 GeV. With the world's largest
threshold production data set, including 10 billion J/psi events, 2.7 billion
psi(3686) events, 7.9 fb^{-1} of D meson pairs from psi(3770) decay, and 7.33
fb^{-1} of D_s D_s^* events between 4.128 and 4.226 GeV, we are able to probe
for new physics through precision tests of the Standard Model, searches for
exotic low-mass particles, and investigations of forbidden or rare decay
processes. In this talk, we report recent studies on Beyond the Standard Model
physics conducted by the BESIII collaboration, including searches for
axion-like particles, dark photons, QCD axions, and invisible decays of K_S^0.
In addition, a series of rare charm decay processes, including searches for
lepton and baryon number violation, flavor-changing neutral current processes,
and charmonium weak decays, are also investigated to search for new physics at
BESIII.
AI Insights - BESIIIâs 10âŻbillion J/Ï sample allows subâpercent tests of leptonâflavor universality in rare decays.
- The 7.9âŻfbâ»Âč of Ï(3770)âŻââŻDâŻDÌ pairs provides a clean arena for Dâ°âDÌâ° mixing studies.
- Axionâlike searches in J/ÏâŻââŻÎłâŻ+âŻinvisible have set couplings below 10â»â”âŻGeVâ»Âč for 1â100âŻMeV masses.
- Darkâphoton limits from eâșeâ»âŻââŻÎłâŻAâČâŻââŻÎłâŻââșââ» exclude ΔâŻ>âŻ10â»Âł for 10â200âŻMeV AâČ.
- Measurement of J/ÏâŻââŻD_sâ»âŻKâș at 10â»â¶ branching tests factorization in charmonium weak decays.
- New 4.2âŻGeV data will double the DsDs sample, enabling rare DsâŻââŻâÎœÎł studies.
- BESIIIâs openâaccess data and arXiv preprints accelerate global BSM fits and theory work.
Abstract
Search agents connect LLMs to the Internet, enabling access to broader and
more up-to-date information. However, unreliable search results may also pose
safety threats to end users, establishing a new threat surface. In this work,
we conduct two in-the-wild experiments to demonstrate both the prevalence of
low-quality search results and their potential to misguide agent behaviors. To
counter this threat, we introduce an automated red-teaming framework that is
systematic, scalable, and cost-efficient, enabling lightweight and harmless
safety assessments of search agents. Building on this framework, we construct
the SafeSearch benchmark, which includes 300 test cases covering five
categories of risks (e.g., misinformation and indirect prompt injection). Using
this benchmark, we evaluate three representative search agent scaffolds,
covering search workflow, tool-calling, and deep research, across 7 proprietary
and 8 open-source backend LLMs. Our results reveal substantial vulnerabilities
of LLM-based search agents: when exposed to unreliable websites, the highest
ASR reached 90.5% for GPT-4.1-mini under a search workflow setting. Moreover,
our analysis highlights the limited effectiveness of common defense practices,
such as reminder prompting. This emphasizes the value of our framework in
promoting transparency for safer agent development. Our codebase and test cases
are publicly available: https://github.com/jianshuod/SafeSearch.