EPFL, ETH Zurich, CSCS, H
Abstract
We present Apertus, a fully open suite of large language models (LLMs)
designed to address two systemic shortcomings in today's open model ecosystem:
data compliance and multilingual representation. Unlike many prior models that
release weights without reproducible data pipelines or regard for content-owner
rights, Apertus models are pretrained exclusively on openly available data,
retroactively respecting robots.txt exclusions and filtering for
non-permissive, toxic, and personally identifiable content. To mitigate risks
of memorization, we adopt the Goldfish objective during pretraining, strongly
suppressing verbatim recall of data while retaining downstream task
performance. The Apertus models also expand multilingual coverage, training on
15T tokens from over 1800 languages, with ~40% of pretraining data allocated to
non-English content. Released at 8B and 70B scales, Apertus approaches
state-of-the-art results among fully open models on multilingual benchmarks,
rivalling or surpassing open-weight counterparts. Beyond model weights, we
release all scientific artifacts from our development cycle with a permissive
license, including data preparation scripts, checkpoints, evaluation suites,
and training code, enabling transparent audit and extension.
AI Insights - Swiss AI Charter in Apertusâ prompt enforces transparency, accountability, and humanâvalue respect.
- Prompt lists Swiss national languages, ensuring culturally relevant multilingual responses.
- Goldfish objective cuts verbatim recall, balancing privacy with downstream performance.
- All weights, code, and evaluation scripts are ApacheâŻ2.0âlicensed for full auditability.
- 40âŻ% nonâEnglish tokens give Apertus an edge on lowâresource language benchmarks.
- Instructions prioritize accuracy, transparently separate facts from speculation, and allow evidenceâbased revision.
- Read âThe Swiss AI Charterâ and âApertus: A Multilingual AI Language Model for General Knowledge and Reasoning Tasks.â
Cornell Tech
Abstract
Large language models equipped with Web search, information retrieval tools,
and other agentic capabilities are beginning to supplant traditional search
engines. As users start to rely on LLMs for information on many topics,
including controversial and debatable issues, it is important to understand how
the stances and opinions expressed in LLM outputs are influenced by the
documents they use as their information sources.
In this paper, we present MillStone, the first benchmark that aims to
systematically measure the effect of external arguments on the stances that
LLMs take on controversial issues (not all of them political). We apply
MillStone to nine leading LLMs and measure how ``open-minded'' they are to
arguments supporting opposite sides of these issues, whether different LLMs
agree with each other, which arguments LLMs find most persuasive, and whether
these arguments are the same for different LLMs.
In general, we find that LLMs are open-minded on most issues. An
authoritative source of information can easily sway an LLM's stance,
highlighting the importance of source selection and the risk that LLM-based
information retrieval and search systems can be manipulated.
AI Insights - MillStone probes nine leading LLMs, quantifying how authoritative sources shift stances.
- Opus uniquely drops refusal when fed balanced arguments, revealing a rare neutrality.
- The benchmark identifies which arguments most sway each model, exposing hidden biases.
- Crossâmodel agreement analysis shows divergent persuasive cues across architectures.
- Findings warn that LLMâpowered search can be gamed by manipulating source credibility.
- âControversial topicsâ are formally defined as issues with documented public disagreement.
- Recommended reading includes BERT and RoBERTa papers for foundational biasâevaluation techniques.