Stable AI & Tsinghua Unv
Abstract
We argue that progress toward general intelligence requires complementary
foundation models grounded in language, the physical world, and structured
data. This report presents LimiX, the first installment of our large
structured-data models (LDMs). LimiX treats structured data as a joint
distribution over variables and missingness, thus capable of addressing a wide
range of tabular tasks through query-based conditional prediction via a single
model. LimiX is pretrained using masked joint-distribution modeling with an
episodic, context-conditional objective, where the model predicts for query
subsets conditioned on dataset-specific contexts, supporting rapid,
training-free adaptation at inference. We evaluate LimiX across 10 large
structured-data benchmarks with broad regimes of sample size, feature
dimensionality, class number, categorical-to-numerical feature ratio,
missingness, and sample-to-feature ratios. With a single model and a unified
interface, LimiX consistently surpasses strong baselines including
gradient-boosting trees, deep tabular networks, recent tabular foundation
models, and automated ensembles, as shown in Figure 1 and Figure 2. The
superiority holds across a wide range of tasks, such as classification,
regression, missing value imputation, and data generation, often by substantial
margins, while avoiding task-specific architectures or bespoke training per
task. All LimiX models are publicly accessible under Apache 2.0.
AI Insights - LimiX models structured data as a joint distribution over variables and missingness, enabling unified inference.
- Its masked joint‑distribution pretraining uses an episodic, context‑conditional objective that predicts any query subset on the fly.
- The model supports rapid, training‑free adaptation at inference, simply conditioning on dataset‑specific context.
- Across ten diverse benchmarks, LimiX outperforms gradient‑boosting trees, deep tabular nets, and recent foundation models.
- It handles classification, regression, missing‑value imputation, and data generation with a single architecture.
- The approach explicitly incorporates missingness into the probability model, improving robustness to sparse data.
- All LimiX checkpoints are released under Apache 2.0, inviting community experimentation.
University of California
Abstract
Artificial intelligence is often measured by the range of tasks it can
perform. Yet wide ability without depth remains only an imitation. This paper
proposes a Structural-Generative Ontology of Intelligence: true intelligence
exists only when a system can generate new structures, coordinate them into
reasons, and sustain its identity over time. These three conditions --
generativity, coordination, and sustaining -- define the depth that underlies
real intelligence. Current AI systems, however broad in function, remain
surface simulations because they lack this depth. Breadth is not the source of
intelligence but the growth that follows from depth. If future systems were to
meet these conditions, they would no longer be mere tools, but could be seen as
a possible Second Being, standing alongside yet distinct from human existence.
AI Insights - The paper flags three AI confusions: equating imitation with being, hiding structure origins, and treating intelligence as engineering.
- It urges an ontological shift to a philosophically rigorous yet empirically testable framework.
- Generativity is creating new categories that open a world.
- Coordination integrates those categories into a normative space of reasons.
- Sustaining keeps generativity and coordination alive over time, forming a historical subject.
- Breadth alone gives coverage; without coordination it fragments; without sustaining it is episodic.
- Suggested readings: Bostrom’s *Superintelligence*, Floridi’s *Fourth Revolution*, Russell’s *Human Compatible*.