Tsinghua University, Tsue
Abstract
Scientific discovery drives progress across disciplines, from fundamental
physics to industrial applications. However, identifying physical laws
automatically from gathered datasets requires identifying the structure and
parameters of the formula underlying the data, which involves navigating a vast
search space and consuming substantial computational resources. To address
these issues, we build on the Buckingham $\Pi$ theorem and Taylor's theorem to
create a unified representation of diverse formulas, which introduces latent
variables to form a two-stage structure. To minimize the search space, we
initially focus on determining the structure of the latent formula, including
the relevant contributing inputs, the count of latent variables, and their
interconnections. Following this, the process of parameter identification is
expedited by enforcing dimensional constraints for physical relevance, favoring
simplicity in the formulas, and employing strategic optimization techniques.
Any overly complex outcomes are refined using symbolic regression for a compact
form. These general strategic techniques drastically reduce search iterations
from hundreds of millions to just tens, significantly enhancing the efficiency
of data-driven formula discovery. We performed comprehensive validation to
demonstrate FIND's effectiveness in discovering physical laws, dimensionless
numbers, partial differential equations, and uniform critical system parameters
across various fields, including astronomy, physics, chemistry, and
electronics. The excellent performances across 11 distinct datasets position
FIND as a powerful and versatile tool for advancing data-driven scientific
discovery in multiple domains.
AI Insights - The authors recommend âDimensional Analysis: With Case Studies in Mechanicsâ by Q.-M. Tan and âScaling, Vol. 34â by G. I. Barenblatt for deepening understanding of Buckingham Î applications.
- âDiscovering Governing Equations from Dataâ by Brunton et al. and âDimensionally Consistent Learning with Buckingham Piâ by Bakarji et al. are cited as foundational works that inspired FINDâs hybrid symbolicâdimensional approach.
- Dimensionless Analysis is defined as a technique that removes units to reveal underlying similarity laws, enabling the construction of physically consistent latent variables.
- The paper acknowledges that FINDâs scalability is limited by the combinatorial growth of latentâvariable interconnections, and that its accuracy hinges on the signalâtoânoise ratio of the input data.
- NASAâs Planets Factsheet and the Exoplanet Archive are highlighted as realâworld datasets where FIND successfully extracted governing equations, illustrating its applicability to astronomical data.
Abstract
We introduce and study the problem of online omniprediction with long-term
constraints. At each round, a forecaster is tasked with generating predictions
for an underlying (adaptively, adversarially chosen) state that are broadcast
to a collection of downstream agents, who must each choose an action. Each of
the downstream agents has both a utility function mapping actions and state to
utilities, and a vector-valued constraint function mapping actions and states
to vector-valued costs. The utility and constraint functions can arbitrarily
differ across downstream agents. Their goal is to choose actions that guarantee
themselves no regret while simultaneously guaranteeing that they do not
cumulatively violate the constraints across time. We show how to make a single
set of predictions so that each of the downstream agents can guarantee this by
acting as a simple function of the predictions, guaranteeing each of them
$\tilde{O}(\sqrt{T})$ regret and $O(1)$ cumulative constraint violation. We
also show how to extend our guarantees to arbitrary intersecting contextually
defined \emph{subsequences}, guaranteeing each agent both regret and constraint
violation bounds not just marginally, but simultaneously on each subsequence,
against a benchmark set of actions simultaneously tailored to each subsequence.