Abstract
AI safety research has emphasized interpretability, control, and robustness,
yet without an ethical substrate these approaches may remain fragile under
competitive and open-ended pressures. This paper explores ethics not as an
external add-on, but as a possible structural lens for alignment, introducing a
\emph{moral problem space} $M$: a high-dimensional domain in which moral
distinctions could, in principle, be represented in AI systems. Human moral
reasoning is treated as a compressed and survival-biased projection
$\tilde{M}$, clarifying why judgment is inconsistent while suggesting tentative
methods -- such as sparse autoencoders, causal mediation, and cross-cultural
corpora -- that might help probe for disentangled moral features. Within this
framing, metaethical positions are interpreted as research directions: realism
as the search for stable invariants, relativism as context-dependent
distortions, constructivism as institutional shaping of persistence, and virtue
ethics as dispositional safeguards under distributional shift. Evolutionary
dynamics and institutional design are considered as forces that may determine
whether ethical-symbiotic lineages remain competitively viable against more
autarkic trajectories. Rather than offering solutions, the paper sketches a
research agenda in which embedding ethics directly into representational
substrates could serve to make philosophical claims more empirically
approachable, positioning moral theory as a potential source of hypotheses for
alignment work.
Abstract
AI companies increasingly develop and deploy privacy-enhancing technologies,
bias-constraining measures, evaluation frameworks, and alignment techniques --
framing them as addressing concerns related to data privacy, algorithmic
fairness, and AI safety. This paper examines the ulterior function of these
technologies as mechanisms of legal influence. First, we examine how
encryption, federated learning, and synthetic data -- presented as enhancing
privacy and reducing bias -- can operate as mechanisms of avoidance with
existing regulations in attempts to place data operations outside the scope of
traditional regulatory frameworks. Second, we investigate how emerging AI
safety practices including open-source model releases, evaluations, and
alignment techniques can be used as mechanisms of change that direct regulatory
focus towards industry-controlled voluntary standards and self-governance. We
term this phenomenon anti-regulatory AI -- the deployment of ostensibly
protective technologies that simultaneously shapes the terms of regulatory
oversight. Our analysis additionally reveals how technologies' anti-regulatory
functions are enabled through framing that legitimizes their deployment while
obscuring their use as regulatory workarounds. This paper closes with a
discussion of policy implications that centers on the consideration of business
incentives that drive AI development and the role of technical expertise in
assessing whether these technologies fulfill their purported protections.