Abstract
AI safety research has emphasized interpretability, control, and robustness,
yet without an ethical substrate these approaches may remain fragile under
competitive and open-ended pressures. This paper explores ethics not as an
external add-on, but as a possible structural lens for alignment, introducing a
\emph{moral problem space} $M$: a high-dimensional domain in which moral
distinctions could, in principle, be represented in AI systems. Human moral
reasoning is treated as a compressed and survival-biased projection
$\tilde{M}$, clarifying why judgment is inconsistent while suggesting tentative
methods -- such as sparse autoencoders, causal mediation, and cross-cultural
corpora -- that might help probe for disentangled moral features. Within this
framing, metaethical positions are interpreted as research directions: realism
as the search for stable invariants, relativism as context-dependent
distortions, constructivism as institutional shaping of persistence, and virtue
ethics as dispositional safeguards under distributional shift. Evolutionary
dynamics and institutional design are considered as forces that may determine
whether ethical-symbiotic lineages remain competitively viable against more
autarkic trajectories. Rather than offering solutions, the paper sketches a
research agenda in which embedding ethics directly into representational
substrates could serve to make philosophical claims more empirically
approachable, positioning moral theory as a potential source of hypotheses for
alignment work.