Amazon Web Services
Abstract
Error attribution in Large Language Model (LLM) multi-agent systems presents
a significant challenge in debugging and improving collaborative AI systems.
Current approaches to pinpointing agent and step level failures in interaction
traces - whether using all-at-once evaluation, step-by-step analysis, or binary
search - fall short when analyzing complex patterns, struggling with both
accuracy and consistency. We present ECHO (Error attribution through Contextual
Hierarchy and Objective consensus analysis), a novel algorithm that combines
hierarchical context representation, objective analysis-based evaluation, and
consensus voting to improve error attribution accuracy. Our approach leverages
a positional-based leveling of contextual understanding while maintaining
objective evaluation criteria, ultimately reaching conclusions through a
consensus mechanism. Experimental results demonstrate that ECHO outperforms
existing methods across various multi-agent interaction scenarios, showing
particular strength in cases involving subtle reasoning errors and complex
interdependencies. Our findings suggest that leveraging these concepts of
structured, hierarchical context representation combined with consensus-based
objective decision-making, provides a more robust framework for error
attribution in multi-agent systems.
AI Insights - The appendix details ECHOās algorithm, prompts, and code, ensuring full reproducibility.
- It follows the NeurIPS Code of Ethics, transparently noting limits and societal impacts.
- ECHO is benchmarked on Who&When with Anthropic LLMs, and its GitHub repo is public.
- No human subjects were used, so IRB approval and risk disclosure were unnecessary.
- The paper references Ethics Guidelines for NeurIPS, Paperswithcode datasets, and Courseraās Ethics course.
- A noted weakness is the lack of safeguards for responsible release of highārisk models.
- ECHOās consensus voting is formally described, offering a new objective decision framework for hierarchical AI debugging.
Abstract
In this paper, we present the first large-scale study exploring whether
JavaScript code generated by Large Language Models (LLMs) can reveal which
model produced it, enabling reliable authorship attribution and model
fingerprinting. With the rapid rise of AI-generated code, attribution is
playing a critical role in detecting vulnerabilities, flagging malicious
content, and ensuring accountability. While AI-vs-human detection usually
treats AI as a single category we show that individual LLMs leave unique
stylistic signatures, even among models belonging to the same family or
parameter size. To this end, we introduce LLM-NodeJS, a dataset of 50,000
Node.js back-end programs from 20 large language models. Each has four
transformed variants, yielding 250,000 unique JavaScript samples and two
additional representations (JSIR and AST) for diverse research applications.
Using this dataset, we benchmark traditional machine learning classifiers
against fine-tuned Transformer encoders and introduce CodeT5-JSA, a custom
architecture derived from the 770M-parameter CodeT5 model with its decoder
removed and a modified classification head. It achieves 95.8% accuracy on
five-class attribution, 94.6% on ten-class, and 88.5% on twenty-class tasks,
surpassing other tested models such as BERT, CodeBERT, and Longformer. We
demonstrate that classifiers capture deeper stylistic regularities in program
dataflow and structure, rather than relying on surface-level features. As a
result, attribution remains effective even after mangling, comment removal, and
heavy code transformations. To support open science and reproducibility, we
release the LLM-NodeJS dataset, Google Colab training scripts, and all related
materials on GitHub: https://github.com/LLM-NodeJS-dataset.