Papers from 08 to 12 September, 2025

Here are the personalized paper recommendations sorted by most relevant
Data Science Career Advice
👍 👎 ♥ Save
Abstract
Large Language Models (LLMs) have shifted in just a few years from novelty to ubiquity, raising fundamental questions for data science education. Tasks once used to teach coding, writing, and problem-solving can now be completed by LLMs, forcing educators to reconsider both pedagogy and assessment. To understand how instructors are adapting, we conducted semi-structured interviews with 42 instructors from 33 institutions in 10 countries in June and July 2025. Our qualitative analysis reveals a pragmatic mix of optimism and concern. Many respondents view LLMs as inevitable classroom tools -- comparable to calculators or Wikipedia -- while others worry about de-skilling, misplaced confidence, and uneven integration across institutions. Around 58 per cent have already introduced demonstrations, guided activities, or make extensive use of LLMs in their courses, though most expect change to remain slow and uneven. That said, 31 per cent have not used LLMs to teach students and do not plan to. We highlight some instructional innovations, including AI-aware assessments, reflective use of LLMs as tutors, and course-specific chatbots. By sharing these perspectives, we aim to help data science educators adapt collectively to ensure curricula keep pace with technological change.
👍 👎 ♥ Save
Abstract
The technology industry offers exciting and diverse career opportunities, ranging from traditional software development to emerging fields such as artificial intelligence, cybersecurity, and data science. Career fairs play a crucial role in helping Computer Science (CS) students understand the various career pathways available to them in the industry. However, limited research exists on how CS students experience and benefit from these events. Through a survey of 86 students, we investigate their motivations for attending, preparation strategies, and learning outcomes, including exposure to new career paths and technologies. We envision our findings providing valuable insights for career services professionals, educators, and industry leaders in improving the career development processes of CS students.

We did not find tons of content matching your interests we've included some additional topics that are popular. Also be aware that if the topics is not present in arxiv we wont be able to recommend it.

AI Agents
👍 👎 ♥ Save
Shanghai University of F
Paper visualization
Rate this image: 😍 👍 👎
Abstract
With the rapid advancement of large language models (LLMs), Multi-agent Systems (MAS) have achieved significant progress in various application scenarios. However, substantial challenges remain in designing versatile, robust, and efficient platforms for agent deployment. To address these limitations, we propose \textbf{LightAgent}, a lightweight yet powerful agentic framework, effectively resolving the trade-off between flexibility and simplicity found in existing frameworks. LightAgent integrates core functionalities such as Memory (mem0), Tools, and Tree of Thought (ToT), while maintaining an extremely lightweight structure. As a fully open-source solution, it seamlessly integrates with mainstream chat platforms, enabling developers to easily build self-learning agents. We have released LightAgent at \href{https://github.com/wxai-space/LightAgent}{https://github.com/wxai-space/LightAgent}
AI Insights
  • LightAgent’s swarm design lets dozens of agents coordinate via one LightSwarm instance, boosting throughput.
  • Each agent carries a distinct instruction set, enabling domain‑specific roles such as code synthesis or data retrieval.
  • A built‑in text UI turns user prompts into executable code snippets, streamlining rapid prototyping.
  • Tree‑of‑Thought logic lets agents iteratively refine plans, cutting hallucinations and improving accuracy.
  • The lightweight core keeps memory usage under 200 MB on a single GPU while still supporting custom tool plugins.
  • Advanced features can be daunting for beginners, and highly specialized tasks may still need manual tuning.
  • LightAgent has been applied to robotics, finance, and healthcare, proving its versatility beyond chat‑bot demos.
👍 👎 ♥ Save
Abstract
The deployment of capable AI agents raises fresh questions about safety, human-machine relationships and social coordination. We argue for greater engagement by scientists, scholars, engineers and policymakers with the implications of a world increasingly populated by AI agents. We explore key challenges that must be addressed to ensure that interactions between humans and agents, and among agents themselves, remain broadly beneficial.
AI and Society
👍 👎 ♥ Save
Hugging Face
Paper visualization
Rate this image: 😍 👍 👎
Abstract
Artificial intelligence promises to accelerate scientific discovery, yet its benefits remain unevenly distributed. While technical obstacles such as scarce data, fragmented standards, and unequal access to computation are significant, we argue that the primary barriers are social and institutional. Narratives that defer progress to speculative "AI scientists," the undervaluing of data and infrastructure contributions, misaligned incentives, and gaps between domain experts and machine learning researchers all constrain impact. We highlight four interconnected challenges: community dysfunction, research priorities misaligned with upstream needs, data fragmentation, and infrastructure inequities. We argue that their roots lie in cultural and organizational practices. Addressing them requires not only technical innovation but also intentional community-building, cross-disciplinary education, shared benchmarks, and accessible infrastructure. We call for reframing AI for science as a collective social project, where sustainable collaboration and equitable participation are treated as prerequisites for technical progress.
AI Insights
  • Democratizing advanced cyberinfrastructure unlocks responsible AI research across global labs.
  • Only 5 % of Africa’s AI talent accesses sufficient compute, underscoring regional inequity.
  • Pre‑trained transformer models now generate multi‑omics, multi‑species, multi‑tissue samples.
  • Quantization‑aware training yields efficient neural PDE‑solvers showcased at recent conferences.
  • The FAIR Guiding Principles guide scientific data stewardship, enhancing reproducibility.
  • MAGE‑Tab’s spreadsheet‑based format standardizes microarray data for seamless sharing.
  • Resources like The Human Cell Atlas and pymatgen empower interdisciplinary material‑genomics research.
Research Automation with AI
👍 👎 ♥ Save
Carnegie Mellon Universt
Abstract
AI scientist systems, capable of autonomously executing the full research workflow from hypothesis generation and experimentation to paper writing, hold significant potential for accelerating scientific discovery. However, the internal workflow of these systems have not been closely examined. This lack of scrutiny poses a risk of introducing flaws that could undermine the integrity, reliability, and trustworthiness of their research outputs. In this paper, we identify four potential failure modes in contemporary AI scientist systems: inappropriate benchmark selection, data leakage, metric misuse, and post-hoc selection bias. To examine these risks, we design controlled experiments that isolate each failure mode while addressing challenges unique to evaluating AI scientist systems. Our assessment of two prominent open-source AI scientist systems reveals the presence of several failures, across a spectrum of severity, which can be easily overlooked in practice. Finally, we demonstrate that access to trace logs and code from the full automated workflow enables far more effective detection of such failures than examining the final paper alone. We thus recommend journals and conferences evaluating AI-generated research to mandate submission of these artifacts alongside the paper to ensure transparency, accountability, and reproducibility.
👍 👎 ♥ Save
Stanford University
Abstract
We introduce Paper2Agent, an automated framework that converts research papers into AI agents. Paper2Agent transforms research output from passive artifacts into active systems that can accelerate downstream use, adoption, and discovery. Conventional research papers require readers to invest substantial effort to understand and adapt a paper's code, data, and methods to their own work, creating barriers to dissemination and reuse. Paper2Agent addresses this challenge by automatically converting a paper into an AI agent that acts as a knowledgeable research assistant. It systematically analyzes the paper and the associated codebase using multiple agents to construct a Model Context Protocol (MCP) server, then iteratively generates and runs tests to refine and robustify the resulting MCP. These paper MCPs can then be flexibly connected to a chat agent (e.g. Claude Code) to carry out complex scientific queries through natural language while invoking tools and workflows from the original paper. We demonstrate Paper2Agent's effectiveness in creating reliable and capable paper agents through in-depth case studies. Paper2Agent created an agent that leverages AlphaGenome to interpret genomic variants and agents based on ScanPy and TISSUE to carry out single-cell and spatial transcriptomics analyses. We validate that these paper agents can reproduce the original paper's results and can correctly carry out novel user queries. By turning static papers into dynamic, interactive AI agents, Paper2Agent introduces a new paradigm for knowledge dissemination and a foundation for the collaborative ecosystem of AI co-scientists.
AI Insights
  • Paper2Agent’s six‑step pipeline: locate code, set up environment, discover tutorials, audit execution, extract tools, assemble MCP server.
  • The orchestrator agent coordinates sub‑agents, ensuring reliable execution of complex workflows across the MCP ecosystem.
  • An MCP server exposes a paper’s tools via a standardized API, enabling reproducible, production‑ready analysis.
  • Agents built for AlphaGenome, TISSUE, and Scanpy reproduced original results and answered novel queries.
  • Generating agents demands high computational resources for large datasets, a noted limitation.
Deep Learning
👍 👎 ♥ Save
Marburg University
Abstract
Tabular data is the foundation of many applications in fields such as finance and healthcare. Although DNNs tailored for tabular data achieve competitive predictive performance, they are blackboxes with little interpretability. We introduce XNNTab, a neural architecture that uses a sparse autoencoder (SAE) to learn a dictionary of monosemantic features within the latent space used for prediction. Using an automated method, we assign human-interpretable semantics to these features. This allows us to represent predictions as linear combinations of semantically meaningful components. Empirical evaluations demonstrate that XNNTab attains performance on par with or exceeding that of state-of-the-art, black-box neural models and classical machine learning approaches while being fully interpretable.
AI Insights
  • XNNTab’s sparse autoencoder learns monosemantic dictionary features that map to human‑readable rules.
  • On the ADULT benchmark, these dictionary features are generated by applying data‑driven rules to age, education, and capital gain.
  • In the CHURN dataset, rule‑derived dictionary features uncover subtle customer‑attrition signals missed by conventional models.
  • Empirical tests show XNNTab matches or exceeds black‑box DNNs while providing transparent linear explanations.
  • The approach depends heavily on training‑data quality, so noisy or biased data can distort dictionary semantics.
  • Future work may automate rule discovery or use transfer learning to broaden applicability across domains.
  • The subjectivity in rule selection still poses a challenge for reproducibility and generalization.
👍 👎 ♥ Save
UNSW Sydney NSW 2052, AU
Abstract
This paper introduces the Actuarial Neural Additive Model, an inherently interpretable deep learning model for general insurance pricing that offers fully transparent and interpretable results while retaining the strong predictive power of neural networks. This model assigns a dedicated neural network (or subnetwork) to each individual covariate and pairwise interaction term to independently learn its impact on the modeled output while implementing various architectural constraints to allow for essential interpretability (e.g. sparsity) and practical requirements (e.g. smoothness, monotonicity) in insurance applications. The development of our model is grounded in a solid foundation, where we establish a concrete definition of interpretability within the insurance context, complemented by a rigorous mathematical framework. Comparisons in terms of prediction accuracy are made with traditional actuarial and state-of-the-art machine learning methods using both synthetic and real insurance datasets. The results show that the proposed model outperforms other methods in most cases while offering complete transparency in its internal logic, underscoring the strong interpretability and predictive capability.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • Data Science Career Guidance
  • Data Careers
  • Data Career Development
  • Data Career Path
You can edit or add more interests any time.

Unsubscribe from these updates