Hi!

Your personalized paper recommendations for 01 to 05 December, 2025.
Data Careers
International Hellenic Un
Paper visualization
Rate image: πŸ‘ πŸ‘Ž
Abstract
This study investigates the potential of language models to improve the classification of labor market information by linking job vacancy texts to two major European frameworks: the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy and the European Qualifications Framework (EQF). We examine and compare two prominent methodologies from the literature: Sentence Linking and Entity Linking. In support of ongoing research, we release an open-source tool, incorporating these two methodologies, designed to facilitate further work on labor classification and employment discourse. To move beyond surface-level skill extraction, we introduce two annotated datasets specifically aimed at evaluating how occupations and qualifications are represented within job vacancy texts. Additionally, we examine different ways to utilize generative large language models for this task. Our findings contribute to advancing the state of the art in job entity extraction and offer computational infrastructure for examining work, skills, and labor market narratives in a digitally mediated economy. Our code is made publicly available: https://github.com/tabiya-tech/tabiya-livelihoods-classifier
AI Summary
  • The paper presents a study on entity recognition, similarity, and extraction in the job market domain using large language models. [2]
  • Entity Recognition (ER): The task of identifying entities such as names, locations, and organizations in a given text. [1]
Waseda University
Abstract
Organizations struggle to share data across departments that have adopted different data analytics platforms. If n datasets must serve m environments, up to n*m replicas can emerge, increasing inconsistency and cost. Traditional warehouses copy data into vendor-specific stores; cross-platform access is hard. This study proposes the Enterprise Data Science Platform (EDSP), which builds on data lakehouse architecture and follows a Write-Once, Read-Anywhere principle. EDSP enables federated data access for multi-query engine environments, targeting data science workloads with periodic data updates and query response times ranging from seconds to minutes. By providing centralized data management with federated access from multiple query engines to the same data sources, EDSP eliminates data duplication and vendor lock-in inherent in traditional data warehouses. The platform employs a four-layer architecture: Data Preparation, Data Store, Access Interface, and Query Engines. This design enforces separation of concerns and reduces the need for data migration when integrating additional analytical environments. Experimental results demonstrate that major cloud data warehouses and programming environments can directly query EDSP-managed datasets. We implemented and deployed EDSP in production, confirming interoperability across multiple query engines. For data sharing across different analytical environments, EDSP achieves a 33-44% reduction in operational steps compared with conventional approaches requiring data migration. Although query latency may increase by up to a factor of 2.6 compared with native tables, end-to-end completion times remain on the order of seconds, maintaining practical performance for analytical use cases. Based on our production experience, EDSP provides practical design guidelines for addressing the data-silo problem in multi-query engine environments.
AI Summary
  • { "title": "Enterprise Data Science Platform (EDSP)", "description": "A unified data management architecture that addresses data management challenges in multi-query engine environments." } { "term": "Write-Once, Read-Anywhere", "definition": "A principle that enables data to be written once and read from multiple query engines without replication or duplication." } { "title": "EDSP Demonstrates Practical Solution to Data Silos in Multi-Query Engine Enterprises" , "description": "The Enterprise Data Science Platform (EDSP) demonstrates that the Write-Once, Read-Anywhere principle can be realized in production environments, offering a practical solution to the long-standing problem of data silos in multi-query engine enterprises." } { "title": "Limited Performance Validation" , "description": "Future work includes performance validation on TB-scale datasets." } { "title": "Data Lake Architectures and Metadata Management" , "description": "The paper references a study on data lake architectures and metadata management, highlighting the importance of metadata in data sharing across heterogeneous query engines." } The paper proposes the Enterprise Data Science Platform (EDSP), a unified data management architecture grounded in the Write-Once, Read-Anywhere principle, to address data management challenges in multi-query engine environments. [2]
Data Science Career Advice
UnB
Abstract
Data science initiatives frequently exhibit high failure rates, driven by technical constraints, organizational limitations and insufficient risk management practices. Challenges such as low data maturity, lack of governance, misalignment between technical and business teams, and the absence of structured mechanisms to address ethical and sociotechnical risks have been widely identified in the literature. In this context, the purpose of this study is to conduct a comparative analysis of the main risk management methodologies applied to data science projects, aiming to identify, classify, and synthesize their similarities, differences and existing gaps. An integrative literature review was performed using indexed databases and a structured protocol for selection and content analysis. The study examines widely adopted risk management standards ISO 31000, PMBOK Risk Management and NIST RMF, as well as frameworks specific to data science workflows, such as CRISP DM and the recently proposed DS EthiCo RMF, which incorporates ethical and sociotechnical dimensions into the project life cycle. The findings reveal that traditional approaches provide limited coverage of emerging risks, whereas contemporary models propose multidimensional structures capable of integrating ethical oversight, governance and continuous monitoring. As a contribution, this work offers theoretical support for the development of hybrid frameworks that balance technical efficiency, organizational alignment and responsible data practices, while highlighting research gaps that can guide future investigations.
AI Summary
  • Data Science: The study of extracting insights from large datasets using various techniques, including machine learning, statistics, and visualization. [3]
  • A systematic literature review was conducted to analyze the application of risk management frameworks in data science projects. [2]

We did not find tons of content matching your interests we've included some additional topics that are popular. Also be aware that if the topics is not present in arxiv we wont be able to recommend it.

AI Agents
IBM
Abstract
The rapid shift from stateless large language models (LLMs) to autonomous, goal-driven agents raises a central question: When is agentic AI truly necessary? While agents enable multi-step reasoning, persistent memory, and tool orchestration, deploying them indiscriminately leads to higher cost, complexity, and risk. We present STRIDE (Systematic Task Reasoning Intelligence Deployment Evaluator), a framework that provides principled recommendations for selecting between three modalities: (i) direct LLM calls, (ii) guided AI assistants, and (iii) fully autonomous agentic AI. STRIDE integrates structured task decomposition, dynamism attribution, and self-reflection requirement analysis to produce an Agentic Suitability Score, ensuring that full agentic autonomy is reserved for tasks with inherent dynamism or evolving context. Evaluated across 30 real-world tasks spanning SRE, compliance, and enterprise automation, STRIDE achieved 92% accuracy in modality selection, reduced unnecessary agent deployments by 45%, and cut resource costs by 37%. Expert validation over six months in SRE and compliance domains confirmed its practical utility, with domain specialists agreeing that STRIDE effectively distinguishes between tasks requiring simple LLM calls, guided assistants, or full agentic autonomy. This work reframes agent adoption as a necessity-driven design decision, ensuring autonomy is applied only when its benefits justify the costs.
AI Summary
  • The framework can be used in conjunction with existing benchmarks to evaluate the performance of agentic AI systems. [3]
  • Future extensions to STRIDE will include multimodal tasks, reinforcement learning for weight tuning, and validation at enterprise scale. [3]
  • STRIDE's scoring functions are heuristic by design, striking a balance between interpretability and generality. [3]
  • STRIDE (Systematic Task Reasoning Intelligence Deployment Evaluator) is a framework that determines when tasks require agentic AI, AI assistants, or simple LLM calls. [2]
  • STRIDE integrates five analytical dimensions: structured task decomposition, dynamic reasoning and tool-interaction scoring, dynamism attribution analysis, self-reflection requirement assessment, and agentic suitability inference. [1]
ulamai
Abstract
We extend the moduli-theoretic framework of psychometric batteries to the domain of dynamical systems. While previous work established the AAI capability score as a static functional on the space of agent representations, this paper formalizes the agent as a flow $ν_r$ parameterized by computational resource $r$, governed by a recursive Generator-Verifier-Updater (GVU) operator. We prove that this operator generates a vector field on the parameter manifold $Θ$, and we identify the coefficient of self-improvement $κ$ as the Lie derivative of the capability functional along this flow. The central contribution of this work is the derivation of the Variance Inequality, a spectral condition that is sufficient (under mild regularity) for the stability of self-improvement. We show that a sufficient condition for $κ> 0$ is that, up to curvature and step-size effects, the combined noise of generation and verification must be small enough. We then apply this formalism to unify the recent literature on Language Self-Play (LSP), Self-Correction, and Synthetic Data bootstrapping. We demonstrate that architectures such as STaR, SPIN, Reflexion, GANs and AlphaZero are specific topological realizations of the GVU operator that satisfy the Variance Inequality through filtration, adversarial discrimination, or grounding in formal systems.
AI Summary
  • The GVU framework is used to analyze the stability of self-improvement in AI systems. [3]
  • The Variance Inequality (Theorem 4.1) provides a sufficient condition for stable self-improvement, requiring a high Signal-to-Noise Ratio (SNR) for both the generator and the verifier. [3]
  • AI slop event at parameter ΞΈ AI slop mass and slop regime The paper provides a framework for understanding the stability of self-improvement in AI systems, highlighting the importance of high SNR for both generators and verifiers. [3]
  • The paper defines AI slop as a region where the internal Verifier ranks outputs among its top fraction, but they actually lie in the bottom fraction of the true battery score. [2]
  • The paper introduces the Generalized Verifier-Generator Update (GVU) framework, which models the interaction between a generator and its verifier. [1]
AI and Society
Ecole normale suprieure
Abstract
Using the example of the film 2001: A Space Odyssey, this chapter illustrates the challenges posed by an AI capable of making decisions that go against human interests. But are human decisions always rational and ethical? In reality, the cognitive decision-making process is influenced by cognitive biases that affect our behavior and choices. AI not only reproduces these biases, but can also exploit them, with the potential to shape our decisions and judgments. Behind IA algorithms, there are sometimes individuals who show little concern for fundamental rights and impose their own rules. To address the ethical and societal challenges raised by AI and its governance, the regulation of digital platforms and education are keys levers. Regulation must reflect ethical, legal, and political choices, while education must strengthen digital literacy and teach people to make informed and critical choices when facing digital technologies.
Polytechnic Institute of
Abstract
This article introduces the concept of the 'dual footprint' as a heuristic device to capture the commonalities and interdependencies between the different impacts of artificial intelligence (AI) on the natural and social surroundings that supply resources for its production and use. Two in-depth case studies, each illustrating international flows of raw materials and of data work services, portray the AI industry as a value chain that spans national boundaries and perpetuates inherited global inequalities. The countries that drive AI development generate a massive demand for inputs and trigger social costs that, through the value chain, largely fall on more peripheral actors. The arrangements in place distribute the costs and benefits of AI unequally, resulting in unsustainable practices and preventing the upward mobility of more disadvantaged countries. The dual footprint grasps how the environmental and social dimensions of the dual footprint emanate from similar underlying socioeconomic processes and geographical trajectories.
AI Summary
  • The carbon (and water) footprints of data centre functioning, model training, and inference mainly occur in countries that lead AI development, such as the United States and France. [3]
  • The supply of data work for countries like the United States and France comes from areas with lower labour costs, including middle- and lower-income countries like Argentina and Madagascar. [3]
  • The 'dual' nature of the footprint is illuminated by the fact that the same country exports both mining products and data work services, with imports flowing towards countries leading the worldwide AI race. [3]
  • AI value chain: The series of activities involved in developing and deploying artificial intelligence systems, from raw materials extraction to software development and deployment. [3]
  • Carbon footprint: The amount of greenhouse gas emissions associated with a particular activity or product. [3]
  • The analysis takes a step back from stricter interpretations of the footprint concept as an accounting method and instead focuses on a bird's eye view, revealing who is impacted by pressure on resources and related effects spread along the AI value chain. [2]
Research Automation with AI
Paper visualization
Rate image: πŸ‘ πŸ‘Ž
Abstract
The advancement in Large Language Models has driven the creation of complex agentic systems, such as Deep Research Agents (DRAs), to overcome the limitations of static Retrieval Augmented Generation (RAG) pipelines in handling complex, multi-turn research tasks. This paper introduces the Static Deep Research Agent (Static-DRA), a novel solution built upon a configurable and hierarchical Tree-based static workflow. The core contribution is the integration of two user-tunable parameters, Depth and Breadth, which provide granular control over the research intensity. This design allows end-users to consciously balance the desired quality and comprehensiveness of the research report against the associated computational cost of Large Language Model (LLM) interactions. The agent's architecture, comprising Supervisor, Independent, and Worker agents, facilitates effective multi-hop information retrieval and parallel sub-topic investigation. We evaluate the Static-DRA against the established DeepResearch Bench using the RACE (Reference-based Adaptive Criteria-driven Evaluation) framework. Configured with a depth of 2 and a breadth of 5, and powered by the gemini-2.5-pro model, the agent achieved an overall score of 34.72. Our experiments validate that increasing the configured Depth and Breadth parameters results in a more in-depth research process and a correspondingly higher evaluation score. The Static-DRA offers a pragmatic and resource-aware solution, empowering users with transparent control over the deep research process. The entire source code, outputs and benchmark results are open-sourced at https://github.com/SauravP97/Static-Deep-Research/
Stanford University
Abstract
The energy transition through increased electrification has put the worlds attention on critical mineral exploration Even with increased investments a decrease in new discoveries has taken place over the last two decades Here I propose a solution to this problem where AI is implemented as the enabler of a rigorous scientific method for mineral exploration that aims to reduce cognitive bias and false positives drive down the cost of exploration I propose a new scientific method that is based on a philosophical approach founded on the principles of Bayesianism and falsification In this approach data acquisition is in the first place seen as a means to falsify human generated hypothesis Decision of what data to acquire next is quantified with verifiable metrics and based on rational decision making A practical protocol is provided that can be used as a template in any exploration campaign However in order to make this protocol practical various form of artificial intelligence are needed I will argue that the most important form are one novel unsupervised learning methods that collaborate with domain experts to better understand data and generate multiple competing geological hypotheses and two humanintheloop AI algorithms that can optimally plan various geological geophysical geochemical and drilling data acquisition where uncertainty reduction of geological hypothesis precedes the uncertainty reduction on grade and tonnage
AI Summary
  • Efficacy of information (EI): a metric that quantifies how much future data will reduce uncertainty on average on some quantity of interest. [3]
  • The author advocates for a new scientific method for mineral exploration, focusing on decision-making rather than traditional geophysical inversion. [2]
  • Epistemic uncertainty: the lack of understanding we still have about the nature of orebodies. [1]
AGI: Artificial General Intelligence
ulamai
Abstract
Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoning about generality or autonomous self-improvement. Here we introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space, and agent performance is described by capability functionals over this space. First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance on batteries spanning families of tasks (for example reasoning, planning, tool use and long-horizon control). Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences. This geometry yields determinacy results: dense families of batteries suffice to certify performance on entire regions of task space. Third, we introduce a general Generator-Verifier-Updater (GVU) operator that subsumes reinforcement learning, self-play, debate and verifier-based fine-tuning as special cases, and we define a self-improvement coefficient $ΞΊ$ as the Lie derivative of a capability functional along the induced flow. A variance inequality on the combined noise of generation and verification provides sufficient conditions for $ΞΊ> 0$. Our results suggest that progress toward artificial general intelligence (AGI) is best understood as a flow on moduli of benchmarks, driven by GVU dynamics rather than by scores on individual leaderboards.
AI Summary
  • GVU Dynamics: a formalism that connects static geometry to learning, showing that many contemporary training procedures are special cases of reinforcement learning on the moduli space. [3]
  • Self-Improvement Coefficient ΞΊ: a measure of the rate of change of an agent's capability trajectory over time. [3]
  • autonomous AI scale moduli space of batteries GVU dynamics self-improvement coefficient ΞΊ variance inequality Autonomous AI Scale: a framework for evaluating autonomous AI systems based on performance thresholds on families of batteries. [2]
Deep Learning
National Technical Univer
Abstract
The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and environmental footprints. Sparsity stands out as a critical mechanism for drastically reducing these resource demands. However, its potential remains largely untapped and is not yet fully incorporated in production AI systems. To bridge this gap, this work provides the necessary knowledge and insights for performance engineers keen to get involved in deep learning inference optimization. In particular, in this work we: a) discuss the various forms of sparsity that can be utilized in DNN inference, b) explain how the original dense computations translate to sparse kernels, c) provide an extensive bibliographic review of the state-of-the-art in the implementation of these kernels for CPUs and GPUs, d) discuss the availability of sparse datasets in support of sparsity-related research and development, e) explore the current software tools and frameworks that provide robust sparsity support, and f) present evaluation results of different implementations of the key SpMM and SDDMM kernels on CPU and GPU platforms. Ultimately, this paper aims to serve as a resource for performance engineers seeking to develop and deploy highly efficient sparse deep learning models in productions.
AI Summary
  • The text discusses various aspects of deep learning, including model architecture, training, optimization, and inference. [3]
  • Model Training: The process that makes a DNN learn to perform a specific task, much like a student learns from practice and correction. [3]
  • Batch Training: Instead of feeding individual data points one by one, models are trained on small groups of samples called batches. [3]
  • Training often requires many epochs to fully learn the data’s patterns. [3]
  • The text concludes that deep learning involves various steps from model architecture to inference, and optimization is crucial for efficient deployment of DNNs. [3]
  • The text mentions several deep learning frameworks such as PyTorch, TensorFlow, JAX, and Hugging Face Hub. [3]
  • Deep learning involves various steps from model architecture to inference, and optimization is crucial for efficient deployment of DNNs. [3]
  • But, just like how you need to practice and get better at recognizing cats, the computer needs to be trained and optimized so that it can perform well in real-world situations. [3]
  • Epochs: A single pass through the entire dataset is called an epoch. [2]
  • The text does not provide a clear explanation of the differences between various model representations such as ONNX, TorchScript, TensorFlow SavedModel / GraphDef, etc. [1]
EPFL
Abstract
In this work, we investigate the potential of weights to serve as effective representations, focusing on neural fields. Our key insight is that constraining the optimization space through a pre-trained base model and low-rank adaptation (LoRA) can induce structure in weight space. Across reconstruction, generation, and analysis tasks on 2D and 3D data, we find that multiplicative LoRA weights achieve high representation quality while exhibiting distinctiveness and semantic structure. When used with latent diffusion models, multiplicative LoRA weights enable higher-quality generation than existing weight-space methods.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • Data Career Development
  • Data Science Career Guidance
  • Data Career Path
You can edit or add more interests any time.