Hi!
Your personalized paper recommendations for 19 to 23 January, 2026.
University of California, Berkeley
AI Insights - The labor share of income is a crucial indicator of economic performance and social welfare. (ML: 0.96)👍👎
- Median Wage: The middle value of wages earned by workers in a given period. (ML: 0.93)👍👎
- Labor Share of Income: The percentage of national income earned by workers in a given period. (ML: 0.91)👍👎
- Monetary policy can significantly impact the labor share of income, but its effects are often complex and nuanced. (ML: 0.91)👍👎
- The proposed look-through calculation method provides a more accurate and transparent way to measure the labor share of income relative to GDP. (ML: 0.90)👍👎
- A new look-through calculation method is proposed to measure the labor share of income relative to GDP, using median wage, broad money supply (M2), and labor participation as parameters. (ML: 0.90)👍👎
- Look-through Calculation Method: A new approach to measuring the labor share of income relative to GDP, using median wage, broad money supply (M2), and labor participation as parameters. (ML: 0.90)👍👎
- The Central Bank's setting mechanism for distribution ratios needs to be made more transparent and subject to public oversight to ensure that monetary policy is effective and fair. (ML: 0.85)👍👎
- The Central Bank's setting mechanism for distribution ratios has become increasingly opaque, leading to concerns about information asymmetry and lack of transparency. (ML: 0.84)👍👎
- Broad Money Supply (M2): The total amount of money circulating in an economy, including currency and deposits. (ML: 0.73)👍👎
Abstract
Modern macroeconomic monetary theory suggests that the labor share of income has effectively become a core macroe-conomic parameter anchored by top policymakers through Open Market Operations (OMO). However, the setting of this parameter remains a subject of intense economic debate. This paper provides a detailed summary of these controversies, analyzes the scope of influence exerted by market agents other than the top policymakers on the labor share, and explores the rationality of its setting mechanism.
Why we are recommending this paper?
Due to your Interest in Economic Inequality
This paper directly addresses economic inequality by examining how monetary policy influences income distribution, a core interest for the user. The focus on policy mechanisms offers a valuable perspective on the drivers of inequality.
Universidade Federal do Rio Grande do Sul
AI Insights - The model assumes fixed wages, which may not accurately reflect real-world market conditions. (ML: 0.95)👍👎
- Modifications to the Market revenue rule do not change GDP or market value but only alter how it is distributed, leading to a more subtle effect on overall system inequality. (ML: 0.95)👍👎
- The results suggest that economic growth alone does not necessarily lead to greater equality, and that other factors such as labor income and monopoly power play a crucial role in shaping wealth distribution. (ML: 0.94)👍👎
- The model of capitalist economy studied in this paper shows that economic growth does not necessarily lead to a more equitable distribution of wealth. (ML: 0.88)👍👎
- Increasing GDP through modifications to the Expenditure rule leads to higher inequality and greater monopoly power, as capitalists' income increases and the labor share declines. (ML: 0.85)👍👎
- When the bargaining intensity parameter α is increased, unemployment also increases, and for the range 1% ≤ α ≤ 5%, the same effect observed for fixed wages is reproduced: as the average wage increases, inequality declines. (ML: 0.84)👍👎
- Expenditure rule: a set of rules governing how agents spend their income Market revenue rule: a set of rules governing how market value is distributed among capitalists Bargaining intensity parameter α: a measure of the strength of bargaining between capitalists and workers The model highlights the importance of considering the distribution of wealth within the capitalist class, as well as overall system inequality. (ML: 0.84)👍👎
Abstract
The impact of rising consumption on wealth inequality remains an open question. Here we revisit and extend the Social Architecture of Capitalism agent-based model proposed by Ian Wright, which reproduces stylized facts of wealth and income distributions. In a previous study, we demonstrated that the macroscopic behavior of the model is predominantly governed by a single dimensionless parameter, the ratio between average wealth per capita and mean salary, denoted by R. The shape of the wealth distribution, the emergence of a two-class structure, and the level of inequality -- summarized by the Gini index -- were found to depend mainly on R, with inequality increasing as R increases. In the present work, we examine the robustness of this result by relaxing some simplifying assumptions of the model. We first allow transactions such as purchases, salary payments, and revenue collections to occur with different frequencies, reflecting the heterogeneous temporal dynamics of real economies. We then impose limits on the maximum fractions of wealth that agents can spend or collect at each step, constraining the amplitude of individual transactions. We find that the dependence of the inequality on R remains qualitatively robust, although the detailed distribution patterns are affected by relative frequencies and transaction limits. Finally, we analyze a further variant of the model with adaptive wages emerging endogenously from the dynamics, showing that self-organized labor-market feedback can either stabilize or amplify inequality depending on macroeconomic conditions.
Why we are recommending this paper?
Due to your Interest in Social Inequality
This research investigates the impact of consumption patterns on wealth inequality, aligning directly with the user’s interest in understanding the mechanisms driving economic disparities. The use of an agent-based model provides a robust framework for analyzing this complex relationship.
Mercor
AI Insights - McNemar's exact test: A statistical test used to compare the performance of two related samples. (ML: 0.97)👍👎
- Pass@1: The proportion of tasks completed correctly by an agent. (ML: 0.95)👍👎
- Significance tests using McNemar's exact test with Benjamini-Hochberg correction show that Kimi-K2-Thinking significantly outperforms Gemini-3-flash-preview (p=5.68e-23), GPT-OSS-120B (p=1.0000), and GPT-5.2 (p=7.29e-10). (ML: 0.95)👍👎
- The APEX–Agents benchmark highlights the importance of developing AI models that can perform complex tasks in various professional domains, with a focus on toolbelt approaches, context window management, and intentional termination. (ML: 0.94)👍👎
- Benjamini-Hochberg correction: A method for controlling false discovery rate in multiple testing. (ML: 0.94)👍👎
- The APEX–Agents benchmark is a comprehensive evaluation of AI models' ability to perform complex tasks in various professional domains. (ML: 0.93)👍👎
- The most frequently used tools by agents are code execution (256,000), add tool to the toolbelt (200,000), list files in the file system (163,874), read spreadsheet tab (127,000), and search the PDF (86,000). (ML: 0.93)👍👎
- The benchmark consists of 227 tasks, covering finance, law, and management consulting, with each task requiring the model to complete a specific task using a set of provided tools. (ML: 0.89)👍👎
- The top-performing models on the APEX–Agents benchmark are Gemini 3 Flash, GPT-5.2, and Kimi K2 Thinking, with Pass@1 scores of 0.555, 0.497, and 0.391 respectively. (ML: 0.88)👍👎
- ReAct paradigm: A toolbelt approach where reasoning and acting are interleaved in a single loop. (ML: 0.79)👍👎
Abstract
We introduce the AI Productivity Index for Agents (APEX-Agents), a benchmark for assessing whether AI agents can execute long-horizon, cross-application tasks created by investment banking analysts, management consultants, and corporate lawyers. APEX-Agents requires agents to navigate realistic work environments with files and tools. We test eight agents for the leaderboard using Pass@1. Gemini 3 Flash (Thinking=High) achieves the highest score of 24.0%, followed by GPT-5.2 (Thinking=High), Claude Opus 4.5 (Thinking=High), and Gemini 3 Pro (Thinking=High). We open source the APEX-Agents benchmark (n=480) with all prompts, rubrics, gold outputs, files, and metadata. We also open-source Archipelago, our infrastructure for agent execution and evaluation.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
The development of AI agents designed to tackle complex tasks, as presented in this Mercor paper, is highly relevant to understanding how automation and intelligent systems might exacerbate or mitigate inequality. It offers a novel approach to analyzing labor market dynamics.
Renmin University of China
AI Insights - Agentic capabilities: Fundamental skills like exploration, tool use, and self-verification. (ML: 0.96)👍👎
- Current results have limitations, such as generated videos being limited to simple animations and composed music lacking expressiveness and creativity. (ML: 0.95)👍👎
- The agentic capability benchmark provided by LLM-in-Sandbox can be used to evaluate models' ability to leverage computational environments. (ML: 0.94)👍👎
- Strong LLMs exhibit emergent capabilities to leverage the sandbox environment for general tasks. (ML: 0.92)👍👎
- LLM-in-Sandbox can be used as an agentic capability benchmark, measuring fundamental skills like exploration, tool use, and self-verification. (ML: 0.91)👍👎
- The metric ∆=LLM-in-Sandbox−LLM offers a meaningful indicator of a model's ability to leverage computational environments. (ML: 0.90)👍👎
- LLM-in-Sandbox has the potential to become the default paradigm for serving LLMs, enabling them to perform general tasks and produce actual outputs rather than text descriptions. (ML: 0.88)👍👎
- LLM-in-Sandbox: A paradigm that grants LLMs access to a virtual computer and enables them to leverage this environment for general tasks. (ML: 0.85)👍👎
- Sandbox-native model training: Training models to interact with the sandbox environment as a first-class objective. (ML: 0.82)👍👎
- LLM-in-Sandbox is a paradigm that grants LLMs access to a virtual computer and enables them to leverage this environment for general tasks. (ML: 0.80)👍👎
Abstract
We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
This Renmin University paper explores the potential of LLMs to exhibit general intelligence, which could have significant implications for understanding and addressing systemic inequalities. The focus on agentic capabilities is particularly pertinent.
Florida Institute of Technology
AI Insights - However, AI also has its limitations and challenges, including the issue of impostor bias, where AI systems may mistakenly identify a legitimate file or activity as malicious. (ML: 0.96)👍👎
- Limited accuracy: AI systems may not always accurately identify malicious files or activities. (ML: 0.96)👍👎
- Impostor bias: AI systems may mistakenly identify a legitimate file or activity as malicious. (ML: 0.94)👍👎
- To address these challenges, researchers are working on developing more accurate and reliable AI systems for digital forensics. (ML: 0.93)👍👎
- To address these challenges, researchers are working on developing more accurate and reliable AI systems for digital forensics. (ML: 0.93)👍👎
- Artificial Intelligence (AI): A type of computer system that can perform tasks that would typically require human intelligence, such as learning, problem-solving, and decision-making. (ML: 0.92)👍👎
- Digital Forensics: The process of collecting, analyzing, and preserving evidence related to cybercrime and other digital crimes. (ML: 0.92)👍👎
- The use of artificial intelligence (AI) in digital forensics is becoming increasingly important as cybercrime continues to grow. (ML: 0.92)👍👎
- The use of AI in digital forensics is becoming increasingly important, but it also has its limitations and challenges. (ML: 0.90)👍👎
- The use of AI in digital forensics is becoming increasingly important, but it also has its limitations and challenges. (ML: 0.90)👍👎
Abstract
In an era where cyber threats are rapidly evolving, the reliability of cyber forensic analysis has become increasingly critical for effective digital investigations and cybersecurity responses. AI agents are being adopted across digital forensic practices due to their ability to automate processes such as anomaly detection, evidence classification, and behavioral pattern recognition, significantly enhancing scalability and reducing investigation timelines. However, the characteristics that make AI indispensable also introduce notable risks. AI systems, often trained on biased or incomplete datasets, can produce misleading results, including false positives and false negatives, thereby jeopardizing the integrity of forensic investigations. This study presents a meticulous comparative analysis of the effectiveness of the most used AI agent, ChatGPT, and human forensic investigators in the realm of cyber forensic analysis. Our research reveals critical limitations within AI-driven approaches, demonstrating scenarios in which sophisticated or novel cyber threats remain undetected due to the rigid pattern-based nature of AI systems. Conversely, our analysis highlights the crucial role that human forensic investigators play in mitigating these risks. Through adaptive decision-making, ethical reasoning, and contextual understanding, human investigators effectively identify subtle anomalies and threats that may evade automated detection systems. To reinforce our findings, we conducted comprehensive reliability testing of forensic techniques using multiple cyber threat scenarios. These tests confirmed that while AI agents significantly improve the efficiency of routine analyses, human oversight remains crucial in ensuring accuracy and comprehensiveness of the results.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
The application of AI agents in cyber forensic analysis, as detailed in this Florida Institute of Technology paper, offers a lens through which to examine how technology might impact the detection and response to inequality-related crimes and vulnerabilities.
University of Sussex
AI Insights - Higher RCP values indicate greater climate change impacts, while lower values suggest less severe effects. (ML: 0.93)👍👎
- Some countries have higher RCP values, such as Mexico (0.14955) and Nigeria (0.05318), suggesting they are more vulnerable to climate change due to their high greenhouse gas emissions. (ML: 0.92)👍👎
- The data also includes some small island nations, such as the Marshall Islands (0.00003) and Maldives (0.00029), which may be more susceptible to climate-related issues due to their geographical location. (ML: 0.92)👍👎
- Country codes: Two-letter country codes, such as 'US' for the United States or 'CN' for China. (ML: 0.90)👍👎
- The data provides a snapshot of countries' vulnerability to climate change based on their greenhouse gas emissions. (ML: 0.90)👍👎
- Other countries have lower RCP values, like Liechtenstein (0.00033) and Monaco (0.00042), indicating they are less affected by climate change. (ML: 0.89)👍👎
- Values: RCP values represent the concentration of carbon dioxide in the atmosphere and are used to estimate climate change impacts. (ML: 0.88)👍👎
- The RCP values range from 0.03 to 8.55, indicating varying levels of carbon dioxide emissions and climate change impacts across different regions. (ML: 0.85)👍👎
- RCP: Representative Concentration Pathway - a measure of greenhouse gas emissions used to estimate climate change impacts. (ML: 0.84)👍👎
- The data appears to be a list of countries with their corresponding RCP (Representative Concentration Pathway) values, which are used to estimate the concentration of greenhouse gases in the atmosphere. (ML: 0.80)👍👎
Abstract
We estimate the national social cost of carbon using a recent meta-analysis of the total impact of climate change and a standard integrated assessment model. The average social cost of carbon closely follows per capita income, the national social cost of carbon the size of the population. The national social cost of carbon measures self-harm. Net liability is defined as the harm done by a country's emissions on other countries minus the harm done to a country by other countries' emissions. Net liability is positive in middle-income, carbon-intensive countries. Poor and rich countries would be compensated because their current emissions are relatively low, poor countries additionally because they are vulnerable.
Why we are recommending this paper?
Due to your Interest in Social Inequality
International Institute of Information Technology, Hyderabad
AI Insights - This could lead to difficulties in applying the result to more general cases. (ML: 0.98)👍👎
- Let G be a graph with vertex set V(G) and edge set E(G). (ML: 0.93)👍👎
- The problem statement involves deriving a bound on the number of edges in a graph G, given certain properties of the graph. (ML: 0.90)👍👎
- A d-set is defined as a subset S ⊆ V(G) such that |S| = d. (ML: 0.89)👍👎
- The final answer is a bound on the number of edges in G, expressed in terms of |V(G)| and other parameters. (ML: 0.88)👍👎
- The solution relies heavily on the properties of the graph G, which may not be explicitly stated. (ML: 0.88)👍👎
- This leads to a bound on the number of edges in G. (ML: 0.85)👍👎
- The parameter t_d is defined as the number of d-sets in G, i.e., t_d = |F_d|. (ML: 0.84)👍👎
- The goal is to find an upper limit for |E(G)| based on the size of the vertex set V(G). (ML: 0.81)👍👎
- The solution involves using Shearer's lemma and the concept of d-sets to derive an inequality involving the entropy of the random variable Xn. (ML: 0.81)👍👎
- The final answer is derived by manipulating the inequality obtained from Shearer's lemma, which ultimately yields an expression for |E(G)| in terms of |V(G)| and other parameters. (ML: 0.79)👍👎
- The solution involves using Shearer's lemma to derive an inequality involving the entropy of Xn, which ultimately leads to the desired bound on |E(G)|. (ML: 0.74)👍👎
- The parameters m_d and ℓ_d are defined as follows: m_d = 2^d and ℓ_d = n - d + 1. (ML: 0.72)👍👎
Abstract
It is well known that there is a strong connection between entropy inequalities and submodularity, since the entropy of a collection of random variables is a submodular function. Unifying frameworks for information inequalities arising from submodularity were developed by Madiman and Tetali (2010) and Sason (2022). Madiman and Tetali (2010) established strong and weak fractional inequalities that subsume classical results such as Han's inequality and Shearer's lemma. Sason (2022) introduced a convex-functional framework for generalizing Han's inequality, and derived unified inequalities for submodular and supermodular functions. In this work, we build on these frameworks and make three contributions. First, we establish convex-functional generalizations of the strong and weak Madiman and Tetali inequalities for submodular functions. Second, using a special case of the strong Madiman-Tetali inequality, we derive a new Loomis-Whitney-type projection inequality for finite point sets in $\mathbb{R}^d$, which improves upon the classical Loomis-Whitney bound by incorporating slice-level structural information. Finally, we study an extremal graph theory problem that recovers and extends the previously known results of Sason (2022) and Boucheron et al., employing Shearer's lemma in contrast to the use of Han's inequality in those works.
Why we are recommending this paper?
Due to your Interest in Inequality
University of Amsterdam
AI Insights - The consistency requirement proposed by the authors is not just statistical frequency but having context-relative grounds for expecting further outputs of comparable novelty and value. (ML: 0.97)👍👎
- The concept of creativity should remain flexible across different domains of creativity, and the indeterminacy of the consistency requirement allows for this flexibility. (ML: 0.96)👍👎
- The consistency requirement proposed by the authors is a more inclusive and functional approach to defining creativity, allowing for non-human natural processes to be labelled 'creative'. (ML: 0.96)👍👎
- The consistency requirement proposed by the authors may not be applicable in all contexts, especially where authenticity conditions the value of the products being generated or examined. (ML: 0.94)👍👎
- The IAC has functional value in specific local contexts, such as cognitive science, jurisprudence, and certain domains of creative practice where authenticity conditions the value of the products being generated or examined. (ML: 0.94)👍👎
- New Standard Definition (NSD) of Creativity: An object is creative if it is novel, valuable, and the product of a system that can consistently generate novel and valuable objects. (ML: 0.92)👍👎
- The NSD states that an object is creative if it is novel, valuable, and the product of a system that can consistently generate novel and valuable objects. (ML: 0.91)👍👎
- The article proposes a new standard definition (NSD) of creativity, which drops the intentional agency condition (IAC) as a necessary condition of creativity. (ML: 0.89)👍👎
- The article does not provide a comprehensive account of where the IAC ought to be applied. (ML: 0.89)👍👎
- The IAC should be excluded from our definition of the genus of creativity but retained as a means of distinguishing between certain species of creativity. (ML: 0.89)👍👎
- Intentional Agency Condition (IAC): A necessary condition of creativity that requires an agent to intentionally endeavor to express themselves. (ML: 0.82)👍👎
Abstract
Many theorists of creativity maintain that intentional agency is a necessary condition of creativity. We argue that this requirement, which we call the Intentional Agency Condition (IAC), should be rejected as a general condition of creativity, while retaining its relevance in specific contexts. We show that recent advances in generative AI have rendered the IAC increasingly problematic, both descriptively and functionally. We offer two reasons for abandoning it at the general level. First, we present corpus evidence indicating that authors and journalists are increasingly comfortable ascribing creativity to generative AI, despite its lack of intentional agency. This development places pressure on the linguistic intuitions that have traditionally been taken to support the IAC. Second, drawing on the method of conceptual engineering, we argue that the IAC no longer fulfils its core social function. Rather than facilitating the identification and encouragement of reliable sources of novel and valuable products, it now feeds into biases that distort our assessments of AI-generated outputs. We therefore propose replacing the IAC with a consistency requirement, according to which creativity tracks the reliable generation of novel and valuable products. Nonetheless, we explain why the IAC should be retained in specific local domains.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Sony
AI Insights - The paper concludes that current XAI methods are based on flawed assumptions and lack a clear understanding of the relationship between humans and machines. (ML: 0.98)👍👎
- Apparatuses: The technical tools, methods, and narratives that constitute what is made intelligible and what is excluded from intelligibility in XAI practices. (ML: 0.97)👍👎
- The paper critiques the current state of Explainable AI (XAI) methods, arguing that they are based on flawed assumptions and lack a clear understanding of the relationship between humans and machines. (ML: 0.97)👍👎
- The paper highlights the limitations of current XAI methods, including their reliance on simplifications and abstractions that erase the original system, and their failure to account for human-machine incommensurability. (ML: 0.96)👍👎
- The authors propose an agential realist approach to XAI, which views interpretation as a relational co-production of interpretable phenomena through intra-actions between human and non-human agencies. (ML: 0.96)👍👎
- Agential cut: The moment at which an interpretive apparatus enacts a relational co-production of interpretable phenomena through intra-actions between human and non-human agencies. (ML: 0.96)👍👎
- Agential realism: A philosophical framework that views knowledge as an intra-action between human and non-human agencies. (ML: 0.94)👍👎
- Intra-action: The process by which human and non-human agencies co-produce interpretable phenomena through their entanglements. (ML: 0.92)👍👎
- The authors suggest that a diffractive optic offers a more philosophically robust reading of XAI practices, one that acknowledges the emergent nature of interpretation and the importance of situated contexts. (ML: 0.90)👍👎
- This approach challenges the dominant reflectivity and refractivity optics in XAI, which assume that meaning pre-exists the practices and beings that produce it. (ML: 0.75)👍👎
Abstract
Explainable AI (XAI) is frequently positioned as a technical problem of revealing the inner workings of an AI model. This position is affected by unexamined onto-epistemological assumptions: meaning is treated as immanent to the model, the explainer is positioned outside the system, and a causal structure is presumed recoverable through computational techniques. In this paper, we draw on Barad's agential realism to develop an alternative onto-epistemology of XAI. We propose that interpretations are material-discursive performances that emerge from situated entanglements of the AI model with humans, context, and the interpretative apparatus. To develop this position, we read a comprehensive set of XAI methods through agential realism and reveal the assumptions and limitations that underpin several of these methods. We then articulate the framework's ethical dimension and propose design directions for XAI interfaces that support emergent interpretation, using a speculative text-to-music interface as a case study.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Stanford University
AI Insights - LLM: Large Language Model RL: Reinforcement Learning ML engineering tasks: Machine learning tasks that heavily depend on feature engineering and hyper-parameter tuning rather than algorithm development. (ML: 0.97)👍👎
- The paper demonstrates the feasibility and potential of automated execution feedback loops in LLM research problems, but highlights remaining limitations that need to be addressed. (ML: 0.96)👍👎
- Execution grounding for code: The idea of learning from execution feedback in the code generation domain. (ML: 0.96)👍👎
- Future work should focus on improving generalizability testing, exploring richer learning signals from execution trajectories, developing more capable execution agents, and incorporating alternative metrics such as idea novelty and interestingness. (ML: 0.95)👍👎
- They find that models tend to converge on simple ideas to improve the average reward but lose diversity and do not improve the upper-bound. (ML: 0.95)👍👎
- The paper presents a large-scale parallel executor for automatically executing model-generated ideas to verify their effectiveness on open-ended LLM research problems. (ML: 0.92)👍👎
- The authors analyze the effectiveness of execution-guided evolutionary search and reinforcement learning with execution rewards. (ML: 0.86)👍👎
- The paper highlights the limitations of current experiments, including a lack of generalizability testing, limited exploration incentives in RL objectives, and noise in the reward signal due to the execution agent's capabilities. (ML: 0.84)👍👎
Abstract
Automated AI research holds great potential to accelerate scientific discovery. However, current LLMs often generate plausible-looking but ineffective ideas. Execution grounding may help, but it is unclear whether automated execution is feasible and whether LLMs can learn from the execution feedback. To investigate these, we first build an automated executor to implement ideas and launch large-scale parallel GPU experiments to verify their effectiveness. We then convert two realistic research problems - LLM pre-training and post-training - into execution environments and demonstrate that our automated executor can implement a large fraction of the ideas sampled from frontier LLMs. We analyze two methods to learn from the execution feedback: evolutionary search and reinforcement learning. Execution-guided evolutionary search is sample-efficient: it finds a method that significantly outperforms the GRPO baseline (69.4% vs 48.0%) on post-training, and finds a pre-training recipe that outperforms the nanoGPT baseline (19.7 minutes vs 35.9 minutes) on pre-training, all within just ten search epochs. Frontier LLMs often generate meaningful algorithmic ideas during search, but they tend to saturate early and only occasionally exhibit scaling trends. Reinforcement learning from execution reward, on the other hand, suffers from mode collapse. It successfully improves the average reward of the ideator model but not the upper-bound, due to models converging on simple ideas. We thoroughly analyze the executed ideas and training dynamics to facilitate future efforts towards execution-grounded automated AI research.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Purdue University
AI Insights - The authors acknowledge that their study is limited by its reliance on a small number of datasets. (ML: 0.99)👍👎
- Domain shift: a phenomenon where the distribution of data in the training set differs from that of the testing set. (ML: 0.98)👍👎
- The development of DSCF highlights the need for large-scale and diverse training data. (ML: 0.97)👍👎
- Recent works have proposed unsupervised domain adaptation frameworks, but their effectiveness beyond the originally reported datasets are yet to be independently evaluated. (ML: 0.95)👍👎
- The results of this benchmarking experiment have shown that classifying test samples that are in-distribution to the training dataset is significantly easier than test samples suffering from distribution shift due to changes in instruments and acquisition conditions, and additional contaminants. (ML: 0.94)👍👎
- Foundation model: a pre-trained model that can be fine-tuned for specific tasks, often using transfer learning. (ML: 0.92)👍👎
- SANet demonstrated the best overall performance across the datasets. (ML: 0.84)👍👎
- The study benchmarks only five architectures and relies on minimal spectral pre-processing. (ML: 0.77)👍👎
- Existing open-source Raman datasets are often restricted in size, chemical diversity or experimental variability. (ML: 0.67)👍👎
- Creating large, curated experimental Raman spectral datasets that span multiple instruments, materials and measurement settings is key to developing a Raman-specific foundation model. (ML: 0.61)👍👎
- Raman spectroscopy: a technique used to analyze the vibrational modes of molecules. (ML: 0.52)👍👎
Abstract
Deep learning classifiers for Raman spectroscopy are increasingly reported to outperform classical chemometric approaches. However their evaluations are often conducted in isolation or compared against traditional machine learning methods or trivially adapted vision-based architectures that were not originally proposed for Raman spectroscopy. As a result, direct comparisons between existing deep learning models developed specifically for Raman spectral analysis on shared open-source datasets remain scarce. To the best of our knowledge, this study presents one of the first systematic benchmarks comparing three or more published Raman-specific deep learning classifiers across multiple open-source Raman datasets. We evaluate five representative deep learning architectures under a unified training and hyperparameter tuning protocol across three open-source Raman datasets selected to support standard evaluation, fine-tuning, and explicit distribution-shift testing. We report classification accuracies and macro-averaged F1 scores to provide a fair and reproducible comparison of deep learning models for Raman spectra based classification.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Quantinuum Ltd
AI Insights - L1 Relative Change (L1RC): A measure of the difference between two probability distributions. (ML: 0.98)👍👎
- Signal-to-Noise Ratio (SNR): The ratio of the signal power to the noise power in a system. (ML: 0.93)👍👎
- However, on Real Pauli data the advantage clearly shifts toward the ML-based models, which outperform all baselines in both median L1 relative change and fraction of improved circuits. (ML: 0.93)👍👎
- Deep learning models can learn corrections directly from data gathered during circuit runs, more easily capturing correlations. (ML: 0.88)👍👎
- The best performing models are comparable to the best baseline methods on Simulated data (both Pauli and Random). (ML: 0.87)👍👎
- It is defined as the L1 norm of the difference between the two distributions. (ML: 0.87)👍👎
- The learned mapping from P noisy and circuit features to P ideal captures a richer structure that goes beyond coarse depolarization or measurement-error mitigation. (ML: 0.81)👍👎
- The PERCEIVER model consistently achieves as good or greater median performance than the baseline mitigation techniques for Pauli circuits. (ML: 0.80)👍👎
- The deep learning approaches can generalize across noise regimes, device generations, and circuit families without relying on a predefined noise model. (ML: 0.79)👍👎
- The deep learning approaches can generalize across noise regimes, device generations, and circuit families without relying on a predefined noise model. (ML: 0.79)👍👎
- The baseline methods retain value as lightweight, interpretable mitigation techniques, particularly for structured, low-depth circuits. (ML: 0.61)👍👎
Abstract
We present a systematic investigation of deep learning methods applied to quantum error mitigation of noisy output probability distributions from measured quantum circuits. We compare different architectures, from fully connected neural networks to transformers, and we test different design/training modalities, identifying sequence-to-sequence, attention-based models as the most effective on our datasets. These models consistently produce mitigated distributions that are closer to the ideal outputs when tested on both simulated and real device data obtained from IBM superconducting quantum processing units (QPU) up to five qubits. Across several different circuit depths, our approach outperforms other baseline error mitigation techniques. We perform a series of ablation studies to examine: how different input features (circuit, device properties, noisy output statistics) affect performance; cross-dataset generalization across circuit families; and transfer learning to a different IBM QPU. We observe that generalization performance across similar devices with the same architecture works effectively, without needing to fully retrain models.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
UC Santa Cruz
AI Insights - However, they often require large amounts of data and computational resources to train, which can be a limitation. (ML: 0.98)👍👎
- require large amounts of data and computational resources to train The use of text-to-image diffusion models for image editing has been explored by several researchers, including those who have developed datasets such as Qwen-Image and Omnigen2. (ML: 0.94)👍👎
- Text-to-image diffusion models have become increasingly popular in recent years, with many researchers exploring their potential applications. (ML: 0.93)👍👎
- The unreasonable effectiveness of deep features as a perceptual metric Text-to-image diffusion models have become increasingly popular in recent years, with many researchers exploring their potential applications. (ML: 0.92)👍👎
- Text-to-image diffusion models are a type of artificial intelligence that can generate images from text descriptions. (ML: 0.91)👍👎
- They have many potential applications, but require large amounts of data and computational resources to train. (ML: 0.91)👍👎
- These models can be used for various tasks such as image editing, object removal, and text-to-image synthesis. (ML: 0.89)👍👎
- These models can be used for various tasks such as object removal, text-to-image synthesis, and instruction-guided image editing. (ML: 0.88)👍👎
Abstract
Recent image generation models have shown impressive progress, yet they often struggle to yield controllable and consistent results when users attempt to edit specific elements within an existing image. Layered representations enable flexible, user-driven content creation, but existing approaches often fail to produce layers with coherent compositing relationships, and their object layers typically lack realistic visual effects such as shadows and reflections. To overcome these limitations, we propose LASAGNA, a novel, unified framework that generates an image jointly with its composing layers--a photorealistic background and a high-quality transparent foreground with compelling visual effects. Unlike prior work, LASAGNA efficiently learns correct image composition from a wide range of conditioning inputs--text prompts, foreground, background, and location masks--offering greater controllability for real-world applications. To enable this, we introduce LASAGNA-48K, a new dataset composed of clean backgrounds and RGBA foregrounds with physically grounded visual effects. We also propose LASAGNABENCH, the first benchmark for layer editing. We demonstrate that LASAGNA excels in generating highly consistent and coherent results across multiple image layers simultaneously, enabling diverse post-editing applications that accurately preserve identity and visual effects. LASAGNA-48K and LASAGNABENCH will be publicly released to foster open research in the community. The project page is https://rayjryang.github.io/LASAGNA-Page/.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Nanjing University NJU
AI Insights - StableWorld also alleviates error accumulation in autoregressive video generation, resulting in more stable, consistent, and higher-quality long videos. (ML: 0.87)👍👎
- Autoregressive video generation: A technique where each frame is generated based on the previous one(s), often leading to error accumulation. (ML: 0.85)👍👎
- Geometric similarity: A measure of how similar two frames are based on their geometric structure. (ML: 0.82)👍👎
- Sliding window approach: A method where a fixed-size window is moved over the sequence, and the most recent frames are used to generate new ones. (ML: 0.82)👍👎
- The paper proposes a method called StableWorld for long-horizon interactive video generation, which aims to prevent error accumulation and maintain temporal consistency. (ML: 0.81)👍👎
- StableWorld effectively prevents cumulative errors by continuously filtering out degraded frames while maintaining coherent motion, resulting in more stable and temporally consistent interactive video sequences. (ML: 0.77)👍👎
- The method's ability to identify and discard a large number of drifted frames during generation has the potential to reduce training cost and aligns naturally with future extensions toward memory-augmented world models. (ML: 0.76)👍👎
- ORB (Oriented FAST and Rotated BRIEF): A feature detector that extracts keypoints with their descriptors for matching purposes. (ML: 0.63)👍👎
- StableWorld uses a sliding window approach with dynamic frame eviction based on geometric similarity computed using ORB features. (ML: 0.63)👍👎
- The method is evaluated on several benchmarks, including Matrix-Game 2.0, Hunyuan-Gamecraft 1.0, Open-Oasis, and Self-Forcing, showing improved stability and consistency in long-horizon generation. (ML: 0.59)👍👎
Abstract
In this paper, we explore the overlooked challenge of stability and temporal consistency in interactive video generation, which synthesizes dynamic and controllable video worlds through interactive behaviors such as camera movements and text prompts. Despite remarkable progress in world modeling, current methods still suffer from severe instability and temporal degradation, often leading to spatial drift and scene collapse during long-horizon interactions. To better understand this issue, we initially investigate the underlying causes of instability and identify that the major source of error accumulation originates from the same scene, where generated frames gradually deviate from the initial clean state and propagate errors to subsequent frames. Building upon this observation, we propose a simple yet effective method, \textbf{StableWorld}, a Dynamic Frame Eviction Mechanism. By continuously filtering out degraded frames while retaining geometrically consistent ones, StableWorld effectively prevents cumulative drift at its source, leading to more stable and temporal consistency of interactive generation. Promising results on multiple interactive video models, \eg, Matrix-Game, Open-Oasis, and Hunyuan-GameCraft, demonstrate that StableWorld is model-agnostic and can be applied to different interactive video generation frameworks to substantially improve stability, temporal consistency, and generalization across diverse interactive scenarios.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
We did not find tons of content matching your interests we've included some additional topics that are popular.
Also be aware that if the topics is not present in arxiv we wont be able to recommend it.
Mercor
AI Insights - McNemar's exact test: A statistical test used to compare the performance of two related samples. (ML: 0.97)👍👎
- Pass@1: The proportion of tasks completed correctly by an agent. (ML: 0.95)👍👎
- Significance tests using McNemar's exact test with Benjamini-Hochberg correction show that Kimi-K2-Thinking significantly outperforms Gemini-3-flash-preview (p=5.68e-23), GPT-OSS-120B (p=1.0000), and GPT-5.2 (p=7.29e-10). (ML: 0.95)👍👎
- The APEX–Agents benchmark highlights the importance of developing AI models that can perform complex tasks in various professional domains, with a focus on toolbelt approaches, context window management, and intentional termination. (ML: 0.94)👍👎
- Benjamini-Hochberg correction: A method for controlling false discovery rate in multiple testing. (ML: 0.94)👍👎
- The APEX–Agents benchmark is a comprehensive evaluation of AI models' ability to perform complex tasks in various professional domains. (ML: 0.93)👍👎
- The most frequently used tools by agents are code execution (256,000), add tool to the toolbelt (200,000), list files in the file system (163,874), read spreadsheet tab (127,000), and search the PDF (86,000). (ML: 0.93)👍👎
- The benchmark consists of 227 tasks, covering finance, law, and management consulting, with each task requiring the model to complete a specific task using a set of provided tools. (ML: 0.89)👍👎
- The top-performing models on the APEX–Agents benchmark are Gemini 3 Flash, GPT-5.2, and Kimi K2 Thinking, with Pass@1 scores of 0.555, 0.497, and 0.391 respectively. (ML: 0.88)👍👎
- ReAct paradigm: A toolbelt approach where reasoning and acting are interleaved in a single loop. (ML: 0.79)👍👎
Abstract
We introduce the AI Productivity Index for Agents (APEX-Agents), a benchmark for assessing whether AI agents can execute long-horizon, cross-application tasks created by investment banking analysts, management consultants, and corporate lawyers. APEX-Agents requires agents to navigate realistic work environments with files and tools. We test eight agents for the leaderboard using Pass@1. Gemini 3 Flash (Thinking=High) achieves the highest score of 24.0%, followed by GPT-5.2 (Thinking=High), Claude Opus 4.5 (Thinking=High), and Gemini 3 Pro (Thinking=High). We open source the APEX-Agents benchmark (n=480) with all prompts, rubrics, gold outputs, files, and metadata. We also open-source Archipelago, our infrastructure for agent execution and evaluation.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
Renmin University of China
AI Insights - Agentic capabilities: Fundamental skills like exploration, tool use, and self-verification. (ML: 0.96)👍👎
- Current results have limitations, such as generated videos being limited to simple animations and composed music lacking expressiveness and creativity. (ML: 0.95)👍👎
- The agentic capability benchmark provided by LLM-in-Sandbox can be used to evaluate models' ability to leverage computational environments. (ML: 0.94)👍👎
- Strong LLMs exhibit emergent capabilities to leverage the sandbox environment for general tasks. (ML: 0.92)👍👎
- LLM-in-Sandbox can be used as an agentic capability benchmark, measuring fundamental skills like exploration, tool use, and self-verification. (ML: 0.91)👍👎
- The metric ∆=LLM-in-Sandbox−LLM offers a meaningful indicator of a model's ability to leverage computational environments. (ML: 0.90)👍👎
- LLM-in-Sandbox has the potential to become the default paradigm for serving LLMs, enabling them to perform general tasks and produce actual outputs rather than text descriptions. (ML: 0.88)👍👎
- LLM-in-Sandbox: A paradigm that grants LLMs access to a virtual computer and enables them to leverage this environment for general tasks. (ML: 0.85)👍👎
- Sandbox-native model training: Training models to interact with the sandbox environment as a first-class objective. (ML: 0.82)👍👎
- LLM-in-Sandbox is a paradigm that grants LLMs access to a virtual computer and enables them to leverage this environment for general tasks. (ML: 0.80)👍👎
Abstract
We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
University of Amsterdam
AI Insights - The consistency requirement proposed by the authors is not just statistical frequency but having context-relative grounds for expecting further outputs of comparable novelty and value. (ML: 0.97)👍👎
- The concept of creativity should remain flexible across different domains of creativity, and the indeterminacy of the consistency requirement allows for this flexibility. (ML: 0.96)👍👎
- The consistency requirement proposed by the authors is a more inclusive and functional approach to defining creativity, allowing for non-human natural processes to be labelled 'creative'. (ML: 0.96)👍👎
- The consistency requirement proposed by the authors may not be applicable in all contexts, especially where authenticity conditions the value of the products being generated or examined. (ML: 0.94)👍👎
- The IAC has functional value in specific local contexts, such as cognitive science, jurisprudence, and certain domains of creative practice where authenticity conditions the value of the products being generated or examined. (ML: 0.94)👍👎
- New Standard Definition (NSD) of Creativity: An object is creative if it is novel, valuable, and the product of a system that can consistently generate novel and valuable objects. (ML: 0.92)👍👎
- The NSD states that an object is creative if it is novel, valuable, and the product of a system that can consistently generate novel and valuable objects. (ML: 0.91)👍👎
- The article proposes a new standard definition (NSD) of creativity, which drops the intentional agency condition (IAC) as a necessary condition of creativity. (ML: 0.89)👍👎
- The article does not provide a comprehensive account of where the IAC ought to be applied. (ML: 0.89)👍👎
- The IAC should be excluded from our definition of the genus of creativity but retained as a means of distinguishing between certain species of creativity. (ML: 0.89)👍👎
- Intentional Agency Condition (IAC): A necessary condition of creativity that requires an agent to intentionally endeavor to express themselves. (ML: 0.82)👍👎
Abstract
Many theorists of creativity maintain that intentional agency is a necessary condition of creativity. We argue that this requirement, which we call the Intentional Agency Condition (IAC), should be rejected as a general condition of creativity, while retaining its relevance in specific contexts. We show that recent advances in generative AI have rendered the IAC increasingly problematic, both descriptively and functionally. We offer two reasons for abandoning it at the general level. First, we present corpus evidence indicating that authors and journalists are increasingly comfortable ascribing creativity to generative AI, despite its lack of intentional agency. This development places pressure on the linguistic intuitions that have traditionally been taken to support the IAC. Second, drawing on the method of conceptual engineering, we argue that the IAC no longer fulfils its core social function. Rather than facilitating the identification and encouragement of reliable sources of novel and valuable products, it now feeds into biases that distort our assessments of AI-generated outputs. We therefore propose replacing the IAC with a consistency requirement, according to which creativity tracks the reliable generation of novel and valuable products. Nonetheless, we explain why the IAC should be retained in specific local domains.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Sony
AI Insights - The paper concludes that current XAI methods are based on flawed assumptions and lack a clear understanding of the relationship between humans and machines. (ML: 0.98)👍👎
- Apparatuses: The technical tools, methods, and narratives that constitute what is made intelligible and what is excluded from intelligibility in XAI practices. (ML: 0.97)👍👎
- The paper critiques the current state of Explainable AI (XAI) methods, arguing that they are based on flawed assumptions and lack a clear understanding of the relationship between humans and machines. (ML: 0.97)👍👎
- The paper highlights the limitations of current XAI methods, including their reliance on simplifications and abstractions that erase the original system, and their failure to account for human-machine incommensurability. (ML: 0.96)👍👎
- The authors propose an agential realist approach to XAI, which views interpretation as a relational co-production of interpretable phenomena through intra-actions between human and non-human agencies. (ML: 0.96)👍👎
- Agential cut: The moment at which an interpretive apparatus enacts a relational co-production of interpretable phenomena through intra-actions between human and non-human agencies. (ML: 0.96)👍👎
- Agential realism: A philosophical framework that views knowledge as an intra-action between human and non-human agencies. (ML: 0.94)👍👎
- Intra-action: The process by which human and non-human agencies co-produce interpretable phenomena through their entanglements. (ML: 0.92)👍👎
- The authors suggest that a diffractive optic offers a more philosophically robust reading of XAI practices, one that acknowledges the emergent nature of interpretation and the importance of situated contexts. (ML: 0.90)👍👎
- This approach challenges the dominant reflectivity and refractivity optics in XAI, which assume that meaning pre-exists the practices and beings that produce it. (ML: 0.75)👍👎
Abstract
Explainable AI (XAI) is frequently positioned as a technical problem of revealing the inner workings of an AI model. This position is affected by unexamined onto-epistemological assumptions: meaning is treated as immanent to the model, the explainer is positioned outside the system, and a causal structure is presumed recoverable through computational techniques. In this paper, we draw on Barad's agential realism to develop an alternative onto-epistemology of XAI. We propose that interpretations are material-discursive performances that emerge from situated entanglements of the AI model with humans, context, and the interpretative apparatus. To develop this position, we read a comprehensive set of XAI methods through agential realism and reveal the assumptions and limitations that underpin several of these methods. We then articulate the framework's ethical dimension and propose design directions for XAI interfaces that support emergent interpretation, using a speculative text-to-music interface as a case study.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Stanford University
AI Insights - LLM: Large Language Model RL: Reinforcement Learning ML engineering tasks: Machine learning tasks that heavily depend on feature engineering and hyper-parameter tuning rather than algorithm development. (ML: 0.97)👍👎
- The paper demonstrates the feasibility and potential of automated execution feedback loops in LLM research problems, but highlights remaining limitations that need to be addressed. (ML: 0.96)👍👎
- Execution grounding for code: The idea of learning from execution feedback in the code generation domain. (ML: 0.96)👍👎
- Future work should focus on improving generalizability testing, exploring richer learning signals from execution trajectories, developing more capable execution agents, and incorporating alternative metrics such as idea novelty and interestingness. (ML: 0.95)👍👎
- They find that models tend to converge on simple ideas to improve the average reward but lose diversity and do not improve the upper-bound. (ML: 0.95)👍👎
- The paper presents a large-scale parallel executor for automatically executing model-generated ideas to verify their effectiveness on open-ended LLM research problems. (ML: 0.92)👍👎
- The authors analyze the effectiveness of execution-guided evolutionary search and reinforcement learning with execution rewards. (ML: 0.86)👍👎
- The paper highlights the limitations of current experiments, including a lack of generalizability testing, limited exploration incentives in RL objectives, and noise in the reward signal due to the execution agent's capabilities. (ML: 0.84)👍👎
Abstract
Automated AI research holds great potential to accelerate scientific discovery. However, current LLMs often generate plausible-looking but ineffective ideas. Execution grounding may help, but it is unclear whether automated execution is feasible and whether LLMs can learn from the execution feedback. To investigate these, we first build an automated executor to implement ideas and launch large-scale parallel GPU experiments to verify their effectiveness. We then convert two realistic research problems - LLM pre-training and post-training - into execution environments and demonstrate that our automated executor can implement a large fraction of the ideas sampled from frontier LLMs. We analyze two methods to learn from the execution feedback: evolutionary search and reinforcement learning. Execution-guided evolutionary search is sample-efficient: it finds a method that significantly outperforms the GRPO baseline (69.4% vs 48.0%) on post-training, and finds a pre-training recipe that outperforms the nanoGPT baseline (19.7 minutes vs 35.9 minutes) on pre-training, all within just ten search epochs. Frontier LLMs often generate meaningful algorithmic ideas during search, but they tend to saturate early and only occasionally exhibit scaling trends. Reinforcement learning from execution reward, on the other hand, suffers from mode collapse. It successfully improves the average reward of the ideator model but not the upper-bound, due to models converging on simple ideas. We thoroughly analyze the executed ideas and training dynamics to facilitate future efforts towards execution-grounded automated AI research.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Florida Institute of Technology
AI Insights - However, AI also has its limitations and challenges, including the issue of impostor bias, where AI systems may mistakenly identify a legitimate file or activity as malicious. (ML: 0.96)👍👎
- Limited accuracy: AI systems may not always accurately identify malicious files or activities. (ML: 0.96)👍👎
- Impostor bias: AI systems may mistakenly identify a legitimate file or activity as malicious. (ML: 0.94)👍👎
- To address these challenges, researchers are working on developing more accurate and reliable AI systems for digital forensics. (ML: 0.93)👍👎
- To address these challenges, researchers are working on developing more accurate and reliable AI systems for digital forensics. (ML: 0.93)👍👎
- Artificial Intelligence (AI): A type of computer system that can perform tasks that would typically require human intelligence, such as learning, problem-solving, and decision-making. (ML: 0.92)👍👎
- Digital Forensics: The process of collecting, analyzing, and preserving evidence related to cybercrime and other digital crimes. (ML: 0.92)👍👎
- The use of artificial intelligence (AI) in digital forensics is becoming increasingly important as cybercrime continues to grow. (ML: 0.92)👍👎
- The use of AI in digital forensics is becoming increasingly important, but it also has its limitations and challenges. (ML: 0.90)👍👎
- The use of AI in digital forensics is becoming increasingly important, but it also has its limitations and challenges. (ML: 0.90)👍👎
Abstract
In an era where cyber threats are rapidly evolving, the reliability of cyber forensic analysis has become increasingly critical for effective digital investigations and cybersecurity responses. AI agents are being adopted across digital forensic practices due to their ability to automate processes such as anomaly detection, evidence classification, and behavioral pattern recognition, significantly enhancing scalability and reducing investigation timelines. However, the characteristics that make AI indispensable also introduce notable risks. AI systems, often trained on biased or incomplete datasets, can produce misleading results, including false positives and false negatives, thereby jeopardizing the integrity of forensic investigations. This study presents a meticulous comparative analysis of the effectiveness of the most used AI agent, ChatGPT, and human forensic investigators in the realm of cyber forensic analysis. Our research reveals critical limitations within AI-driven approaches, demonstrating scenarios in which sophisticated or novel cyber threats remain undetected due to the rigid pattern-based nature of AI systems. Conversely, our analysis highlights the crucial role that human forensic investigators play in mitigating these risks. Through adaptive decision-making, ethical reasoning, and contextual understanding, human investigators effectively identify subtle anomalies and threats that may evade automated detection systems. To reinforce our findings, we conducted comprehensive reliability testing of forensic techniques using multiple cyber threat scenarios. These tests confirmed that while AI agents significantly improve the efficiency of routine analyses, human oversight remains crucial in ensuring accuracy and comprehensiveness of the results.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Purdue University
AI Insights - The authors acknowledge that their study is limited by its reliance on a small number of datasets. (ML: 0.99)👍👎
- Domain shift: a phenomenon where the distribution of data in the training set differs from that of the testing set. (ML: 0.98)👍👎
- The development of DSCF highlights the need for large-scale and diverse training data. (ML: 0.97)👍👎
- Recent works have proposed unsupervised domain adaptation frameworks, but their effectiveness beyond the originally reported datasets are yet to be independently evaluated. (ML: 0.95)👍👎
- The results of this benchmarking experiment have shown that classifying test samples that are in-distribution to the training dataset is significantly easier than test samples suffering from distribution shift due to changes in instruments and acquisition conditions, and additional contaminants. (ML: 0.94)👍👎
- Foundation model: a pre-trained model that can be fine-tuned for specific tasks, often using transfer learning. (ML: 0.92)👍👎
- SANet demonstrated the best overall performance across the datasets. (ML: 0.84)👍👎
- The study benchmarks only five architectures and relies on minimal spectral pre-processing. (ML: 0.77)👍👎
- Existing open-source Raman datasets are often restricted in size, chemical diversity or experimental variability. (ML: 0.67)👍👎
- Creating large, curated experimental Raman spectral datasets that span multiple instruments, materials and measurement settings is key to developing a Raman-specific foundation model. (ML: 0.61)👍👎
- Raman spectroscopy: a technique used to analyze the vibrational modes of molecules. (ML: 0.52)👍👎
Abstract
Deep learning classifiers for Raman spectroscopy are increasingly reported to outperform classical chemometric approaches. However their evaluations are often conducted in isolation or compared against traditional machine learning methods or trivially adapted vision-based architectures that were not originally proposed for Raman spectroscopy. As a result, direct comparisons between existing deep learning models developed specifically for Raman spectral analysis on shared open-source datasets remain scarce. To the best of our knowledge, this study presents one of the first systematic benchmarks comparing three or more published Raman-specific deep learning classifiers across multiple open-source Raman datasets. We evaluate five representative deep learning architectures under a unified training and hyperparameter tuning protocol across three open-source Raman datasets selected to support standard evaluation, fine-tuning, and explicit distribution-shift testing. We report classification accuracies and macro-averaged F1 scores to provide a fair and reproducible comparison of deep learning models for Raman spectra based classification.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Quantinuum Ltd
AI Insights - L1 Relative Change (L1RC): A measure of the difference between two probability distributions. (ML: 0.98)👍👎
- Signal-to-Noise Ratio (SNR): The ratio of the signal power to the noise power in a system. (ML: 0.93)👍👎
- However, on Real Pauli data the advantage clearly shifts toward the ML-based models, which outperform all baselines in both median L1 relative change and fraction of improved circuits. (ML: 0.93)👍👎
- Deep learning models can learn corrections directly from data gathered during circuit runs, more easily capturing correlations. (ML: 0.88)👍👎
- The best performing models are comparable to the best baseline methods on Simulated data (both Pauli and Random). (ML: 0.87)👍👎
- It is defined as the L1 norm of the difference between the two distributions. (ML: 0.87)👍👎
- The learned mapping from P noisy and circuit features to P ideal captures a richer structure that goes beyond coarse depolarization or measurement-error mitigation. (ML: 0.81)👍👎
- The PERCEIVER model consistently achieves as good or greater median performance than the baseline mitigation techniques for Pauli circuits. (ML: 0.80)👍👎
- The deep learning approaches can generalize across noise regimes, device generations, and circuit families without relying on a predefined noise model. (ML: 0.79)👍👎
- The deep learning approaches can generalize across noise regimes, device generations, and circuit families without relying on a predefined noise model. (ML: 0.79)👍👎
- The baseline methods retain value as lightweight, interpretable mitigation techniques, particularly for structured, low-depth circuits. (ML: 0.61)👍👎
Abstract
We present a systematic investigation of deep learning methods applied to quantum error mitigation of noisy output probability distributions from measured quantum circuits. We compare different architectures, from fully connected neural networks to transformers, and we test different design/training modalities, identifying sequence-to-sequence, attention-based models as the most effective on our datasets. These models consistently produce mitigated distributions that are closer to the ideal outputs when tested on both simulated and real device data obtained from IBM superconducting quantum processing units (QPU) up to five qubits. Across several different circuit depths, our approach outperforms other baseline error mitigation techniques. We perform a series of ablation studies to examine: how different input features (circuit, device properties, noisy output statistics) affect performance; cross-dataset generalization across circuit families; and transfer learning to a different IBM QPU. We observe that generalization performance across similar devices with the same architecture works effectively, without needing to fully retrain models.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
UC Santa Cruz
AI Insights - However, they often require large amounts of data and computational resources to train, which can be a limitation. (ML: 0.98)👍👎
- require large amounts of data and computational resources to train The use of text-to-image diffusion models for image editing has been explored by several researchers, including those who have developed datasets such as Qwen-Image and Omnigen2. (ML: 0.94)👍👎
- Text-to-image diffusion models have become increasingly popular in recent years, with many researchers exploring their potential applications. (ML: 0.93)👍👎
- The unreasonable effectiveness of deep features as a perceptual metric Text-to-image diffusion models have become increasingly popular in recent years, with many researchers exploring their potential applications. (ML: 0.92)👍👎
- Text-to-image diffusion models are a type of artificial intelligence that can generate images from text descriptions. (ML: 0.91)👍👎
- They have many potential applications, but require large amounts of data and computational resources to train. (ML: 0.91)👍👎
- These models can be used for various tasks such as image editing, object removal, and text-to-image synthesis. (ML: 0.89)👍👎
- These models can be used for various tasks such as object removal, text-to-image synthesis, and instruction-guided image editing. (ML: 0.88)👍👎
Abstract
Recent image generation models have shown impressive progress, yet they often struggle to yield controllable and consistent results when users attempt to edit specific elements within an existing image. Layered representations enable flexible, user-driven content creation, but existing approaches often fail to produce layers with coherent compositing relationships, and their object layers typically lack realistic visual effects such as shadows and reflections. To overcome these limitations, we propose LASAGNA, a novel, unified framework that generates an image jointly with its composing layers--a photorealistic background and a high-quality transparent foreground with compelling visual effects. Unlike prior work, LASAGNA efficiently learns correct image composition from a wide range of conditioning inputs--text prompts, foreground, background, and location masks--offering greater controllability for real-world applications. To enable this, we introduce LASAGNA-48K, a new dataset composed of clean backgrounds and RGBA foregrounds with physically grounded visual effects. We also propose LASAGNABENCH, the first benchmark for layer editing. We demonstrate that LASAGNA excels in generating highly consistent and coherent results across multiple image layers simultaneously, enabling diverse post-editing applications that accurately preserve identity and visual effects. LASAGNA-48K and LASAGNABENCH will be publicly released to foster open research in the community. The project page is https://rayjryang.github.io/LASAGNA-Page/.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Nanjing University NJU
AI Insights - StableWorld also alleviates error accumulation in autoregressive video generation, resulting in more stable, consistent, and higher-quality long videos. (ML: 0.87)👍👎
- Autoregressive video generation: A technique where each frame is generated based on the previous one(s), often leading to error accumulation. (ML: 0.85)👍👎
- Geometric similarity: A measure of how similar two frames are based on their geometric structure. (ML: 0.82)👍👎
- Sliding window approach: A method where a fixed-size window is moved over the sequence, and the most recent frames are used to generate new ones. (ML: 0.82)👍👎
- The paper proposes a method called StableWorld for long-horizon interactive video generation, which aims to prevent error accumulation and maintain temporal consistency. (ML: 0.81)👍👎
- StableWorld effectively prevents cumulative errors by continuously filtering out degraded frames while maintaining coherent motion, resulting in more stable and temporally consistent interactive video sequences. (ML: 0.77)👍👎
- The method's ability to identify and discard a large number of drifted frames during generation has the potential to reduce training cost and aligns naturally with future extensions toward memory-augmented world models. (ML: 0.76)👍👎
- ORB (Oriented FAST and Rotated BRIEF): A feature detector that extracts keypoints with their descriptors for matching purposes. (ML: 0.63)👍👎
- StableWorld uses a sliding window approach with dynamic frame eviction based on geometric similarity computed using ORB features. (ML: 0.63)👍👎
- The method is evaluated on several benchmarks, including Matrix-Game 2.0, Hunyuan-Gamecraft 1.0, Open-Oasis, and Self-Forcing, showing improved stability and consistency in long-horizon generation. (ML: 0.59)👍👎
Abstract
In this paper, we explore the overlooked challenge of stability and temporal consistency in interactive video generation, which synthesizes dynamic and controllable video worlds through interactive behaviors such as camera movements and text prompts. Despite remarkable progress in world modeling, current methods still suffer from severe instability and temporal degradation, often leading to spatial drift and scene collapse during long-horizon interactions. To better understand this issue, we initially investigate the underlying causes of instability and identify that the major source of error accumulation originates from the same scene, where generated frames gradually deviate from the initial clean state and propagate errors to subsequent frames. Building upon this observation, we propose a simple yet effective method, \textbf{StableWorld}, a Dynamic Frame Eviction Mechanism. By continuously filtering out degraded frames while retaining geometrically consistent ones, StableWorld effectively prevents cumulative drift at its source, leading to more stable and temporal consistency of interactive generation. Promising results on multiple interactive video models, \eg, Matrix-Game, Open-Oasis, and Hunyuan-GameCraft, demonstrate that StableWorld is model-agnostic and can be applied to different interactive video generation frameworks to substantially improve stability, temporal consistency, and generalization across diverse interactive scenarios.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Interests not found
We did not find any papers that match the below interests.
Try other terms also consider if the content exists in arxiv.org.
💬 Help Shape Our Pricing
We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.
Share Your Feedback
Help us improve your experience!
This project is on its early stages your feedback can be pivotal on the future of the project.
Let us know what you think about this week's papers and suggestions!
Give Feedback