BarIlan University
AI Insights - Reasoning ability: The capacity of an AI system to draw conclusions and make decisions based on evidence. (ML: 0.99)ππ
- Future research should focus on improving the reasoning ability of AI systems and developing more effective evidence retrieval methods. (ML: 0.98)ππ
- The paper also explores the use of large language models (LLMs) in claim verification, highlighting their strengths and limitations. (ML: 0.98)ππ
- LLMs can be useful tools for claim verification, but their limitations should not be overlooked. (ML: 0.97)ππ
- The paper concludes that ReasonRank outperforms other methods in claim verification, especially when combined with LLMs. (ML: 0.97)ππ
- Evidence retrieval: The process of finding relevant information to support or refute a claim. (ML: 0.95)ππ
- Passage ranking: A method for ranking the relevance of passages in relation to a query or claim. (ML: 0.95)ππ
- Claim verification: The process of determining whether a statement or claim is true or false based on evidence. (ML: 0.95)ππ
- The paper discusses a method for claim verification using evidence retrieval and ranking. (ML: 0.95)ππ
- The authors propose a new approach called ReasonRank, which combines passage ranking with strong reasoning ability. (ML: 0.88)ππ
Abstract
Attribution and fact verification are critical challenges in natural language processing for assessing information reliability. While automated systems and Large Language Models (LLMs) aim to retrieve and select concise evidence to support or refute claims, they often present users with either insufficient or overly redundant information, leading to inefficient and error-prone verification. To address this, we propose Evidence Ranking, a novel task that prioritizes presenting sufficient information as early as possible in a ranked list. This minimizes user reading effort while still making all available evidence accessible for sequential verification. We compare two approaches for the new ranking task: one-shot ranking and incremental ranking. We introduce a new evaluation framework, inspired by information retrieval metrics, and construct a unified benchmark by aggregating existing fact verification datasets. Extensive experiments with diverse models show that incremental ranking strategies better capture complementary evidence and that LLM-based methods outperform shallower baselines, while still facing challenges in balancing sufficiency and redundancy. Compared to evidence selection, we conduct a controlled user study and demonstrate that evidence ranking both reduces reading effort and improves verification. This work provides a foundational step toward more interpretable, efficient, and user-aligned information verification systems.
Why we are recommending this paper?
Due to your Interest in Attribution
This paper directly addresses attribution, a core interest, by exploring methods for ranking evidence presented to users. The focus on evidence ranking aligns with the need for reliable information assessment within your domain.
Peking University
AI Insights - Mechanistic data attribution can be used to identify which specific data points contribute to the model's performance and interpretability. (ML: 0.98)ππ
- The paper provides a theoretical framework for understanding the influence of individual training examples on LLMs. (ML: 0.98)ππ
- The paper discusses mechanistic data attribution for tracing the training origins of interpretable LLM units. (ML: 0.98)ππ
- Influence function: A measure of how a single training example affects the model's parameters and performance. (ML: 0.98)ππ
- Limited scope: The paper focuses on a specific type of machine learning model (LLMs) and may not be applicable to other models. (ML: 0.98)ππ
- Influence functions are used as a tool for understanding how individual training examples contribute to the model's performance. (ML: 0.97)ππ
- A theoretical framework is provided for deriving influence functions, which can be applied to various machine learning models. (ML: 0.96)ππ
- This approach has implications for improving the explainability and transparency of large language models. (ML: 0.96)ππ
- Empirical risk minimizer: The model parameter that minimizes the average loss over the training dataset. (ML: 0.93)ππ
- Hessian matrix: A square matrix of second partial derivatives of a scalar-valued function, used to describe the curvature of the function. (ML: 0.78)ππ
Abstract
While Mechanistic Interpretability has identified interpretable circuits in LLMs, their causal origins in training data remain elusive. We introduce Mechanistic Data Attribution (MDA), a scalable framework that employs Influence Functions to trace interpretable units back to specific training samples. Through extensive experiments on the Pythia family, we causally validate that targeted intervention--removing or augmenting a small fraction of high-influence samples--significantly modulates the emergence of interpretable heads, whereas random interventions show no effect. Our analysis reveals that repetitive structural data (e.g., LaTeX, XML) acts as a mechanistic catalyst. Furthermore, we observe that interventions targeting induction head formation induce a concurrent change in the model's in-context learning (ICL) capability. This provides direct causal evidence for the long-standing hypothesis regarding the functional link between induction heads and ICL. Finally, we propose a mechanistic data augmentation pipeline that consistently accelerates circuit convergence across model scales, providing a principled methodology for steering the developmental trajectories of LLMs.
Why we are recommending this paper?
Due to your Interest in Attribution
Understanding the origins of LLM training data is crucial for attribution and personalization efforts. This work provides a framework for tracing the impact of training data, which is highly relevant to your interests in data science management.
Shanghai Jiao Tong University
AI Insights - Random: A naive baseline that selects 50 samples uniformly at random from Dcandidate. (ML: 0.96)ππ
- Log Loss: Measures the model's goodness-of-fit. (ML: 0.96)ππ
- Surrogate objective U(S): A proxy for the true model uncertainty G(S), used to select samples in the Greedy-Surrogate method. (ML: 0.94)ππ
- Lower values are better. (ML: 0.93)ππ
- Higher values are better. (ML: 0.93)ππ
- FIM (Oracle): A strong but computationally expensive baseline that greedily selects samples to maximize the reduction in the true model uncertainty G(S). (ML: 0.92)ππ
- Area Under the ROC Curve (AUC): Measures the model's ability to discriminate between the positive and negative classes. (ML: 0.92)ππ
- The algorithm's total expected expenditure is bounded by the budget B plus an additional term that is controlled by the algorithm's learning parameters, providing a formal upper bound on expected spending. (ML: 0.91)ππ
- The Greedy-Surrogate method achieves a significant reduction in Log Loss and an increase in AUC, closely approaching the performance of the FIM oracle and substantially outperforming the random baseline. (ML: 0.86)ππ
- The proposed method, Greedy-Surrogate, iteratively selects samples that yield the maximum marginal gain in the surrogate objective U(S), outperforming both the Random and FIM (Oracle) baselines. (ML: 0.85)ππ
- Greedy-FIM (Oracle): The FIM oracle selection strategy, which serves as a practical upper bound on performance. (ML: 0.83)ππ
- The proposed method provides a principled online optimization algorithm with provable near-optimal performance, guaranteeing that a creator's budget will be utilized efficiently and effectively over time. (ML: 0.82)ππ
Abstract
Modern content platforms offer paid promotion to mitigate cold start by allocating exposure via auctions. Our empirical analysis reveals a counterintuitive flaw in this paradigm: while promotion rescues low-to-medium quality content, it can harm high-quality content by forcing exposure to suboptimal audiences, polluting engagement signals and downgrading future recommendation. We recast content promotion as a dual-objective optimization that balances short-term value acquisition with long-term model improvement. To make this tractable at bid time in content promotion, we introduce a decomposable surrogate objective, gradient coverage, and establish its formal connection to Fisher Information and optimal experimental design. We design a two-stage auto-bidding algorithm based on Lagrange duality that dynamically paces budget through a shadow price and optimizes impression-level bids using per-impression marginal utilities. To address missing labels at bid time, we propose a confidence-gated gradient heuristic, paired with a zeroth-order variant for black-box models that reliably estimates learning signals in real time. We provide theoretical guarantees, proving monotone submodularity of the composite objective, sublinear regret in online auction, and budget feasibility. Extensive offline experiments on synthetic and real-world datasets validate the framework: it outperforms baselines, achieves superior final AUC/LogLoss, adheres closely to budget targets, and remains effective when gradients are approximated zeroth-order. These results show that strategic, information-aware promotion can improve long-term model performance and organic outcomes beyond naive impression-maximization strategies.
Why we are recommending this paper?
Due to your Interest in Bidding
The paperβs focus on auto-bidding and content promotion directly relates to your interest in marketing channels and optimization strategies. It offers insights into how exposure allocation can be strategically managed.
Kuaishou Technology
AI Insights - The proposed method may not be applicable to all types of decision-making tasks. (ML: 0.98)ππ
- It uses a combination of reinforcement learning and attention mechanisms to make decisions. (ML: 0.96)ππ
- Decision Transformer: A type of transformer-based model that is used for decision-making tasks, such as real-time bidding in online advertising. (ML: 0.94)ππ
- AuctionNet uses a combination of reinforcement learning and attention mechanisms to make decisions. (ML: 0.91)ππ
- The paper cites several existing methods for decision-making in large-scale games, including the Decision Transformer and AuctionNet. (ML: 0.89)ππ
- The authors compare their proposed method with existing methods and show that it outperforms them in terms of accuracy and efficiency. (ML: 0.87)ππ
- The use of AuctionNet as a benchmark can help improve the performance of decision-making models in large-scale games. (ML: 0.87)ππ
- AuctionNet: A novel benchmark for decision-making in large-scale games, which uses a combination of reinforcement learning and attention mechanisms to make decisions. (ML: 0.86)ππ
- The paper proposes a new method for decision-making in large-scale games called AuctionNet, which is a novel benchmark for decision-making. (ML: 0.84)ππ
- The use of AuctionNet as a benchmark may require significant computational resources. (ML: 0.83)ππ
- The proposed method outperforms existing methods in terms of accuracy and efficiency. (ML: 0.82)ππ
Abstract
Decision Transformer (DT) shows promise for generative auto-bidding by capturing temporal dependencies, but suffers from two critical limitations: insufficient cross-correlation modeling among state, action, and return-to-go (RTG) sequences, and indiscriminate learning of optimal/suboptimal behaviors. To address these, we propose C2, a novel framework enhancing DT with two core innovations: (1) a Cross Learning Block (CLB) via cross-attention to strengthen inter-sequence correlation modeling; (2) a Constraint-aware Loss (CL) incorporating budget and Cost-Per-Acquisition (CPA) constraints for selective learning of optimal trajectories. Extensive offline evaluations on the AuctionNet dataset demonstrate consistent performance gains (up to 3.2% over state-of-the-art method) across diverse budget settings; ablation studies verify the complementary synergy of CLB and CL, confirming C2's superiority in auto-bidding. The code for reproducing our results is available at: https://github.com/Dingjinren/C2.
Why we are recommending this paper?
Due to your Interest in Bidding
This paper tackles auto-bidding, a key area of interest, using a Decision Transformer framework. The constraint-aware loss function is particularly relevant to optimizing bidding strategies for personalized campaigns.
The University of Tennessee
AI Insights - AI and Data Science: The use of artificial intelligence and data science techniques to analyze and interpret complex data sets. (ML: 0.98)ππ
- The publication trends across LIS research themes from 2014 to 2023 reveal substantial variation in growth patterns across the field. (ML: 0.97)ππ
- The publication trends indicate that LIS research is becoming more interdisciplinary, with a focus on emerging technologies such as artificial intelligence, data science, and extended reality. (ML: 0.95)ππ
- Biomedical Informatics: The application of computer science and information technology to medical research and healthcare. (ML: 0.95)ππ
- The three overarching LIS research dimensions are: 1) Libraries, Librarianship, and Information Services; 2) Scholarly Communication; and 3) Information Access and Equity. (ML: 0.93)ππ
- The field of Library and Information Science (LIS) has undergone significant changes in recent years, with a shift towards more interdisciplinary research. (ML: 0.93)ππ
- Library and Information Science (LIS): A field of study that focuses on the collection, organization, preservation, and dissemination of information. (ML: 0.92)ππ
- The top 5 most published themes are: 1) Biomedical Informatics; 2) AI and Data Science; 3) Metadata and Archives; 4) Health Informatics and Technology; and 5) Social Media. (ML: 0.91)ππ
- Interdisciplinary research: Research that combines multiple disciplines or fields to address a particular problem or question. (ML: 0.87)ππ
- Metadata and Archives: The creation, management, and preservation of metadata and archives for digital libraries and other information systems. (ML: 0.86)ππ
Abstract
This study provides the first comprehensive empirical mapping of how organizational structures and research portfolios co-occur across U.S. Library and Information Science (LIS) schools. Analyzing 14,705 publications from 1,264 faculty members across 44 institutions (2013--2024), we employ computational methods including word embeddings and topic modeling to identify 16 distinct research themes organized into three foundational dimensions: Library and Knowledge Organization (LKO), Human-Centered Technology (HCT), and Computing Systems (CS). Our mixed-method analysis reveals significant differences in research composition across organizational types: Computer-affiliated schools cluster tightly in computationally-intensive research and differ significantly from all other school types, while independent Information schools demonstrate the greatest research diversity. Temporal analysis of LIS schools reveals complex evolutionary dynamics: 51.4% are moving toward HCT, 37.8% toward CS, and 37.8% toward LKO, with many schools simultaneously shifting along multiple dimensions. Contrary to narratives of computational dominance, HCT emerged as LIS's primary growth vector. These patterns challenge assumptions about field fragmentation, revealing structured diversification shaped by but not determined by organizational positioning. The study provides empirical foundations for institutional strategic planning, accreditation policy, and understanding LIS's evolving disciplinary identity amid computational transformation.
Why we are recommending this paper?
Due to your Interest in Direction on Data Science Organizations
This study provides a broader perspective on research trends within the LIS field, which is pertinent to your interest in data science management and organizational direction. Understanding research patterns can inform strategic decisions about data science organizations.
York University
AI Insights - AUPRC: Area Under the Precision-Recall Curve Lack of spread: A measure of model performance that is not well-defined in this context. (ML: 0.97)ππ
- Calibration slope: A measure of model calibration. (ML: 0.96)ππ
- When using the second loss function (Lββ), which includes the AUPRC as a measure of discrimination, the optimal Mis proportion is much higher than under Lβ. (ML: 0.96)ππ
- CITL: Calibration-In-The-Large, a measure of model calibration. (ML: 0.96)ππ
- The best performing model under Lβ has an AUPRC value of 0.475 and a lack of spread value of 0.063 when Ξ± = 0.1. (ML: 0.95)ππ
- The results suggest that the choice of loss function and the value of Ξ± have a significant impact on the performance of the model. (ML: 0.95)ππ
- The proposed algorithm for tuning the size of subpopulation (Mis) is applied to a real-world dataset from the eICU cardiac database. (ML: 0.93)ππ
- When using the first loss function (Lβ), the optimal Mis proportion is 0.29 under all values of Ξ± except Ξ± = 0.9, where it is 0.20. (ML: 0.90)ππ
- The results show that the optimal Mis value varies depending on the choice of alpha (Ξ±), which controls the emphasis on discrimination and calibration in the loss function. (ML: 0.89)ππ
Abstract
Advances in precision medicine increasingly drive methodological innovation in health research. A key development is the use of personalized prediction models (PPMs), which are fit using a similar subpopulation tailored to a specific index patient, and have been shown to outperform one-size-fits-all models, particularly in terms of model discrimination performance. We propose a generalized loss function that enables tuning of the subpopulation size used to fit a PPM. This loss function allows joint optimization of discrimination and calibration, allowing both the performance measures and their relative weights to be specified by the user. To reduce computational burden, we conducted extensive simulation studies to identify practical bounds for the grid of subpopulation sizes. Based on these results, we recommend using a lower bound of 20\% and an upper bound of 70\% of the entire training dataset. We apply the proposed method to both simulated and real-world datasets and demonstrate that previously observed relationships between subpopulation size and model performance are robust. Furthermore, we show that the choice of performance measures in the loss function influences the optimal subpopulation size selected. These findings support the flexible and computationally efficient implementation of PPMs in precision health research.
Why we are recommending this paper?
Due to your Interest in Personalization
Plaksha University
AI Insights - Further research is needed to refine the granularity of real-time difficulty calibration and explore the long-term effects of using such a system. (ML: 0.98)ππ
- The study suggests that GuideAI's use of physiological data to inform adaptive interventions can lead to better learning outcomes and increased user engagement. (ML: 0.98)ππ
- Further research is needed to refine the granularity of real-time difficulty calibration. (ML: 0.97)ππ
- Previous studies have shown that personalized learning systems can improve learning outcomes, but few have explored the use of physiological data to inform adaptive interventions. (ML: 0.97)ππ
- The study found that GuideAI, a real-time personalized learning solution with adaptive interventions, significantly improved learning outcomes and user experience compared to a control group. (ML: 0.96)ππ
- The study demonstrates the potential of GuideAI's biosensor-driven approach to improve learning outcomes and user experience. (ML: 0.96)ππ
- The study's sample size was relatively small. (ML: 0.95)ππ
- GuideAI: A real-time personalized learning solution with adaptive interventions. (ML: 0.95)ππ
- GuideAI's biosensor-driven interventions were rated positively by participants, who appreciated the system's ability to detect and respond to cognitive-affective shifts in real time. (ML: 0.94)ππ
- Biosensor-driven interventions: Adaptive interventions informed by physiological data, such as heart rate or skin conductance, to adjust the learning experience in real time. (ML: 0.94)ππ
Abstract
Large Language Models (LLMs) have emerged as powerful learning tools, but they lack awareness of learners' cognitive and physiological states, limiting their adaptability to the user's learning style. Contemporary learning techniques primarily focus on structured learning paths, knowledge tracing, and generic adaptive testing but fail to address real-time learning challenges driven by cognitive load, attention fluctuations, and engagement levels. Building on findings from a formative user study (N=66), we introduce GuideAI, a multi-modal framework that enhances LLM-driven learning by integrating real-time biosensory feedback including eye gaze tracking, heart rate variability, posture detection, and digital note-taking behavior. GuideAI dynamically adapts learning content and pacing through cognitive optimizations (adjusting complexity based on learning progress markers), physiological interventions (breathing guidance and posture correction), and attention-aware strategies (redirecting focus using gaze analysis). Additionally, GuideAI supports diverse learning modalities, including text-based, image-based, audio-based, and video-based instruction, across varied knowledge domains. A preliminary study (N = 25) assessed GuideAI's impact on knowledge retention and cognitive load through standardized assessments. The results show statistically significant improvements in both problem-solving capability and recall-based knowledge assessments. Participants also experienced notable reductions in key NASA-TLX measures including mental demand, frustration levels, and effort, while simultaneously reporting enhanced perceived performance. These findings demonstrate GuideAI's potential to bridge the gap between current LLM-based learning systems and individualized learner needs, paving the way for adaptive, cognition-aware education at scale.
Why we are recommending this paper?
Due to your Interest in Personalization