Papers from 29 to 03 October, 2025

Here are the personalized paper recommendations sorted by most relevant

Tech for Social Good

Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research

University of Oxford

Abstract
Large Language Models (LLMs) are being increasingly used as synthetic agents in social science, in applications ranging from augmenting survey responses to powering multi-agent simulations. Because strong prediction plus conditioning prompts, token log-probs, and repeated sampling mimic Bayesian workflows, their outputs can be misinterpreted as posterior-like evidence from a coherent model. However, prediction does not equate to probabilism, and accurate points do not imply calibrated uncertainty. This paper outlines cautions that should be taken when interpreting LLM outputs and proposes a pragmatic reframing for the social sciences in which LLMs are used as high-capacity pattern matchers for quasi-predictive interpolation under explicit scope conditions and not as substitutes for probabilistic inference. Practical guardrails such as independent draws, preregistered human baselines, reliability-aware validation, and subgroup calibration, are introduced so that researchers may engage in useful prototyping and forecasting while avoiding category errors.

Econometrics for Social Good

👍 👎 ♥ Save

Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare

Tencent, Shandong Univer

Abstract
Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare. However, the principles and values that guide these models when distributing scarce societal resources remain largely unexamined. To address this, we introduce the Social Welfare Function (SWF) Benchmark, a dynamic simulation environment where an LLM acts as a sovereign allocator, distributing tasks to a heterogeneous community of recipients. The benchmark is designed to create a persistent trade-off between maximizing collective efficiency (measured by Return on Investment) and ensuring distributive fairness (measured by the Gini coefficient). We evaluate 20 state-of-the-art LLMs and present the first leaderboard for social welfare allocation. Our findings reveal three key insights: (i) A model's general conversational ability, as measured by popular leaderboards, is a poor predictor of its allocation skill. (ii) Most LLMs exhibit a strong default utilitarian orientation, prioritizing group productivity at the expense of severe inequality. (iii) Allocation strategies are highly vulnerable, easily perturbed by output-length constraints and social-influence framing. These results highlight the risks of deploying current LLMs as societal decision-makers and underscore the need for specialized benchmarks and targeted alignment for AI governance.

AI for Social Good

👍 👎 ♥ Save

Know Thyself? On the Incapability and Implications of AI Self-Recognition

Rate this image: 😍 👍 👎

Abstract
Self-recognition is a crucial metacognitive capability for AI systems, relevant not only for psychological analysis but also for safety, particularly in evaluative scenarios. Motivated by contradictory interpretations of whether models possess self-recognition (Panickssery et al., 2024; Davidson et al., 2024), we introduce a systematic evaluation framework that can be easily applied and updated. Specifically, we measure how well 10 contemporary larger language models (LLMs) can identify their own generated text versus text from other models through two tasks: binary self-recognition and exact model prediction. Different from prior claims, our results reveal a consistent failure in self-recognition. Only 4 out of 10 models predict themselves as generators, and the performance is rarely above random chance. Additionally, models exhibit a strong bias toward predicting GPT and Claude families. We also provide the first evaluation of model awareness of their own and others' existence, as well as the reasoning behind their choices in self-recognition. We find that the model demonstrates some knowledge of its own existence and other models, but their reasoning reveals a hierarchical bias. They appear to assume that GPT, Claude, and occasionally Gemini are the top-tier models, often associating high-quality text with them. We conclude by discussing the implications of our findings on AI safety and future directions to develop appropriate AI self-awareness.

👍 👎 ♥ Save

NeurIPS should lead scientific consensus on AI policy

Stanford Institute for HU

Abstract
Designing wise AI policy is a grand challenge for society. To design such policy, policymakers should place a premium on rigorous evidence and scientific consensus. While several mechanisms exist for evidence generation, and nascent mechanisms tackle evidence synthesis, we identify a complete void on consensus formation. In this position paper, we argue NeurIPS should actively catalyze scientific consensus on AI policy. Beyond identifying the current deficit in consensus formation mechanisms, we argue that NeurIPS is the best option due its strengths and the paucity of compelling alternatives. To make progress, we recommend initial pilots for NeurIPS by distilling lessons from the IPCC's leadership to build scientific consensus on climate policy. We dispel predictable counters that AI researchers disagree too much to achieve consensus and that policy engagement is not the business of NeurIPS. NeurIPS leads AI on many fronts, and it should champion scientific consensus to create higher quality AI policy.

AI Insights

Authors propose a unified cross‑domain framework for safety, fairness, and robustness metrics.
They stress transparent model documentation to trace societal impact from data to deployment.
A pilot NeurIPS consensus workshop, modeled after the IPCC’s review cycle, is recommended.
Cybench is highlighted as a tool for quantifying cybersecurity risks in large language models.
The paper urges studies on open‑foundation‑model societal effects, citing recent impact research.
Authors flag the lack of a concrete implementation roadmap as a key weakness.
They caution that new metrics alone may not solve governance issues, calling for deeper normative work.

Healthy Society

👍 👎 ♥ Save

Healthy Lifestyles and Self-Improvement Videos on YouTube: A Thematic Analysis of Teen-Targeted Social Media Content

University of California

Abstract
As teenagers increasingly turn to social media for health-related information, understanding the values of teen-targeted content has become important. Although videos on healthy lifestyles and self-improvement are gaining popularity on social media platforms like YouTube, little is known about how these videos benefit and engage with teenage viewers. To address this, we conducted a thematic analysis of 44 YouTube videos and 66,901 comments. We found that these videos provide various advice on teenagers' common challenges, use engaging narratives for authenticity, and foster teen-centered communities through comments. However, a few videos also gave misleading advice to adolescents that can be potentially harmful. Based on our findings, we discuss design implications for creating relatable and intriguing social media content for adolescents. Additionally, we suggest ways for social media platforms to promote healthier and safer experiences for teenagers.

Inequality

👍 👎 ♥ Save

Magnitude and matrix inequalities

arXiv

Abstract
We present several applications of matrix-theoretic inequalities to the magnitude of metric spaces. We first resolve an open problem by showing that the magnitude of any finite metric space of negative type is less than or equal to its cardinality. This is a direct consequence of Styan's matrix inequality involving the Hadamard product of matrices. By related methods we also show a subadditivity property for the magnitude function of negative type compact metric spaces, and prove a convexity property for the magnitude for metrics interpolating in a natural way between two given, comparable metrics on a given set.

👍 👎 ♥ Save

Certain Inequalities for the generalized polar derivative of a polynomial

Abstract
Recently Rather et al. \cite{NT} considered the generalized derivative and the generalized polar derivative and studied the relative position of zeros of generalized derivative and generalized polar derivative with respect to the zeros of polynomial.\\ \indent In this paper, we establish some inequalities that estimate the maximum modulus of generalized derivative and the generalized polar derivative of the polynomial $P(z)$, which is also the extension of recently known results.

Female Empowerment

👍 👎 ♥ Save

Decoding the Gender Gap: Addressing Gender Stereotypes and Psychological Barriers to Empower Women in Technology

Dsseldorf, Germany and

Abstract
Recently, the unequal presence of women compared to men in technology has attracted the attention of researchers and practitioners across multiple fields. It is time to regard this problem as a global crisis that not only limits access to talent but also reduces the diversity of perspectives that shape technological innovation. This article examines the psychological and social barriers that influence this gap, as well as the interventions designed to reduce it. Using a structured review, the findings assemble evidence on the role of early gender stereotypes in the family and school and the continuation of this crisis in educational and career choices, through to the psychological challenges women face in professional settings, such as feelings of self-undervaluation, occupational anxiety, a heightened fear of technology, and structural limitations in educational environments. Special attention is paid to Germany, where the technology gap is particularly evident and where multiple national programs have been implemented to address it. The present review shows that effective solutions require more than anti-discrimination policies: they should include educational practices, organizational reforms, mentoring, and psychological support. The article concludes by outlining practical and research implications and introduces the NEURON project as a pilot interdisciplinary initiative aimed at accelerating current empowerment efforts and developing new programs for women in technology occupations.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

Racism
Measureable ways to end poverty
Animal Welfare
Casual ML for Social Good
Poverty

You can edit or add more interests any time.

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback

Unsubscribe from these updates