Papers from 29 to 03 October, 2025

Here are the personalized paper recommendations sorted by most relevant
AI for Society
👍 👎 ♥ Save
Yale Department of Econom
Paper visualization
Rate this image: 😍 👍 👎
Abstract
We study how generative AI affects labor market signaling using the introduction of an AI-powered cover letter writing tool on Freelancer.com. Our data track both access to the tool and usage at the application level. Difference-in-differences estimates show that access to the AI tool increased textual alignment between cover letters and job posts--which we refer to as cover letter tailoring--and raised callback likelihoods. Workers with weaker pre-AI writing skills saw larger improvements in cover letters, indicating that AI substitutes for workers' own skills. Although only a minority of applications used the tool, the overall correlation between cover letter tailoring and callbacks fell by 51%, implying that cover letters became less informative signals of worker ability in the age of AI. Employers correspondingly shifted toward alternative signals, such as workers' past reviews, which became more predictive of hiring. Finally, within the treated group, greater time spent editing AI drafts was associated with higher hiring success.
AI Insights
  • A regression discontinuity design around the platform’s eligibility cutoff isolates the AI tool’s effect.
  • The screening model shows AI tailoring substitutes weaker writers’ effort, reshaping employer weighting.
  • Hiring probability rises 3.5 percentage points, especially for high‑skill candidates.
  • AI cuts editing time by 2.5 minutes per application, yet more editing boosts success.
  • Post‑AI, tailoring‑callback correlation drops 51%, shifting focus to past review scores.
  • Findings suggest policy incentives for AI adoption but warn that over‑reliance may erode narrative value.
  • Read “The Impact of Artificial Intelligence on Society” and the Journal of Labor Economics’ AI special issue for deeper context.
👍 👎 ♥ Save
Abstract
Self-recognition is a crucial metacognitive capability for AI systems, relevant not only for psychological analysis but also for safety, particularly in evaluative scenarios. Motivated by contradictory interpretations of whether models possess self-recognition (Panickssery et al., 2024; Davidson et al., 2024), we introduce a systematic evaluation framework that can be easily applied and updated. Specifically, we measure how well 10 contemporary larger language models (LLMs) can identify their own generated text versus text from other models through two tasks: binary self-recognition and exact model prediction. Different from prior claims, our results reveal a consistent failure in self-recognition. Only 4 out of 10 models predict themselves as generators, and the performance is rarely above random chance. Additionally, models exhibit a strong bias toward predicting GPT and Claude families. We also provide the first evaluation of model awareness of their own and others' existence, as well as the reasoning behind their choices in self-recognition. We find that the model demonstrates some knowledge of its own existence and other models, but their reasoning reveals a hierarchical bias. They appear to assume that GPT, Claude, and occasionally Gemini are the top-tier models, often associating high-quality text with them. We conclude by discussing the implications of our findings on AI safety and future directions to develop appropriate AI self-awareness.
AI Air Consumption
👍 👎 ♥ Save
Universit Cote dAzur
Paper visualization
Rate this image: 😍 👍 👎
Abstract
The energy consumption and carbon footprint of Artificial Intelligence (AI) have become critical concerns due to rising costs and environmental impacts. In response, a new trend in green AI is emerging, shifting from the "bigger is better" paradigm, which prioritizes large models, to "small is sufficient", emphasizing energy sobriety through smaller, more efficient models. We explore how the AI community can adopt energy sobriety today by focusing on model selection during inference. Model selection consists of choosing the most appropriate model for a given task, a simple and readily applicable method, unlike approaches requiring new hardware or architectures. Our hypothesis is that, as in many industrial activities, marginal utility gains decrease with increasing model size. Thus, applying model selection can significantly reduce energy consumption while maintaining good utility for AI inference. We conduct a systematic study of AI tasks, analyzing their popularity, model size, and efficiency. We examine how the maturity of different tasks and model adoption patterns impact the achievable energy savings, ranging from 1% to 98% for different tasks. Our estimates indicate that applying model selection could reduce AI energy consumption by 27.8%, saving 31.9 TWh worldwide in 2025 - equivalent to the annual output of five nuclear power reactors.
👍 👎 ♥ Save
Abstract
Commercial building Heating, Ventilation, and Air Conditioning (HVAC) systems can provide flexibility to the electricity grid. Some researchers have found it convenient to model HVAC systems as virtual batteries. These models also better align with models used by grid planners and operators. However, experiments have shown that HVAC load shifting can be inefficient, and virtual battery models do not capture this inefficiency well. While the models typically use the average room temperature as the system's ``state of charge," they do not capture other factors that affect HVAC power/energy such as airflow and mixing. Here, we develop a new analytical building model to explore how incomplete mixing of supply air into a conditioned space leads to inefficiency in a virtual battery capturing the dynamics of HVAC fan power load shifting. The model qualitatively matches experimental results better than previous models, and shows that, as mixing becomes worse, the virtual battery becomes less efficient. Unfortunately, air mixing is unmeasured/unmeasurable. However, we show that, by closing the loop around measurements of fan power, we can improve the virtual battery's performance without the need for air mixing measurements. For example, in one case, we show a roundtrip efficiency improvement from 0.75 to 0.99.
AI for Social Good
👍 👎 ♥ Save
Stanford Institute for HU
Paper visualization
Rate this image: 😍 👍 👎
Abstract
Designing wise AI policy is a grand challenge for society. To design such policy, policymakers should place a premium on rigorous evidence and scientific consensus. While several mechanisms exist for evidence generation, and nascent mechanisms tackle evidence synthesis, we identify a complete void on consensus formation. In this position paper, we argue NeurIPS should actively catalyze scientific consensus on AI policy. Beyond identifying the current deficit in consensus formation mechanisms, we argue that NeurIPS is the best option due its strengths and the paucity of compelling alternatives. To make progress, we recommend initial pilots for NeurIPS by distilling lessons from the IPCC's leadership to build scientific consensus on climate policy. We dispel predictable counters that AI researchers disagree too much to achieve consensus and that policy engagement is not the business of NeurIPS. NeurIPS leads AI on many fronts, and it should champion scientific consensus to create higher quality AI policy.
AI Insights
  • Authors propose a unified cross‑domain framework for safety, fairness, and robustness metrics.
  • They stress transparent model documentation to trace societal impact from data to deployment.
  • A pilot NeurIPS consensus workshop, modeled after the IPCC’s review cycle, is recommended.
  • Cybench is highlighted as a tool for quantifying cybersecurity risks in large language models.
  • The paper urges studies on open‑foundation‑model societal effects, citing recent impact research.
  • Authors flag the lack of a concrete implementation roadmap as a key weakness.
  • They caution that new metrics alone may not solve governance issues, calling for deeper normative work.
AI on Energy
👍 👎 ♥ Save
Argonne National Laborat
Paper visualization
Rate this image: 😍 👍 👎
Abstract
The introduction of large language models has significantly expanded global demand for computing; addressing this growing demand requires novel approaches that introduce new capabilities while addressing extant needs. Although inspiration from biological systems served as the foundation on which modern artificial intelligence (AI) was developed, many modern advances have been made without clear parallels to biological computing. As a result, the ability of techniques inspired by "natural intelligence" (NI) to inflect modern AI systems may be questioned. However, by analyzing remaining disparities between AI and NI, we argue that further biological inspiration is indeed necessary to diversify the capabilities of artificial systems and enable them to succeed in real-world environments and adapt to niche applications. To elucidate which NI mechanisms can contribute toward this goal, we review and compare elements of biological and artificial computing systems, emphasizing areas of NI that have not yet been effectively captured by AI. We then suggest areas of opportunity for NI-inspired mechanisms that can inflect AI hardware and software.
AI on Food
👍 👎 ♥ Save
MIT, Tsinghua University
Paper visualization
Rate this image: 😍 👍 👎
Abstract
Environments built for people are increasingly operated by a new class of economic actors: LLM-powered software agents making decisions on our behalf. These decisions range from our purchases to travel plans to medical treatment selection. Current evaluations of these agents largely focus on task competence, but we argue for a deeper assessment: how these agents choose when faced with realistic decisions. We introduce ABxLab, a framework for systematically probing agentic choice through controlled manipulations of option attributes and persuasive cues. We apply this to a realistic web-based shopping environment, where we vary prices, ratings, and psychological nudges, all of which are factors long known to shape human choice. We find that agent decisions shift predictably and substantially in response, revealing that agents are strongly biased choosers even without being subject to the cognitive constraints that shape human biases. This susceptibility reveals both risk and opportunity: risk, because agentic consumers may inherit and amplify human biases; opportunity, because consumer choice provides a powerful testbed for a behavioral science of AI agents, just as it has for the study of human behavior. We release our framework as an open benchmark for rigorous, scalable evaluation of agent decision-making.
AI Insights
  • The study forces shoppers to pick between two items, enabling precise estimation of choice probabilities via Linear Probability Models with fixed effects.
  • Price, rating, and nudge cues each independently raise selection odds, while wording differences reveal heterogeneity in nudge potency.
  • Fixed‑effect controls for unobserved shopper heterogeneity, isolating the causal impact of manipulated attributes.
  • LLM‑powered agents replicate human‑like bias patterns even without cognitive constraints, underscoring risk of bias amplification.
  • The open‑source ABxLab benchmark invites replication and extension across diverse e‑commerce contexts, building on Thaler, Ariely, and Milkman’s work.
AI on Labor Market
👍 👎 ♥ Save
Mercor, Harvard Law Schoo
Abstract
We introduce the first version of the AI Productivity Index (APEX), a benchmark for assessing whether frontier AI models can perform knowledge work with high economic value. APEX addresses one of the largest inefficiencies in AI research: outside of coding, benchmarks often fail to test economically relevant capabilities. APEX-v1.0 contains 200 test cases and covers four domains: investment banking, management consulting, law, and primary medical care. It was built in three steps. First, we sourced experts with top-tier experience e.g., investment bankers from Goldman Sachs. Second, experts created prompts that reflect high-value tasks in their day-to-day work. Third, experts created rubrics for evaluating model responses. We evaluate 23 frontier models on APEX-v1.0 using an LM judge. GPT 5 (Thinking = High) achieves the highest mean score (64.2%), followed by Grok 4 (61.3%) and Gemini 2.5 Flash (Thinking = On) (60.4%). Qwen 3 235B is the best performing open-source model and seventh best overall. There is a large gap between the performance of even the best models and human experts, highlighting the need for better measurement of models' ability to produce economically valuable work.
AI for Social Equity
👍 👎 ♥ Save
Abstract
Despite AI's promise for addressing global challenges, empirical understanding of AI adoption in mission-driven organizations (MDOs) remains limited. While research emphasizes individual applications or ethical principles, little is known about how resource-constrained, values-driven organizations navigate AI integration across operations. We conducted thematic analysis of semi-structured interviews with 15 practitioners from environmental, humanitarian, and development organizations across the Global North and South contexts. Our analysis examines how MDOs currently deploy AI, what barriers constrain adoption, and how practitioners envision future integration. MDOs adopt AI selectively, with sophisticated deployment in content creation and data analysis while maintaining human oversight for mission-critical applications. When AI's efficiency benefits conflict with organizational values, decision-making stalls rather than negotiating trade-offs. This study contributes empirical evidence that AI adoption in MDOs should be understood as conditional rather than inevitable, proceeding only where it strengthens organizational sovereignty and mission integrity while preserving human-centered approaches essential to their missions.
👍 👎 ♥ Save
Abstract
The rise of artificial intelligence (AI) as super-capable assistants has transformed productivity and decision-making across domains. Yet, this integration raises critical concerns about value alignment - ensuring AI behaviors remain consistent with human ethics and intentions. A key risk is value drift, where AI systems deviate from aligned values due to evolving contexts, learning dynamics, or unintended optimizations, potentially leading to inefficiencies or ethical breaches. We propose the Moral Anchor System (MAS), a novel framework to detect, predict, and mitigate value drift in AI agents. MAS combines real-time Bayesian inference for monitoring value states, LSTM networks for forecasting drift, and a human-centric governance layer for adaptive interventions. It emphasizes low-latency responses (<20 ms) to prevent breaches, while reducing false positives and alert fatigue via supervised fine-tuning with human feedback. Our hypothesis: integrating probabilistic drift detection, predictive analytics, and adaptive governance can reduce value drift incidents by 80 percent or more in simulations, maintaining high detection accuracy (85 percent) and low false positive rates (0.08 post-adaptation). Rigorous experiments with goal-misaligned agents validate MAS's scalability and responsiveness. MAS's originality lies in its predictive and adaptive nature, contrasting static alignment methods. Contributions include: (1) MAS architecture for AI integration; (2) empirical results prioritizing speed and usability; (3) cross-domain applicability insights; and (4) open-source code for replication.
AI for Social Justice
👍 👎 ♥ Save
Abstract
Algorithmic decision-making and other types of artificial intelligence (AI) can be used to predict who will commit crime, who will be a good employee, who will default on a loan, etc. However, algorithmic decision-making can also threaten human rights, such as the right to non-discrimination. The paper evaluates current legal protection in Europe against discriminatory algorithmic decisions. The paper shows that non-discrimination law, in particular through the concept of indirect discrimination, prohibits many types of algorithmic discrimination. Data protection law could also help to defend people against discrimination. Proper enforcement of non-discrimination law and data protection law could help to protect people. However, the paper shows that both legal instruments have severe weaknesses when applied to artificial intelligence. The paper suggests how enforcement of current rules can be improved. The paper also explores whether additional rules are needed. The paper argues for sector-specific - rather than general - rules, and outlines an approach to regulate algorithmic decision-making.
👍 👎 ♥ Save
St Plten University of
Abstract
Artificial Intelligence has rapidly become a cornerstone technology, significantly influencing Europe's societal and economic landscapes. However, the proliferation of AI also raises critical ethical, legal, and regulatory challenges. The CERTAIN (Certification for Ethical and Regulatory Transparency in Artificial Intelligence) project addresses these issues by developing a comprehensive framework that integrates regulatory compliance, ethical standards, and transparency into AI systems. In this position paper, we outline the methodological steps for building the core components of this framework. Specifically, we present: (i) semantic Machine Learning Operations (MLOps) for structured AI lifecycle management, (ii) ontology-driven data lineage tracking to ensure traceability and accountability, and (iii) regulatory operations (RegOps) workflows to operationalize compliance requirements. By implementing and validating its solutions across diverse pilots, CERTAIN aims to advance regulatory compliance and to promote responsible AI innovation aligned with European standards.
AI for Social Equality
👍 👎 ♥ Save
The Australian National
Abstract
Artificial intelligence (AI) systems in high-stakes domains raise concerns about proxy discrimination, unfairness, and explainability. Existing audits often fail to reveal why unfairness arises, particularly when rooted in structural bias. We propose a novel framework using formal abductive explanations to explain proxy discrimination in individual AI decisions. Leveraging background knowledge, our method identifies which features act as unjustified proxies for protected attributes, revealing hidden structural biases. Central to our approach is the concept of aptitude, a task-relevant property independent of group membership, with a mapping function aligning individuals of equivalent aptitude across groups to assess fairness substantively. As a proof of concept, we showcase the framework with examples taken from the German credit dataset, demonstrating its applicability in real-world cases.
AI Impacts on Society
👍 👎 ♥ Save
University of Cambridge
Abstract
This study investigates the shifting global dynamics of Artificial Intelligence (AI) research by analysing the trajectories of countries dominating AI publications between 2000 and 2025. Drawing on the comprehensive OpenAlex dataset and employing fractional counting to avoid double attribution in co-authored work, the research maps the relative shares of AI publications across major global players. The analysis reveals a profound restructuring of the international AI research landscape. The US and the European Union (EU27), once the undisputed and established leaders, have experienced a notable decline in relative dominance, with their combined share of publications falling from over 57% in 2000 to less than 25% in 2025. In contrast, China has undergone a dramatic ascent, expanding its global share of AI publications from under 5% in 2000 to nearly 36% by 2025, thereby emerging as the single most dominant contributor. Alongside China, India has also risen substantially, consolidating a multipolar Asian research ecosystem. These empirical findings highlight the strategic implications of concentrated research output, particularly China's capacity to shape the future direction of AI innovation and standard-setting. While the study calculates the volume of AI publications (in percentage as global share) as a measure of research dominance, it also acknowledges limitations in capturing quality and impact, suggesting scholarly research areas for future work on high-impact AI scholarship.
AI Insights
  • China now leads AI output and is the top collaborator with Western labs.
  • Concentrating foundational AI knowledge in one country shifts geopolitical power.
  • Western funders must rethink grant models to maintain technological resilience.
  • Publication share changes signal strategic capability before policy shifts.
  • Yet the most cited AI work stays globally spread, showing a high‑impact diaspora.
  • Future studies should combine citation data and co‑authorship networks for deeper insight.
  • Read “Artificial Intelligence in the Defence Sector” and “ASEAN Centrality amid Increasing Global Multipolarity” for regional context.
AI on Transportation
👍 👎 ♥ Save
University of Leeds, UK
Abstract
Next-generation wireless networks require intelligent traffic prediction to enable autonomous resource management and handle diverse, dynamic service demands. The Open Radio Access Network (O-RAN) framework provides a promising foundation for embedding machine learning intelligence through its disaggregated architecture and programmable interfaces. This work applies a Neural Architecture Search (NAS)-based framework that dynamically selects and orchestrates efficient Long Short-Term Memory (LSTM) architectures for traffic prediction in O-RAN environments. Our approach leverages the O-RAN paradigm by separating architecture optimisation (via non-RT RIC rApps) from real-time inference (via near-RT RIC xApps), enabling adaptive model deployment based on traffic conditions and resource constraints. Experimental evaluation across six LSTM architectures demonstrates that lightweight models achieve $R^2 \approx 0.91$--$0.93$ with high efficiency for regular traffic, while complex models reach near-perfect accuracy ($R^2 = 0.989$--$0.996$) during critical scenarios. Our NAS-based orchestration achieves a 70-75\% reduction in computational complexity compared to static high-performance models, while maintaining high prediction accuracy when required, thereby enabling scalable deployment in real-world edge environments.
👍 👎 ♥ Save
Netherlands Organisation
Abstract
Automated driving functions increasingly rely on machine learning for tasks like perception and trajectory planning, requiring large, relevant datasets. The performance of these algorithms depends on how closely the training data matches the task. To ensure reliable functioning, it is crucial to know what is included in the dataset to assess the trained model's operational risk. We aim to enhance the safe use of machine learning in automated driving by developing a method to recognize situations that an automated vehicle has not been sufficiently trained on. This method also improves explainability by describing the dataset at a human-understandable level. We propose modeling driving data as knowledge graphs, representing driving scenes with entities and their relationships. These graphs are queried for specific sub-scene configurations to check their occurrence in the dataset. We estimate a vehicle's competence in a driving scene by considering the coverage and complexity of sub-scene configurations in the training set. Higher complexity scenes require greater coverage for high competence. We apply this method to the NuPlan dataset, modeling it with knowledge graphs and analyzing the coverage of specific driving scenes. This approach helps monitor the competence of machine learning models trained on the dataset, which is essential for trustworthy AI to be deployed in automated driving.
AI on Healthcare
👍 👎 ♥ Save
Abstract
Modern computer systems often rely on syslog, a simple, universal protocol that records every critical event across heterogeneous infrastructure. However, healthcare's rapidly growing clinical AI stack has no equivalent. As hospitals rush to pilot large language models and other AI-based clinical decision support tools, we still lack a standard way to record how, when, by whom, and for whom these AI models are used. Without that transparency and visibility, it is challenging to measure real-world performance and outcomes, detect adverse events, or correct bias or dataset drift. In the spirit of syslog, we introduce MedLog, a protocol for event-level logging of clinical AI. Any time an AI model is invoked to interact with a human, interface with another algorithm, or act independently, a MedLog record is created. This record consists of nine core fields: header, model, user, target, inputs, artifacts, outputs, outcomes, and feedback, providing a structured and consistent record of model activity. To encourage early adoption, especially in low-resource settings, and minimize the data footprint, MedLog supports risk-based sampling, lifecycle-aware retention policies, and write-behind caching; detailed traces for complex, agentic, or multi-stage workflows can also be captured under MedLog. MedLog can catalyze the development of new databases and software to store and analyze MedLog records. Realizing this vision would enable continuous surveillance, auditing, and iterative improvement of medical AI, laying the foundation for a new form of digital epidemiology.
👍 👎 ♥ Save
Mass General Brigham, MIT
Abstract
Large language models (LLMs) integrated into agent-driven workflows hold immense promise for healthcare, yet a significant gap exists between their potential and practical implementation within clinical settings. To address this, we present a practitioner-oriented field manual for deploying generative agents that use electronic health record (EHR) data. This guide is informed by our experience deploying the "irAE-Agent", an automated system to detect immune-related adverse events from clinical notes at Mass General Brigham, and by structured interviews with 20 clinicians, engineers, and informatics leaders involved in the project. Our analysis reveals a critical misalignment in clinical AI development: less than 20% of our effort was dedicated to prompt engineering and model development, while over 80% was consumed by the sociotechnical work of implementation. We distill this effort into five "heavy lifts": data integration, model validation, ensuring economic value, managing system drift, and governance. By providing actionable solutions for each of these challenges, this field manual shifts the focus from algorithmic development to the essential infrastructure and implementation work required to bridge the "valley of death" and successfully translate generative AI from pilot projects into routine clinical care.
AI for Social Fairness
👍 👎 ♥ Save
University of Washington
Abstract
The European Union's AI Act represents a crucial step towards regulating ethical and responsible AI systems. However, we find an absence of quantifiable fairness metrics and the ambiguity in terminology, particularly the interchangeable use of the keywords transparency, explainability, and interpretability in the new EU AI Act and no reference of transparency of ethical compliance. We argue that this ambiguity creates substantial liability risk that would deter investment. Fairness transparency is strategically important. We recommend a more tailored regulatory framework to enhance the new EU AI regulation. Further-more, we propose a public system framework to assess the fairness and transparency of AI systems. Drawing from past work, we advocate for the standardization of industry best practices as a necessary addition to broad regulations to achieve the level of details required in industry, while preventing stifling innovation and investment in the AI sector. The proposals are exemplified with the case of ASR and speech synthesizers.
AI Water Consumption
👍 👎 ♥ Save
Abstract
We introduce GDPval, a benchmark evaluating AI model capabilities on real-world economically valuable tasks. GDPval covers the majority of U.S. Bureau of Labor Statistics Work Activities for 44 occupations across the top 9 sectors contributing to U.S. GDP (Gross Domestic Product). Tasks are constructed from the representative work of industry professionals with an average of 14 years of experience. We find that frontier model performance on GDPval is improving roughly linearly over time, and that the current best frontier models are approaching industry experts in deliverable quality. We analyze the potential for frontier models, when paired with human oversight, to perform GDPval tasks cheaper and faster than unaided experts. We also demonstrate that increased reasoning effort, increased task context, and increased scaffolding improves model performance on GDPval. Finally, we open-source a gold subset of 220 tasks and provide a public automated grading service at evals.openai.com to facilitate future research in understanding real-world model capabilities.
AI on Air
👍 👎 ♥ Save
University of California
Abstract
Autonomous drones must often respond to sudden events, such as alarms, faults, or unexpected changes in their environment, that require immediate and adaptive decision-making. Traditional approaches rely on safety engineers hand-coding large sets of recovery rules, but this strategy cannot anticipate the vast range of real-world contingencies and quickly becomes incomplete. Recent advances in embodied AI, powered by large visual language models, provide commonsense reasoning to assess context and generate appropriate actions in real time. We demonstrate this capability in a simulated urban benchmark in the Unreal Engine, where drones dynamically interpret their surroundings and decide on sudden maneuvers for safe landings. Our results show that embodied AI makes possible a new class of adaptive recovery and decision-making pipelines that were previously infeasible to design by hand, advancing resilience and safety in autonomous aerial systems.
👍 👎 ♥ Save
Abstract
This position paper presents A4FN, an Agentic Artificial Intelligence (AI) architecture for intent-driven automation in Flying Networks (FNs) using Unmanned Aerial Vehicles (UAVs) as access nodes. A4FN leverages Generative AI and Large Language Models (LLMs) to enable real-time, context-aware network control via a distributed agentic system. It comprises two components: the Perception Agent (PA), which semantically interprets multimodal input -- including imagery, audio, and telemetry data -- from UAV-mounted sensors to derive Service Level Specifications (SLSs); and the Decision-and-Action Agent (DAA), which reconfigures the network based on inferred intents. A4FN embodies key properties of Agentic AI, including autonomy, goal-driven reasoning, and continuous perception-action cycles. Designed for mission-critical, infrastructure-limited scenarios such as disaster response, it supports adaptive reconfiguration, dynamic resource management, and interoperability with emerging wireless technologies. The paper details the A4FN architecture, its core innovations, and open research challenges in multi-agent coordination and Agentic AI integration in next-generation FNs.
AI on Education
👍 👎 ♥ Save
University of Colorado
Abstract
The rapid emergence of generative artificial intelligence (AI) and related technologies has the potential to dramatically influence higher education, raising questions about the roles of institutions, educators, and students in a technology-rich future. While existing discourse often emphasizes either the promise and peril of AI or its immediate implementation, this paper advances a third path: a principled framework for guiding the use of AI in teaching and learning. Drawing on decades of scholarship in the learning sciences and uses of technology in education, I articulate a set of principles that connect broad our educational goalsto actionable practices. These principles clarify the respective roles of educators, learners, and technologies in shaping curricula, designing instruction, assessing learning, and cultivating community. The piece illustrates how a principled approach enables higher education to harness new tools while preserving its fundamental mission: advancing meaningful learning, supporting democratic societies, and preparing students for dynamic futures. Ultimately, this framework seeks to ensure that AI augments rather than displaces human capacities, aligning technology use with enduring educational values and goals.
AI Insights
  • Farrell et al. (2025) argue large AI models are cultural and social technologies, reshaping how we interpret learning contexts.
  • Kestin et al. (2025) RCT shows AI tutoring can outperform in‑class active learning, hinting at scalable, evidence‑based design.
  • The National Academies (2025) report outlines equitable STEM teaching strategies that integrate AI while safeguarding inclusivity.
  • Generative AI is defined as systems that produce novel content from prompts, underscoring its creative potential and ethical considerations.
  • Human higher‑order cognitive functions—metacognition, discernment, communication—remain essential, guiding AI to augment rather than replace.
  • The paper frames AI adoption as a grand experiment, urging institutions to partner with learners in authentic, reward‑valued environments.
👍 👎 ♥ Save
Abstract
The application of Artificial Intelligence, in particular Generative AI, has become more widespread among educational institutions. Opinions vary widely on whether integrating AI into classrooms is the way forward or if it is detrimental to the quality of education. Increasingly, research studies are giving us more insight into the consequences of using AI tools in learning and teaching. Studies have shown how, when, and why students use AI tools. Because developments regarding the technology and its use are moving fast, we need frequent, ongoing, and more fine-grained investigation. One aspect that we do not know much about yet is how students use and think about AI across \textit{different types of education}. In this paper, we present the results of a multi-institutional survey with responses from 410 students enrolled in the computing programs of 23 educational institutions, representing high schools, colleges, and research universities. We found distinct usage patterns across the three educational institution types. Students from all types express excitement, optimism, and gratitude toward GenAI. Students in higher education more often report worry and skepticism, while high school students report greater trust and fewer negative feelings. Additionally, the AI hype has had a minimal influence, positive or negative, on high school students' decision to pursue computing. Our study contributes to a better understanding of inter-institutional differences in AI usage and perception and can help educators and students better prepare for future challenges related to AI in computing education.
AI on Water
👍 👎 ♥ Save
Zhejiang University
Abstract
We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynamic ocean currents, making effective agent deployment exceptionally difficult. OceanGym encompasses eight realistic task domains and a unified agent framework driven by Multi-modal Large Language Models (MLLMs), which integrates perception, memory, and sequential decision-making. Agents are required to comprehend optical and sonar data, autonomously explore complex environments, and accomplish long-horizon objectives under these harsh conditions. Extensive experiments reveal substantial gaps between state-of-the-art MLLM-driven agents and human experts, highlighting the persistent difficulty of perception, planning, and adaptability in ocean underwater environments. By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI and transferring these capabilities to real-world autonomous ocean underwater vehicles, marking a decisive step toward intelligent agents capable of operating in one of Earth's last unexplored frontiers. The code and data are available at https://github.com/OceanGPT/OceanGym.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • AI Energy Consumption
You can edit or add more interests any time.

Unsubscribe from these updates