AGI

1FLAT: a Firmamento-based catalog of AGN in Fermi-LAT high Galactic latitude γ-ray sources

New York UniversityAbuD

Rate this image: 😍 👍 👎

Abstract
We present a systematic reassessment of 5,062 high-Galactic latitude gamma-ray sources from the Fermi-LAT 4FGL-DR4 catalog using Firmamento, a web-based platform for multi-frequency source discovery and analysis. Our goal is to provide an independent evaluation of LAT gamma-ray source associations through alternative spectral and spatial methods that combine recent and legacy survey data, supplemented by human supervision of spectral energy distributions (SEDs), source morphology, flux variability, and template-based comparisons. Firmamento confirms the 4FGL-DR4 and 4LAC-DR3 counterparts or unassociated sources in 4,493 cases (88.8%), demonstrating the robustness of both approaches. Beyond this general agreement, we identify 421 new blazar counterparts among previously unassociated sources, thereby reducing the fraction of unidentified extragalactic Fermi-LAT sources from 25% to 17%. In addition, in 64 cases we find alternative blazar associations, while in 49 instances we do not confirm the 4FGL-DR4 association. For all confirmed blazar counterparts we provide homogeneous estimates of synchrotron peak frequency and peak flux using machine-learning and template-based methods; these agree with 4LAC-DR3 values in most cases, though significant discrepancies appear for a few dozen sources, often due to improved X-ray coverage. The primary outcome of this work is the 1st Firmamento LAT AGN table (1FLAT), made publicly available through the Firmamento platform (https://firmamento.nyuad.nyu.edu), where all related multi-wavelength data and images are available. The project involved extensive manual validation and benefited from the active participation of graduate and undergraduate students, highlighting the platform's value for both research and education.

AI Insights

ERCi, a novel probabilistic cross‑matching algorithm, underpins 1FLAT’s association pipeline.
All 1FLAT entries are distributed as FITS binary tables, ready for Astropy ingestion.
The catalog’s machine‑learning module delivers synchrotron peak estimates with <10% scatter versus 4LAC.
Discrepancies in a handful of sources trace back to newly available X‑ray data from eROSITA.
Firmamento’s web interface lets users overlay multi‑band images and SEDs in real time.
Graduate and undergraduate teams performed the 64‑hour visual vetting that validated 4,493 associations.
1FLAT’s open‑access design invites cross‑matching with upcoming surveys like LSST and SKA.

AI Agents

👍 👎 ♥ Save

Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

Abstract
Data lakehouses run sensitive workloads, where AI-driven automation raises concerns about trust, correctness, and governance. We argue that API-first, programmable lakehouses provide the right abstractions for safe-by-design, agentic workflows. Using Bauplan as a case study, we show how data branching and declarative environments extend naturally to agents, enabling reproducibility and observability while reducing the attack surface. We present a proof-of-concept in which agents repair data pipelines using correctness checks inspired by proof-carrying code. Our prototype demonstrates that untrusted AI agents can operate safely on production data and outlines a path toward a fully agentic lakehouse.

👍 👎 ♥ Save

ProSEA: Problem Solving via Exploration Agents

Aitomatic, Inc

Abstract
Large language models (LLMs) have empowered AI agents to tackle increasingly complex tasks. However, most existing agents remain limited to static planning and brittle interactions, falling short of true collaboration or adaptive reasoning. We introduce ProSEA, a modular, general-purpose multi-agent framework designed for iterative problem solving through exploration and plan evolution. ProSEA features a hierarchical architecture in which a Manager Agent orchestrates domain-specialized Expert Agents, decomposes tasks, and adaptively replans based on structured feedback from failed attempts. Unlike prior systems, ProSEA agents report not only success or failure but also detailed reasons for failure and newly discovered constraints, enabling dynamic plan refinement informed by exploratory traces. The framework operates autonomously but supports seamless integration with human collaborators when needed. Experiments on the challenging FinanceBench benchmark demonstrate that ProSEA, even without human feedback, outperforms state-of-the-art baselines and achieves robust performance across reasoning-heavy tasks. These results underscore ProSEA's potential as a foundation for more transparent, adaptive, and human-aligned AI agents.

AI and Society

👍 👎 ♥ Save

Measuring What Matters: The AI Pluralism Index

Universit de Montral

Rate this image: 😍 👍 👎

Abstract
Artificial intelligence systems increasingly mediate knowledge, communication, and decision making. Development and governance remain concentrated within a small set of firms and states, raising concerns that technologies may encode narrow interests and limit public agency. Capability benchmarks for language, vision, and coding are common, yet public, auditable measures of pluralistic governance are rare. We define AI pluralism as the degree to which affected stakeholders can shape objectives, data practices, safeguards, and deployment. We present the AI Pluralism Index (AIPI), a transparent, evidence-based instrument that evaluates producers and system families across four pillars: participatory governance, inclusivity and diversity, transparency, and accountability. AIPI codes verifiable practices from public artifacts and independent evaluations, explicitly handling "Unknown" evidence to report both lower-bound ("evidence") and known-only scores with coverage. We formalize the measurement model; implement a reproducible pipeline that integrates structured web and repository analysis, external assessments, and expert interviews; and assess reliability with inter-rater agreement, coverage reporting, cross-index correlations, and sensitivity analysis. The protocol, codebook, scoring scripts, and evidence graph are maintained openly with versioned releases and a public adjudication process. We report pilot provider results and situate AIPI relative to adjacent transparency, safety, and governance frameworks. The index aims to steer incentives toward pluralistic practice and to equip policymakers, procurers, and the public with comparable evidence.

AI Insights

Imagine model cards closing the AI accountability gap by transparently reporting model behavior.
OECD AI Recommendation pushes for human‑centered, explainable, and fair AI.
UNESCO Ethics Recommendation embeds human values to turn AI into societal good.
HELM from Stanford’s CRFM holistically benchmarks language models on safety and impact.
NIST AI RMF offers a risk‑management cycle for responsible AI governance.
WCAG 2.2 ensures AI interfaces are accessible to users with disabilities.
Krippendorff’s content‑analysis method quantifies stakeholder participation in AI governance.

👍 👎 ♥ Save

AI and Consciousness

Abstract
This is a skeptical overview of the literature on AI consciousness. We will soon create AI systems that are conscious according to some influential, mainstream theories of consciousness but are not conscious according to other influential, mainstream theories of consciousness. We will not be in a position to know which theories are correct and whether we are surrounded by AI systems as richly and meaningfully conscious as human beings or instead only by systems as experientially blank as toasters. None of the standard arguments either for or against AI consciousness takes us far. Table of Contents Chapter One: Hills and Fog Chapter Two: What Is Consciousness? What Is AI? Chapter Three: Ten Possibly Essential Features of Consciousness Chapter Four: Against Introspective and Conceptual Arguments for Essential Features Chapter Five: Materialism and Functionalism Chapter Six: The Turing Test and the Chinese Room Chapter Seven: The Mimicry Argument Against AI Consciousness Chapter Eight: Global Workspace Theories and Higher Order Theories Chapter Nine: Integrated Information, Local Recurrence, Associative Learning, and Iterative Natural Kinds Chapter Ten: Does Biological Substrate Matter? Chapter Eleven: The Problem of Strange Intelligence Chapter Twelve: The Leapfrog Hypothesis and the Social Semi-Solution

Research Automation with AI

👍 👎 ♥ Save

Automated Research Article Classification and Recommendation Using NLP and ML

York University, North YO

Rate this image: 😍 👍 👎

Abstract
In the digital era, the exponential growth of scientific publications has made it increasingly difficult for researchers to efficiently identify and access relevant work. This paper presents an automated framework for research article classification and recommendation that leverages Natural Language Processing (NLP) techniques and machine learning. Using a large-scale arXiv.org dataset spanning more than three decades, we evaluate multiple feature extraction approaches (TF--IDF, Count Vectorizer, Sentence-BERT, USE, Mirror-BERT) in combination with diverse machine learning classifiers (Logistic Regression, SVM, Na\"ive Bayes, Random Forest, Gradient Boosted Trees, and k-Nearest Neighbour). Our experiments show that Logistic Regression with TF--IDF consistently yields the best classification performance, achieving an accuracy of 69\%. To complement classification, we incorporate a recommendation module based on the cosine similarity of vectorized articles, enabling efficient retrieval of related research papers. The proposed system directly addresses the challenge of information overload in digital libraries and demonstrates a scalable, data-driven solution to support literature discovery.

AI Insights

Hybrid ensemble of Logistic Regression, SVM, and Random Forest boosts accuracy beyond single models!
Cross‑dataset validation on arXiv, PubMed, and CiteSeer demonstrates robust generalizability.
User‑feedback loops enable adaptive re‑ranking, refining recommendations over time!
Word2Vec and GloVe embeddings enrich semantic vectors, improving classification precision.
Deep‑learning extraction of patent semantics showcases the framework’s extensibility!
The study omits bias analysis and detailed preprocessing, highlighting future research gaps.
Recommended reading: LDA for topic modeling and the WebFind tool for global paper discovery.

👍 👎 ♥ Save

TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents

University of Illinois at

Abstract
Automatic research with Large Language Models (LLMs) is rapidly gaining importance, driving the development of increasingly complex workflows involving multi-agent systems, planning, tool usage, code execution, and human-agent interaction to accelerate research processes. However, as more researchers and developers begin to use and build upon these tools and platforms, the complexity and difficulty of extending and maintaining such agentic workflows have become a significant challenge, particularly as algorithms and architectures continue to advance. To address this growing complexity, TinyScientist identifies the essential components of the automatic research workflow and proposes an interactive, extensible, and controllable framework that easily adapts to new tools and supports iterative growth. We provide an open-source codebase, an interactive web demonstration, and a PyPI Python package to make state-of-the-art auto-research pipelines broadly accessible to every researcher and developer.

AI Insights

TinyScientist’s “checker” can automatically assess task risk, flagging potential safety issues before execution.
Its “drawer” component produces ML‑centric diagrams on the fly, easing visual communication in papers.
The framework ships with evaluation rubrics that score content richness, reference quality, clarity, depth, and completeness on a 1‑5 scale.
A full ML pipeline—data collection, cleaning, feature engineering, training, evaluation, deployment—is built into the system for end‑to‑end reproducibility.
The paper cites meta‑learning advances such as Neural Tangent Kernel methods and memory‑augmented networks, highlighting their cross‑domain success.
Users should note that the checker’s risk scores can be imperfect and the generated diagrams may need manual tweaking.
TinyScientist’s open‑source Python package and interactive web demo make state‑of‑the‑art auto‑research pipelines accessible to all.

AGI: Artificial General Intelligence

👍 👎 ♥ Save

BuilderBench -- A benchmark for generalist agents

Princeton University

Abstract
Today's AI models learn primarily through mimicry and sharpening, so it is not surprising that they struggle to solve problems beyond the limits set by existing data. To solve novel problems, agents should acquire skills for exploring and learning through experience. Finding a scalable learning mechanism for developing agents that learn through interaction remains a major open problem. In this work, we introduce BuilderBench, a benchmark to accelerate research into agent pre-training that centers open-ended exploration. BuilderBench requires agents to learn how to build any structure using blocks. BuilderBench is equipped with $(1)$ a hardware accelerated simulator of a robotic agent interacting with various physical blocks, and $(2)$ a task-suite with over 42 diverse target structures that are carefully curated to test an understanding of physics, mathematics, and long-horizon planning. During training, agents have to explore and learn general principles about the environment without any external supervision. During evaluation, agents have to build the unseen target structures from the task suite. Solving these tasks requires a sort of \emph{embodied reasoning} that is not reflected in words but rather in actions, experimenting with different strategies and piecing them together. Our experiments show that many of these tasks challenge the current iteration of algorithms. Hence, we also provide a ``training wheels'' protocol, in which agents are trained and evaluated to build a single target structure from the task suite. Finally, we provide single-file implementations of six different algorithms as a reference point for researchers.

AI Insights

BuilderBench tasks explicitly probe a gripper’s pick‑and‑place precision, sequential logic, and packing‑problem solving in a physics‑rich simulation.
The benchmark includes scaffolding challenges that force agents to build temporary support structures for stability.
Adaptive decision‑making is tested by varying block configurations, compelling agents to react to changing environments.
The platform supplies a full toolchain for task creation, simulation, and performance analysis, enabling rapid prototyping.
Recommended reading: “Robotics: Modelling, Planning and Control” and surveys on robot learning from demonstration for foundational theory.
Key literature: “Learning to Grasp and Manipulate Objects with a Robotic Hand” and “Building Support Structures with a Robotic Gripper” provide state‑of‑the‑art methods.

Deep Learning

👍 👎 ♥ Save

Rethinking deep learning: linear regression remains a key benchmark in predicting terrestrial water storage

Rate this image: 😍 👍 👎

Abstract
Recent advances in machine learning such as Long Short-Term Memory (LSTM) models and Transformers have been widely adopted in hydrological applications, demonstrating impressive performance amongst deep learning models and outperforming physical models in various tasks. However, their superiority in predicting land surface states such as terrestrial water storage (TWS) that are dominated by many factors such as natural variability and human driven modifications remains unclear. Here, using the open-access, globally representative HydroGlobe dataset - comprising a baseline version derived solely from a land surface model simulation and an advanced version incorporating multi-source remote sensing data assimilation - we show that linear regression is a robust benchmark, outperforming the more complex LSTM and Temporal Fusion Transformer for TWS prediction. Our findings highlight the importance of including traditional statistical models as benchmarks when developing and evaluating deep learning models. Additionally, we emphasize the critical need to establish globally representative benchmark datasets that capture the combined impact of natural variability and human interventions.

👍 👎 ♥ Save

An in-depth look at approximation via deep and narrow neural networks

University of Hamburg

Abstract
In 2017, Hanin and Sellke showed that the class of arbitrarily deep, real-valued, feed-forward and ReLU-activated networks of width w forms a dense subset of the space of continuous functions on R^n, with respect to the topology of uniform convergence on compact sets, if and only if w>n holds. To show the necessity, a concrete counterexample function f:R^n->R was used. In this note we actually approximate this very f by neural networks in the two cases w=n and w=n+1 around the aforementioned threshold. We study how the approximation quality behaves if we vary the depth and what effect (spoiler alert: dying neurons) cause that behavior.

AI Insights

Depth lowers error until dying ReLU forces a constant output, even when width equals input dimension.
With width n+1, deeper nets keep improving, showing w>n is not a hard limit.
Minimal‑width ReLU nets can approximate any continuous function, confirming Hanin & Sellke’s theorem.
The constant N0≡1/8 is the best uniform approximator for the counterexample, achieving error 1/8 for all depths.
Experiments show the depth‑benefit plateau occurs earlier in higher dimensions due to dying neurons.
Beise et al.’s decision‑region analysis explains constant outputs in narrow deep nets.
Bresler & Nagaraj’s sharp representation theorems give a depth‑dependence framework matching the results.

Interests not found

Help us improve your experience!