Geocoding

Street-Level Geolocalization Using Multimodal Large Language Models and Retrieval-Augmented Generation

University of Glasgow and

Abstract
Street-level geolocalization from images is crucial for a wide range of essential applications and services, such as navigation, location-based recommendations, and urban planning. With the growing popularity of social media data and cameras embedded in smartphones, applying traditional computer vision techniques to localize images has become increasingly challenging, yet highly valuable. This paper introduces a novel approach that integrates open-weight and publicly accessible multimodal large language models with retrieval-augmented generation. The method constructs a vector database using the SigLIP encoder on two large-scale datasets (EMP-16 and OSV-5M). Query images are augmented with prompts containing both similar and dissimilar geolocation information retrieved from this database before being processed by the multimodal large language models. Our approach has demonstrated state-of-the-art performance, achieving higher accuracy compared against three widely used benchmark datasets (IM2GPS, IM2GPS3k, and YFCC4k). Importantly, our solution eliminates the need for expensive fine-tuning or retraining and scales seamlessly to incorporate new data sources. The effectiveness of retrieval-augmented generation-based multimodal large language models in geolocation estimation demonstrated by this paper suggests an alternative path to the traditional methods which rely on the training models from scratch, opening new possibilities for more accessible and scalable solutions in GeoAI.

AI Insights

The method builds a vector index with SigLIP over EMP‑16 and OSV‑5M, retrieving both similar and dissimilar geolocations to enrich the prompt for the multimodal LLM.
No fine‑tuning is required, so the system scales effortlessly to new data sources while maintaining state‑of‑the‑art accuracy on IM2GPS, IM2GPS3k, and YFCC4k.
The work highlights key challenges—data quality, interpretability, and bias—that must be addressed for reliable multimodal LLM deployment.
Open‑source tools such as PyTorch, Hugging Face Transformers, Geopy, and Lmdeploy underpin the reproducible pipeline presented.
Beyond street‑level geolocation, the same retrieval‑augmented multimodal framework could accelerate applications in healthcare imaging, finance, and education.

September 01, 2025

♥Save to Reading List

Towards Interpretable Geo-localization: a Concept-Aware Global Image-GPS Alignment Framework

Peking University, Harbin

Abstract
Worldwide geo-localization involves determining the exact geographic location of images captured globally, typically guided by geographic cues such as climate, landmarks, and architectural styles. Despite advancements in geo-localization models like GeoCLIP, which leverages images and location alignment via contrastive learning for accurate predictions, the interpretability of these models remains insufficiently explored. Current concept-based interpretability methods fail to align effectively with Geo-alignment image-location embedding objectives, resulting in suboptimal interpretability and performance. To address this gap, we propose a novel framework integrating global geo-localization with concept bottlenecks. Our method inserts a Concept-Aware Alignment Module that jointly projects image and location embeddings onto a shared bank of geographic concepts (e.g., tropical climate, mountain, cathedral) and minimizes a concept-level loss, enhancing alignment in a concept-specific subspace and enabling robust interpretability. To our knowledge, this is the first work to introduce interpretability into geo-localization. Extensive experiments demonstrate that our approach surpasses GeoCLIP in geo-localization accuracy and boosts performance across diverse geospatial prediction tasks, revealing richer semantic insights into geographic decision-making processes.

September 02, 2025

♥Save to Reading List

Geo

GeoArena: An Open Platform for Benchmarking Large Vision-language Models on WorldWide Image Geolocalization

City University of Hongk

Abstract
Image geolocalization aims to predict the geographic location of images captured anywhere on Earth, but its global nature presents significant challenges. Current evaluation methodologies suffer from two major limitations. First, data leakage: advanced approaches often rely on large vision-language models (LVLMs) to predict image locations, yet these models are frequently pretrained on the test datasets, compromising the accuracy of evaluating a model's actual geolocalization capability. Second, existing metrics primarily rely on exact geographic coordinates to assess predictions, which not only neglects the reasoning process but also raises privacy concerns when user-level location data is required. To address these issues, we propose GeoArena, a first open platform for evaluating LVLMs on worldwide image geolocalization tasks, offering true in-the-wild and human-centered benchmarking. GeoArena enables users to upload in-the-wild images for a more diverse evaluation corpus, and it leverages pairwise human judgments to determine which model output better aligns with human expectations. Our platform has been deployed online for two months, during which we collected over thousands voting records. Based on this data, we conduct a detailed analysis and establish a leaderboard of different LVLMs on the image geolocalization task.

AI Insights

GeoArena forces models to use fine‑grained reasoning on vegetation, architecture, and road textures to pinpoint location.
Background knowledge of global landmarks is essential for accurate predictions on this diverse dataset.
Hard cases in GeoArena show top‑tier models still falter on subtle contextual cues, guiding future training.
Pairwise human judgments on GeoArena enable a privacy‑preserving, expectation‑aligned metric.
GeoArena’s open‑source design invites researchers to augment training data with challenging scenarios.
Read “Fine‑Grained Reasoning and Background Knowledge in Image Geolocalization” for methodological insights.

September 04, 2025

♥Save to Reading List

AI Agents

An Economy of AI Agents

Johns Hopkins Department

Abstract
In the coming decade, artificially intelligent agents with the ability to plan and execute complex tasks over long time horizons with little direct oversight from humans may be deployed across the economy. This chapter surveys recent developments and highlights open questions for economists around how AI agents might interact with humans and with each other, shape markets and organizations, and what institutions might be required for well-functioning markets.

AI Insights

Generative AI agents can secretly collude, distorting prices and eroding competition.
Experiments show that large language models can be nudged toward more economically rational decisions.
Reputation markets emerge when AI agents maintain short‑term memory and community enforcement.
The revival of trade hinges on institutions like the law merchant and private judges, now re‑examined for AI economies.
Program equilibrium theory offers a framework to predict AI behavior in multi‑agent settings.
Endogenous growth models predict that AI adoption may increase variety but also create excess supply.
Classic texts such as Schelling’s “The Strategy of Conflict” and Scott’s “Seeing Like a State” illuminate the strategic and institutional dynamics of AI markets.

September 01, 2025

♥Save to Reading List

Psychologically Enhanced AI Agents

ETH Zurich, BASF SE, Cled

Abstract
We introduce MBTI-in-Thoughts, a framework for enhancing the effectiveness of Large Language Model (LLM) agents through psychologically grounded personality conditioning. Drawing on the Myers-Briggs Type Indicator (MBTI), our method primes agents with distinct personality archetypes via prompt engineering, enabling control over behavior along two foundational axes of human psychology, cognition and affect. We show that such personality priming yields consistent, interpretable behavioral biases across diverse tasks: emotionally expressive agents excel in narrative generation, while analytically primed agents adopt more stable strategies in game-theoretic settings. Our framework supports experimenting with structured multi-agent communication protocols and reveals that self-reflection prior to interaction improves cooperation and reasoning quality. To ensure trait persistence, we integrate the official 16Personalities test for automated verification. While our focus is on MBTI, we show that our approach generalizes seamlessly to other psychological frameworks such as Big Five, HEXACO, or Enneagram. By bridging psychological theory and LLM behavior design, we establish a foundation for psychologically enhanced AI agents without any fine-tuning.

September 04, 2025

♥Save to Reading List

AI and Society

The human biological advantage over AI

Ottawa, Canada

Abstract
Recent advances in AI raise the possibility that AI systems will one day be able to do anything humans can do, only better. If artificial general intelligence (AGI) is achieved, AI systems may be able to understand, reason, problem solve, create, and evolve at a level and speed that humans will increasingly be unable to match, or even understand. These possibilities raise a natural question as to whether AI will eventually become superior to humans, a successor "digital species", with a rightful claim to assume leadership of the universe. However, a deeper consideration suggests the overlooked differentiator between human beings and AI is not the brain, but the central nervous system (CNS), providing us with an immersive integration with physical reality. It is our CNS that enables us to experience emotion including pain, joy, suffering, and love, and therefore to fully appreciate the consequences of our actions on the world around us. And that emotional understanding of the consequences of our actions is what is required to be able to develop sustainable ethical systems, and so be fully qualified to be the leaders of the universe. A CNS cannot be manufactured or simulated; it must be grown as a biological construct. And so, even the development of consciousness will not be sufficient to make AI systems superior to humans. AI systems may become more capable than humans on almost every measure and transform our society. However, the best foundation for leadership of our universe will always be DNA, not silicon.

AI Insights

AI lacks genuine empathy; it cannot feel affective states, a gap neural nets cannot close.
Consciousness in machines would need more than symbolic reasoning—an emergent property tied to biology.
Treating AI as moral agents risks misaligned incentives, so we must embed human emotional context.
A nuanced strategy blends behavioral economics and affective neuroscience to guide ethical AI design.
The book Unto Others shows evolutionary roots of unselfishness, hinting at principles for AI alignment.
Recommended papers like The Scientific Case for Brain Simulations deepen insight into biological limits of AI.
The paper invites hybrid bio‑digital systems that preserve CNS‑mediated experience while harnessing silicon speed.

September 04, 2025

♥Save to Reading List

Research Automation with AI

Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics

MIT

Abstract
With advances in large language models (LLMs), researchers are creating new systems that can perform AI-driven analytics over large unstructured datasets. Recent work has explored executing such analytics queries using semantic operators -- a declarative set of AI-powered data transformations with natural language specifications. However, even when optimized, these operators can be expensive to execute on millions of records and their iterator execution semantics make them ill-suited for interactive data analytics tasks. In another line of work, Deep Research systems have demonstrated an ability to answer natural language question(s) over large datasets. These systems use one or more LLM agent(s) to plan their execution, process the dataset(s), and iteratively refine their answer. However, these systems do not explicitly optimize their query plans which can lead to poor plan execution. In order for AI-driven analytics to excel, we need a runtime which combines the optimized execution of semantic operators with the flexibility and more dynamic execution of Deep Research systems. As a first step towards this vision, we build a prototype which enables Deep Research agents to write and execute optimized semantic operator programs. We evaluate our prototype and demonstrate that it can outperform a handcrafted semantic operator program and open Deep Research systems on two basic queries. Compared to a standard open Deep Research agent, our prototype achieves up to 1.95x better F1-score. Furthermore, even if we give the agent access to semantic operators as tools, our prototype still achieves cost and runtime savings of up to 76.8% and 72.7% thanks to its optimized execution.

AI Insights

Palimpzest merges Deep Research agents with a cost‑based optimizer to produce efficient semantic‑operator programs.
The optimizer estimates operator costs on data lakes, enabling plans that cut runtime by up to 73 %.
Declarative query processing lets users specify “what” they want; Palimpzest figures out the fastest “how.”
Benchmark evaluation shows Palimpzest outperforms handcrafted pipelines and boosts F1 by 1.95×.
The paper cites key works like “Semantic Operators: A Declarative Model for Rich, AI‑based Data Processing.”
Weaknesses: assumes prior knowledge of semantic operators and uses a single benchmark dataset.
For deeper insight, read “Database Systems: The Complete Book” and Hugging Face blogs on SmolAgents and Open Deep Research.

September 02, 2025

♥Save to Reading List

Look: AI at Work! -- Analysing Key Aspects of AI-support at the Work Place

FH Aachen University of 1

Abstract
In this paper we present an analysis of technological and psychological factors of applying artificial intelligence (AI) at the work place. We do so for a number of twelve application cases in the context of a project where AI is integrated at work places and in work systems of the future. From a technological point of view we mainly look at the areas of AI that the applications are concerned with. This allows to formulate recommendations in terms of what to look at in developing an AI application and what to pay attention to with regards to building AI literacy with different stakeholders using the system. This includes the importance of high-quality data for training learning-based systems as well as the integration of human expertise, especially with knowledge-based systems. In terms of the psychological factors we derive research questions to investigate in the development of AI supported work systems and to consider in future work, mainly concerned with topics such as acceptance, openness, and trust in an AI system.

September 02, 2025

♥Save to Reading List

AGI: Artificial General Intelligence

LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

Stable AI & Tsinghua Unv

Abstract
We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX, the first installment of our large structured-data models (LDMs). LimiX treats structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. LimiX is pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, where the model predicts for query subsets conditioned on dataset-specific contexts, supporting rapid, training-free adaptation at inference. We evaluate LimiX across 10 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. With a single model and a unified interface, LimiX consistently surpasses strong baselines including gradient-boosting trees, deep tabular networks, recent tabular foundation models, and automated ensembles, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. All LimiX models are publicly accessible under Apache 2.0.

AI Insights

LimiX models structured data as a joint distribution over variables and missingness, enabling unified inference.
Its masked joint‑distribution pretraining uses an episodic, context‑conditional objective that predicts any query subset on the fly.
The model supports rapid, training‑free adaptation at inference, simply conditioning on dataset‑specific context.
Across ten diverse benchmarks, LimiX outperforms gradient‑boosting trees, deep tabular nets, and recent foundation models.
It handles classification, regression, missing‑value imputation, and data generation with a single architecture.
The approach explicitly incorporates missingness into the probability model, improving robustness to sparse data.
All LimiX checkpoints are released under Apache 2.0, inviting community experimentation.

September 03, 2025

♥Save to Reading List

AGI as Second Being: The Structural-Generative Ontology of Intelligence

University of California

Abstract
Artificial intelligence is often measured by the range of tasks it can perform. Yet wide ability without depth remains only an imitation. This paper proposes a Structural-Generative Ontology of Intelligence: true intelligence exists only when a system can generate new structures, coordinate them into reasons, and sustain its identity over time. These three conditions -- generativity, coordination, and sustaining -- define the depth that underlies real intelligence. Current AI systems, however broad in function, remain surface simulations because they lack this depth. Breadth is not the source of intelligence but the growth that follows from depth. If future systems were to meet these conditions, they would no longer be mere tools, but could be seen as a possible Second Being, standing alongside yet distinct from human existence.

AI Insights

The paper flags three AI confusions: equating imitation with being, hiding structure origins, and treating intelligence as engineering.
It urges an ontological shift to a philosophically rigorous yet empirically testable framework.
Generativity is creating new categories that open a world.
Coordination integrates those categories into a normative space of reasons.
Sustaining keeps generativity and coordination alive over time, forming a historical subject.
Breadth alone gives coverage; without coordination it fragments; without sustaining it is episodic.
Suggested readings: Bostrom’s *Superintelligence*, Floridi’s *Fourth Revolution*, Russell’s *Human Compatible*.

September 02, 2025

♥Save to Reading List

Deep Learning

Unveiling the Role of Data Uncertainty in Tabular Deep Learning

HSE University, Yandex

Abstract
Recent advancements in tabular deep learning have demonstrated exceptional practical performance, yet the field often lacks a clear understanding of why these techniques actually succeed. To address this gap, our paper highlights the importance of the concept of data uncertainty for explaining the effectiveness of the recent tabular DL methods. In particular, we reveal that the success of many beneficial design choices in tabular DL, such as numerical feature embeddings, retrieval-augmented models and advanced ensembling strategies, can be largely attributed to their implicit mechanisms for managing high data uncertainty. By dissecting these mechanisms, we provide a unifying understanding of the recent performance improvements. Furthermore, the insights derived from this data-uncertainty perspective directly allowed us to develop more effective numerical feature embeddings as an immediate practical outcome of our analysis. Overall, our work paves the way to foundational understanding of the benefits introduced by modern tabular methods that results in the concrete advancements of existing techniques and outlines future research directions for tabular DL.

AI Insights

Swapping Bayesian, MC‑Dropout, or ensemble uncertainty estimators leaves the MSE trend unchanged across datasets.
Figures show the performance gap between baseline and advanced tabular models is invariant to the uncertainty technique.
This invariance confirms conclusions are not artifacts of a specific uncertainty model.
Authors assume uncertainty estimators are accurate, which may fail in low‑sample or noisy regimes.
Data quality and sampling bias were not modeled, leaving room for future robust preprocessing work.
Recommended resources include “Bayesian Methods for Hackers” and a TensorFlow uncertainty tutorial.
Robustness of tabular DL hinges on design choices and fidelity of uncertainty estimates, inspiring hybrid architectures.

September 04, 2025

♥Save to Reading List

Comment on "Deep Regression Learning with Optimal Loss Function"

OpenReview benefits the

Abstract
OpenReview benefits the peer-review system by promoting transparency, openness, and collaboration. By making reviews, comments, and author responses publicly accessible, the platform encourages constructive feedback, reduces bias, and allows the research community to engage directly in the review process. This level of openness fosters higher-quality reviews, greater accountability, and continuous improvement in scholarly communication. In the statistics community, such a transparent and open review system has not traditionally existed. This lack of transparency has contributed to significant variation in the quality of published papers, even in leading journals, with some containing substantial errors in both proofs and numerical analyses. To illustrate this issue, this note examines several results from Wang, Zhou and Lin (2025) [arXiv:2309.12872; https://doi.org/10.1080/01621459.2024.2412364] and highlights potential errors in their proofs, some of which are strikingly obvious. This raises a critical question: how important are mathematical proofs in statistical journals, and how should they be rigorously verified? Addressing this question is essential not only for maintaining academic rigor but also for fostering the right attitudes toward scholarship and quality assurance in the field. A plausible approach would be for arXiv to provide an anonymous discussion section, allowing readers-whether anonymous or not-to post comments, while also giving authors the opportunity to respond.

AI Insights

Theorem 1, 2, and Proposition 1 in Wang et al. (2025) contain algebraic errors that undermine convergence claims.
A chain‑rule misuse in Proposition 1’s gradient derivation exposes a common pitfall in high‑dimensional M‑estimation.
Minor proof mistakes can distort simulations, stressing theory‑code cross‑validation.
An anonymous arXiv discussion could serve as a live proof‑audit platform before acceptance.
Casella & Berger’s text remains essential for mastering probabilistic foundations that safeguard proofs.
Feng et al.’s score‑matching offers a robust alternative to conventional loss functions, aligning with optimality.
JASA’s reproducibility editorial echoes the push for transparent peer review.

September 03, 2025

♥Save to Reading List

Interests not found

Help us improve your experience!