Hi!

Your personalized paper recommendations for 19 to 23 January, 2026.

TransMode-LLM: Feature-Informed Natural Language Modeling with Domain-Enhanced Prompting for Travel Behavior Modeling

Singapore University of Technology and Design

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The study relies on a limited dataset, which may not be representative of real-world scenarios. (ML: 0.98)👍👎
Few-Shot Learning: A type of machine learning where a model is trained on a small number of examples, often with the goal of adapting to new tasks or domains. (ML: 0.95)👍👎
Few-shot learning can improve the accuracy of LLMs by providing in-context examples that guide predictions through demonstration rather than parameter updates. (ML: 0.94)👍👎
Few-shot learning can improve the accuracy of LLMs by providing in-context examples that guide predictions through demonstration rather than parameter updates. (ML: 0.94)👍👎
LLM: Large Language Model Zero-Shot: A type of few-shot learning where a model is trained on a single example or no examples at all. (ML: 0.93)👍👎
The study demonstrates that LLM-based approaches can be effective for travel mode prediction tasks, even without task-specific training. (ML: 0.92)👍👎
The results suggest a paradigm shift in predictive modeling for transportation applications, where LLMs can achieve competitive or superior performance compared to traditional classifiers. (ML: 0.89)👍👎
LLM-based approaches can achieve competitive or superior performance compared to established baseline classifiers for travel mode prediction tasks. (ML: 0.88)👍👎
Zero-shot LLMs can outperform traditional classifiers without requiring task-specific training, demonstrating a paradigm shift in predictive modeling for transportation applications. (ML: 0.85)👍👎

Abstract
Understanding traveler behavior and accurately predicting travel mode choice are at the heart of transportation planning and policy-making. This study proposes TransMode-LLM, an innovative framework that integrates statistical methods with LLM-based techniques to predict travel modes from travel survey data. The framework operates through three phases: (1) statistical analysis identifies key behavioral features, (2) natural language encoding transforms structured data into contextual descriptions, and (3) LLM adaptation predicts travel mode through multiple learning paradigms including zero-shot and one/few-shot learning and domain-enhanced prompting. We evaluate TransMode-LLM using both general-purpose models (GPT-4o, GPT-4o-mini) and reasoning-focused models (o3-mini, o4-mini) with varying sample sizes on real-world travel survey data. Extensive experiment results demonstrate that the LLM-based approach achieves competitive accuracy compared to state-of-the-art baseline classifiers models. Moreover, few-shot learning significantly improves prediction accuracy, with models like o3-mini showing consistent improvements of up to 42.9\% with 5 provided examples. However, domain-enhanced prompting shows divergent effects across LLM architectures. In detail, it is helpful to improve performance for general-purpose models with GPT-4o achieving improvements of 2.27% to 12.50%. However, for reasoning-oriented models (o3-mini, o4-mini), domain knowledge enhancement does not universally improve performance. This study advances the application of LLMs in travel behavior modeling, providing promising and valuable insights for both academic research and transportation policy-making in the future.

Why we are recommending this paper?
Due to your Interest in Travel Personalization

This paper directly addresses travel behavior modeling, a core interest for the user. The use of LLMs for predicting travel modes aligns with the user's interest in travel recommendations and search.

Confident Rankings with Fewer Items: Adaptive LLM Evaluation with Continuous Scores

Trismik and University of Cambridge

Rate paper: 👍 👎 ♥ Save

AI Insights

The paper presents a novel approach to evaluating language models using Item Response Theory (IRT) and continuous IRT extension. (ML: 0.98)👍👎
The proposed approach provides a more comprehensive understanding of model performance, highlighting areas where models excel or struggle. (ML: 0.97)👍👎
The results show that the proposed approach can provide a more nuanced understanding of model performance, highlighting strengths and weaknesses in different areas. (ML: 0.97)👍👎
The method is applied to five benchmark datasets: BioLaySumm2025-PLOS, FLORES-Turkish-English, GovReport-Summarization, Nemotron-PII, and TruthfulQA. (ML: 0.97)👍👎
Continuous IRT extension: An adaptation of IRT that allows for continuous scores rather than binary ones, enabling more fine-grained analysis of model performance. (ML: 0.96)👍👎
The use of AI assistants in generating text and experimental details demonstrates their potential for automating tasks and improving research efficiency. (ML: 0.95)👍👎
Item Response Theory (IRT): A statistical framework used to analyze the relationship between items on a test or questionnaire and the latent traits or abilities they are intended to measure. (ML: 0.94)👍👎
The paper also explores the use of AI assistants to generate text and experimental details, demonstrating their potential for automating tasks. (ML: 0.92)👍👎
BERTScore: A metric used to evaluate the similarity between generated text and reference text, based on BERT embeddings. (ML: 0.91)👍👎
Heteroskedastic normal distribution: A probability distribution where the variance is not constant but depends on the mean. (ML: 0.87)👍👎

Abstract
Computerized Adaptive Testing (CAT) has proven effective for efficient LLM evaluation on multiple-choice benchmarks, but modern LLM evaluation increasingly relies on generation tasks where outputs are scored continuously rather than marked correct/incorrect. We present a principled extension of IRT-based adaptive testing to continuous bounded scores (ROUGE, BLEU, LLM-as-a-Judge) by replacing the Bernoulli response distribution with a heteroskedastic normal distribution. Building on this, we introduce an uncertainty aware ranker with adaptive stopping criteria that achieves reliable model ranking while testing as few items and as cheaply as possible. We validate our method on five benchmarks spanning n-gram-based, embedding-based, and LLM-as-judge metrics. Our method uses 2% of the items while improving ranking correlation by 0.12 τ over random sampling, with 95% accuracy on confident predictions.

Why we are recommending this paper?
Due to your Interest in Travel Ranking

Given the user's interest in travel ranking and personalization, this paper's focus on adaptive LLM evaluation for continuous scoring is highly relevant. The work explores methods for improved ranking, a key aspect of the user’s interests.

Measuring the State of Open Science in Transportation Using Large Language Models

Massachusetts Institute of Technology

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

Data-driven methods were the most likely topic to have available code, while automated driving, COVID-19, policy, traffic flow, transport emissions, or travel mode topics were less likely. (ML: 0.98)👍👎
The study analyzed the factors influencing code and data availability in research papers. (ML: 0.96)👍👎
The study found that certain characteristics of the paper had a significant effect on the availability of data, such as the number of figures. (ML: 0.96)👍👎
Regions like Africa, North America, Europe, and Oceania were more likely to have a data citation and no data repository compared to Asia. (ML: 0.96)👍👎
Papers with a high number of figures were more likely to have available code, and older papers were less likely to have available code. (ML: 0.96)👍👎
Code availability was found to be significantly higher in certain journals, such as TR-B and TR-C. (ML: 0.95)👍👎
The study also found that certain journals had a significant effect on data availability, such as TR-B, TR-E, and TR-F. (ML: 0.94)👍👎
Code availability: The presence of code in research papers. (ML: 0.94)👍👎
Papers from Europe, South America, Africa, North America, and Oceania were more likely to have available code compared to Asia. (ML: 0.94)👍👎
Papers based on data-driven methods were the most likely to provide a data citation and/or a data repository. (ML: 0.91)👍👎

Abstract
Open science initiatives have strengthened scientific integrity and accelerated research progress across many fields, but the state of their practice within transportation research remains under-investigated. Key features of open science, defined here as data and code availability, are difficult to extract due to the inherent complexity of the field. Previous work has either been limited to small-scale studies due to the labor-intensive nature of manual analysis or has relied on large-scale bibliometric approaches that sacrifice contextual richness. This paper introduces an automatic and scalable feature-extraction pipeline to measure data and code availability in transportation research. We employ Large Language Models (LLMs) for this task and validate their performance against a manually curated dataset and through an inter-rater agreement analysis. We applied this pipeline to examine 10,724 research articles published in the Transportation Research Part series of journals between 2019 and 2024. Our analysis found that only 5% of quantitative papers shared a code repository, 4% of quantitative papers shared a data repository, and about 3% of papers shared both, with trends differing across journals, topics, and geographic regions. We found no significant difference in citation counts or review duration between papers that provided data and code and those that did not, suggesting a misalignment between open science efforts and traditional academic metrics. Consequently, encouraging these practices will likely require structural interventions from journals and funding agencies to supplement the lack of direct author incentives. The pipeline developed in this study can be readily scaled to other journals, representing a critical step toward the automated measurement and monitoring of open science practices in transportation research.

Why we are recommending this paper?
Due to your Interest in Travel Industry

Coming from MIT, this paper investigates open science initiatives, which could be valuable for understanding data availability and research trends within the travel industry. This aligns with the user’s interest in travel and travel recommendations.

FARE: Fast-Slow Agentic Robotic Exploration

National University of Singapore

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The comprehensive evaluation presented in the paper highlights the strengths and limitations of Fael compared to other approaches. (ML: 0.96)👍👎
Large Language Models (LLMs): A type of artificial intelligence model that is trained on vast amounts of text data to generate human-like language. (ML: 0.94)👍👎
Graph-Based Methods: A way of representing complex relationships between objects or entities using graph structures. (ML: 0.92)👍👎
Fael's performance is shown to be superior to existing state-of-the-art methods in terms of efficiency and effectiveness. (ML: 0.90)👍👎
The proposed method, called Fael, uses an LLM to generate a graph representation of the environment and then performs exploration based on this graph. (ML: 0.88)👍👎
The paper presents a novel approach to autonomous exploration using large language models (LLMs) and graph-based methods. (ML: 0.86)👍👎
The paper also presents a comprehensive evaluation of the performance of Fael compared to other methods, including frontier-based and deep reinforcement learning-based approaches. (ML: 0.86)👍👎
Autonomous Exploration: The ability of a robot or system to navigate and explore its environment without external guidance. (ML: 0.85)👍👎
Fael is shown to outperform existing state-of-the-art methods in terms of efficiency and effectiveness in exploring large-scale environments. (ML: 0.84)👍👎
The proposed method, Fael, demonstrates the potential of combining LLMs with graph-based methods for autonomous exploration. (ML: 0.75)👍👎

Abstract
This work advances autonomous robot exploration by integrating agent-level semantic reasoning with fast local control. We introduce FARE, a hierarchical autonomous exploration framework that integrates a large language model (LLM) for global reasoning with a reinforcement learning (RL) policy for local decision making. FARE follows a fast-slow thinking paradigm. The slow-thinking LLM module interprets a concise textual description of the unknown environment and synthesizes an agent-level exploration strategy, which is then grounded into a sequence of global waypoints through a topological graph. To further improve reasoning efficiency, this module employs a modularity-based pruning mechanism that reduces redundant graph structures. The fast-thinking RL module executes exploration by reacting to local observations while being guided by the LLM-generated global waypoints. The RL policy is additionally shaped by a reward term that encourages adherence to the global waypoints, enabling coherent and robust closed-loop behavior. This architecture decouples semantic reasoning from geometric decision, allowing each module to operate in its appropriate temporal and spatial scale. In challenging simulated environments, our results show that FARE achieves substantial improvements in exploration efficiency over state-of-the-art baselines. We further deploy FARE on hardware and validate it in complex, large scale $200m\times130m$ building environment.

Why we are recommending this paper?
Due to your Interest in Travel Planning

This paper's focus on autonomous robot exploration with LLM-based reasoning is pertinent to travel planning and search, particularly in the context of efficient route discovery. The exploration aspect directly relates to the user's interest in travel itinerary creation.

HumanDiffusion: A Vision-Based Diffusion Trajectory Planner with Human-Conditioned Goals for Search and Rescue UAV

Skolkovo Institute of Science and Technology

Rate paper: 👍 👎 ♥ Save

AI Insights

DTG uses a combination of local and global planners to generate trajectories that are both efficient and safe. (ML: 0.89)👍👎
Further research is needed to adapt DTG to real-world scenarios and to develop more robust and adaptive systems. (ML: 0.74)👍👎
DTG: Diffusion-based Trajectory Generation UAV: Unmanned Aerial Vehicle The proposed DTG method is a promising approach to UAV navigation, offering improved efficiency and safety in complex environments. (ML: 0.71)👍👎
The proposed method may not be suitable for all types of UAVs or environments. (ML: 0.70)👍👎
The proposed method, called DTG, is designed for mapless global navigation and can handle complex environments with obstacles. (ML: 0.68)👍👎
The authors discuss the challenges and future directions in UAV navigation, including the need for more robust and adaptive systems. (ML: 0.68)👍👎
The paper presents a novel approach to unmanned aerial vehicle (UAV) navigation, utilizing a diffusion-based trajectory generation method. (ML: 0.68)👍👎
The paper also presents a comprehensive review of existing UAV path planning methods, highlighting their strengths and weaknesses. (ML: 0.63)👍👎

Abstract
Reliable human--robot collaboration in emergency scenarios requires autonomous systems that can detect humans, infer navigation goals, and operate safely in dynamic environments. This paper presents HumanDiffusion, a lightweight image-conditioned diffusion planner that generates human-aware navigation trajectories directly from RGB imagery. The system combines YOLO-11--based human detection with diffusion-driven trajectory generation, enabling a quadrotor to approach a target person and deliver medical assistance without relying on prior maps or computationally intensive planning pipelines. Trajectories are predicted in pixel space, ensuring smooth motion and a consistent safety margin around humans. We evaluate HumanDiffusion in simulation and real-world indoor mock-disaster scenarios. On a 300-sample test set, the model achieves a mean squared error of 0.02 in pixel-space trajectory reconstruction. Real-world experiments demonstrate an overall mission success rate of 80% across accident-response and search-and-locate tasks with partial occlusions. These results indicate that human-conditioned diffusion planning offers a practical and robust solution for human-aware UAV navigation in time-critical assistance settings.

Why we are recommending this paper?
Due to your Interest in Travel Planning

This paper's exploration of human-robot collaboration and navigation aligns with the user's interest in travel recommendations and search, particularly concerning safety and efficient travel planning.

Generalization and Completeness of Stochastic Local Search Algorithms

Universidad Complutense de Madrid

Rate paper: 👍 👎 ♥ Save

AI Insights

Turing-completeness: A computation model is said to be Turing-complete if it can simulate any given Turing Machine for any given input. (ML: 0.94)👍👎
Turing Machine: A Turing Machine is defined as (Q,Σ,Γ, δ, q 0, B, F), where Q is the set of states; Σ is the input alphabet; Γ is the tape alphabet; δ is the transition function; q 0∈Q is the initial state; B∈Γ is the blank symbol; and F⊆Q is the set of final states (accepting states). (ML: 0.92)👍👎
This means that the final solution of the GA provides the output of that Turing machine for that input. (ML: 0.87)👍👎
It consists in, given a finite set W of pairs(a, b) of finite strings, find out if, for some finite sequence of pairs (where there may be several occurrences of the same pair), the string obtained by consecutively reading the first components of the pairs and the string read on the second ones coincide. (ML: 0.87)👍👎
The Turing-completeness of genetic algorithms (GAs) is proven by constructing a GA capable of simulating any given Turing Machine for any given input. (ML: 0.86)👍👎
The proof shows that GAs are Turing-complete even when all their components are defined in remarkably simple ways, such as the fitness function checking for substring inclusion. (ML: 0.86)👍👎
If PCP were decidable, so would be the halting problem (which is undecidable). (ML: 0.85)👍👎
The construction of the GA involves designing a genetic algorithm that tries to solve the Modified Post Correspondence Problem (MPCP) for the input T by sequentially finding the next pair until MPCP is solved. (ML: 0.85)👍👎
The halting problem for Turing machines is reduced to MPCP, and MPCP is reduced to PCP. (ML: 0.83)👍👎
The proof of the Turing-completeness of GAs shows that they are capable of simulating any given Turing Machine for any given input, making them Turing-complete. (ML: 0.78)👍👎
Post Correspondence Problem (PCP): PCP consists in finding a solution to MPCP. (ML: 0.78)👍👎
The GA will only stop when the tile for the closing pair has been placed correctly, and it uses a blacklist of tiles via Extrato filter the ones already discarded in the current step of the MPCP to guarantee regular convergence. (ML: 0.74)👍👎
Modified Post Correspondence Problem (MPCP): MPCP adds the constraint that the first pair of the sequence is fixed. (ML: 0.71)👍👎
The idea is to mimic the way genetic programming builds structured individuals in order to design an individual that arranges the pairs from T into incremental partial solutions of MPCP until a complete solution is found. (ML: 0.71)👍👎
This result has significant implications for the analysis and prediction of GAs, as it means that their behavior cannot be easily predicted or analyzed using traditional methods. (ML: 0.70)👍👎

Abstract
We generalize Stochastic Local Search (SLS) heuristics into a unique formal model. This model has two key components: a common structure designed to be as large as possible and a parametric structure intended to be as small as possible. Each heuristic is obtained by instantiating the parametric part in a different way. Particular instances for Genetic Algorithms (GA), Ant Colony Optimization (ACO), and Particle Swarm Optimization (PSO) are presented. Then, we use our model to prove the Turing-completeness of SLS algorithms in general. The proof uses our framework to construct a GA able to simulate any Turing machine. This Turing-completeness implies that determining any non-trivial property concerning the relationship between the inputs and the computed outputs is undecidable for GA and, by extension, for the general set of SLS methods (although not necessarily for each particular method). Similar proofs are more informally presented for PSO and ACO.

Why we are recommending this paper?
Due to your Interest in Travel Search

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

Travel Itinerary Creation
Travel
Travel Recommendations

You can edit or add more interests any time.

💬 Help Shape Our Pricing

We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.

Share Your Feedback

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback