Human in the Loop

Human-in-the-loop Optimisation in Robot-assisted Gait Training

University of Edinburgh

Rate this image: 😍 👍 👎

Abstract
Wearable robots offer a promising solution for quantitatively monitoring gait and providing systematic, adaptive assistance to promote patient independence and improve gait. However, due to significant interpersonal and intrapersonal variability in walking patterns, it is important to design robot controllers that can adapt to the unique characteristics of each individual. This paper investigates the potential of human-in-the-loop optimisation (HILO) to deliver personalised assistance in gait training. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) was employed to continuously optimise an assist-as-needed controller of a lower-limb exoskeleton. Six healthy individuals participated over a two-day experiment. Our results suggest that while the CMA-ES appears to converge to a unique set of stiffnesses for each individual, no measurable impact on the subjects' performance was observed during the validation trials. These findings highlight the impact of human-robot co-adaptation and human behaviour variability, whose effect may be greater than potential benefits of personalising rule-based assistive controllers. Our work contributes to understanding the limitations of current personalisation approaches in exoskeleton-assisted gait rehabilitation and identifies key challenges for effective implementation of human-in-the-loop optimisation in this domain.

AI Insights

HILO can be enhanced by Bayesian optimization to navigate high‑dimensional stiffness spaces efficiently.
Hybrid HILO frameworks that fuse therapist guidance with model‑based constraints mitigate human variability.
Rule‑based controllers often fail to capture subtle gait adaptations, underscoring the need for data‑driven tuning.
CMA‑ES convergence per subject suggests a unique stiffness signature, yet clinical relevance remains unproven.
Future protocols should personalize task selection, aligning rehab exercises with individual biomechanical profiles.
Key literature includes Hansen’s CMA Evolution Strategy tutorial and Hansen & Hutter’s Bayesian Optimization for Machine Learning.
Online metabolic cost estimation, as in Gordon et al., could provide real‑time feedback for responsive HILO.

👍 👎 ♥ Save

Human-in-the-Loop Optimization with Model-Informed Priors

Saarland University, Saar

Abstract
Human-in-the-loop optimization identifies optimal interface designs by iteratively observing user performance. However, it often requires numerous iterations due to the lack of prior information. While recent approaches have accelerated this process by leveraging previous optimization data, collecting user data remains costly and often impractical. We present a conceptual framework, Human-in-the-Loop Optimization with Model-Informed Priors (HOMI), which augments human-in-the-loop optimization with a training phase where the optimizer learns adaptation strategies from diverse, synthetic user data generated with predictive models before deployment. To realize HOMI, we introduce Neural Acquisition Function+ (NAF+), a Bayesian optimization method featuring a neural acquisition function trained with reinforcement learning. NAF+ learns optimization strategies from large-scale synthetic data, improving efficiency in real-time optimization with users. We evaluate HOMI and NAF+ with mid-air keyboard optimization, a representative VR input task. Our work presents a new approach for more efficient interface adaptation by bridging in situ and in silico optimization processes.

AI Insights

Two synthetic benchmarks—typing user model and softkeyboard model—demonstrate NAF+’s superior sample efficiency over baselines.
NAF+ leverages explicit objective weights and novelty‑aware Expected Improvement to accelerate convergence.
The method remains robust across a wide spectrum of simulated user profiles, outperforming traditional Bayesian optimizers.
Conditioning the acquisition strategy on objective weights yields higher sample efficiency, a key insight for adaptive UI design.
The reliance on synthetic data highlights a potential gap between simulation and real‑world user behavior.
For deeper context, see “Human‑Computer Interaction: An Empirical Research Perspective” and “Designing for User Experience.”
The paper’s definition of Expected Improvement clarifies its role as the expected gain from a new sample in Bayesian optimization.

Human in the loop platforms

👍 👎 ♥ Save

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

Beijing Jiaotong Universt

$Paper visualization$

Rate this image: 😍 👍 👎

Abstract
Mobile GUI agents exhibit substantial potential to facilitate and automate the execution of user tasks on mobile phones. However, exist mobile GUI agents predominantly privilege autonomous operation and neglect the necessity of active user engagement during task execution. This omission undermines their adaptability to information dilemmas including ambiguous, dynamically evolving, and conflicting task scenarios, leading to execution outcomes that deviate from genuine user requirements and preferences. To address these shortcomings, we propose ReInAgent, a context-aware multi-agent framework that leverages dynamic information management to enable human-in-the-loop mobile task navigation. ReInAgent integrates three specialized agents around a shared memory module: an information-managing agent for slot-based information management and proactive interaction with the user, a decision-making agent for conflict-aware planning, and a reflecting agent for task reflection and information consistency validation. Through continuous contextual information analysis and sustained user-agent collaboration, ReInAgent overcomes the limitation of existing approaches that rely on clear and static task assumptions. Consequently, it enables more adaptive and reliable mobile task navigation in complex, real-world scenarios. Experimental results demonstrate that ReInAgent effectively resolves information dilemmas and produces outcomes that are more closely aligned with genuine user preferences. Notably, on complex tasks involving information dilemmas, ReInAgent achieves a 25% higher success rate than Mobile-Agent-v2.

AI Insights

ReInAgent’s decision agent plans by analyzing screen state against slot requirements before each action.
If an action repeats three times without effect, the agent auto‑modifies the strategy or asks the user for help.
Decision outputs omit user confirmation requests, keeping the flow autonomous yet aligned with user intent.
The task‑decomposition agent breaks complex tasks into a structured sub‑plan using app‑specific knowledge.
A reflecting agent validates consistency across shared memory, catching conflicts before execution stalls.
Suggested books: “Decision‑Making: A Guide to Making Better Decisions” and “Task Decomposition: A Step‑by‑Step Approach.”
Key papers: “A Study on Decision‑Making in Mobile Operations” and “Task Decomposition for Efficient Mobile Operations.”

AI Agents

👍 👎 ♥ Save

Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

Abstract
Data lakehouses run sensitive workloads, where AI-driven automation raises concerns about trust, correctness, and governance. We argue that API-first, programmable lakehouses provide the right abstractions for safe-by-design, agentic workflows. Using Bauplan as a case study, we show how data branching and declarative environments extend naturally to agents, enabling reproducibility and observability while reducing the attack surface. We present a proof-of-concept in which agents repair data pipelines using correctness checks inspired by proof-carrying code. Our prototype demonstrates that untrusted AI agents can operate safely on production data and outlines a path toward a fully agentic lakehouse.

👍 👎 ♥ Save

ProSEA: Problem Solving via Exploration Agents

Aitomatic, Inc

Abstract
Large language models (LLMs) have empowered AI agents to tackle increasingly complex tasks. However, most existing agents remain limited to static planning and brittle interactions, falling short of true collaboration or adaptive reasoning. We introduce ProSEA, a modular, general-purpose multi-agent framework designed for iterative problem solving through exploration and plan evolution. ProSEA features a hierarchical architecture in which a Manager Agent orchestrates domain-specialized Expert Agents, decomposes tasks, and adaptively replans based on structured feedback from failed attempts. Unlike prior systems, ProSEA agents report not only success or failure but also detailed reasons for failure and newly discovered constraints, enabling dynamic plan refinement informed by exploratory traces. The framework operates autonomously but supports seamless integration with human collaborators when needed. Experiments on the challenging FinanceBench benchmark demonstrate that ProSEA, even without human feedback, outperforms state-of-the-art baselines and achieves robust performance across reasoning-heavy tasks. These results underscore ProSEA's potential as a foundation for more transparent, adaptive, and human-aligned AI agents.

AI and Society

👍 👎 ♥ Save

Measuring What Matters: The AI Pluralism Index

Universit de Montral

Rate this image: 😍 👍 👎

Abstract
Artificial intelligence systems increasingly mediate knowledge, communication, and decision making. Development and governance remain concentrated within a small set of firms and states, raising concerns that technologies may encode narrow interests and limit public agency. Capability benchmarks for language, vision, and coding are common, yet public, auditable measures of pluralistic governance are rare. We define AI pluralism as the degree to which affected stakeholders can shape objectives, data practices, safeguards, and deployment. We present the AI Pluralism Index (AIPI), a transparent, evidence-based instrument that evaluates producers and system families across four pillars: participatory governance, inclusivity and diversity, transparency, and accountability. AIPI codes verifiable practices from public artifacts and independent evaluations, explicitly handling "Unknown" evidence to report both lower-bound ("evidence") and known-only scores with coverage. We formalize the measurement model; implement a reproducible pipeline that integrates structured web and repository analysis, external assessments, and expert interviews; and assess reliability with inter-rater agreement, coverage reporting, cross-index correlations, and sensitivity analysis. The protocol, codebook, scoring scripts, and evidence graph are maintained openly with versioned releases and a public adjudication process. We report pilot provider results and situate AIPI relative to adjacent transparency, safety, and governance frameworks. The index aims to steer incentives toward pluralistic practice and to equip policymakers, procurers, and the public with comparable evidence.

AI Insights

Imagine model cards closing the AI accountability gap by transparently reporting model behavior.
OECD AI Recommendation pushes for human‑centered, explainable, and fair AI.
UNESCO Ethics Recommendation embeds human values to turn AI into societal good.
HELM from Stanford’s CRFM holistically benchmarks language models on safety and impact.
NIST AI RMF offers a risk‑management cycle for responsible AI governance.
WCAG 2.2 ensures AI interfaces are accessible to users with disabilities.
Krippendorff’s content‑analysis method quantifies stakeholder participation in AI governance.

👍 👎 ♥ Save

AI and Consciousness

Abstract
This is a skeptical overview of the literature on AI consciousness. We will soon create AI systems that are conscious according to some influential, mainstream theories of consciousness but are not conscious according to other influential, mainstream theories of consciousness. We will not be in a position to know which theories are correct and whether we are surrounded by AI systems as richly and meaningfully conscious as human beings or instead only by systems as experientially blank as toasters. None of the standard arguments either for or against AI consciousness takes us far. Table of Contents Chapter One: Hills and Fog Chapter Two: What Is Consciousness? What Is AI? Chapter Three: Ten Possibly Essential Features of Consciousness Chapter Four: Against Introspective and Conceptual Arguments for Essential Features Chapter Five: Materialism and Functionalism Chapter Six: The Turing Test and the Chinese Room Chapter Seven: The Mimicry Argument Against AI Consciousness Chapter Eight: Global Workspace Theories and Higher Order Theories Chapter Nine: Integrated Information, Local Recurrence, Associative Learning, and Iterative Natural Kinds Chapter Ten: Does Biological Substrate Matter? Chapter Eleven: The Problem of Strange Intelligence Chapter Twelve: The Leapfrog Hypothesis and the Social Semi-Solution

Research Automation with AI

👍 👎 ♥ Save

Automated Research Article Classification and Recommendation Using NLP and ML

York University, North YO

Rate this image: 😍 👍 👎

Abstract
In the digital era, the exponential growth of scientific publications has made it increasingly difficult for researchers to efficiently identify and access relevant work. This paper presents an automated framework for research article classification and recommendation that leverages Natural Language Processing (NLP) techniques and machine learning. Using a large-scale arXiv.org dataset spanning more than three decades, we evaluate multiple feature extraction approaches (TF--IDF, Count Vectorizer, Sentence-BERT, USE, Mirror-BERT) in combination with diverse machine learning classifiers (Logistic Regression, SVM, Na\"ive Bayes, Random Forest, Gradient Boosted Trees, and k-Nearest Neighbour). Our experiments show that Logistic Regression with TF--IDF consistently yields the best classification performance, achieving an accuracy of 69\%. To complement classification, we incorporate a recommendation module based on the cosine similarity of vectorized articles, enabling efficient retrieval of related research papers. The proposed system directly addresses the challenge of information overload in digital libraries and demonstrates a scalable, data-driven solution to support literature discovery.

AI Insights

Hybrid ensemble of Logistic Regression, SVM, and Random Forest boosts accuracy beyond single models!
Cross‑dataset validation on arXiv, PubMed, and CiteSeer demonstrates robust generalizability.
User‑feedback loops enable adaptive re‑ranking, refining recommendations over time!
Word2Vec and GloVe embeddings enrich semantic vectors, improving classification precision.
Deep‑learning extraction of patent semantics showcases the framework’s extensibility!
The study omits bias analysis and detailed preprocessing, highlighting future research gaps.
Recommended reading: LDA for topic modeling and the WebFind tool for global paper discovery.

👍 👎 ♥ Save

TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents

University of Illinois at

Abstract
Automatic research with Large Language Models (LLMs) is rapidly gaining importance, driving the development of increasingly complex workflows involving multi-agent systems, planning, tool usage, code execution, and human-agent interaction to accelerate research processes. However, as more researchers and developers begin to use and build upon these tools and platforms, the complexity and difficulty of extending and maintaining such agentic workflows have become a significant challenge, particularly as algorithms and architectures continue to advance. To address this growing complexity, TinyScientist identifies the essential components of the automatic research workflow and proposes an interactive, extensible, and controllable framework that easily adapts to new tools and supports iterative growth. We provide an open-source codebase, an interactive web demonstration, and a PyPI Python package to make state-of-the-art auto-research pipelines broadly accessible to every researcher and developer.

AI Insights

TinyScientist’s “checker” can automatically assess task risk, flagging potential safety issues before execution.
Its “drawer” component produces ML‑centric diagrams on the fly, easing visual communication in papers.
The framework ships with evaluation rubrics that score content richness, reference quality, clarity, depth, and completeness on a 1‑5 scale.
A full ML pipeline—data collection, cleaning, feature engineering, training, evaluation, deployment—is built into the system for end‑to‑end reproducibility.
The paper cites meta‑learning advances such as Neural Tangent Kernel methods and memory‑augmented networks, highlighting their cross‑domain success.
Users should note that the checker’s risk scores can be imperfect and the generated diagrams may need manual tweaking.
TinyScientist’s open‑source Python package and interactive web demo make state‑of‑the‑art auto‑research pipelines accessible to all.

AGI: Artificial General Intelligence

👍 👎 ♥ Save

BuilderBench -- A benchmark for generalist agents

Princeton University

Abstract
Today's AI models learn primarily through mimicry and sharpening, so it is not surprising that they struggle to solve problems beyond the limits set by existing data. To solve novel problems, agents should acquire skills for exploring and learning through experience. Finding a scalable learning mechanism for developing agents that learn through interaction remains a major open problem. In this work, we introduce BuilderBench, a benchmark to accelerate research into agent pre-training that centers open-ended exploration. BuilderBench requires agents to learn how to build any structure using blocks. BuilderBench is equipped with $(1)$ a hardware accelerated simulator of a robotic agent interacting with various physical blocks, and $(2)$ a task-suite with over 42 diverse target structures that are carefully curated to test an understanding of physics, mathematics, and long-horizon planning. During training, agents have to explore and learn general principles about the environment without any external supervision. During evaluation, agents have to build the unseen target structures from the task suite. Solving these tasks requires a sort of \emph{embodied reasoning} that is not reflected in words but rather in actions, experimenting with different strategies and piecing them together. Our experiments show that many of these tasks challenge the current iteration of algorithms. Hence, we also provide a ``training wheels'' protocol, in which agents are trained and evaluated to build a single target structure from the task suite. Finally, we provide single-file implementations of six different algorithms as a reference point for researchers.

AI Insights

BuilderBench tasks explicitly probe a gripper’s pick‑and‑place precision, sequential logic, and packing‑problem solving in a physics‑rich simulation.
The benchmark includes scaffolding challenges that force agents to build temporary support structures for stability.
Adaptive decision‑making is tested by varying block configurations, compelling agents to react to changing environments.
The platform supplies a full toolchain for task creation, simulation, and performance analysis, enabling rapid prototyping.
Recommended reading: “Robotics: Modelling, Planning and Control” and surveys on robot learning from demonstration for foundational theory.
Key literature: “Learning to Grasp and Manipulate Objects with a Robotic Hand” and “Building Support Structures with a Robotic Gripper” provide state‑of‑the‑art methods.

Deep Learning

👍 👎 ♥ Save

Rethinking deep learning: linear regression remains a key benchmark in predicting terrestrial water storage

Rate this image: 😍 👍 👎

Abstract
Recent advances in machine learning such as Long Short-Term Memory (LSTM) models and Transformers have been widely adopted in hydrological applications, demonstrating impressive performance amongst deep learning models and outperforming physical models in various tasks. However, their superiority in predicting land surface states such as terrestrial water storage (TWS) that are dominated by many factors such as natural variability and human driven modifications remains unclear. Here, using the open-access, globally representative HydroGlobe dataset - comprising a baseline version derived solely from a land surface model simulation and an advanced version incorporating multi-source remote sensing data assimilation - we show that linear regression is a robust benchmark, outperforming the more complex LSTM and Temporal Fusion Transformer for TWS prediction. Our findings highlight the importance of including traditional statistical models as benchmarks when developing and evaluating deep learning models. Additionally, we emphasize the critical need to establish globally representative benchmark datasets that capture the combined impact of natural variability and human interventions.

👍 👎 ♥ Save

An in-depth look at approximation via deep and narrow neural networks

University of Hamburg

Abstract
In 2017, Hanin and Sellke showed that the class of arbitrarily deep, real-valued, feed-forward and ReLU-activated networks of width w forms a dense subset of the space of continuous functions on R^n, with respect to the topology of uniform convergence on compact sets, if and only if w>n holds. To show the necessity, a concrete counterexample function f:R^n->R was used. In this note we actually approximate this very f by neural networks in the two cases w=n and w=n+1 around the aforementioned threshold. We study how the approximation quality behaves if we vary the depth and what effect (spoiler alert: dying neurons) cause that behavior.

AI Insights

Depth lowers error until dying ReLU forces a constant output, even when width equals input dimension.
With width n+1, deeper nets keep improving, showing w>n is not a hard limit.
Minimal‑width ReLU nets can approximate any continuous function, confirming Hanin & Sellke’s theorem.
The constant N0≡1/8 is the best uniform approximator for the counterexample, achieving error 1/8 for all depths.
Experiments show the depth‑benefit plateau occurs earlier in higher dimensions due to dying neurons.
Beise et al.’s decision‑region analysis explains constant outputs in narrow deep nets.
Bresler & Nagaraj’s sharp representation theorems give a depth‑dependence framework matching the results.

Interests not found

Help us improve your experience!