Papers from 29 to 03 October, 2025

Here are the personalized paper recommendations sorted by most relevant
Functional Programming
👍 👎 ♥ Save
University of Aveiro, CID
Abstract
Representations are essential to mathematically model phenomena, but there are many options available. While each of those options provides useful properties with which to solve problems related to the phenomena in study, comparing results between these representations can be non-trivial, as different frameworks are used for different contexts. We present a general structure based on set-theoretic concepts that accommodates many situations related to logical and semantic frameworks. We show the versatility of this approach by presenting alternative constructions of modal logic; in particular, all modal logics can be represented within the framework.
👍 👎 ♥ Save
Tsinghua University
Abstract
Functional data play a pivotal role across science and engineering, yet their infinite-dimensional nature makes representation learning challenging. Conventional statistical models depend on pre-chosen basis expansions or kernels, limiting the flexibility of data-driven discovery, while many deep-learning pipelines treat functions as fixed-grid vectors, ignoring inherent continuity. In this paper, we introduce Functional Attention with a Mixture-of-Experts (FAME), an end-to-end, fully data-driven framework for function-on-function regression. FAME forms continuous attention by coupling a bidirectional neural controlled differential equation with MoE-driven vector fields to capture intra-functional continuity, and further fuses change to inter-functional dependencies via multi-head cross attention. Extensive experiments on synthetic and real-world functional-regression benchmarks show that FAME achieves state-of-the-art accuracy, strong robustness to arbitrarily sampled discrete observations of functions.
Object Oriented Programming
👍 👎 ♥ Save
Wuhan University, The Unv
Abstract
Establishing fair and robust benchmarks is essential for evaluating intelligent code generation by large language models (LLMs). Our survey of 35 existing benchmarks uncovers three major imbalances: 85.7% focus on a single programming language; 94.3% target only function-level or statement-level tasks; and over 80% include fewer than ten test cases on average. To address these gaps, we propose MultiOOP, a multi-language object-oriented programming benchmark covering six popular languages (Python, PHP, C++, C#, Java, JavaScript) with 267 tasks per language. We design a translator that extends an existing single-language OOP benchmark and the pass@o metric to a multilingual setting. Moreover, we propose an automated framework for augmenting test cases to ensure the reliability of the evaluation results. We evaluate 14 mainstream LLMs under zero-shot prompting and report three key findings: 1) Substantial performance degradation: pass@1 scores on MultiOOP drop by up to 65.6 percentage points compared to function-level tasks (e.g., HumanEval). 2) Cross-language variability: GPT-4o mini achieves pass@1 of 48.06% in Python but only 0.12%-15.26% in other languages, indicating limited multilingual generalization. 3) Conceptual gaps: pass@o scores are consistently 1.1-19.2 points lower than pass@k, demonstrating that LLMs often generate executable code without fully capturing core OOP concepts. Our benchmark, metric extensions, and evaluation scripts will be publicly released to foster a more balanced and comprehensive assessment of LLMs in object-oriented code generation. Our code and data will be released at https://github.com/alphadl/OOP-eval and https://huggingface.co/datasets/codeai-dteam/MultiOOP respectively.
👍 👎 ♥ Save
University of Toronto and
Abstract
Novice programmers often struggle to understand how code executes and to form the abstract mental models necessary for effective problem-solving, challenges that are amplified in large, diverse introductory courses where students' backgrounds, language proficiencies, and prior experiences vary widely. This study examines whether interactive, multi-representational visualizations, combining synchronized code views, memory diagrams, and conceptual analogies, can help manage cognitive load and foster engagement more effectively than single-visual or text-only approaches. Over a 12-week deployment in a high-enrolment introductory Python course (N = 829), students who relied solely on text-based explanations reported significantly higher immediate mental effort than those using visual aids, although overall cognitive load did not differ significantly among conditions. The multi-representational approach consistently yielded higher engagement than both single-visual and text-only methods. Usage logs indicated that learners' interaction patterns varied with topic complexity, and predictive modelling suggested that early experiences of high cognitive load were associated with lower longer-term perceptions of clarity and helpfulness. Individual differences, including language proficiency and prior programming experience, moderated these patterns. By integrating multiple external representations with scaffolded support adapted to diverse learner profiles, our findings highlight design considerations for creating visualization tools that more effectively support novices learning to program.
Programming Language Design
👍 👎 ♥ Save
Cornell University,2, and
Paper visualization
Rate this image: 😍 👍 👎
Abstract
We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM initialized from T5Gemma, obtains > 0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves > 0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.
👍 👎 ♥ Save
Abstract
Code generation has shown great promise in assisting software development. A fundamental yet underexplored question is how the choice of code representation affects model performance. While existing studies employ various representations, such as treating code as plain text, grammar rule sequences, or syntax tree sequences, they lack a principled understanding of the relationship between parsing difficulty and model effectiveness. This paper proposes a conjecture: the easier a representation is to parse, the better performance the model achieves. We formalize this idea using grammar classes, where representations in simpler classes (e.g., LL(1)) are easier to parse. Through a controlled experiment on a Python-based DSL, we show that parsing difficulty strongly correlates with model performance. Motivated by this finding, we present GramTrans, a general approach that automatically transforms a context-free language into a representation within the LL(1) class. GramTrans introduces a novel hierarchical conflict elimination algorithm, enabling a flexible trade-off between syntactic simplicity and token efficiency. We evaluate GramTrans on both Python and Java using three code generation models: StarCoder 1B, DeepSeek-Coder 1.3B, and Qwen2.5 1.5B. Across multiple benchmarks, GramTrans consistently delivers significant improvements over baseline representations. Furthermore, our analysis of existing representations reconfirms the strong alignment between parsing difficulty and model performance, providing additional support for the conjecture.
Design Patterns
👍 👎 ♥ Save
University of Porto
Paper visualization
Rate this image: 😍 👍 👎
Abstract
Due to the susceptibility of Artificial Intelligence (AI) to data perturbations and adversarial examples, it is crucial to perform a thorough robustness evaluation before any Machine Learning (ML) model is deployed. However, examining a model's decision boundaries and identifying potential vulnerabilities typically requires access to the training and testing datasets, which may pose risks to data privacy and confidentiality. To improve transparency in organizations that handle confidential data or manage critical infrastructure, it is essential to allow external verification and validation of AI without the disclosure of private datasets. This paper presents Systematic Pattern Analysis (SPATA), a deterministic method that converts any tabular dataset to a domain-independent representation of its statistical patterns, to provide more detailed and transparent data cards. SPATA computes the projection of each data instance into a discrete space where they can be analyzed and compared, without risking data leakage. These projected datasets can be reliably used for the evaluation of how different features affect ML model robustness and for the generation of interpretable explanations of their behavior, contributing to more trustworthy AI.
👍 👎 ♥ Save
Massachusetts Institute
Abstract
During development, highly ordered structures emerge as cells collectively coordinate with each other. While recent advances have clarified how individual cells process and respond to external signals, understanding collective cellular decision making remains a major challenge. Here, we introduce a minimal, analytically tractable, model of cell patterning via local cell-cell communication. Using this framework, we identify a trade-off between the speed and accuracy of collective pattern formation and, by adapting techniques from stochastic chemical kinetics, quantify how information flows between cells during patterning. Our analysis reveals counterintuitive features of collective patterning: globally optimized solutions do not necessarily maximize intercellular information transfer and individual cells may appear suboptimal in isolation. Moreover, the model predicts that instantaneous information shared between cells can be non-monotonic in time as patterning occurs. An analysis of recent experimental data from lateral inhibition in Drosophila pupal abdomen finds a qualitatively similar effect.
Programming Paradigms
👍 👎 ♥ Save
Georgia Institute of Tecn
Abstract
While LLM-based specification generation is gaining traction, existing tools primarily focus on mainstream programming languages like C, Java, and even Solidity, leaving emerging and yet verification-oriented languages like Move underexplored. In this paper, we introduce MSG, an automated specification generation tool designed for Move smart contracts. MSG aims to highlight key insights that uniquely present when applying LLM-based specification generation to a new ecosystem. Specifically, MSG demonstrates that LLMs exhibit robust code comprehension and generation capabilities even for non-mainstream languages. MSG successfully generates verifiable specifications for 84% of tested Move functions and even identifies clauses previously overlooked by experts. Additionally, MSG shows that explicitly leveraging specification language features through an agentic, modular design improves specification quality substantially (generating 57% more verifiable clauses than conventional designs). Incorporating feedback from the verification toolchain further enhances the effectiveness of MSG, leading to a 30% increase in generated verifiable specifications.
Unsubscribe from these updates