Papers from 29 to 03 October, 2025
Here are the personalized paper recommendations sorted by most relevant
Functional Programming
University of Aveiro, CID
Abstract
Representations are essential to mathematically model phenomena, but there
are many options available. While each of those options provides useful
properties with which to solve problems related to the phenomena in study,
comparing results between these representations can be non-trivial, as
different frameworks are used for different contexts. We present a general
structure based on set-theoretic concepts that accommodates many situations
related to logical and semantic frameworks. We show the versatility of this
approach by presenting alternative constructions of modal logic; in particular,
all modal logics can be represented within the framework.
Tsinghua University
Abstract
Functional data play a pivotal role across science and engineering, yet their
infinite-dimensional nature makes representation learning challenging.
Conventional statistical models depend on pre-chosen basis expansions or
kernels, limiting the flexibility of data-driven discovery, while many
deep-learning pipelines treat functions as fixed-grid vectors, ignoring
inherent continuity. In this paper, we introduce Functional Attention with a
Mixture-of-Experts (FAME), an end-to-end, fully data-driven framework for
function-on-function regression. FAME forms continuous attention by coupling a
bidirectional neural controlled differential equation with MoE-driven vector
fields to capture intra-functional continuity, and further fuses change to
inter-functional dependencies via multi-head cross attention. Extensive
experiments on synthetic and real-world functional-regression benchmarks show
that FAME achieves state-of-the-art accuracy, strong robustness to arbitrarily
sampled discrete observations of functions.
Object Oriented Programming
Wuhan University, The Unv
Abstract
Establishing fair and robust benchmarks is essential for evaluating
intelligent code generation by large language models (LLMs). Our survey of 35
existing benchmarks uncovers three major imbalances: 85.7% focus on a single
programming language; 94.3% target only function-level or statement-level
tasks; and over 80% include fewer than ten test cases on average. To address
these gaps, we propose MultiOOP, a multi-language object-oriented programming
benchmark covering six popular languages (Python, PHP, C++, C#, Java,
JavaScript) with 267 tasks per language. We design a translator that extends an
existing single-language OOP benchmark and the pass@o metric to a multilingual
setting. Moreover, we propose an automated framework for augmenting test cases
to ensure the reliability of the evaluation results. We evaluate 14 mainstream
LLMs under zero-shot prompting and report three key findings: 1) Substantial
performance degradation: pass@1 scores on MultiOOP drop by up to 65.6
percentage points compared to function-level tasks (e.g., HumanEval). 2)
Cross-language variability: GPT-4o mini achieves pass@1 of 48.06% in Python but
only 0.12%-15.26% in other languages, indicating limited multilingual
generalization. 3) Conceptual gaps: pass@o scores are consistently 1.1-19.2
points lower than pass@k, demonstrating that LLMs often generate executable
code without fully capturing core OOP concepts. Our benchmark, metric
extensions, and evaluation scripts will be publicly released to foster a more
balanced and comprehensive assessment of LLMs in object-oriented code
generation. Our code and data will be released at
https://github.com/alphadl/OOP-eval and
https://huggingface.co/datasets/codeai-dteam/MultiOOP respectively.
University of Toronto and
Abstract
Novice programmers often struggle to understand how code executes and to form
the abstract mental models necessary for effective problem-solving, challenges
that are amplified in large, diverse introductory courses where students'
backgrounds, language proficiencies, and prior experiences vary widely. This
study examines whether interactive, multi-representational visualizations,
combining synchronized code views, memory diagrams, and conceptual analogies,
can help manage cognitive load and foster engagement more effectively than
single-visual or text-only approaches. Over a 12-week deployment in a
high-enrolment introductory Python course (N = 829), students who relied solely
on text-based explanations reported significantly higher immediate mental
effort than those using visual aids, although overall cognitive load did not
differ significantly among conditions. The multi-representational approach
consistently yielded higher engagement than both single-visual and text-only
methods. Usage logs indicated that learners' interaction patterns varied with
topic complexity, and predictive modelling suggested that early experiences of
high cognitive load were associated with lower longer-term perceptions of
clarity and helpfulness. Individual differences, including language proficiency
and prior programming experience, moderated these patterns. By integrating
multiple external representations with scaffolded support adapted to diverse
learner profiles, our findings highlight design considerations for creating
visualization tools that more effectively support novices learning to program.
Programming Language Design
Cornell University,2, and
Abstract
We study code-to-metric regression: predicting numeric outcomes of code
executions, a challenging task due to the open-ended nature of programming
languages. While prior methods have resorted to heavy and domain-specific
feature engineering, we show that a single unified Regression Language Model
(RLM) can simultaneously predict directly from text, (i) the memory footprint
of code across multiple high-level languages such as Python and C++, (ii) the
latency of Triton GPU kernels, and (iii) the accuracy and speed of trained
neural networks represented in ONNX. In particular, a relatively small 300M
parameter RLM initialized from T5Gemma, obtains > 0.9 Spearman-rank on
competitive programming submissions from APPS, and a single unified model
achieves > 0.5 average Spearman-rank across 17 separate languages from CodeNet.
Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five
classic NAS design spaces previously dominated by graph neural networks, and
simultaneously predict architecture latencies on numerous hardware platforms.
Abstract
Code generation has shown great promise in assisting software development. A
fundamental yet underexplored question is how the choice of code representation
affects model performance. While existing studies employ various
representations, such as treating code as plain text, grammar rule sequences,
or syntax tree sequences, they lack a principled understanding of the
relationship between parsing difficulty and model effectiveness. This paper
proposes a conjecture: the easier a representation is to parse, the better
performance the model achieves. We formalize this idea using grammar classes,
where representations in simpler classes (e.g., LL(1)) are easier to parse.
Through a controlled experiment on a Python-based DSL, we show that parsing
difficulty strongly correlates with model performance. Motivated by this
finding, we present GramTrans, a general approach that automatically transforms
a context-free language into a representation within the LL(1) class. GramTrans
introduces a novel hierarchical conflict elimination algorithm, enabling a
flexible trade-off between syntactic simplicity and token efficiency. We
evaluate GramTrans on both Python and Java using three code generation models:
StarCoder 1B, DeepSeek-Coder 1.3B, and Qwen2.5 1.5B. Across multiple
benchmarks, GramTrans consistently delivers significant improvements over
baseline representations. Furthermore, our analysis of existing representations
reconfirms the strong alignment between parsing difficulty and model
performance, providing additional support for the conjecture.
Design Patterns
University of Porto
Abstract
Due to the susceptibility of Artificial Intelligence (AI) to data
perturbations and adversarial examples, it is crucial to perform a thorough
robustness evaluation before any Machine Learning (ML) model is deployed.
However, examining a model's decision boundaries and identifying potential
vulnerabilities typically requires access to the training and testing datasets,
which may pose risks to data privacy and confidentiality. To improve
transparency in organizations that handle confidential data or manage critical
infrastructure, it is essential to allow external verification and validation
of AI without the disclosure of private datasets. This paper presents
Systematic Pattern Analysis (SPATA), a deterministic method that converts any
tabular dataset to a domain-independent representation of its statistical
patterns, to provide more detailed and transparent data cards. SPATA computes
the projection of each data instance into a discrete space where they can be
analyzed and compared, without risking data leakage. These projected datasets
can be reliably used for the evaluation of how different features affect ML
model robustness and for the generation of interpretable explanations of their
behavior, contributing to more trustworthy AI.
Massachusetts Institute
Abstract
During development, highly ordered structures emerge as cells collectively
coordinate with each other. While recent advances have clarified how individual
cells process and respond to external signals, understanding collective
cellular decision making remains a major challenge. Here, we introduce a
minimal, analytically tractable, model of cell patterning via local cell-cell
communication. Using this framework, we identify a trade-off between the speed
and accuracy of collective pattern formation and, by adapting techniques from
stochastic chemical kinetics, quantify how information flows between cells
during patterning. Our analysis reveals counterintuitive features of collective
patterning: globally optimized solutions do not necessarily maximize
intercellular information transfer and individual cells may appear suboptimal
in isolation. Moreover, the model predicts that instantaneous information
shared between cells can be non-monotonic in time as patterning occurs. An
analysis of recent experimental data from lateral inhibition in Drosophila
pupal abdomen finds a qualitatively similar effect.
Programming Paradigms
Georgia Institute of Tecn
Abstract
While LLM-based specification generation is gaining traction, existing tools
primarily focus on mainstream programming languages like C, Java, and even
Solidity, leaving emerging and yet verification-oriented languages like Move
underexplored. In this paper, we introduce MSG, an automated specification
generation tool designed for Move smart contracts. MSG aims to highlight key
insights that uniquely present when applying LLM-based specification generation
to a new ecosystem. Specifically, MSG demonstrates that LLMs exhibit robust
code comprehension and generation capabilities even for non-mainstream
languages. MSG successfully generates verifiable specifications for 84% of
tested Move functions and even identifies clauses previously overlooked by
experts. Additionally, MSG shows that explicitly leveraging specification
language features through an agentic, modular design improves specification
quality substantially (generating 57% more verifiable clauses than conventional
designs). Incorporating feedback from the verification toolchain further
enhances the effectiveness of MSG, leading to a 30% increase in generated
verifiable specifications.