Papers from 22 to 26 September, 2025

Here are the personalized paper recommendations sorted by most relevant
Functional Programming
👍 👎 ♥ Save
Abstract
In this paper, we establish a distributed functional optimization (DFO) theory based on time-varying multi-agent networks. The vast majority of existing distributed optimization theories are developed based on Euclidean decision variables. However, for many scenarios in machine learning and statistical learning, such as reproducing kernel spaces or probability measure spaces that use functions or probability measures as fundamental variables, the development of existing distributed optimization theories exhibit obvious theoretical and technical deficiencies. This paper addresses these issues by developing a novel general DFO theory on Banach spaces, allowing functional learning problems in the aforementioned scenarios to be incorporated into our framework for resolution. We study both convex and nonconvex DFO problems and rigorously establish a comprehensive convergence theory of distributed functional mirror descent and distributed functional gradient descent algorithm to solve them. Satisfactory convergence rates are fully derived. The work has provided generic analyzing frameworks for distributed optimization. The established theory is shown to have crucial application value in the kernel-based distributed learning theory.
👍 👎 ♥ Save
Michigan State University
Abstract
Information theory, originating from Shannon's work on communication systems, has become a fundamental tool across neuroscience, genetics, physics, and machine learning. However, the application of information theory is often limited to the simplest case: mutual information between two variables. A central challenge in extending information theory to multivariate systems is decomposition: understanding how the information that multiple variables collectively provide about a target can be broken down into the distinct contributions that are assignable to individual variables or their interactions. To restate the problem clearly, what is sought after is a decomposition of the mutual information between a set of inputs (or parts) and an output (or whole). In this work, we introduce Functional Information Decomposition (FID) a new approach to information decomposition that differs from prior methods by operating on complete functional relationships rather than statistical correlations, enabling precise quantification of independent and synergistic contributions.
AI Insights
  • FID constructs the full function space from data and bounds each mutual‑information term, tightening uncertainty as samples grow.
  • Injectivity analysis links redundancy to biased or degenerate inputs and synergy to over‑injection.
  • FID assumes independent, uniform inputs, highlighting limits and prompting extensions to correlated cases.
  • Separating structural contributions from noise, FID yields plausible, interpretable values for individual and joint terms.
  • The method iteratively refines estimates, ideal for progressive data collection in neuroscience or genetics.
  • Key references: Watanabe, McGill, and Willimans & Beer.
  • Emmeche, Køppe, Stjernfelt, and Anderson offer philosophical lenses on emergent information.
Object Oriented Programming
👍 👎 ♥ Save
Abstract
Code data has been shown to enhance the reasoning capabilities of large language models (LLMs), but it remains unclear which aspects of code are most responsible. We investigate this question with a systematic, data-centric framework. We construct parallel instruction datasets in ten programming languages and apply controlled perturbations that selectively disrupt structural or semantic properties of code. We then finetune LLMs from five model families and eight scales on each variant and evaluate their performance on natural language, math, and code tasks. Across 3,331 experiments, our results show that LLMs are more vulnerable to structural perturbations than semantic ones, particularly on math and code tasks. Appropriate abstractions like pseudocode and flowcharts can be as effective as code, while encoding the same information with fewer tokens without adhering to original syntax can often retain or even improve performance. Remarkably, even corrupted code with misleading signals remains competitive when surface-level regularities persist. Finally, syntactic styles also shape task-specific gains with Python favoring natural language reasoning and lower-level languages such as Java and Rust favoring math. Through our systematic framework, we aim to provide insight into how different properties of code influence reasoning and inform the design of training data for enhancing LLM reasoning capabilities.
Programming Language Design
👍 👎 ♥ Save
Paper visualization
Rate this image: 😍 👍 👎
Abstract
Most visual programming languages (VPLs) are domain-specific, with few general-purpose VPLs like Programming Without Coding Technology (PWCT). These general-purpose VPLs are developed using textual programming languages and improving them requires textual programming. In this thesis, we designed and developed PWCT2, a dual-language (Arabic/English), general-purpose, self-hosting visual programming language. Before doing so, we specifically designed a textual programming language called Ring for its development. Ring is a dynamically typed language with a lightweight implementation, offering syntax customization features. It permits the creation of domain-specific languages through new features that extend object-oriented programming, allowing for specialized languages resembling Cascading Style Sheets (CSS) or Supernova language. The Ring Compiler and Virtual Machine are designed using the PWCT visual programming language where the visual implementation is composed of 18,945 components that generate 24,743 lines of C code, which increases the abstraction level and hides unnecessary details. Using PWCT to develop Ring allowed us to realize several issues in PWCT, which led to the development of the PWCT2 visual programming language using the Ring textual programming language. PWCT2 provides approximately 36 times faster code generation and requires 20 times less storage for visual source files. It also allows for the conversion of Ring code into visual code, enabling the creation of a self-hosting VPL that can be developed using itself. PWCT2 consists of approximately 92,000 lines of Ring code and comes with 394 visual components. PWCT2 is distributed to many users through the Steam platform and has received positive feedback, On Steam, 1772 users have launched the software, and the total recorded usage time exceeds 17,000 hours, encouraging further research and development.
👍 👎 ♥ Save
Queens University
Abstract
As software systems grow in scale and complexity, understanding the distribution of programming language topics within source code becomes increasingly important for guiding technical decisions, improving onboarding, and informing tooling and education. This paper presents the design, implementation, and evaluation of a novel programming language topic classification workflow. Our approach combines a multi-label Support Vector Machine (SVM) with a sliding window and voting strategy to enable fine-grained localization of core language concepts such as operator overloading, virtual functions, inheritance, and templates. Trained on the IBM Project CodeNet dataset, our model achieves an average F1 score of 0.90 across topics and 0.75 in code-topic highlight. Our findings contribute empirical insights and a reusable pipeline for researchers and practitioners interested in code analysis and data-driven software engineering.
AI Insights
  • Fine‑grained localization pinpoints individual language constructs, turning code into a searchable map.
  • Multi‑label classification lets a single snippet carry several topic tags, mirroring real‑world code overlap.
  • A sliding‑window scans every token span, ensuring no subtle operator overloading slips past the model.
  • Voting aggregates window predictions, smoothing noise and boosting the 75 % highlight F1 into a reliable tool.
  • Training on IBM’s CodeNet, the workflow learns from 1.5 M diverse projects, a treasure trove for future research.
  • The study flags that complexity can erode accuracy, hinting at richer data or deeper models as next steps.
  • For deeper dives, check Deitel’s C++ guide, Stroustrup’s classic, and Linstead’s probabilistic topic‑model paper.
Design Patterns
👍 👎 ♥ Save
Abstract
We explore an extended notion of a tiling of translational finite local complexity (FLC) being substitutional, or hierarchical, to Euclidean `patterns' and define, more generally, a similar notion for spaces of patterns. We prove recognisability results for these, relating injectivity of substitution to aperiodicity, with no minimality requirements. More precisely, for a suitable power of the substitution map, we determine the size of fibre over any pattern to be the index of its inflated (translational) periods in its whole group of periods. This answers an open question of Cortez and Solomyak, on whether non-periodic tilings necessarily have unique pre-images under substitution: they do, even for a wider notion of pattern and being substitutional; conversely, for an appropriate power of the substitution, discretely periodic points necessarily have multiple pre-images. Our results cover examples of FLC pattern spaces, such as of uniformly discrete but non-relatively dense point sets, that contain elements with non-discrete groups of periods.
👍 👎 ♥ Save
The University of Melboun
Abstract
As designers become familiar with Generative AI, a new concept is emerging: Agentic AI. While generative AI produces output in response to prompts, agentic AI systems promise to perform mundane tasks autonomously, potentially freeing designers to focus on what they love: being creative. But how do designers feel about integrating agentic AI systems into their workflows? Through design fiction, we investigated how designers want to interact with a collaborative agentic AI platform. Ten professional designers imagined and discussed collaborating with an AI agent to organise inspiration sources and ideate. Our findings highlight the roles AI agents can play in supporting designers, the division of authority between humans and AI, and how designers' intent can be explained to AI agents beyond prompts. We synthesise our findings into a conceptual framework that identifies authority distribution among humans and AI agents and discuss directions for utilising AI agents in future design workflows.
AI Insights
  • Design fiction revealed that designers envision AI agents curating inspiration libraries autonomously, freeing creative time.
  • The study highlights a persistent risk of design fixation when generative outputs dominate, underscoring the need for human oversight.
  • A conceptual framework emerged mapping authority tiers, showing designers retain final approval while AI handles routine iteration.
  • Papers such as GANCollage and Code Shaping illustrate concrete tools where AI generates mood boards and edits code from free‑form sketches.
  • Early‑stage research indicates agentic AI can streamline client collaboration, yet its impact on stakeholder communication remains underexplored.
  • Core definitions: Agentic AI acts independently to achieve goals; Generative AI produces novel content from prompts.
  • Future work should investigate how intent can be communicated to agents beyond textual prompts, perhaps via structured intent models.
Programming Paradigms
👍 👎 ♥ Save
Abstract
Integrating LLM powered operators in declarative query languages allows for the combination of cheap and interpretable functions with powerful, generalizable language model reasoning. However, in order to benefit from the optimized execution of a database query language like SQL, generated outputs must align with the rules enforced by both type checkers and database contents. Current approaches address this challenge with orchestrations consisting of many LLM-based post-processing calls to ensure alignment between generated outputs and database values, introducing performance bottlenecks. We perform a study on the ability of various sized open-source language models to both parse and execute functions within a query language based on SQL, showing that small language models can excel as function executors over hybrid data sources. Then, we propose an efficient solution to enforce the well-typedness of LLM functions, demonstrating 7% accuracy improvement on a multi-hop question answering dataset with 53% improvement in latency over comparable solutions. We make our implementation available at https://github.com/parkervg/blendsql
Unsubscribe from these updates