Abstract
Most visual programming languages (VPLs) are domain-specific, with few
general-purpose VPLs like Programming Without Coding Technology (PWCT). These
general-purpose VPLs are developed using textual programming languages and
improving them requires textual programming. In this thesis, we designed and
developed PWCT2, a dual-language (Arabic/English), general-purpose,
self-hosting visual programming language. Before doing so, we specifically
designed a textual programming language called Ring for its development. Ring
is a dynamically typed language with a lightweight implementation, offering
syntax customization features. It permits the creation of domain-specific
languages through new features that extend object-oriented programming,
allowing for specialized languages resembling Cascading Style Sheets (CSS) or
Supernova language. The Ring Compiler and Virtual Machine are designed using
the PWCT visual programming language where the visual implementation is
composed of 18,945 components that generate 24,743 lines of C code, which
increases the abstraction level and hides unnecessary details. Using PWCT to
develop Ring allowed us to realize several issues in PWCT, which led to the
development of the PWCT2 visual programming language using the Ring textual
programming language. PWCT2 provides approximately 36 times faster code
generation and requires 20 times less storage for visual source files. It also
allows for the conversion of Ring code into visual code, enabling the creation
of a self-hosting VPL that can be developed using itself. PWCT2 consists of
approximately 92,000 lines of Ring code and comes with 394 visual components.
PWCT2 is distributed to many users through the Steam platform and has received
positive feedback, On Steam, 1772 users have launched the software, and the
total recorded usage time exceeds 17,000 hours, encouraging further research
and development.
Queens University
Abstract
As software systems grow in scale and complexity, understanding the
distribution of programming language topics within source code becomes
increasingly important for guiding technical decisions, improving onboarding,
and informing tooling and education. This paper presents the design,
implementation, and evaluation of a novel programming language topic
classification workflow. Our approach combines a multi-label Support Vector
Machine (SVM) with a sliding window and voting strategy to enable fine-grained
localization of core language concepts such as operator overloading, virtual
functions, inheritance, and templates. Trained on the IBM Project CodeNet
dataset, our model achieves an average F1 score of 0.90 across topics and 0.75
in code-topic highlight. Our findings contribute empirical insights and a
reusable pipeline for researchers and practitioners interested in code analysis
and data-driven software engineering.
AI Insights - Fine‑grained localization pinpoints individual language constructs, turning code into a searchable map.
- Multi‑label classification lets a single snippet carry several topic tags, mirroring real‑world code overlap.
- A sliding‑window scans every token span, ensuring no subtle operator overloading slips past the model.
- Voting aggregates window predictions, smoothing noise and boosting the 75 % highlight F1 into a reliable tool.
- Training on IBM’s CodeNet, the workflow learns from 1.5 M diverse projects, a treasure trove for future research.
- The study flags that complexity can erode accuracy, hinting at richer data or deeper models as next steps.
- For deeper dives, check Deitel’s C++ guide, Stroustrup’s classic, and Linstead’s probabilistic topic‑model paper.