Hunan University
AI Insights - Higher computing power allows for handling more complex queries, thus increasing the assignment ratio to edge servers. (ML: 0.95)ππ
- The assignment ratio increases as storage capacity increases, allowing more queries to be processed at the edge. (ML: 0.93)ππ
- Variations in bandwidths between end users and edge servers affect query execution efficiency. (ML: 0.91)ππ
- The method assumes that the number of end users is fixed and does not account for dynamic changes in network topology or load conditions. (ML: 0.89)ππ
- Edge-First: prioritizes edge servers for query processing whenever they are capable, without considering the issue of computational resources allocation. (ML: 0.89)ππ
- Greedy: allocates each query to the cloud or a capable edge server that minimizes execution cost. (ML: 0.87)ππ
- Random: assigns queries randomly to either the cloud or capable edge servers. (ML: 0.84)ππ
- The proposed method is more efficient and scalable than other strategies, especially when considering varying storage capacities, computing power, bandwidths, numbers of edge servers, and graph sizes. (ML: 0.84)ππ
- The proposed method outperforms other strategies, demonstrating more efficient query scheduling and a performance improvement of 15.46% in response time. (ML: 0.83)ππ
- Cloud-Only: processes all queries in the cloud. (ML: 0.82)ππ
Abstract
With the increasing use of RDF graphs, storing and querying such data using SPARQL remains a critical problem. Current mainstream solutions rely on cloud-based data management architectures, but often suffer from performance bottle- necks in environments with limited bandwidth or high system load. To address this issue, this paper explores for the first time the integration of edge computing to move graph data storage and processing to edge environments, thereby improving query performance. This approach requires offloading query processing to edge servers, which involves addressing two challenges: data localization and network scheduling. First, the data localization challenge lies in computing the subgraphs maintained on edge servers to quickly identify the servers that can handle specific queries. To address this challenge, we introduce a new concept of pattern-induced subgraphs. Second, the network scheduling challenge involves efficiently assigning queries to edge and cloud servers to optimize overall system performance. We tackle this by constructing a overall system model that jointly captures data distribution, query characteristics, network communication, and computational resources. Accordingly, we further propose a joint formulation of query assignment and computational resource allocation, modeling it as a Mixed Integer Nonlinear Programming (MINLP) problem and solve this problem using a modified branch-and-bound algorithm. Experimental results on real datasets under a real cloud platform demonstrate that our proposed method outperforms the state-of-the-art baseline methods in terms of efficiency. The codes are available on GitHub
Why we are recommending this paper?
Due to your Interest in NoSQL Databases
This paper directly addresses the query processing challenges associated with large RDF graphs, aligning with your interest in NoSQL databases and data warehousing. The focus on collaborative approaches and performance optimization is highly relevant to your database design concerns.
Shanghai Jiao Tong University
AI Insights - Table learning: A type of machine learning that involves training models on relational data stored in tables. (ML: 0.96)ππ
- Declarative programming: A style of programming where the focus is on specifying what the program should accomplish, rather than how it should accomplish it. (ML: 0.94)ππ
- It translates TLSQL statements into standard SQL queries and structured task descriptions consumable by downstream table learning frameworks. (ML: 0.90)ππ
- This eliminates the need for data export, manual preprocessing, or low-level pipeline management. (ML: 0.89)ππ
- TLSQL provides a compelling foundation for future research in database-centric learning, offering both a conceptual framework and a practical prototype. (ML: 0.87)ππ
- TLSQL is a declarative SQL-like interface for table learning over modern relational databases. (ML: 0.84)ππ
- Future work includes integrating large language models to generate and refine TLSQL programs from natural language, further simplifying end-to-end table learning in real-world settings. (ML: 0.83)ππ
- SQL-like interface: An interface that allows users to specify queries and operations using a syntax similar to SQL (Structured Query Language). (ML: 0.83)ππ
- The integration with large language models to generate and refine TLSQL programs from natural language is still future work. (ML: 0.82)ππ
- The paper does not provide a detailed evaluation of TLSQL's performance and scalability. (ML: 0.73)ππ
Abstract
Table learning, which lies at the intersection of machine learning and modern database systems, has recently attracted growing attention. However, existing frameworks typically require explicit data export and extensive feature engineering, creating a high barrier for database practitioners. We present TLSQL (Table Learning Structured Query Language), a system that enables table learning directly over relational databases via SQL-like declarative specifications. TLSQL is implemented as a lightweight Python library that translates these specifications into standard SQL queries and structured learning task descriptions. The generated SQL queries are executed natively by the database engine, while the task descriptions are consumed by downstream table learning frameworks. This design allows users to focus on modeling and analysis rather than low-level data preparation and pipeline orchestration. Experiments on real-world datasets demonstrate that TLSQL effectively lowers the barrier to integrating machine learning into databasecentric workflows. Our code is available at https://github.com/rllmproject/tlsql/.
Why we are recommending this paper?
Due to your Interest in Relational Databases
Given your interest in SQL and database systems, this paperβs exploration of Table Learning offers a potentially innovative approach to query language development. The work tackles the complexities of modern database systems, directly relating to your stated interests.
Columbia University
AI Insights - They also note that the results may not generalize to other domains or tasks beyond text-to-SQL. (ML: 0.99)ππ
- The authors acknowledge the limitations of their study, including the small size of the dataset and the potential bias in the persona modeling prompt. (ML: 0.98)ππ
- The authors propose a framework for generating high-quality datasets that align with business intelligence (BI) settings and evaluate the performance of various LLMs using this framework. (ML: 0.97)ππ
- Previous studies have shown that LLMs can be effective on certain NLP tasks, but struggle with others, highlighting the need for more research in this area. (ML: 0.96)ππ
- The proposed framework generates high-quality datasets that align with BI settings and evaluates the performance of various LLMs using this framework. (ML: 0.96)ππ
- The text is a research paper on developing a benchmark for evaluating large language models (LLMs) on real-world text-to-SQL tasks. (ML: 0.96)ππ
- The results show that some LLMs perform well on certain aspects of text-to-SQL, but struggle with others, highlighting the need for more research in this area. (ML: 0.95)ππ
- Text-to-SQL: A task where a natural language question is converted into an SQL query to retrieve relevant data from a database. (ML: 0.94)ππ
- LLMs: Large Language Models, which are artificial intelligence models that can process and generate human-like text. (ML: 0.94)ππ
- ReAct paradigm: A prompt-based agentic framework for evaluating LLMs on complex tasks such as text-to-SQL. (ML: 0.93)ππ
Abstract
Evaluating Text-to-SQL agents in private business intelligence (BI) settings is challenging due to the scarcity of realistic, domain-specific data. While synthetic evaluation data offers a scalable solution, existing generation methods fail to capture business realism--whether questions reflect realistic business logic and workflows. We propose a Business Logic-Driven Data Synthesis framework that generates data grounded in business personas, work scenarios, and workflows. In addition, we improve the data quality by imposing a business reasoning complexity control strategy that diversifies the analytical reasoning steps required to answer the questions. Experiments on a production-scale Salesforce database show that our synthesized data achieves high business realism (98.44%), substantially outperforming OmniSQL (+19.5%) and SQL-Factory (+54.7%), while maintaining strong question-SQL alignment (98.59%). Our synthetic data also reveals that state-of-the-art Text-to-SQL models still have significant performance gaps, achieving only 42.86% execution accuracy on the most complex business queries.
Why we are recommending this paper?
Due to your Interest in SQL
This paper's focus on synthetic data generation for Text-to-SQL agents aligns with your interest in data warehousing and database design, particularly in the context of business intelligence applications. It provides a method for creating realistic data scenarios for testing and evaluating your database solutions.
Aalborg University
AI Insights - They also discuss the challenges associated with processing and analyzing such massive amounts of data in real-time. (ML: 0.95)ππ
- However, there are still challenges associated with processing and analyzing such massive amounts of data in real-time. (ML: 0.95)ππ
- There is limited discussion on the scalability and performance of the system. (ML: 0.93)ππ
- The paper does not provide a detailed evaluation of the proposed approach. (ML: 0.91)ππ
- The authors discuss various existing approaches for managing trajectory data, including Hadoop GIS, TrajMesa, and MobilityDB. (ML: 0.85)ππ
- The authors propose a novel approach for managing large-scale AIS (Automatic Identification System) datasets using a distributed database system. (ML: 0.85)ππ
- The paper discusses the design and implementation of a maritime data warehouse, called MobiSpaces, which is part of the European Union's funded Project under grant agreement no 101070279. (ML: 0.84)ππ
- AIS: Automatic Identification System MobiSpaces: A maritime data warehouse project funded by the European Union The proposed approach for managing AIS datasets using a distributed database system is efficient and scalable. (ML: 0.81)ππ
Abstract
AIS data from ships is excellent for analyzing single-ship movements and monitoring all ships within a specific area. However, the AIS data needs to be cleaned, processed, and stored before being usable. This paper presents a system consisting of an efficient and modular ETL process for loading AIS data, as well as a distributed spatial data warehouse storing the trajectories of ships. To efficiently analyze a large set of ships, a raster approach to querying the AIS data is proposed. A spatially partitioned data warehouse with a granularized cell representation and heatmap presentation is designed, developed, and evaluated. Currently the data warehouse stores ~312 million kilometers of ship trajectories and more than +8 billion rows in the largest table. It is found that searching the cell representation is faster than searching the trajectory representation. Further, we show that the spatially divided shards enable a consistently good scale-up for both cell and heatmap analytics in large areas, ranging between 354% to 1164% with a 5x increase in workers
Why we are recommending this paper?
Due to your Interest in Data Warehousing
Considering your interest in data warehousing, this paper presents a system for handling spatial data, specifically AIS data, which is a valuable area for data analysis and warehousing. The focus on data processing and storage is a key aspect of your database design interests.
University of Bremen
AI Insights - Reasoning: The process of drawing conclusions or making decisions based on the knowledge represented in an ontology. (ML: 0.98)ππ
- Knowledge representation and reasoning (KRR): The process of representing knowledge in a way that can be used by machines to reason about it. (ML: 0.98)ππ
- Ontology: A formal representation of knowledge that provides a common understanding of the meaning of terms and concepts. (ML: 0.97)ππ
- They also discuss the importance of explainability and adaptability in KRR for robotics, as well as the need for more research in this area. (ML: 0.95)ππ
- It highlights the potential benefits of using an ontology-based framework for representing and reasoning about knowledge in robotics. (ML: 0.94)ππ
- The paper discusses the challenges of integrating knowledge representation and reasoning (KRR) into robotic systems. (ML: 0.93)ππ
- The paper concludes by emphasizing the need for more research in KRR for robotics, particularly in areas such as explainability and adaptability. (ML: 0.89)ππ
- The authors propose an ontology-based framework for representing and reasoning about knowledge in robotics, which can be used to integrate various components of a robotic system. (ML: 0.87)ππ
- It highlights the need for a more flexible and open approach to KRR in robotics, rather than relying on proprietary or closed systems. (ML: 0.83)ππ
- The authors also discuss the importance of developing more flexible and open approaches to KRR in robotics. (ML: 0.82)ππ
Abstract
This paper introduces KRROOD, a framework designed to bridge the integration gap between modern software engineering and Knowledge Representation & Reasoning (KR&R) systems. While Object-Oriented Programming (OOP) is the standard for developing complex applications, existing KR&R frameworks often rely on external ontologies and specialized languages that are difficult to integrate with imperative code. KRROOD addresses this by treating knowledge as a first-class programming abstraction using native class structures, bridging the gap between the logic programming and OOP paradigms. We evaluate the system on the OWL2Bench benchmark and a human-robot task learning scenario. Experimental results show that KRROOD achieves strong performance while supporting the expressive reasoning required for real-world autonomous systems.
Why we are recommending this paper?
Due to your Interest in Database Design
This paperβs exploration of integrating Knowledge Representation & Reasoning with Object-Oriented Programming is relevant to your interest in database design and data modeling. It offers a framework for combining these two areas, potentially informing your approach to complex database systems.