Beyond SQL: AI for Complex Data Management


ObjectivesFormatCall for PapersDatesSubmissionsProgramOrganization

Objectives

There has been a tremendous amount of work in the space of converting natural language to SQL queries. While this work is very valuable to business users, enterprise data management has many complex uses of SQL that are less actively covered by current work. Examples are querying structured and unstructured data, knowledge graphs, property graphs, performing extract load transform workloads, remediating complex SQL queries as application demands change, etc. As AI is increasingly used to transform simpler querying workloads, there is an opportunity to apply them in these other aspects of data management. We aim to bring together researchers exploring these diverse topics together to promote cross-fertilization and sharing of techniques that may transfer well from one domain to the other.

Format

Beyond SQL will be held as a half-day workshop at ICDE 2026. The proceedings of all workshops will be published jointly alongside with the conference proceedings.

The workshop accepts regular research papers and industrial papers of the following types:

Call for Papers

Audience: Our workshop encourages participation from researchers in data management, AI, natural language processing, vision and semantic web working on a wide range of problems relevant to these topics. We hope that this will constitute a single reference point for the researchers and practitioners working in that area and help form new collaborations. We also aim to provide a venue for researchers from industry and practitioners present use cases and discuss their needs in addressing real-world problems and large-scale solutions especially with respect to newer topics such as data products.

Topics of Interest include but are not limited to:

Novelty: Submissions must present original work that has not been previously published or accepted at any other conference or workshop. Contributions should demonstrate substantial novelty and provide meaningful advancement beyond prior work.

Dates

  Deadline
Paper Submission February 16th, 2026
Notification of acceptance: March 2nd, 2026
Camera-ready copy due: March 9th, 2026
Workshop day: May 4th, 2026
ICDE Main Conference: May 5-8th, 2026

Submissions

Submissions will be submitted over OpenReview and will be reviewed in a single-anonymous manner.

Submit your paper at: https://openreview.net/group?id=IEEE.org/ICDE/2026/Workshop/Beyond_SQL

Template: Manuscripts must be prepared in accordance with the IEEE format that is also used for the main ICDE conference.

Conference Program

Time Program Speaker
11:00 Keynote 1 - BIRD-SQL: Towards Automatic Data-Centric Code Generation [slides] Reynold Cheng
12:10 Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL [paper] [slides] Saurabh Deochake, Debajyoti Mukhopadhyay
12:30 Lunch Break  
14:00 Keynote 2 - Agentic optimization for unstructured data processing [slides] Aditya Parameswaran
14:55 Automatic End-to-End Data Integration using Large Language Models [paper] [slides] Aaron Steiner, Christian Bizer
15:15 Short Break  
15:20 Towards Executing Sloppy SQL Queries Over Tabular Data Lakes [paper] [slides] Jan-Micha Bodensohn, Jakob Steinke, Carsten Binnig
15:40 How Far Can They Map? Probing LLM Capabilities for Cross-Schema SQL Generation [paper] Mohammadreza Daviran, Davood Rafiei
16:00 Closing  

Keynotes:

BIRD-SQL: Towards Automatic Data-Centric Code Generation - Reynold Cheng (University of Hong Kong (HKU))

Abstract: Database systems, which provide various operations for defining and querying data, enable large-scale AI systems and intelligent applications in various domains. Due to recent advances in large language models (LLMs), automating database operations through code generation has become increasingly attainable. This capability has given rise to a new paradigm—Data-Centric Code Generation (DCCG)—which aims to build systems that can automatically understand, manipulate, and reason over data. To realize DCCG, I will discuss our team’s effort in building benchmarking systems, including BIRD-SQL, a large-scale Text-to-SQL benchmark on real databases, and SWE-SQL, which gauges the ability that an LLM resolves user SQL issues. These benchmarks, widely used in the industry, reveal hallucination and other issues faced by LLMs. To address these challenges, I will present our work in graph-aware reasoning, SQL correction, and multi-turn tabular data analysis. They aim to evolve LLMs from static code generators into autonomous, trustworthy agents that can understand and generate data-driven software systems.

Bio: Prof. Reynold Cheng is the Division Head and Professor (AI & Data Science), at the School of Computing and Data Science in the University of Hong Kong (HKU) established in 2024. He is a Steering Committee Member of the HKU Musketeers Foundation Institute of Data Science. He is an academic advisor to the College of Professional and Continuing Education of HKPU. He was an Associate Dean of Engineering in 2022-24. His research interests are in data science, big graph analytics and uncertain databases. Professor Cheng is named the AI 2000 Most Influential Scholar Honorable Mention in Database in 2023 to 2025. He received the ACM Distinguished Membership Award and the HKU Outstanding Research Student Supervisor Award in 2023. He was listed as the World’s Top 2% Scientists by Stanford University in 2022. He received the SIGMOD Research Highlights Reward 2020, International Exhibition of Inventions Geneva Award (2026), HKICT Awards (2021, 2023), HKU Knowledge Exchange Award (2024), HKU Engineering Knowledge Exchange Award (2024, 2021), HKU Engineering Best Teaching Award (2023, 2024), and HKU Outstanding Young Researcher Award 2011-12. He received the Universitas 21 Fellowship in 2011, and Hong Kong Polytechnic University Computing Performance Awards (2006, 2007). He was a PC co-chair of IEEE ICDE 2021. He is on the editorial board of PVLDB, ACM TSAS, IS, DAPD, and DSEJ.

Agentic optimization for unstructured data processing - Aditya Parameswaran (University of California, Berkeley)

Abstract: We’re increasingly starting to see unstructured data processing making its way into data systems, thanks to the data understanding capabilities of modern LLMs. At Berkeley, we’ve been working on optimizing unstructured data processing operators and pipelines for cost and accuracy, with help from agents. I’ll describe a few anecdotes from our recent work in this vein, both for entire pipeline optimization, as well as optimizing individual operators. Our work has been broadly adopted and used both by database vendors as well as by individual users and teams across sectors.

Bio: Aditya Parameswaran is an Associate Professor in Computer Science at UC Berkeley, and a co-director of the EPIC Data Lab. Aditya leverages techniques from artificial intelligence, databases, and human-computer interaction to solve hard data challenges. Multiple open-source tools developed in his group have received thousands of GitHub stars (including Modin, Lux, IPyFlow, DocETL)—and have been downloaded tens of millions of times overall across a spectrum of industries. His research was commercialized as a startup, Ponder, in 2021, where he served as Co-founder and President, before its acquisition by Snowflake. Aditya has received the Alfred P. Sloan Research Fellowship, VLDB Early Career Award, the NSF CAREER Award, the TCDE Rising Star Award, along with other recognitions. His website is at http://adityagp.net.

Organization

For questions, please contact: solashirai(at)ibm.com

Beyond SQL is organized by:

Program Committee