Beyond SQL: AI for Complex Data Management
Objectives
There has been a tremendous amount of work in the space of converting natural language to SQL queries. While this work is very valuable to business users, enterprise data management has many complex uses of SQL that are less actively covered by current work. Examples are querying structured and unstructured data, knowledge graphs, property graphs, performing extract load transform workloads, remediating complex SQL queries as application demands change, etc. As AI is increasingly used to transform simpler querying workloads, there is an opportunity to apply them in these other aspects of data management. We aim to bring together researchers exploring these diverse topics together to promote cross-fertilization and sharing of techniques that may transfer well from one domain to the other.
Format
Beyond SQL will be held as a half-day workshop at ICDE 2026. The proceedings of all workshops will be published jointly alongside with the conference proceedings.
The workshop accepts regular research papers and industrial papers of the following types:
- Full research papers: up to 8 pages + references (no appendix)
- Short paper: up to 4 pages + references (no appendix)
- Extended abstract papers: up to 2 pages + references (no appendix)
Call for Papers
Audience: Our workshop encourages participation from researchers in data management, AI, natural language processing, vision and semantic web working on a wide range of problems relevant to these topics. We hope that this will constitute a single reference point for the researchers and practitioners working in that area and help form new collaborations. We also aim to provide a venue for researchers from industry and practitioners present use cases and discuss their needs in addressing real-world problems and large-scale solutions especially with respect to newer topics such as data products.
Topics of Interest include but are not limited to:
- ETL/ELT workload processing
- SQL remediation
- SQL extensions
- Structuring unstructured data at scale
- Querying other structured data (RDF graphs, property graphs, JSON).
- Linking structured and unstructured data at scale
- Creation of data products
- Data governance and data contracts
- SQL editing for changing application requirements
- From data to report generation, hypothesis testing
- Benchmarks on any of the topics above
Novelty: Submissions must present original work that has not been previously published or accepted at any other conference or workshop. Contributions should demonstrate substantial novelty and provide meaningful advancement beyond prior work.
Dates
| Deadline | |
|---|---|
| Paper Submission | February 16th, 2026 |
| Notification of acceptance: | March 2nd, 2026 |
| Camera-ready copy due: | March 9th, 2026 |
| Workshop day: | May 4th, 2026 |
| ICDE Main Conference: | May 5-8th, 2026 |
Submissions
Submissions will be submitted over OpenReview and will be reviewed in a single-anonymous manner.
Submit your paper at: https://openreview.net/group?id=IEEE.org/ICDE/2026/Workshop/Beyond_SQL
Template: Manuscripts must be prepared in accordance with the IEEE format that is also used for the main ICDE conference.
Conference Program
| Time | Program | Speaker |
|---|---|---|
| 11:00 | Keynote 1 - BIRD-SQL: Towards Automatic Data-Centric Code Generation [slides] | Reynold Cheng |
| 12:10 | Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL [paper] [slides] | Saurabh Deochake, Debajyoti Mukhopadhyay |
| 12:30 | Lunch Break | |
| 14:00 | Keynote 2 - Agentic optimization for unstructured data processing [slides] | Aditya Parameswaran |
| 14:55 | Automatic End-to-End Data Integration using Large Language Models [paper] [slides] | Aaron Steiner, Christian Bizer |
| 15:15 | Short Break | |
| 15:20 | Towards Executing Sloppy SQL Queries Over Tabular Data Lakes [paper] [slides] | Jan-Micha Bodensohn, Jakob Steinke, Carsten Binnig |
| 15:40 | How Far Can They Map? Probing LLM Capabilities for Cross-Schema SQL Generation [paper] | Mohammadreza Daviran, Davood Rafiei |
| 16:00 | Closing |
Keynotes:
BIRD-SQL: Towards Automatic Data-Centric Code Generation - Reynold Cheng (University of Hong Kong (HKU))
Abstract: Database systems, which provide various operations for defining and querying data, enable large-scale AI systems and intelligent applications in various domains. Due to recent advances in large language models (LLMs), automating database operations through code generation has become increasingly attainable. This capability has given rise to a new paradigm—Data-Centric Code Generation (DCCG)—which aims to build systems that can automatically understand, manipulate, and reason over data. To realize DCCG, I will discuss our team’s effort in building benchmarking systems, including BIRD-SQL, a large-scale Text-to-SQL benchmark on real databases, and SWE-SQL, which gauges the ability that an LLM resolves user SQL issues. These benchmarks, widely used in the industry, reveal hallucination and other issues faced by LLMs. To address these challenges, I will present our work in graph-aware reasoning, SQL correction, and multi-turn tabular data analysis. They aim to evolve LLMs from static code generators into autonomous, trustworthy agents that can understand and generate data-driven software systems.
Bio: Prof. Reynold Cheng is the Division Head and Professor (AI & Data Science), at the School of Computing and Data Science in the University of Hong Kong (HKU) established in 2024. He is a Steering Committee Member of the HKU Musketeers Foundation Institute of Data Science. He is an academic advisor to the College of Professional and Continuing Education of HKPU. He was an Associate Dean of Engineering in 2022-24. His research interests are in data science, big graph analytics and uncertain databases. Professor Cheng is named the AI 2000 Most Influential Scholar Honorable Mention in Database in 2023 to 2025. He received the ACM Distinguished Membership Award and the HKU Outstanding Research Student Supervisor Award in 2023. He was listed as the World’s Top 2% Scientists by Stanford University in 2022. He received the SIGMOD Research Highlights Reward 2020, International Exhibition of Inventions Geneva Award (2026), HKICT Awards (2021, 2023), HKU Knowledge Exchange Award (2024), HKU Engineering Knowledge Exchange Award (2024, 2021), HKU Engineering Best Teaching Award (2023, 2024), and HKU Outstanding Young Researcher Award 2011-12. He received the Universitas 21 Fellowship in 2011, and Hong Kong Polytechnic University Computing Performance Awards (2006, 2007). He was a PC co-chair of IEEE ICDE 2021. He is on the editorial board of PVLDB, ACM TSAS, IS, DAPD, and DSEJ.
Agentic optimization for unstructured data processing - Aditya Parameswaran (University of California, Berkeley)
Abstract: We’re increasingly starting to see unstructured data processing making its way into data systems, thanks to the data understanding capabilities of modern LLMs. At Berkeley, we’ve been working on optimizing unstructured data processing operators and pipelines for cost and accuracy, with help from agents. I’ll describe a few anecdotes from our recent work in this vein, both for entire pipeline optimization, as well as optimizing individual operators. Our work has been broadly adopted and used both by database vendors as well as by individual users and teams across sectors.
Bio: Aditya Parameswaran is an Associate Professor in Computer Science at UC Berkeley, and a co-director of the EPIC Data Lab. Aditya leverages techniques from artificial intelligence, databases, and human-computer interaction to solve hard data challenges. Multiple open-source tools developed in his group have received thousands of GitHub stars (including Modin, Lux, IPyFlow, DocETL)—and have been downloaded tens of millions of times overall across a spectrum of industries. His research was commercialized as a startup, Ponder, in 2021, where he served as Co-founder and President, before its acquisition by Snowflake. Aditya has received the Alfred P. Sloan Research Fellowship, VLDB Early Career Award, the NSF CAREER Award, the TCDE Rising Star Award, along with other recognitions. His website is at http://adityagp.net.
Organization
For questions, please contact: solashirai(at)ibm.com
Beyond SQL is organized by:
- Oktie Hassanzadeh (IBM Research)
- Kavitha Srinivas (IBM Research)
- Liane Vogel (TU Darmstadt)
- Sola Shirai (IBM Research)
- Liana Patel (Stanford University)
Program Committee
- Alon Halevy (Google Cloud)
- Haonan Wang (Columbia University)
- Paolo Papotti (EURECOM)
- Madelon Hulsebos (CWI)
- Jan-Micha Bodensohn (TU Darmstadt)
- Makbule Gulcin Ozsoy (Neo4j)