aiDM 2025
Eighth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM)
 

 
Co-located with ACM SIGMOD/PODS 2025
Sunday, June 22, 2025, Schöneberg I/II/III

 
 
  Links
 
 
 
 
 
 
 
 
 
 
Workshop Overview

Recently, the field of Artificial Intelligence (AI) has been experiencing a resurgence. AI broadly covers a wide swath of techniques, which include logic-based approaches, probabilistic graphical models, machine learning approaches such as deep learning. Advances in specialized hardware capabilities (e.g., Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), Field-Programmable Gate Arrays (FPGAs), etc.), software ecosystem (e.g., programming languages such as Python, Data Science frameworks, and accelerated ML libraries), and systems infrastructure (e.g., cloud servers with AI accelerators) have led to wide-spread adoption of AI techniques in a variety of domains. Examples of such domains include image classification, autonomous driving, automatic speech recognition, and conversational systems (e.g., chatbots). AI solutions not only support multiple data types (e.g., images, speech, or text), but also are available in various configurations and settings, from personal devices to large-scale distributed systems.

Despite the widespread adoption of AI across diverse domains, its integration with data management systems remains in its infancy. Currently, most database management systems (DBMS) serve primarily as repositories for feeding input data to AI models and storing results. Recently, there has been increasing interest in using AI techniques within data management systems, including natural language interfaces to relational databases and machine learning techniques for query optimization and performance tuning. However, significant opportunities remain to harness the full potential of AI for enhancing data management workloads.

aiDM'24 is a one-day workshop that will bring together people from academia and industry to explore innovative ways to integrate AI techniques into data management systems. The workshop will focus on leveraging AI to enhance various components of data management systems, including user interfaces, tooling, performance optimizations, and support for new query types and workloads. Special attention will be given to transparently exploiting AI techniques, such as Generative AI frameworks, for enterprise-class data management workloads. We aim to identify key research areas and inspire new initiatives in this emerging and transformative field.

Topics of Interest

The goal of the workshop is to take a holistic view of various AI technologies and investigate how they can be applied to different component of an end-to-end data management pipeline. Special emphasis would be given to how AI techniques could be used for enhancing user experience by reducing complexity in tools, or providing newer insights, or providing better user interfaces. Topics of interest include, but are not restricted to:

  • AI-enabled improvements to foundational DB algorithms: sorting, searching, consensus
  • New AI-enabled business intelligence (BI) queries for relational databases
  • Integration of Large Language Models with databases and supporting services (e.g., Generative AI)
  • Enabling different types of RAG Capabilities
  • Integration into Agentic and Orchestration Frameworks
  • Supporting Large Reasoning Models
  • Natural language queries and conversational interfaces
  • AI-enabled database programming (e.g., natural language queries, SQL co-pilots, etc.)
  • Design and Implementation of Vector Databases for unstructured data
  • Ethics, governance, and societal implications of AI-enabled databases
  • Reasoning over knowledge bases
  • Self-tuning Databases using reinforced learning
  • Impact of model interpretability
  • Supporting multiple datatypes (e.g., images or time-series data)
  • Supporting semi-structured, streaming, and graph databases
  • Impact of AI on tooling, e.g., ETL or data cleaning
  • Performance implications of AI-enabled queries
  • AI-enabled databases for managing and supporting AI workloads
  • AI strategies for data provenence, access control, anomaly detection and cyber security
  • Case studies of AI-accelerated workloads
  • AI-driven data compression and storage optimization


Keynote Presentations


  • AI That Matters: Bridging the Gap Between Research and Business Impact Ulf Brackmann, Vice President of Engineering at SAP Signavio

    Ulf Brackmann is Vice President of Engineering at SAP Signavio, where he leads multiple teams focused on core suite services and central AI development. With over 20 years of experience in enterprise software development, Ulf brings deep expertise in building AI-driven systems at scale. He holds a Master’s degree in Computer Science from the University of Karlsruhe (KIT) and an MBA from Mannheim Business School. Ulf is currently pursuing a PhD at TU Darmstadt’s Systems Group, in close collaboration with the German Research Center for Artificial Intelligence (DFKI). His research explores the impact of foundation models on enterprise software systems, with a strong emphasis on real-world data and the practical challenges it presents.

    Abstract: Artificial Intelligence has made remarkable progress in academic research, yet translating these advances into real-world enterprise settings uncovers a new layer of complexity. In industry, data is messy, incomplete, and tightly coupled with evolving business processes. Tasks are rarely isolated — they’re part of intricate, interdependent workflows that must ultimately deliver measurable business value. This keynote delves into the real-world challenges of applying AI within enterprise software systems, where success hinges not only on technical innovation but on seamless, end-to-end integration with business objectives. Drawing from hands-on industry experience and ongoing research at TU Darmstadt, we’ll share a fresh perspective on the role of large language models (LLMs) in data engineering and explore how enterprise-grade AI must support compound reasoning across domains, systems, and fragmented data landscapes. This talk will focus on what it takes to make AI truly succeed in enterprise environments — where it must go beyond models and benchmarks, tackle real-world problems at scale, and deliver outcomes that genuinely move the business forward.

  • Query Optimization in the Age of Semantic Operators and LLMs Paolo Papotti, Associate Professor at EURECOM

    Paolo Papotti is an Associate Professor at EURECOM (France) since 2017 and the holder of a Chair of Artificial Intelligence at the 3IA Institute since 2024. He got his PhD from Roma Tre University (Italy) in 2007 and had research positions at the Qatar Computing Research Institute (Qatar) and Arizona State University (USA). His research is focused on data management and NLP. He has authored more than 170 publications and his work has been recognized with best paper awards (CIKM 2024, ISWC 2024), “Best of the Conference” citations (SIGMOD 2009, VLDB 2016), best demo awards (SIGMOD 2015, DBA 2020, SIGMOD 2022), and Google Faculty Research awards (2016, 2020).

    Abstract: The advent of Large Language Models (LLMs) is reshaping the foundations of data management, introducing capabilities to process and query information in ways previously infeasible. Executing declarative queries, such as SQL, over LLMs involves 'semantic operators' that fundamentally differ from traditional database operations, primarily due to their reliance on LLM inference. This talk will address the critical need for new query optimization frameworks tailored to these operators, considering their unique cost structures, quality considerations (precision/recall), and the general lack of pre-existing statistical information. After surveying current approaches in the rapidly developing field of LLM-integrated query processing, we will present an in-depth look at Galois. Galois is a system that applies and extends database optimization principles to SQL queries executed against LLMs, treating the LLM as a storage layer. We will explore its strategies for logical and physical plan optimization, including context-aware scan operators and dynamic, LLM-driven metadata collection to inform optimization choices. The talk will illustrate how these techniques can significantly improve the quality and efficiency of querying LLMs, paving the way for more powerful and intelligent data systems.


Workshop Schedule (9AM-5PM)

(9-9.15 AM) Workshop Opening Welcome remarks and overview of the day


  • (9.15-10.30 AM) Keynote 1: AI That Matters: Bridging the Gap Between Research and Business Impact Ulf Brackmann, Vice President of Engineering at SAP Signavio


(10.30-11 AM) Coffee Break


(11 AM-12.40 PM) Session 1: Advances in Query Optimization and Execution Planning

  • (11-11.25 AM) SERAG: Self-Evolving RAG System for Query Optimization, Hanwen Liu, Qihan Zhang, University of Southern California, Ryan Marcus, University of Pennsylvania, Ibrahim Sabek, University of Southern California
  • (11.25-11.50 AM) PLANSIEVE: Real-time Suboptimal Query Plan Prediction through Incremental Learning, Asoke Datta, Yesdaulet Izenov, Brian Tsan, Abylay Amanbayev, Florin Rusu, University of California, Merced
  • (11.50 AM-12.15 PM) Grid-AR: A Grid--based Booster for Learned Cardinality Estimation and Range Joins, Damjan Gjurovski, Angjela Davitkova, and Sebastian Michel, RPTU Kaiserslautern-Landau
  • (12.15-12.40 PM) Exploring Next Token Prediction For Optimizing Databases, Yeasir Rayhan and Walid Aref, Purdue University


(12.40-1.30 PM) Lunch Break


  • (1.30-2.45 PM) Keynote 2: Query Optimization in the Age of Semantic Operators and LLMs Paolo Papotti, Associate Professor at EURECOM


(2.45-3.30 PM) Coffee Break


(3.30-4.45 PM) Session 2: Data-Driven Systems and Benchmarks for Modern Workloads

  • (3.30-3.55 PM) Data-driven Adaptive Processing of Streaming ML Queries, Phillip Hilliard, Rajeev Alur, Zachary Ives, University of Pennsylvania
  • (3.55-4.20 PM) Filter-Centric Vector Indexing: Geometric Transformation for Efficient Filtered Vector Search, Alireza Heidari, Wei Zhang, Huawei Technologies Co.
  • (4.20-4.45 PM) Redbench: A Benchmark Reflecting Real Workloads, Skander Krid, Mihail Stoian, and Andreas Kipf, University of Technology Nuremberg

(4.45-5 PM) Workshop Closing: Wrap-up, acknowledgments, and final remarks


Organization

Workshop Steering Committee

Workshop Program Chairs

Program Committee

  • Konstantinos Kanellis, University of Wisconsin-Madison
  • Dominik Horn, Amazon Web Services
  • Chenyuan Wu, City University of Hong Kong
  • Pascal Pfeil, TU Munich
  • Xinyu Wang, Microsoft
  • Bolin Ding, Alibaba Group
  • Mohammad Amiri, Stony Brook University
  • Pratyush Agnihotri, TU Darmstadt
  • Maximilian Boether, ETH Zurich
  • Matthias Boehm, TU Berlin
  • Tilmann Rabl, Hasso Plattner Institute
  • Kavitha Srinivas, IBM
  • Zixuan Yi, University of Pennsylvania
  • Rui Dong, University of Michigan

Submission Instructions

Important Dates 

  • Paper Submission: Tuesday, April 1 2025, 12 pm PST (UPDATED)
  • Notification of Acceptance: Friday, 25th April, 2025 (UPDATED)
  • Camera-ready Submission: Monday, 5th May, 2025

Submission Site 

All submissions will be handled electronically via EasyChair.

Formatting Guidelines 

We will use the same document templates as the SIGMOD/PODS'25 conferences (the ACM format). Like SIGMOD/PODS'25, the aiDM submission should be double-blind.

It is the authors' responsibility to ensure that their submissions adhere strictly to the ACM format. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review. 

The paper length for a full paper is limited upto 12 pages, with unlimited pages of references. However, shorter papers (4 or 8 pages) are encouraged as well.  

All accepted papers will be indexed via the ACM digital library and available for download from the workshop webpage in the digital library.