aiDM 2022
Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM)

Co-located with ACM SIGMOD/PODS 2022
Friday, June 17, 2022

Workshop Overview

Recently, the field of Artificial Intelligence (AI) has been experiencing a resurgence. AI broadly covers a wide swath of techniques, which include logic-based approaches, probabilistic graphical models, machine learning approaches such as deep learning. Advances in specialized hardware capabilities (e.g., Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), Field-Programmable Gate Arrays (FPGAs), etc.), software ecosystem (e.g., programming languages such as Python, Data Science frameworks, and accelerated ML libraries), and systems infrastructure (e.g., cloud servers with AI accelerators) have led to wide-spread adoption of AI techniques in a variety of domains. Examples of such domains include image classification, autonomous driving, automatic speech recognition, and conversational systems (e.g., chatbots). AI solutions not only support multiple data types (e.g., images, speech, or text), but also are available in various configurations and settings, from personal devices to large-scale distributed systems.

In spite of the wide-ranging techniques and applications of AI, their interactions with data management systems remain in infancy. Database management systems have been, for a long time, simply used as repositories for feeding inputs and storing results. Only very recently, we have started seeing some new efforts in using AI techniques in data management systems, e.g., enabling natural language interfaces to relational databases and applying machine learning techniques for query optimization. However, a lot more needs to be done to fully exploit the power of AI for data management systems and workloads.

aiDM is a one-day workshop that will bring together people from academia and industry to discuss various ways of integrating AI techniques with data management systems. The primary goal of the workshop is to explore opportunities for using AI techniques in enhancing various components of data management systems, such as user interfaces, tooling, performance optimization, support for new query types and workloads. Special emphasis will be given to transparent exploitation of AI techniques using existing data management infrastructures for enterprise-class workloads. We hope this workshop will identify important areas of research and spur new efforts in this emerging field.

Topics of Interest

The goal of the workshop is to take a holistic view of various AI technologies and investigate how they can be applied to different component of an end-to-end data management pipeline. Special emphasis would be given to how AI techniques could be used for enhancing user experience by reducing complexity in tools, or providing newer insights, or providing better user interfaces. Topics of interest include, but are not restricted to:

  • Characterizing different AI approaches: Logic-based, probabilistic graphical models, and machine learning/deep learning approaches
  • Evaluation of different learning approaches: unsupervised, self-supervised, supervised or reinforced learning, transfer learning, zero-shot learning, adversarial networks, and deep probabilistic models
  • New AI-enabled business intelligence (BI) queries for relational databases
  • Natural language enablement (e.g., queries, result summarization, chatbot interfaces, etc.)
  • Explainability and interpretability
  • Fairness of AI-based system components
  • Integration with Data Science and Deep Learning toolkits (e.g., sklearn, TensorFlow, PyTorch, ONNX, etc.)
  • Evaluating quality of approximate results from AI-enabled queries
  • Supporting multiple datatypes (e.g., images, time-series data, etc.)
  • Supporting semi-structured, streaming, and graph databases
  • Reasoning over knowledge bases
  • Data exploration and visualization
  • Integrating structured and unstructured data sources
  • AI-enabled data integration strategies (e.g., entity resolution, schema matching, etc.)
  • Reinforcement learning for Database tuning
  • Impact of AI on tooling, e.g., ETL or data cleaning
  • Performance implications of AI-enabled queries
  • Case studies of AI-accelerated workloads
  • Social Implications of AI-enabled databases (e.g., detection and elimination of bias)
  • Learned data structures, database algorithms or systems components
  • AI-enabled databases for managing and supporting AI workloads
  • AI strategies for data provenence, access control, anomaly detection and cyber security
  • Experiences with database systems employing AI-enhanced components and interaction among AI-enhanced components

Workshop Schedule (9.15 am - 5.30 pm EST)

Session 1 (9.15-10.30 am EST) (Chair: Donatella Firmani)

  • Introductory Remarks Donatella Firmani, Department of Statistical Sciences, Sapienza University of Rome

  • (Keynote 1) Nick Koudas, University of Toronto

Coffee Break (10.30-11 am EST)

Session 2 (11 am - 12.30 pm EST) (Chair: Oded Shmueli)

  • Neuroshard: Towards Automatic Multi-objective Sharding with Deep Reinforcement Learning Tamer Eldeeb, Columbia University, Zhengneng Chen, Seamoney, Asaf Cidon, and Junfeng Yang, Columbia University
  • Machop: an End-to-End Generalized Entity Matching Framework, Jin Wang, Yuliang Li, Wataru Hirota and Eser Kandogan, Megagon Labs
  • GCNSplit: Bounding the State of Streaming Graph Partitioning Michal Zwolak, Zainab Abbas, Sonia Horchidan, Paris Carbone, KTH Royal Institute of Technology, and Vasiliki Kalavri, Boston University

Lunch Break (12.30 - 2.00 pm EST)

Session 3 (2-3.30 pm EST) (Chair: Yael Amsterdamer)

  • (Keynote 2): IBM Db2 13 for z/OS SQL Data Insights Jonathan Sloan, IBM
    Jonathan Sloan is currently a portfolio product marketing director focusing on data and AI products on the IBM Z mainframe platform. He excels in helping others understand how to apply advanced analytic and machine learning technology to business problems. Jonathan has experience in a number of industries with a focus on health care, insurance, financial services and consumer packaged goods. He is passionate about helping organizations drive greater insight and value from enterprise data. He excels working directly with customers and providing leadership within team environments.
  • LSI: A Learned Secondary Index Structure Andreas Kipf, Dominik Horn, Pascal Pfeil, Ryan Marcus and Tim Kraska, MIT CSAIL

Coffee Break (3.30- 4 pm EST)

Session 4 (4-5.30 pm EST) (Chair: Rajesh Bordawekar)

  • Micro-architectural Analysis of a Learned Index, Mikkel Møller Andersen, Aalborg University, Copenhagen, and Pinar Tozun, IT University of Copenhagen
  • (Keynote 3): Turning the heat on with MySQL HeatWave, Nipun Agarwal, Oracle
    Nipun Agarwal is Senior Vice President of MySQL, HeatWave and advanced development at Oracle. His interests include data processing, machine learning and cloud computing. Prior to this role, Nipun was in Oracle Labs directing a number of research initiatives which were introduced as new products at Oracle including MySQL HeatWave. Nipun joined Oracle in 1994 with a MS in Computer Science and was in the Oracle database team for several years. He has been awarded over 175 patents.


Workshop Steering Committee

Workshop Program Chairs

Program Committee

  • Andreas Kipf, MIT
  • Alekh Jindal, Microsoft
  • Felix Naumann, HPI
  • Rekha Singhal, Tata Consultancy Services
  • Hong Min, IBM Research
  • Renata Borovica-Gajic, University of Melbourne, Australia
  • Seema Sundara, Oracle
  • Saravanan Thirumuruganathan, QCRI, HBKU
  • Wolfgang Lehner, TU Dresden
  • Yuchen Li, SMU, Singapore

Submission Instructions

Important Dates 

  • Paper Submission: Friday, 11th March 2022, 12 pm PST
  • Notification of Acceptance: Monday, 18th April, 2022
  • Camera-ready Submission: Monday, 9th May, 2022

Submission Site 

All submissions will be handled electronically via EasyChair.

Formatting Guidelines 

We will use the same document templates as the SIGMOD/PODS'22 conferences (the ACM format).

It is the authors' responsibility to ensure that their submissions adhere strictly to the ACM format. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review. 

The paper length for a full paper is limited upto 12 pages, with unlimited pages of references. However, shorter papers (4 or 8 pages) are encouraged as well.  

All accepted papers will be indexed via the ACM digital library and available for download from the workshop webpage in the digital library.