End-to-End Learned Query Optimization for Distributed SQL with Robustness to Schema and Workload Shifts

Authors

  • Andika Pramudito Universitas Sains Madya Nusantara, Department of Computer Science and Engineering, Jl. Melati No. 48, Kendari, Sulawesi Tenggara, Indonesia Author
  • Ferdian Ramadhana Institut Teknologi Citra Andalas, Department of Computer Systems and Networks, Jl. Imam Bonjol No. 103, Bukittinggi, Sumatera Barat, Indonesia Author

Abstract

Distributed SQL engines rely on query optimizers to translate declarative statements into physical execution plans under changing data, infrastructure, and workload conditions. Classical optimizers combine hand-engineered rules, cardinality estimation, and analytical cost models, but their assumptions can degrade when schemas evolve, when query templates shift, or when runtime behavior diverges from simplified models. Recent learning-based approaches often improve average-case performance on fixed benchmarks while remaining sensitive to out-of-distribution queries and schema drift, and they frequently decouple learning from the end-to-end objective of minimizing realized execution cost. This paper studies end-to-end learned query optimization for distributed SQL with explicit robustness to schema and workload shifts. The approach treats optimization as structured prediction over physical-plan decisions, using neural representations of relational algebra graphs and schema graphs, and trains with objectives aligned to measured or simulated runtime costs while accounting for resource constraints and uncertainty. Robustness is addressed through schema-invariant encodings, shift-aware regularization, conservative uncertainty penalties, and online adaptation mechanisms that avoid catastrophic regressions. The paper also integrates approximate statistics and sketches to reduce communication overhead while bounding estimation error that affects planning. The result is a framework that connects representation learning, differentiable relaxations of discrete plan search, and distributed-systems cost structures, with an evaluation protocol that isolates generalization across schemas and workloads and emphasizes reproducibility in heterogeneous clusters.

Downloads

Published

2019-01-04

How to Cite

End-to-End Learned Query Optimization for Distributed SQL with Robustness to Schema and Workload Shifts. (2019). International Journal of Advanced Scientific Computation, Modeling, and Simulation, 9(1), 1-18. https://sciencespress.com/index.php/IJASCMS/article/view/2019-01-04