In application domains that store data in a tabular format, a common task is to fill the values of some cells using values stored in other cells. For instance, such data completion tasks arise in the context of \emph{missing value imputation} in data science and \emph{derived data} computation in spreadsheets and relational databases. Unfortunately, end-users and data scientists typically struggle with many data completion tasks that require non-trivial programming expertise.
This paper presents a synthesis technique for automating data completion tasks using \emph{programming-by-example (PBE)} and a very lightweight sketching approach. Given a \emph{formula sketch} (e.g., {\texttt{AVG}}($\texttt{?}_1$, $\texttt{?}_2$)) and a few input-output examples for each hole, our technique synthesizes a program to automate the desired data completion task. Towards this goal, we propose a domain-specific language (DSL) that combines spatial and relational reasoning over tabular data and a novel synthesis algorithm that can generate DSL programs that are consistent with the input-output examples. The key technical novelty of our approach is a new version space learning algorithm that is based on \emph{finite tree automata} (FTA). The use of FTAs in the learning algorithm leads to a more compact representation that allows more sharing between programs that are consistent with the examples. We have implemented the proposed approach in a tool called \textsc{DACE} and evaluate it on 84 benchmarks taken from online help forums. We also illustrate the advantages of our approach by comparing our technique against two existing synthesizers, namely Prose and Sketch.
Wed 25 OctDisplayed time zone: Tijuana, Baja California change
15:30 - 17:22 | |||
15:30 22mTalk | Model-Assisted Machine-Code Synthesis OOPSLA Venkatesh Srinivasan University of Wisconsin - Madison, Ara Vartanian University of Wisconsin-Madison, USA, Thomas Reps University of Wisconsin - Madison and GrammaTech, Inc. DOI | ||
15:52 22mTalk | Synthesis of Data Completion Scripts using Finite Tree Automata OOPSLA DOI | ||
16:14 22mTalk | SQLizer: Query Synthesis from Natural Language OOPSLA Navid Yaghmazadeh University of Texas, Austin, Yuepeng Wang University of Texas at Austin, Işıl Dillig UT Austin, Thomas Dillig DOI | ||
16:37 22mTalk | Synthesizing Configuration File Specifications with Association Rule Learning OOPSLA Mark Santolucito Yale University, Ennan Zhai Yale University, USA, Rahul Dhodapkar MongoDB, USA, Aaron Shim Microsoft, USA, Ruzica Piskac Yale University DOI | ||
16:59 22mTalk | Natural Synthesis of Provably-Correct Data-Structure Manipulations OOPSLA DOI |