Learning to Blame: Localizing Novice Type Errors with Data-Driven Diagnosis (SPLASH 2017 - OOPSLA)

Write a Blog >>

Sun 22 - Fri 27 October 2017 Vancouver, Canada

Who

Eric Seidel, Huma Sibghat, Kamalika Chaudhuri, Westley Weimer, Ranjit Jhala

Track

SPLASH 2017 OOPSLA

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 25 Oct 2017 13:52 - 14:15 at Regency C - Tools Chair(s): Joshua Sunshine

Abstract

Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data – pairs of ill-typed programs and their "fixed" versions – to automatically learn a model of where the error is most likely to be found. Given a new ill-typed program, Nate executes the model to generate a list of potential blame assignments ranked by likelihood. We evaluate Nate by comparing its precision to the state of the art on a set of over 5,000 ill-typed OCaml programs drawn from two instances of an introductory programming course. We show that when the top-ranked blame assignment is considered, Nate's data-driven model is able to correctly predict the exact sub-expression that should be changed 72% of the time, 28 points higher than OCaml and 16 points higher than the state-of-the-art SHErrLoc tool. Furthermore, Nate's accuracy surpasses 85% when we consider the top two locations and reaches 91% if we consider the top three.

DOI

https://doi.org/10.1145/3138818

Eric Seidel

University of California at San Diego, USA

United States

Huma Sibghat

University of California at San Diego, USA

Kamalika Chaudhuri

University of California at San Diego, USA

Westley Weimer

University of Virginia, USA

Ranjit Jhala

University of California at San Diego, USA

United States

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 25 Oct
Displayed time zone: Tijuana, Baja California change

13:30 - 15:00	ToolsOOPSLA at Regency C Chair(s): Joshua Sunshine Carnegie Mellon University

13:30 22m Talk		Effective Interactive Resolution of Static Analysis Alarms OOPSLA Xin Zhang Massachusetts Institute of Technology, USA, Radu Grigore University of Kent, Xujie Si University of Pennsylvania, Mayur Naik University of Pennsylvania DOI
13:52 22m Talk		Learning to Blame: Localizing Novice Type Errors with Data-Driven Diagnosis OOPSLA Eric Seidel University of California at San Diego, USA, Huma Sibghat University of California at San Diego, USA, Kamalika Chaudhuri University of California at San Diego, USA, Westley Weimer University of Virginia, USA, Ranjit Jhala University of California at San Diego, USA DOI
14:15 22m Talk		Abridging Source Code OOPSLA Binhang Yuan Rice University, USA, Vijayaraghavan Murali Rice University, USA, Chris Jermaine Rice University DOI
14:37 22m Talk		Evaluating and Improving Semistructured Merge OOPSLA Guilherme Cavalcanti Federal University of Pernambuco, Brazil, Paulo Borba Federal University of Pernambuco, Brazil, Paola Accioly Federal University of Pernambuco, Brazil DOI