System Test Failures Diagnosis Using Grammars Generated by Mining Event Logs (Parsing@SLE 2017)

Write a Blog >>

Sun 22 - Fri 27 October 2017 Vancouver, Canada

Who

Stephen Hanka, Frank Coyle

Track

Parsing@SLE 2017

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 22 Oct 2017 11:00 - 11:30 at Oxford - Session 2 Chair(s): Jurgen Vinju

Abstract

Diagnosing and correcting failures in complex, distributed systems is difficult. In a network of perhaps dozens of nodes, each of which is executing dozens of interacting applications, sometimes from different suppliers or vendors, finding the source of a system failure is a confusing, often tedious, piece of detective work. The person assigned this task must trace the failing command, event, or operation and find the deviation from the correct, desired interaction sequence. After the deviation is identified, the failing applications must be found, and the fault or faults traced to the incorrect source code. Often the primary source for tracing the failures is the set of event log files generated by the applications on each node. The event logs from several platforms and from multiple virtual machines on those platforms must be filtered, merged, correlated, and examined by a human expert. The expert must locate the point of failure within the logs and then deduce which interaction or component failed and then re-assign the problem to the persons responsible for the failing component sets. Those individuals must then, in turn, use the original logs filtered and merged using different criteria to find the failing code modules, analyze the cause of the failure, and correct the code or even the architecture of the failing components. Reducing the human effort involved in diagnosing these test failures through automated mining of the data in the logs is the goal of this project. In this paper we explore the automatic generation of grammars from successful log sequences and utilizing the grammars to identify log entries indicative of failed tests. Such grammars can also be used to mine performance data from the logs and to mine internal operational data from the system not normally visible in the user interface and system output.

Stephen Hanka

Frank Coyle

SMU