Training and Tuning Machine-Learning Applications: A View from the Trenches (SPLASH 2017 - SPLASH-I)

Write a Blog >>

Sun 22 - Fri 27 October 2017 Vancouver, Canada

Who

Matthew Arnold, Harold Ossher

Track

SPLASH 2017 SPLASH-I

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 26 Oct 2017 16:30 - 17:00 at Regency D - Machine Learning & Data Science Chair(s): Cristina Cifuentes

Abstract

The software lifecycle of a system or application that includes machine-learning components involves traditional software engineering plus a new element: training and tuning machine-learning models. This is a substantially different activity, involving quite different skills and with its own set of challenges.

Conceptually, training a machine-learning model is simple: just provide it with a ground truth set consisting of a large amount of labeled data (data elements, such as questions, statements or pictures, with associated labels, classes or intents). Research on machine learning usually uses shared, carefully curated, benchmark ground truth sets, an excellent approach that both saves effort and enables comparison of algorithms. The situation is quite different, however, when building an industrial application, such as a question-answering system or chatbot for a new domain. Companies do not usually have labeled data, and creating it is very costly and time consuming. Even when enormous effort is put in, experience has shown that the ground truth is often inadequate and even inconsistent, and does not reflect the reality of the inputs that users type. Figuring out exactly what is wrong and how to fix it is notoriously difficult, and quite different from conventional debugging.

Companies often do have large bodies of recorded user input, such as logs of chat exchanges between customers and human customer service agents or earlier versions of the system. They also collect actual user input to an application during a pilot phase and then on an ongoing basis once the application is in production. This data has the great advantage of being realistic. It is often messy and noisy, however, a far cry from curated datasets. Nonetheless, there is powerful motivation to mine this data, create ground truth from it, and evolve the ground truth based on experience. Approaches like active-learning and topic modeling are promising, but creation and maintenance of ground truth from actual interactions is a new area with many challenges.

At the Cognitive Systems Performance Laboratory at the IBM T. J. Watson Research Center we participate on a consulting basis in challenging customer engagements, and apply lessons learned to come up with improved approaches and tools. Informed and illustrated (with obfuscation) by our experience, this talk will provide a practical, industry perspective on training and tuning machine-learning components. We will also outline a research agenda for this area.

Matthew Arnold

Harold OssherAuthor

IBM Thomas J. Watson Research Center

United States