Write a Blog >>
SPLASH 2017
Sun 22 - Fri 27 October 2017 Vancouver, Canada
Fri 27 Oct 2017 11:15 - 11:37 at Regency A - Language Design Chair(s): Gregor Richards

Today’s cloud services extensively rely on replication techniques to ensure availability and reliability. In complex datacenter network architectures, however, seemingly independent replica servers may inadvertently share deep dependencies (e.g., aggregation switches). Such unexpected common dependencies may potentially result in correlated failures across the entire replication deployments, invalidating the efforts. Although existing cloud management and diagnosis tools have been able to offer post-failure forensics, they, nevertheless, typically lead to quite prolonged failure recovery time. In this paper, we propose a novel language framework, named RepAudit, that manages to prevent correlated failure risks before service outages occur, by allowing cloud administrators to proactively audit the replication deployments of interest. In particular, RepAudit consists of three new components: 1) a declarative domain-specific language, RAL, for cloud administrators to write auditing programs expressing diverse auditing tasks; 2) a high-performance RAL auditing engine that generates the auditing results by accurately and efficiently analyzing the underlying structures of the target replication deployments; and 3) an RAL-code generator that can automatically produce complex RAL programs based on easily written specifications. Our evaluation result shows that RepAudit can determine the top-20 critical correlated failure root causes in a replication system containing 30,528 devices within 1 minute, which is 400x more efficient in auditing time than state-of-the-art efforts. To the best of our knowledge, RepAudit is the first effort capable of simultaneously offering expressive, accurate and efficient correlated failure auditing to the cloud-scale replication systems.

Fri 27 Oct

Displayed time zone: Tijuana, Baja California change

10:30 - 12:00
Language DesignOOPSLA at Regency A
Chair(s): Gregor Richards University of Waterloo
10:30
22m
Talk
Project Snowflake: Non-blocking Safe Manual Memory Management for .NET
OOPSLA
Matthew Parkinson Microsoft Research, UK, Dimitrios Vytiniotis Microsoft Research, Cambridge, Kapil Vaswani Microsoft Research, Manuel Costa Microsoft Research, Pantazis Deligiannis Microsoft Research, Dylan McDermott University of Cambridge, Jonathan Balkind Princeton, USA, Aaron Blankstein Princeton, USA
DOI
10:52
22m
Talk
Alpaca: Intermittent Execution without Checkpoints
OOPSLA
Kiwan Maeng Carnegie Mellon University, USA, Alexei Colin Carnegie Mellon University, Brandon Lucia Carnegie Mellon University
DOI
11:15
22m
Talk
An Auditing Language for Preventing Correlated Failures in the Cloud
OOPSLA
Ennan Zhai Yale University, USA, Ruzica Piskac Yale University, Ronghui Gu Columbia University, USA, Xun Lao Yale University, USA, Xi Wang Yale University, USA
DOI
11:37
22m
Talk
Reliable and Automatic Composition of Language Extensions to C
OOPSLA
Ted Kaminski University of Minnesota, Lucas Kramer University of Minnesota, Travis Carlson University of Minnesota, USA, Eric Van Wyk University of Minnesota, USA
DOI Pre-print