Declarative Failure Recovery for Sensor Networks

Proceedings of the Sixth International Conference on Aspect-Oriented Software Development (AOSD 2007), Vancouver, British Columbia, March 12-16, 2007.
Ramakrishna Gummadi, Nupur Kothari, Todd Millstein, Ramesh Govindan
Wireless sensor networks consist of a system of distributed sensors embedded in the physical world, and promise to allow observation of previously unobservable phenomena. Since they are exposed to unpredictable environments, sensor-network applications must handle a wide variety of faults: software errors, node and link failures, and network partitions. The code to manually detect and recover from faults crosscuts the entire application, is tedious to implement correctly and efficiently, and is fragile in the face of program modifications. We investigate language support for modularly managing faults. Our insight is that such support can be naturally provided as an extension to existing "macroprogramming" systems for sensor networks. In such a system, a programmer describes a sensor network application as a centralized program; a compiler then produces equivalent node-level programs. We describe a simple checkpoint API for macroprograms, which can be automatically implemented in a distributed fashion across the network. We also describe declarative annotations that allow programmers to specify checkpointing strategies at a higher level of abstraction. We have implemented our approach in the Kairos macroprogramming system. Experiments show it to improve application availability by an order of magnitude and incur low messaging overhead.