Adaptive Web Execution (AWE): Supporting Billions of Diverse Users by Adapting Execution to Available Resources
PI: Professor Ravi Netravali, Princeton University (previously UCLA); rnetravali@cs.princeton.edu; https://www.cs.princeton.edu/~ravian/
Supported by NSF (grant: CNS-2152313, previously CNS-1943621)

Mission: Web pages provide access to many critical services (e.g., health care, education, news), and are increasingly accessed by users with diverse network and device resources. Unfortunately, despite the fact that performance and functionality of page loads can vary drastically across resource settings, page loads today minimally adapt to their execution environments. This results in either underutilized resources or broken (incomplete) pages. This project aims to develop a new web paradigm called Adaptive Web Execution (AWE), in which page loads directly adapt their execution or content according to the available resources. The key goal is to maximize the performance and functionality that a web page can offer a user based on that user’s resource availability.

Research Artifacts

Floo: Automatic, Lightweight Memoization for Faster Mobile Apps
Murali Ramanujam, Helen Chen, Shaghayegh Mardani, Ravi Netravali
MobiSys 2022
Source code
Abstract:: Owing to growing feature sets and sluggish improvements to smartphone CPUs (relative to mobile networks), mobile app response times have increasingly become bottlenecked on client-side computations. In designing a solution to this emerging issue, our primary insight is that app computations exhibit substantial stability over time in that they are entirely performed in rarely-updated codebases within app binaries and the OS. Building on this, we present Floo, a system that aims to automatically reuse (or memoize) computation results during app operation in an effort to reduce the amount of compute needed to handle user interactions. To ensure practicality – the struggle with any memoization effort – in the face of limited mobile device resources and the short-lived nature of each app computation, Floo embeds several new techniques that collectively enable it to mask cache lookup overheads and ensure high cache hit rates, all the while guaranteeing correctness for any reused computations. Across a wide range of apps, live networks, phones, and interaction traces, Floo reduces median and 95th percentile interaction response times by 32.7\% and 72.3\%
Jawa: Web Archival in the Era of JavaScript
Ayush Goel, Jingyuan Zhu, Ravi Netravali, Harsha Madhyastha
OSDI 2022
Source code
Abstract:: By repeatedly crawling and saving web pages over time, web archives (such as the Internet Archive) enable users to visit historical versions of any page. In this paper, we point out that existing web archives are not well designed to cope with the widespread presence of JavaScript on the web. Some archives store petabytes of JavaScript code, and yet many pages render incorrectly when users load them. Other archives which store the end-state of page loads (e.g., screen captures) break post-load interactions implemented in JavaScript. To address these problems, we present Jawa, a new design for web archives which significantly reduces the storage necessary to save modern web pages while also improving the fidelity with which archived pages are served. Key to enabling Jawa’s use at scale are our observations on a) the forms of non-determinism which impair the execution of JavaScript on archived pages, and b) the ways in which JavaScript’s execution fundamentally differs between live web pages and their archived copies. On a corpus of 1 million archived pages, Jawa reduces overall storage needs by 41\%, when compared to the techniques currently used by the Internet Archive.
Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation
Nikhil Kansal, Murali Ramanujam, Ravi Netravali
OSDI 2021
Source code
Abstract:: Web pages today commonly include large amounts of JavaScript code in order to offer users a dynamic experience. These scripts often make pages slow to load, partly due to a fundamental inefficiency in how browsers process JavaScript content: browsers make it easy for web developers to reason about page state by serially executing all scripts on any frame in a page, but as a result, fail to leverage the multiple CPU cores that are readily available even on low-end phones. In this paper, we show how to address this inefficiency without requiring pages to be rewritten or browsers to be modified. The key to our solution, Horcrux, is to account for the non-determinism intrinsic to web page loads and the constraints placed by the browser’s API for parallelism. Horcrux-compliant web servers perform offline analysis of all the JavaScript code on any frame they serve to conservatively identify, for every JavaScript function, the union of the page state that the function could access across all loads of that page. Horcrux’s JavaScript scheduler then uses this information to judiciously parallelize JavaScript execution on the client-side so that the end-state is identical to that of a serial execution, while minimizing coordination and offloading overheads. Across a wide range of pages, phones, and mobile networks covering web workloads in both developed and emerging regions, Horcrux reduces median browser computation delays by 31-44\% and page load times by 18-37\%. Despite their promise, HTTP/2's server push and preload features have seen minimal adoption. The reason is that the efficacy of a push/preload policy depends on subtle relationships between page content, browser state, device resources, and network conditions--static policies that generalize across environments remain elusive. We present Alohamora, a system that uses Reinforcement Learning to learn (and apply) the appropriate push/preload policy for a given page load based on inputs characterizing the page structure and execution environment. To ensure practical training despite the large number of pages served by a site and the massive space of potential policies to consider for a given page, Alohamora introduces several key innovations: a page clustering strategy that favorably balances push/preload insight extraction with the number of pages required for training, and a faithful page load simulator that can evaluate a policy in several milliseconds (compared to 10s of seconds with a real browser). Experiments across a wide range of pages and mobile environments (emulation and real-world) reveal that Alohamora accelerates page loads by 19-61%, provides 3.6-4x more benefits than recent push/preload systems, and properly adapts to never degrade performance.
Marauder: Synergized Caching and Prefetching for Low-Risk Mobile App Acceleration
Nikhil Kansal, Murali Ramanujam, Ravi Netravali
MobiSys 2021
Source code
Abstract:: Low interaction response times are crucial to the experience that mobile apps provide for their users. Unfortunately, existing strategies to alleviate the network latencies that hinder app responsiveness fall short in practice. In particular, caching is plagued by challenges in setting expiration times that match when a resource’s content changes, while prefetching hinges on accurate predictions of user behavior that have proven elusive. We present Marauder, a system that synergizes caching and prefetching to improve the speedups achieved by each technique while avoiding their inherent limitations. Key to Marauder is our observation that, like web pages, apps handle interactions by downloading and parsing structured text resources that entirely list (i.e., without needing to consult app binaries) the set of other resources to load. Building on this, Marauder introduces two low-risk optimizations directly from the app’s cache. First, guided by cached text files, Marauder prefetches referenced resources during an already-triggered interaction. Second, to improve the efficacy of cached content, Marauder judiciously prefetches about-to-expire resources, extending cache lives for unchanged resources, and downloading updates for lightweight (but crucial) text files. Across a wide range of apps, live networks, interaction traces, and phones, Marauder reduces median and 90th percentile interaction response times by 27.4\% and 43.5\%, while increasing data usage by only 18\%.
Rethinking Client-Side Caching for the Mobile Web
Ayush Goel, Vaspol Ruamviboonsuk, Ravi Netravali, Harsha V. Madhyastha
HotMobile 2021
Abstract:: Mobile web browsing remains slow despite many efforts to accelerate page loads. Like others, we find that client-side computation (in particular, JavaScript execution) is a key culprit. Prior solutions to mitigate computation overheads, however, suffer from security, privacy, and deployability issues, hindering their adoption. To sidestep these issues, we propose a browser-based solution in which every client reuses identical computations from its prior page loads. Our analysis across roughly 230 pages reveals that, even on a modern smartphone, such an approach could reduce client-side computation by a median of 49\% on pages which are most in need of such optimizations.
Alohamora: Reviving HTTP/2 Push and Preload by Adapting Policies On the Fly
Nikhil Kansal, Murali Ramanujam, Ravi Netravali
NSDI 2021
Source code
Abstract:: Despite their promise, HTTP/2's server push and preload features have seen minimal adoption. The reason is that the efficacy of a push/preload policy depends on subtle relationships between page content, browser state, device resources, and network conditions--static policies that generalize across environments remain elusive. We present Alohamora, a system that uses Reinforcement Learning to learn (and apply) the appropriate push/preload policy for a given page load based on inputs characterizing the page structure and execution environment. To ensure practical training despite the large number of pages served by a site and the massive space of potential policies to consider for a given page, Alohamora introduces several key innovations: a page clustering strategy that favorably balances push/preload insight extraction with the number of pages required for training, and a faithful page load simulator that can evaluate a policy in several milliseconds (compared to 10s of seconds with a real browser). Experiments across a wide range of pages and mobile environments (emulation and real-world) reveal that Alohamora accelerates page loads by 19-61%, provides 3.6-4x more benefits than recent push/preload systems, and properly adapts to never degrade performance.
WebMedic: Disentangling the Memory--Functionality Tension for the Next Billion Mobile Web Users
Usama Naseer, Theophilus Benson, Ravi Netravali
HotMobile 2021
Abstract:: Users in developing regions still suffer from poor web performance, mainly due to their unique landscape of low-end devices. In this paper, we uncover a root cause of this suboptimal performance by cross-analyzing longitudinal resource (in particular, memory) profiles from a large social network, and the memory consumption of modern webpages in five regions. We discover that the primary culprit for hitting memory constraints is JavaScript execution which existing optimizations are ill-suited to alleviate. To handle this, we propose WebMedic, an approach that trades-off less critical functionality of a webpage to directly address memory and performance problems.
Semeru: A Memory-Disaggregated Managed Runtime
Chenxi Wang, Haoran Ma, Shi Liu, Yuanqi Li, Zhenyuan Ruan, Khanh Nguyen, Michael Bond, Ravi Netravali, Miryung Kim, Harry Xu
OSDI 2020
Source code
Abstract:: Resource-disaggregated architectures have risen in popularity for large datacenters. However, prior disaggregation systems are designed for native applications; in addition, all of them require applications to possess excellent locality to be efficiently executed. In contrast, programs written in managed languages are subject to periodic garbage collection (GC), which is a typical graph workload with poor locality. Although most datacenter applications are written in managed languages, current systems are far from delivering acceptable performance for these applications. This paper presents Semeru, a distributed JVM that can dramatically improve the performance of managed cloud applications in a memory-disaggregated environment. Its design possesses three major innovations: (1) a universal Java heap, which provides a unified abstraction of virtual memory across CPU and memory servers and allows any legacy program to run without modifications; (2) a distributed GC, which offloads object tracing to memory servers so that tracing is performed closer to data; and (3) a swap system in the OS kernel that works with the runtime to swap page data efficiently. An evaluation of Semeru on a set of widely-deployed systems shows very promising results.
Mind the Delay: The Adverse Effects of Delay-Based TCP on HTTP
Neil Agarwal, Matteo Varvello, Andrius Aucinas, James Newman, Fabian Bustamante, Ravi Netravali
CoNEXT 2020
Source code
Abstract: The last three decades have seen much evolution in web and network protocols: amongst them, a transition from HTTP/1.1 to HTTP/2 and a shift from loss-based to delay-based TCP congestion control algorithms. This paper argues that these two trends come at odds with one another, ultimately hurting web performance. Using a controlled synthetic study, we show how delay-based congestion control protocols (e.g., BBR and CUBIC + Hybrid Slow Start) result in the underestimation of the available congestion window in mobile networks, and how that dramatically hampers the effectiveness of HTTP/2. To quantify the impact of such finding in the current web, we evolve the web performance toolbox in two ways. First, we develop Igor, a client-side TCP congestion control detection tool that can differentiate between loss-based and delay-based algorithms by focusing on their behavior during slow start. Second, we develop a Chromium patch which allows fine-grained control on the HTTP version to be used per domain. Using these new web performance tools, we analyze over 300 real websites and find that 67% of sites relying solely on delay-based congestion control algorithms have better performance with HTTP/1.1.
Continuous Prefetch for Interactive Data Applications
Haneen Mohammed, Ziyun Wei, Eugene Wu, Ravi Netravali
VLDB 2020
Source code
Abstract: Interactive data visualization and exploration (DVE) applications are often network-bottlenecked due to bursty request patterns, large response sizes, and heterogeneous deployments over a range of networks and devices. This makes it difficult to ensure consistently low response times. Khameleon is a framework for DVE applications that uses a novel combination of prefetching and response tuning to dynamically trade-off response quality for low latency. Khameleon exploits DVE's approximation tolerance: immediate lower-quality responses are preferable to waiting for complete results. To this end, Khameleon progressively encodes responses, and runs a server-side scheduler that proactively streams portions of responses using available bandwidth to maximize user-perceived interactivity. The scheduler involves a complex optimization based on available resources, predicted user interactions, and response quality levels; yet, decisions must also be made in real-time. To overcome this, Khameleon uses a fast greedy heuristic that closely approximates the optimal approach. Using image exploration and visualization applications with real user interaction traces, we show that across a wide range of network and client resource conditions, Khameleon outperforms existing prefetching approaches that benefit from perfect prediction models: Khameleon always lowers response latencies (typically by 2-3 orders of magnitude) while keeping response quality within 50-80%.