Mission: Web pages provide access to many critical services (e.g., health care, education, news), and are increasingly accessed by users with diverse network and device resources. Unfortunately, despite the fact that performance and functionality of page loads can vary drastically across resource settings, page loads today minimally adapt to their execution environments. This results in either underutilized resources or broken (incomplete) pages. This project aims to develop a new web paradigm called Adaptive Web Execution (AWE), in which page loads directly adapt their execution or content according to the available resources. The key goal is to maximize the performance and functionality that a web page can offer a user based on that user’s resource availability.
Floo: Automatic, Lightweight Memoization for Faster Mobile Apps
Murali Ramanujam, Helen Chen, Shaghayegh Mardani, Ravi Netravali
MobiSys 2022
Source code
Abstract:: Owing to growing feature sets and sluggish improvements to smartphone CPUs
(relative to mobile networks), mobile app response times have increasingly
become bottlenecked on client-side computations. In designing a solution to
this emerging issue, our primary insight is that app computations exhibit
substantial stability over time in that they are entirely performed in
rarely-updated codebases within app binaries and the OS. Building on this, we
present Floo, a system that aims to automatically reuse (or memoize)
computation results during app operation in an effort to reduce the amount of
compute needed to handle user interactions. To ensure practicality – the
struggle with any memoization effort – in the face of limited mobile device
resources and the short-lived nature of each app computation, Floo embeds
several new techniques that collectively enable it to mask cache lookup
overheads and ensure high cache hit rates, all the while guaranteeing
correctness for any reused computations. Across a wide range of apps, live
networks, phones, and interaction traces, Floo reduces median and 95th
percentile interaction response times by 32.7\% and 72.3\%
Jawa: Web Archival in the Era of JavaScript
Ayush Goel, Jingyuan Zhu, Ravi Netravali, Harsha Madhyastha
OSDI 2022
Source code
Abstract:: By repeatedly crawling and saving web pages over
time, web archives (such as the Internet Archive) enable users
to visit historical versions of any page. In this paper, we point
out that existing web archives are not well designed to cope
with the widespread presence of JavaScript on the web. Some
archives store petabytes of JavaScript code, and yet many pages
render incorrectly when users load them. Other archives which
store the end-state of page loads (e.g., screen captures) break
post-load interactions implemented in JavaScript.
To address these problems, we present Jawa, a new design for
web archives which significantly reduces the storage necessary
to save modern web pages while also improving the fidelity with
which archived pages are served. Key to enabling Jawa’s use at
scale are our observations on a) the forms of non-determinism
which impair the execution of JavaScript on archived pages,
and b) the ways in which JavaScript’s execution fundamentally
differs between live web pages and their archived copies. On a
corpus of 1 million archived pages, Jawa reduces overall storage
needs by 41\%, when compared to the techniques currently used
by the Internet Archive.
Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation
Nikhil Kansal, Murali Ramanujam, Ravi Netravali
OSDI 2021
Source code
Abstract:: Web pages today commonly include large amounts of
JavaScript code in order to offer users a dynamic experience. These scripts
often make pages slow to load, partly due to a fundamental inefficiency in how
browsers process JavaScript content: browsers make it easy for web developers
to reason about page state by serially executing all scripts on any frame in a
page, but as a result, fail to leverage the multiple CPU cores that are readily
available even on low-end phones. In this paper, we show how to address this
inefficiency without requiring pages to be rewritten or browsers to be
modified. The key to our solution, Horcrux, is to account for the
non-determinism intrinsic to web page loads and the constraints placed by the
browser’s API for parallelism. Horcrux-compliant web servers perform offline
analysis of all the JavaScript code on any frame they serve to conservatively
identify, for every JavaScript function, the union of the page state that the
function could access across all loads of that page. Horcrux’s JavaScript
scheduler then uses this information to judiciously parallelize JavaScript
execution on the client-side so that the end-state is identical to that of a
serial execution, while minimizing coordination and offloading overheads.
Across a wide range of pages, phones, and mobile networks covering web
workloads in both developed and emerging regions, Horcrux reduces median
browser computation delays by 31-44\% and page load times by 18-37\%.
Despite their promise, HTTP/2's server push and preload
features have seen minimal adoption. The reason is that the efficacy of a
push/preload policy depends on subtle relationships between page content,
browser state, device resources, and network conditions--static policies that
generalize across environments remain elusive. We present Alohamora, a system
that uses Reinforcement Learning to learn (and apply) the appropriate
push/preload policy for a given page load based on inputs characterizing the
page structure and execution environment. To ensure practical training despite
the large number of pages served by a site and the massive space of potential
policies to consider for a given page, Alohamora introduces several key
innovations: a page clustering strategy that favorably balances push/preload
insight extraction with the number of pages required for training, and a
faithful page load simulator that can evaluate a policy in several milliseconds
(compared to 10s of seconds with a real browser). Experiments across a wide
range of pages and mobile environments (emulation and real-world) reveal that
Alohamora accelerates page loads by 19-61%, provides 3.6-4x more benefits than
recent push/preload systems, and properly adapts to never degrade performance.
Marauder: Synergized Caching and Prefetching for Low-Risk Mobile App Acceleration
Nikhil Kansal, Murali Ramanujam, Ravi Netravali
MobiSys 2021
Source code
Abstract:: Low interaction response times are crucial to the
experience that mobile apps provide for their users. Unfortunately, existing
strategies to alleviate the network latencies that hinder app responsiveness
fall short in practice. In particular, caching is plagued by challenges in
setting expiration times that match when a resource’s content changes, while
prefetching hinges on accurate predictions of user behavior that have proven
elusive. We present Marauder, a system that synergizes caching and prefetching
to improve the speedups achieved by each technique while avoiding their
inherent limitations. Key to Marauder is our observation that, like web pages,
apps handle interactions by downloading and parsing structured text resources
that entirely list (i.e., without needing to consult app binaries) the set of
other resources to load. Building on this, Marauder introduces two low-risk
optimizations directly from the app’s cache. First, guided by cached text
files, Marauder prefetches referenced resources during an already-triggered
interaction. Second, to improve the efficacy of cached content, Marauder
judiciously prefetches about-to-expire resources, extending cache lives for
unchanged resources, and downloading updates for lightweight (but crucial) text
files. Across a wide range of apps, live networks, interaction traces, and
phones, Marauder reduces median and 90th percentile interaction response times
by 27.4\% and 43.5\%, while increasing data usage by only 18\%.
Rethinking Client-Side Caching for the Mobile Web
Ayush Goel, Vaspol Ruamviboonsuk, Ravi Netravali, Harsha V. Madhyastha
HotMobile 2021
Abstract:: Mobile web browsing remains slow despite many
efforts to accelerate page loads. Like others, we find that client-side
computation (in particular, JavaScript execution) is a key culprit. Prior
solutions to mitigate computation overheads, however, suffer from security,
privacy, and deployability issues, hindering their adoption. To sidestep these
issues, we propose a browser-based solution in which every client reuses
identical computations from its prior page loads. Our analysis across roughly
230 pages reveals that, even on a modern smartphone, such an approach could
reduce client-side computation by a median of 49\% on pages which are most in
need of such optimizations.
Alohamora: Reviving HTTP/2 Push and Preload by Adapting Policies On the Fly
Nikhil Kansal, Murali Ramanujam, Ravi Netravali
NSDI 2021
Source code
Abstract:: Despite their promise, HTTP/2's server push and preload
features have seen minimal adoption. The reason is that the efficacy of a
push/preload policy depends on subtle relationships between page content,
browser state, device resources, and network conditions--static policies that
generalize across environments remain elusive. We present Alohamora, a system
that uses Reinforcement Learning to learn (and apply) the appropriate
push/preload policy for a given page load based on inputs characterizing the
page structure and execution environment. To ensure practical training despite
the large number of pages served by a site and the massive space of potential
policies to consider for a given page, Alohamora introduces several key
innovations: a page clustering strategy that favorably balances push/preload
insight extraction with the number of pages required for training, and a
faithful page load simulator that can evaluate a policy in several milliseconds
(compared to 10s of seconds with a real browser). Experiments across a wide
range of pages and mobile environments (emulation and real-world) reveal that
Alohamora accelerates page loads by 19-61%, provides 3.6-4x more benefits than
recent push/preload systems, and properly adapts to never degrade performance.
WebMedic: Disentangling the Memory--Functionality Tension for the Next Billion Mobile Web Users
Usama Naseer, Theophilus Benson, Ravi Netravali
HotMobile 2021
Abstract:: Users in developing regions still suffer from poor web performance,
mainly due to their unique landscape of low-end devices. In this
paper, we uncover a root cause of this suboptimal performance by
cross-analyzing longitudinal resource (in particular, memory) profiles
from a large social network, and the memory consumption of modern
webpages in five regions. We discover that the primary culprit for
hitting memory constraints is JavaScript execution which existing
optimizations are ill-suited to alleviate. To handle this, we propose
WebMedic, an approach that trades-off less critical functionality of
a webpage to directly address memory and performance problems.
Semeru: A Memory-Disaggregated Managed Runtime
Chenxi Wang, Haoran Ma, Shi Liu, Yuanqi Li, Zhenyuan Ruan, Khanh Nguyen, Michael Bond, Ravi Netravali, Miryung Kim, Harry Xu
OSDI 2020
Source code
Abstract:: Resource-disaggregated architectures have risen in popularity
for large datacenters. However, prior disaggregation systems are designed for
native applications; in addition, all of them require applications to possess
excellent locality to be efficiently executed. In contrast, programs written in
managed languages are subject to periodic garbage collection (GC), which is a
typical graph workload with poor locality. Although most datacenter
applications are written in managed languages, current systems are far from
delivering acceptable performance for these applications. This paper presents
Semeru, a distributed JVM that can dramatically improve the performance of
managed cloud applications in a memory-disaggregated environment. Its design
possesses three major innovations: (1) a universal Java heap, which provides a
unified abstraction of virtual memory across CPU and memory servers and allows
any legacy program to run without modifications; (2) a distributed GC, which
offloads object tracing to memory servers so that tracing is performed closer
to data; and (3) a swap system in the OS kernel that works with the runtime to
swap page data efficiently. An evaluation of Semeru on a set of widely-deployed
systems shows very promising results.
Mind the Delay: The Adverse Effects of Delay-Based TCP on HTTP
Neil Agarwal, Matteo Varvello, Andrius Aucinas, James Newman, Fabian Bustamante, Ravi Netravali
CoNEXT 2020
Source code
Abstract: The last three decades have seen much evolution in web and network
protocols: amongst them, a transition from HTTP/1.1 to HTTP/2 and a shift from
loss-based to delay-based TCP congestion control algorithms. This paper argues
that these two trends come at odds with one another, ultimately hurting web
performance. Using a controlled synthetic study, we show how delay-based
congestion control protocols (e.g., BBR and CUBIC + Hybrid Slow Start) result
in the underestimation of the available congestion window in mobile networks,
and how that dramatically hampers the effectiveness of HTTP/2. To quantify the
impact of such finding in the current web, we evolve the web performance
toolbox in two ways. First, we develop Igor, a client-side TCP congestion
control detection tool that can differentiate between loss-based and
delay-based algorithms by focusing on their behavior during slow start. Second,
we develop a Chromium patch which allows fine-grained control on the HTTP
version to be used per domain. Using these new web performance tools, we
analyze over 300 real websites and find that 67% of sites relying solely on
delay-based congestion control algorithms have better performance with
HTTP/1.1.
Continuous Prefetch for Interactive Data Applications
Haneen Mohammed, Ziyun Wei, Eugene Wu, Ravi Netravali
VLDB 2020
Source code
Abstract: Interactive data visualization and exploration (DVE) applications are often network-bottlenecked due to bursty request
patterns, large response sizes, and heterogeneous deployments over a range of
networks and devices. This makes it difficult to ensure consistently low
response times. Khameleon is a framework for DVE
applications that uses a novel combination of prefetching and
response tuning to dynamically trade-off response quality for
low latency. Khameleon exploits DVE's approximation tolerance:
immediate lower-quality responses are preferable to waiting for
complete results. To this end, Khameleon progressively encodes
responses, and runs a server-side scheduler that proactively
streams portions of responses using available bandwidth to
maximize user-perceived interactivity. The scheduler involves a
complex optimization based on available resources, predicted
user interactions, and response quality levels; yet, decisions
must also be made in real-time. To overcome this, Khameleon
uses a fast greedy heuristic that closely approximates the
optimal approach. Using image exploration and visualization
applications with real user interaction traces, we show that
across a wide range of network and client resource conditions,
Khameleon outperforms existing prefetching approaches that
benefit from perfect prediction models: Khameleon always lowers
response latencies (typically by 2-3 orders of magnitude) while
keeping response quality within 50-80%.