End-to-End Sequential Consistency
39th International Symposium on Computer Architecture (ISCA 2012), Portland, Oregon, June 9-13, 2012.
IEEE Micro Top Pick
Abhayendra Singh, Satish Narayanasamy, Daniel Marino, Todd Millstein, Madanlal Musuvathi
Sequential consistency (SC) is arguably the most intuitive behavior
for a shared-memory multithreaded program. It is widely accepted that
language-level SC could significantly improve programmability of a
multiprocessor system. However, efficiently supporting end-to-end SC
remains a challenge as it requires that both compiler and hardware
optimizations preserve SC semantics. While a recent study has shown
that a compiler can preserve SC semantics for a small performance
cost, an efficient and complexity-effective SC hardware remains
elusive. Past hardware solutions relied on aggressive speculation
techniques, which has not yet been realized in a practical
implementation.
This paper exploits the observation that hardware need not enforce any
memory model constraints on accesses to thread-local and shared
read-only locations. A processor can easily determine a large
fraction of these safe accesses with assistance from static compiler
analysis and the hardware memory management unit. We discuss a
low-complexity hardware design that exploits this information to
reduce the overhead in ensuring SC. Our design employs an additional
unordered store buffer for fast-tracking thread-local stores and
allowing later memory accesses to proceed without a memory ordering
related stall.
Our experimental study shows that the cost of guaranteeing end-to-end
SC is only 6.2% on average when compared to a system with TSO
hardware executing a stock compiler's output.
[PDF | Project Page]