Garbled RAM Revisited
Part II

Steve Lu * Rafael Ostrovsky †

February 5, 2014

Abstract

In EUROCRYPT 2013, Lu and Ostrovsky proposed the notion of Garbled RAM (GRAM) programs. These GRAM programs are analogous to the classic result of Yao’s garbled circuits: a large encrypted memory can first be provided to evaluator, and then a program can separately be garbled and sent to an evaluator to securely execute while learning nothing but the output of the program and its running time. The key feature of GRAM is that it harnesses the natural complexity-theoretic power that Random Access Memory (RAM) programs have over circuits, such as in binary search or randomized algorithms, where program can be executed in a garbled way without “unrolling” it into a circuit. The candidate Lu-Ostrovsky GRAM scheme proposed in that paper is based on the existence of one-way functions, but due to an implicit reliance on a complex “circular” use of Yao garbled circuits, the scheme requires an additional circularity assumptions and may not be secure otherwise.

In this paper, we show how to construct efficient GRAM without circularity and based solely on the existence of any one-way function. The novel approach that allows us to break the circularity is a modification of the Goldreich-Goldwasser-Micali (PRF) construction. More specifically, we modify the PRF to allow PRF-keys to be “adaptively revoked” during run-time at the additive cost of roughly $\log n$ per revocation. Then, to improve the overhead of this scheme, we apply a delicate recursion technique that bootstraps mini-GRAM schemes into larger, more powerful ones while still avoiding circularity in the hybrid arguments. This results in secure GRAM with overhead of $\text{poly}(\kappa)(\text{min}(t, n^\epsilon))$ for any constant $\epsilon > 0$ where $n$ is the size of memory and $t$ is the running time.

In a companion work (Part I), Gentry, Halevi, Raykova, and Wichs show an alternative approach using identity-based encryption to solve the circularity problem\(^1\). Their scheme achieves overhead of $\text{poly}(\kappa)\text{polylog}(n)$ assuming the existence of IBE.

Keywords: Secure Computation, Oblivious RAM, Garbled RAM, Garbled Circuits.

1 Introduction

The problem of securely storing, accessing and computing on encrypted data is of increasing importance as our resources are being outsourced, such as in cloud storage and computation. This can be modeled as a general problem of secure computation of some functionality $F$. This problem has come from a long line of research, starting with the two-party case of Yao’s garbled circuits [22] and the multi-party case of the

\(^1\)A merged version of the two works appears in Eurocrypt 2014.
Goldreich-Micali-Wigderson [9] paradigm. In both of these approaches, the functionality is represented as a circuit. Many of the subsequent works, including secure computation via fully homomorphic encryption, requires representing \( F \) as a circuit, and there have only been a handful of results in other models such as branching programs, Turing Machines, RAM programs, and so forth.

On the other hand, many algorithms are naturally represented as RAM programs and simply cannot be represented by circuits without a large blowup. These include programs that run in poly-logarithmic time on large data (e.g. binary search on large data), or Las Vegas style algorithms that can have exponential running time but are efficient on average (e.g. simplex algorithm from linear programming). Performing secure RAM computation was first suggested in the work of Ostrovsky and Shoup [20], and a line of works [17, 12, 15, 16, 10] have considered the model of secure RAM computation.

We reiterate some of the main motivation of previous works of the advantages of using the RAM model of secure computation. In the case of Yao’s garbled circuits [22], you can non-interactively first garble the circuit and send it over during an “offline” phase, then when you have online inputs you can garble the input separately and send it over to the evaluator. The evaluator can then online run the garbled circuit and learn only the output. The Lu-Ostrovsky [16] GRAM construction achieves the same benefits of this non-interactive online/offline garbling paradigm but for RAM programs: you can garble a large memory and send it over, then separately garble a program and inputs that can be run online on this garbled memory. In this paper, we show how to construct a solution that also enjoys these same non-interactive garbling properties from any one-way function while avoiding circularity.

Of course, many complex real-world programs are more suitable for the RAM model instead of circuits: unrolling programs with multiple execution paths, recursion, loops, etc. into a circuit is often inefficient and prohibitive. The power of random access means that a program with small running time and code size can still access a large memory (such as in database search), but when represented as a circuit it must be at least as large as the size of the database and thus even in the online/offline model of secure computation, the online cost is still proportional to the database size. The online/offline model is still important since one can offload the cost of pre-processing a large database in preparation for secure computation.

In EUROCRYPT 2013, the work of [16] introduced the notion of Garbled RAM (GRAM) programs and presented a candidate scheme under the minimal assumption of one-way functions. However, there is a subtle “circularity” problem with the given construction, which prevents a proof of security from going through as-is, and would require an additional circularity assumption (which seems difficult to prove) to be secure. In a companion work (Part I), Gentry, Halevi, Raykova, and Wichs [6] proposed a modification to the construction that avoids the circularity in the argument and is secure under the existence of IBE schemes and asymptotically matches the efficiency of the [16] candidate.

In this work, we show how to construct a GRAM scheme that is secure based only on OWFs with a mild drop in overhead compared to the original construction of [16]. We patch the [16] scheme by directly breaking the circularity that lies in the PRF that is built into the construction. The main technique we use to break the circularity is to use a notion of revocable PRFs that allows for the “adaptive revocation” of PRF-keys and values. This notion is related to a well-studied theme of removing PRF values in the literature that appear in various forms such as delegatable PRFs [13], functional PRFs [5], or punctured PRFs [21]. That is to say, given a set \( X \) of values, and given a PRF key \( k \), there is a “revoke” transformation \( k \rightarrow k_X \) such that \( \forall y \notin X, F_{k_X}(y) = F_k(y) \) and \( \forall x \in X, F_k(x) \) is pseudorandom even given \( k_X \). When \( k \) is revoked in such a fashion, we obtain a new key \( k_X \), which as stated above is powerful enough to compute \( F_k \) on any value not in \( X \), but still reveals no information about \( F_k \) on elements in \( X \). The additional important property is that we can further revoke values, i.e. given any \( X \) and \( k_X \), and any \( Y \supset X \), there is a transformation \( k_X \rightarrow k_Y \).

This construction will be based on the Goldreich-Goldwasser-Micali [8] PRF, and it will allow keys to be revoked at the cost of roughly \( \log n \) per revocation. Then, to improve the overhead of this secure scheme, we apply a recursion technique that bootstraps mini-GRAM schemes into larger, more powerful ones. This results in our scheme having an overhead cost which is \( \text{poly}(\kappa)(\text{min}(t, n^\epsilon)) \) for any constant \( \epsilon > 0 \) where \( n \) is the size of memory and \( t \) is the running time.
1.1 Related Work

The notion of Oblivious RAM, which is used to hide the contents and access pattern (namely, the sequence of locations that is read from and written to during the execution of a RAM program) from memory, was introduced in the context of software protection by Goldreich [7] and Ostrovsky [18, 19]. In the original work by Goldreich [7], a solution was given with $O(\sqrt{n})$ and communication overhead where lookups could be done in a single round and $O(2\sqrt{\log n \log \log n})$ communication overhead for a recursive solution. Subsequently, [18, 19] gave a solution with only poly-log overhead and constant client memory (the so-called “hierarchical solution”). These works will be referenced in our construction of our GRAM scheme in order to introduce regularity properties in the access pattern of our programs.

The work of [20] suggested how to perform secure RAM computation based on an oblivious reading and writing scheme. Secure RAM computation was explored in the work of Naor and Nissim [17] using circuits with lookup tables. Gordon et al. [12] proposed a solution in the case of RAM programs with sublinear amortized cost. Namely, consider a client that holds a small input $x$, and a server that holds a large database $D$, and the client wishes to repeatedly perform private queries $f(x, D)$. In this model, an expensive initialization (depending only on $D$) is first performed (say, in an offline phase). Afterwards, if $f$ can be computed in time $T$ with space $S$ with a RAM machine, then there is a secure two-party protocol computing $f$ in time $O(T \cdot \text{polylog}(S))$ with the client using $O(\log S)$ space and the server using $O(S \cdot \text{polylog}(S))$ space. The interactive secure RAM computation solution of Lu and Ostrovsky [15] gave a construction with lower concrete complexity and can also be viewed as a generalization of the [20] model where servers must also perform sublinear work. Recently, the works [16] and [10] show how to construct non-interactive garbled RAM solutions. The former construction was based off any one-way function (and the additional circularity assumption), and the latter construction achieved full compactness under stronger assumptions.

1.2 Our Results

Our main theorem covers the entire construction which is secure given any one-way function and does not require additional circularity assumptions. We allow for the garbling of memory, inputs, program code, and program CPU steps, and the evaluator will evaluate this garbled information.

Main Theorem (Informal). Assume one-way functions exist, and let the security parameter be $\kappa$. Then, for any initial RAM contents $D$ of size $n$, programs $P_1, \ldots, P_\ell$ that run on $D$ in at most $t$ CPU steps, there exists a Garbled RAM scheme with $\text{polylog}(\kappa, n)$ overhead in the storage size of the garbled memory contents, and $\text{poly}(\kappa)(\min(t, n^e))$ overhead in the size of the garbled program and its running time.

1.3 Remarks

Polynomial vs Exponential Size/Time RAM. Since the RAM CPU is of size at least $\kappa$, it can theoretically index up to $2^\kappa$ locations. While typically we consider $\kappa$, $t$ and $n$ to be polynomially related, we obtain interesting regimes in the case of exponentially (or superpolynomially) large running-times or memory sizes. In particular, if $n$ is, say, exponentially large, we can strengthen our Main Theorem to drive the overhead to be sub-polynomial in $n$, i.e. $\text{poly}(\kappa)(\min(t, 2\sqrt{\log n}))$.

Reactive Functionalities and Reusability. One of the nice properties that we retain from the construction of [16] is that one can “reuse” the memory once initially garbled. I.e. a sequence of different programs and different inputs can be garbled in sequence that all operate on a dynamically changing memory store that is garbled once (which mimics a reactive functionality). Note that these changes, once applied, cannot be rewound by the evaluator and the way to ensure this to have the garbler maintain an ever-increasing counter. Note that we require only the garbler to choose these new programs and inputs to

\footnote{We mention that the program code and input can be thought of as a single object, but we separate them so that we can run multiple programs on the same input or vice versa.}
avoid any adaptivity issues that arise in garbling schemes (see Bellare-Hoang-Rogaway [2] for a discussion on this issue). A formal treatment of this notion is provided in Part I [6], and we retain the notation.

An interesting concept introduced in the STOC 2013 paper of Goldwasser et al. [11] is the notion of token-based obfuscation, where a reusable garbled circuit can be evaluated on multiple inputs as long as the garbler provides a new “token” for each input. Because our scheme can garble the code of a program and store it ahead of time, we achieve a slightly different notion of token-based obfuscation: we need to garble each input, but we also need to provide garbled CPU steps that allow the evaluator to run a single step of the program, thus resulting in tokens for both inputs and “time quota”.

Worst-case Versus Per-instance Running Time, Universal Programs, and Output Privacy. As was noted in the CRYPTO 2013 work of Goldwasser et al. [10], the power of secure computation on Turing Machines and RAM programs over that of circuits is that for algorithms with very different worst-case and average-case running times, the circuit must be of worst-case size. Randomized algorithms such as Las Vegas algorithms or even heuristically good-on-average programs would benefit greatly if the online running time of the secure computation ran in time proportional to that particular instance. In our solution, though we have an upper bound \( T \) on the number of execution steps of the algorithm which affects the offline time and space, the online evaluation can have a CPU step output “halt” in the clear when the program has halted and the evaluator will then only run in time depending on this particular input.

In order to further mask the program, one can consider a \( T \) time-bounded universal program \( u_T \), which takes as input the code of a program \( \pi \) and an input for that program. One can also provide an auxiliary mask so that the output of \( P \) is blinded by this value (such a modification has appeared in the literature, see, e.g. [1]).

2 Background

We follow the notational convention that was set forth in Part I [6]. A few of the subsections from this section are restated for the convenience of the reader.

2.1 RAM Model

Notation for RAM Computation. Before we describe garbled RAM, let us fix a notation for describing standard RAM computation. We will consider a program \( P \) that has random-access to a memory of size \( n \) which may initially contain some data \( D \in \{0,1\}^n \). In addition, the program gets a “short” input \( x \), which we can alternatively think of as the initial state of the program. In general, the distinction between what to include in the program \( P \), the memory data \( D \) and the short input \( x \) can be somewhat arbitrary. We use the notation \( P^D(x) \) to denote the execution of such program. The program can read/write to various locations in memory throughout the execution. We will also consider the case where several different programs are executed sequentially and the memory persists between executions. We denote this process as \( (y_1, \ldots, y_\ell) = (P_1(x_1), \ldots, P_\ell(x_\ell))^D \) to indicate that first \( P_1^D(x_1) \) is executed, resulting in some memory contents \( D_1 \) and output \( y_1 \), then \( P_2^{D_1}(x_2) \) is executed resulting in some memory contents \( D_2 \) and output \( y_2 \) etc. As a useful example to keep in mind throughout this work, imagine that \( D \) is a huge database and the programs \( P_i \) are database queries that can read and possibly write to the database and are parameterized by some values \( x_i \).

CPU-Step Circuit. A useful representation of a RAM program \( P \) is through a small CPU-Step Circuit which executes a single CPU step:

\[
C_P^{CPU}(state, b^{read}) = (state', r^{read}, r^{write}, b^{write})
\]
This circuit takes as input the current CPU state and a bit \( b^{\text{read}} \) residing in the the last read memory location. It outputs an updated state', the next location to read \( i^{\text{read}} \in [n] \), a location to write to \( i^{\text{write}} \in [n] \cup \{ \perp \} \) where \( \perp \) values are ignored, a bit \( b^{\text{write}} \) to write into that location.

The computation \( P_D(x) \) starts in the initial state \( \text{state}_1 = x \), corresponding to the “short input” and by convention we will set the initial read bit to \( b^{\text{read}} := 0 \). In each step \( j \), the computation proceeds by running \( C^D_{\text{CPU}}(\text{state}_j, i^{\text{read}}) = (\text{state}_{j+1}, i^{\text{read}}, i^{\text{write}}, b^{\text{write}}) \). We first read the requested location \( i^{\text{read}} \) by setting \( b^{\text{read}} := D[i^{\text{read}}] \), and, if \( i^{\text{write}} \neq \perp \), we write to the location by setting \( D[i^{\text{write}}] := b^{\text{write}} \). The value \( y = \text{state} \) output by the last CPU step serves as the output of the computation.

We say that a program \( P \) has read-only memory access, if it never overwrites any values in memory. In particular, using the above notation, the outputs of \( C^D_{\text{CPU}} \) always set \( i^{\text{write}} = \perp \).

### 2.2 Garbled Circuits

Garbled circuits was first suggested by Yao [22] and subsequently proven secure by Lindell and Pinkas [14]. We review the concept of garbled circuits and refer the reader to the work of Bellare et al. [3] for a thorough treatment of garbling schemes. We continue with the notation of Part I and work with a projective circuit garbling scheme which is a tuple of PPT algorithms \( (\text{GCircuit}, \text{Eval}) \). Since the scheme is projective, it is helpful to think of individual wire labels for the input so that wire \( w \) of the circuit is associated with two labels \( \text{lbl}_{w^1}, \text{lbl}_{w^2} \) corresponding to the bit-values 0, 1. Finally, since one can apply a generic transformation (see, e.g. [1]) to blind the output, we allow output wires to also have arbitrary labels associated with them. We take the following definitions and notation from Part I:

- \( (\tilde{C}, \{ (j, b, \text{lbl}^{\text{input},j}_b) \}) \) \( \leftarrow \) \( \text{GCircuit}(1^n, C, \{ (i, b, \text{lbl}^{\text{output},i}_b) \}) \): Given a circuit \( C \) with input size \( v_{\text{input}} \) and output size \( v_{\text{output}} \), and a set of output labels \( \text{lbl}^{\text{output},i}_b \) for all output wires \( i \in [v_{\text{output}}] \) and \( b \in \{0, 1\} \), outputs a garbled circuit \( \tilde{C} \) and a set of input labels \( \text{lbl}^{\text{input},j}_b \) for every input wire \( j \in [v_{\text{input}}] \) and \( b \in \{0, 1\} \).

- \( (\text{lbl}^{\text{output},1}_b, \ldots, \text{lbl}^{\text{output},v_{\text{output}}}_b) = \text{Eval}(\tilde{C}, (\text{lbl}^{\text{input},1}_b, \ldots, \text{lbl}^{\text{input},v_{\text{input}}}_b)) \): Given a garbled circuit \( \tilde{C} \) and a sequence of input labels \( \text{lbl}^{\text{input},j}_b \), outputs a sequence of output labels \( \text{lbl}^{\text{output},i}_b \). Intuitively, if the input labels correspond to some input \( x \in \{0, 1\}^{v_{\text{input}}} \) then the output labels should correspond to \( y = C(x) \).

**Correctness.** For correctness, we require that for any circuit \( C \) and any input \( x \in \{0, 1\}^{v_{\text{input}}} \) such that \( y = (y[1], \ldots, y[v_{\text{output}}]) = C(x) \) and any set of output labels \( \{ (i, b, \text{lbl}^{\text{output},i}_b) \} \) we have

\[
\Pr \left[ \text{Eval}(\tilde{C}, (\text{lbl}^{\text{input},1}_b x[1], \ldots, \text{lbl}^{\text{input},v_{\text{input}}}_b x[v_{\text{input}}])) = (\text{lbl}^{\text{output},1}_b y[1], \ldots, \text{lbl}^{\text{output},v_{\text{output}}}_b y[v_{\text{output}}]), \right] = 1.
\]

**Security.** For security, we require that there is a PPT simulator \( \text{Sim} \) such that for any \( C, x, \{ (i, b, \text{lbl}^{\text{output},i}_b) \} \) as above, we have

\[
(\tilde{C}, \text{lbl}^{\text{input},1}_b x[1], \ldots, \text{lbl}^{\text{input},v_{\text{input}}}_b x[v_{\text{input}}]) \approx_{\text{comp}} \text{Sim}(1^n, C, \text{lbl}^{\text{output},1}_b y[1], \ldots, \text{lbl}^{\text{output},v_{\text{output}}}_b y[v_{\text{output}}])
\]

where \((\tilde{C}, \{ (j, b, \text{lbl}^{\text{input},j}_b) \}) \) \( \leftarrow \) \( \text{GCircuit}(C, \{ (i, b, \text{lbl}^{\text{output},i}_b) \}) \), \( y = C(x) \).

### 2.3 Goldreich-Goldwasser-Micali Pseudorandom Function

We review the GGM [8] PRF construction and make an observation that will be useful for us later on. Let \( G \) be a PRG that stretches from \( \lambda \) bits to \( 2\lambda \) bits, and we can write \( G_0 \) to denote the left half of the output
and \( G_1 \) to denote the right half. We can write \( G_{00} \) to denote \( G_0 \circ G_0 \), and similarly for \( G_{01}, G_{10}, G_{11} \). Indeed for any bitstring \( x \), we can write \( G_x \) to denote repeatedly going left/right using \( x \) as the indicator.

Then the GGM construction of a PRF \( F \) from \( G \) is as follows. Suppose \( k \) is the seed of the PRF, then \( F_k(x) \) is defined to be \( G_x(k) \). This can be viewed as evaluating \( G \) on the “root” of a tree consisting of \( k \), and each bit of the input \( x \) defines whether to go left or right on the tree. The security property is that any PPT algorithm \( A \) given oracle access to \( F_k(\cdot) \) should behave as if it were interacting with a random oracle \( R(\cdot) \):

\[
\left| Pr[A^{F_k(\cdot)}(1^\lambda) = 1; k \leftarrow \{0,1\}^\lambda] - Pr[A^{R(\cdot)}(1^\lambda) = 1] \right| < \epsilon
\]

with \( \epsilon \) being negligible.

### 2.4 Garbled RAM

We will right-away consider a scenario where the memory data \( D \) is garbled once and then many different garbled programs can be executed sequentially with the memory changes persisting from one execution to the next. We stress that each garbled program \( \tilde{P}_i \) can only be executed on a single garbled input \( \tilde{x}_i \). In other words, although the garbled data is reusable and allows for the execution of many programs, the garbled programs are not reusable. The programs can only be executed in the specified order and are not “interchangeable”. Therefore, they cannot be garbled completely independently. In our case, we will assume that the garbling procedure of each program \( P_i \) gets \( t^{\text{init}} \) which is the total number of CPU steps executed so far by \( P_1, \ldots, P_{i-1} \) and \( t^{\text{cur}} \) which is the number of CPU steps to be executed by \( P_i \).

**Syntax & Efficiency.** A garbled RAM scheme consists of four procedures: \((\text{GData}, \text{GProg}, \text{GInput}, \text{GEval})\) with the following syntax:

- \( \tilde{D} \leftarrow \text{GData}(D,k) \): Takes memory data \( D \in \{0,1\}^n \) and a key \( k \). Outputs the garbled data \( \tilde{D} \).
- \( (\tilde{P}, k^{\text{input}}) \leftarrow \text{GProg}(P,k,n,t^{\text{init}},t^{\text{cur}}) \): Takes a key \( k \) and a description of a RAM program \( P \) with memory-size \( n \) and run-time consisting of \( t^{\text{cur}} \) CPU steps. In the case of garbling multiple programs, we also provide \( t^{\text{init}} \) indicating the cumulative number of CPU steps executed by all of the previous programs. Outputs a garbled program \( \tilde{P} \) and an input-garbling-key \( k^{\text{input}} \).
- \( \tilde{x} \leftarrow \text{GInput}(x,k^{\text{input}}) \): Takes an input \( x \) and input-garbling-key \( k^{\text{input}} \) and outputs a garbled-input \( \tilde{x} \).
- \( y = \text{GEval}^{\tilde{D}}(\tilde{P},\tilde{x}) \): Takes a garbled program \( \tilde{P} \), garbled input \( \tilde{x} \) and garbled memory data \( \tilde{D} \) and computes the output \( y = D^{\tilde{D}}(x) \). We model \( \text{GEval} \) itself as a RAM program that can read and write to arbitrary locations of its memory initially containing \( \tilde{D} \).

For efficiency, we require that the run-time of \( \text{GProg} \) and \( \text{GEval} \) is \( |C_{CPU}^{\text{GProg}}| \cdot t^{\text{cur}} \cdot \text{poly}(\kappa) \cdot \text{polylog}(n) \), which also serves as the bound on the size of the garbled program \( \tilde{P} \). Moreover, we require that the run-time of \( \text{GData} \) should be \( n \cdot \text{poly}(\kappa) \), which also serves as an upper bound on the size of \( \tilde{D} \).

**Correctness & Security.** To define the correctness and security requirements of garbled RAMs, let \( P_1, \ldots, P_\ell \) be any sequence of programs with polynomially-bounded run-times \( t_1, \ldots, t_\ell \). Let \( D \in \{0,1\}^n \) be any initial memory data, let \( x_1, \ldots, x_\ell \) be inputs and \((y_1, \ldots, y_\ell) = (P_1(x_1), \ldots, P_\ell(x_\ell))^D\) be the outputs given by the sequential execution of the programs.

Consider the following experiment: choose a key \( k \leftarrow \{0,1\}^\kappa \), \( \tilde{D} \leftarrow \text{GData}(D,k) \) and for \( i = 1, \ldots, \ell \):

\[
(\tilde{P}_i,k^{\text{input}}_i) \leftarrow \text{GProg}(P_i,n,t^{\text{init}}_i,t_i,k), \tilde{x}_i \leftarrow \text{GInput}(x_i,k^{\text{input}}_i)
\]

where \( t_i^{\text{init}} := \sum_{j=1}^{i-1} t_i \) denotes the run-time of all programs prior to \( P_i \). Let

\[
(y'_1, \ldots, y'_\ell) = (\text{GEval}(\tilde{P}_1,\tilde{x}_1), \ldots, \text{GEval}(\tilde{P}_\ell,\tilde{x}_\ell))^\tilde{D},
\]

denotes the output of evaluating the garbled programs sequentially over the garbled memory.

We require that the following properties hold:
• Correctness: We require that \( \Pr[y_1' = y_1, \ldots, y_t' = y_t] = 1 \) in the above experiment.

• Security: we require that there exists a universal simulator \( \text{Sim} \) such that:

\[
(\tilde{D}, \tilde{P}_1, \ldots, \tilde{P}_t, \tilde{x}_1, \ldots, \tilde{x}_t) \overset{\text{comp}}{\approx} \text{Sim}(1^\kappa, \{P_i, t_i, y_i\}_{i=1}^\ell, n).
\]

Our security definition is non-adaptive: the data/programs/inputs are all chosen ahead of time. This makes our definitions/analysis simpler and also matches the standard definitions for our building blocks such as ORAM. However, there does not seem to be any inherent hurdle to allowing each subsequent program/input \( (P_i, x_i) \) to be chosen adaptively after seeing \( \tilde{D}, (\tilde{P}_1, \tilde{x}_1), \ldots, (\tilde{P}_{t-1}, \tilde{x}_{t-1}) \).

**Security with Unprotected Memory Access (UMA).** We also consider a weaker security notion, which we call security with *unprotected memory access* (UMA). In this variant, the attacker may learn the initial contents of the memory \( D \), as well as the complete memory-access pattern throughout the computation including the locations being read/written and their contents. In particular, we let \( \text{MemAccess} = \{(i_{j,\text{read}}, i_{j,\text{write}}, b_{j,\text{write}}) : j = 1, \ldots, t\} \) correspond to the outputs of the CPU-step circuits during the execution of \( P^D(x) \). For security with unprotected memory access, we give the simulator the additional values \( (D, \text{MemAccess}) \). Using the notation from above, we require:

\[
(\tilde{D}, \tilde{P}_1, \ldots, \tilde{P}_t, \tilde{x}_1, \ldots, \tilde{x}_t) \overset{\text{comp}}{\approx} \text{Sim}(1^\kappa, \{P_i, t_i, y_i\}_{i=1}^\ell, D, \text{MemAccess}, n).
\]

### 2.5 Review of the Lu-Ostrovsky Construction

A thorough review and intuition of the candidate construction of [16] is given in Part I. Here, we restate the relevant technical details which will be used in our new construction. This construction is for GRAM with UMA-security on programs with so-called “predictably timed writes” which we define below.

**Predictably Timed Writes.** As a first step, we describe how to incorporate a limited form of writing to memory, which we call *predictably timed writes* (ptWrites). On a high level, this means that whenever we want to read some location \( i \) in memory, it is easy to figure out the time (i.e., CPU step) \( j \) in which that location was last written to, given only the current state of the computation and without reading any other values in memory. We will later describe how to upgrade a solution for ptWrites to one that allows arbitrary writes. We give a formal definition of ptWrites below:

**Definition 2.1 (Predictably Timed Writes (ptWrites)).** A program execution \( P^D(x) \) has predictably timed writes (ptWrites) if there exists a poly-size circuit \( \text{WriteTime} \) such that the following holds for every CPU step \( j = 1, \ldots, t \). Let the inputs/outputs of the \( j \)-th CPU step be \( C^P_{\text{CPU}}(\text{state}_j, b_{j,\text{read}}) = (\text{state}_{j+1}, i_{j,\text{read}}, i_{j,\text{write}}, b_{j,\text{write}}) \). Then, \( u = \text{WriteTime}(j, \text{state}_j, i_{j,\text{read}}) \) is the largest value of \( u < j \) such that the CPU step \( u \) wrote to location \( i_{j,\text{write}} \); i.e., \( i_{u,\text{write}} = i_{j,\text{write}} \). We also define a ptWrites property for a sequence of program executions \( (P_1(x_1), \ldots, P_t(x_t))^D \) if the above property holds for each CPU step in the sequence.

**Garbled Data.** The garbled data \( \tilde{D} \) consists of \( n \) secret keys for some symmetric-key encryption scheme. For each bit \( i \in [n] \) of the original data \( D \), the garbled data \( \tilde{D} \) contains a secret key \( \text{sk}_i \). The secret keys are chosen pseudo-randomly using a pseudo-random function (PRF) family \( F \) via \( \text{sk}_i = F_k(u, i, D[i]) \) (where \( u \) is the last CPU step in which it was written). Initially \( u = 0 \), so given \( k \), there are two possible values \( \text{sk}_{(i,0)} = F_k(0, i, 0) \) and \( \text{sk}_{(i,1)} = F_k(0, i, 1) \) that can initially reside in \( \tilde{D}[i] \) depending on the bit \( D[i] \) of the original data, and we set \( \tilde{D}[i] = \text{sk}_{(i,D[i])} \).
Garbled Program (Technical). We define an augmented CPU-step circuit $C_{\text{CPU}}^+$ which gets as input $(\text{state},b_{\text{read}})$ and outputs $(\text{state}',i_{\text{read}},i_{\text{write}},sk_{\text{write}},\text{translate})$. It contains some hard-coded parameters $(j,k,r_0,r_1,\text{lb}_b^{(\text{read})},\text{lb}_1^{(\text{read})})$, where $j$ is the current step, and performs the following computation:

- $(\text{state}',i_{\text{read}},i_{\text{write}},\text{translate}) = C_{\text{CPU}}^+(\text{state},b_{\text{read}})$ are the outputs of the basic CPU-step circuit.
- $\text{translate} = (ct_0,ct_1)$ consists of two ciphertexts, computed as follows. For $b \in \{0,1\}$, first compute $sk_{(i,b)} := F_k(u,i,b)$ for $i = i_{\text{read}}$, where $u \leftarrow \text{WriteTime}$. Then set $c_b = \text{Enc}_{sk_{(i,b)}}(\text{lb}_b^{(\text{read})}; r_b)$ where $\text{Enc}$ is a symmetric key encryption and $r_b$ is the encryption randomness.
- If $i_{\text{write}} \neq \perp$, set $sk_{\text{write}} = F_k(j, i_{\text{write}}, b_{\text{write}})$.

The garbled program $\hat{P}$ consists of $t$ garbled copies of this augmented CPU-step circuit $\hat{C}_{\text{CPU}}^+(j)$. We start garbling from the end $j = t$. Each garbled circuit $\hat{C}_{\text{CPU}}^+(j)$ outputs the values $i_{\text{read}},\text{translate}$ in the clear and the updated $\text{state}'$ is garbled with the same labels as the input $\text{state}$ in the next circuit $\hat{C}_{\text{CPU}}^+(j+1)$; the last circuit outputs $\text{state}'$ in the clear as the output of the computation. Each garbled circuit $\hat{C}_{\text{CPU}}^+(j)$ contains hard-coded values $(j,k,r_0^{(j)},r_1^{(j)},\text{lb}_0^{(\text{read},j+1)},\text{lb}_1^{(\text{read},j+1)})$ which are used to compute the translation mapping $\text{translate}$ as described above. The key $k$ is the PRF key which was used to garbled the memory data. The values $r_0^{(j)},r_1^{(j)}$ are fresh encryption random coins, and $\text{lb}_0^{(\text{read},j+1)},\text{lb}_1^{(\text{read},j+1)}$ are the labels of the input-wire for the bit $b_{\text{read}}$ in the garbled circuit $\hat{C}_{\text{CPU}}^+(j+1)$.

Garbled Input & Evaluation. The garbled input $\tilde{x}$ consists of the wire-labels for the value $\text{state}_1 = x$ for the garbled circuit $\hat{C}_{\text{CPU}}^+(j = 1)$. The evaluator simply evaluates the garbled augmented CPU-step circuits one by one starting from $j = 1$. It can evaluate the first circuit using only $\tilde{x}$, and gets out a garbled output $\text{state}_2$ along with the values $(i_{\text{read}},\text{translate} = (ct_0,ct_1))$ in the clear. The evaluator looks up the secret key $sk := D[i_{\text{read}}]$ and attempts to use it to decrypt $ct_0$ and $ct_1$ to recover a label $\text{lb}_1^{(\text{read},j=2)}$. In case of a write operation, it writes the new $sk_{\text{write}}$ to the indicated location. The evaluator then evaluates the second garbled circuit $\hat{C}_{\text{CPU}}^+(j = 2)$ using the garbled input $\text{state}_2$ and the wire-label $\text{lb}_1^{(\text{read},j=2)}$ for the wire corresponding to the bit $b_{\text{read}}$. This process continues until the last circuit $j = t$ which outputs $\text{state}'$ in the clear as the output of the computation.

Circularity in the Security Analysis. There is a complex circularity as seen below:

1. In order to argue that the evaluator does not learn anything about the “other” label $\text{lb}_1^{\text{read}}$, we need to rely on the security of the ciphertext $ct_1$.
2. In order to rely on the security of the ciphertext $ct_1$ we need to argue that the attacker does not learn the decryption key $sk_{(i,1)} = F_k(i,1)$, which requires us to argue that the attacker does not learn the PRF key $k$. However, the PRF key $k$ is contained as a hard-coded value of the second garbled circuit $\hat{C}_{\text{CPU}}^+(j = 2)$ and all future circuits as well. Therefore, to argue that the attacker does not learn $k$ we need to (at the very least) rely on the security of the second garbled circuit.
3. In order to use the security of the second garbled circuit $\hat{C}_{\text{CPU}}^+(j = 2)$, we need to argue that the evaluator only gets one label per wire, and in particular, we need to argue the the evaluator does not have the “other” label $\text{lb}_1^{\text{read}}$. But this is what we wanted to prove in the first place!

For the rest of this paper, we will show how to circumvent this issue while still using only the minimal assumption of the existence of one-way functions.

3 Warm-up Read-only Construction

The main problem that arises in the circularity is that there is only one PRF key, and that this key when embedded in any future time step is able to decode anything the circuit does in the current time step. The
intuitive way to circumvent this is to iteratively weaken the PRF key. In order to do so, we introduce the following notion of revocable PRFs.

### 3.1 revocable PRFs

We define the notion of (adaptively) revocable PRFs and explain how it differs from existing notions such as [4, 5, 13, 21]. The idea is that we can revoke values from the key so that the PRF cannot be evaluated on these values, and given an already-revoked key, one can further revoke new values.

**Definition 3.1.** A revocable PRF is a PRF $F$ equipped with an additional revoke algorithm $\text{Rev}$. Given any key $k_X$, where $X$ is the set of revoked values (with $X = \emptyset$ if $k$ is a fresh key), $k_Y \leftarrow \text{Rev}(k_X, Y)$ is a new revoked key relative to the set of values $Y \supset X$ satisfying the following properties:

**Correctness:** $F_{k_Y}(x) = F_{k_X}(x)$ for all $x \notin Y$, and $\perp$ otherwise.

**Pseudorandomness:** Given any set of keys $\{k_{Y_1}, \ldots, k_{Y_m}\}$, $F_k(x)$ is pseudorandom for all $x \notin \bigcap_i Y_i$.

Note that this definition appears similar to constrained PRFs [4]; however, we do not require that the revoked set be hidden in any way, and we allow for keys to be adaptively revoked further after the initial fresh key has been operated on. Despite these differences, the concrete GGM-based instantiation that we use is congruent to existing works, and we remind the reader of how it goes.

We start with the simple case of removing a single value $x$ from the domain of $F_k$. Suppose we write $x = x^0x^1\ldots$ as the bits of $x$. Instead of giving out $k$, we can give out the set of values $G_{1-x^0}(k), G_{x^01-x^1}(k), \ldots$, which can be viewed as the siblings of all the nodes in the path from the root to the leaf $x$ in the GGM tree. Note that any value $y \neq x$ can still be evaluated by this key since we have the root of the subtree corresponding to the first bit on which $x$ and $y$ differ.

Let $X = \{x_1, x_2, \ldots, x_s\}$ be a set of $s$ distinct values that we want to “revoke” from this PRF. We can produce a new key $k_X$ relative to the set $X$ that contains at most $s \cdot |x|$ values (corresponding to the values on the vertices of the minimal tree cover of the set $\{0, 1\}^{|x|} \setminus X$). Furthermore, if we are given $k_X$ and $Y \supset X$, we can apply another revocation to this key to get the new key $k_Y$ by removing the roots of subtrees powerful enough to compute $y \in Y$, but adding in the siblings of the subtrees on paths to $y$.

### 3.2 Garbling a Read-Only, Read-Once Program

We begin with a warmup construction that gives intuition as to how our full solution works. Consider a RAM program that is read-only, and it only reads each memory location at most once. Then for each CPU step, after we publish the translation table $\text{translate}$, the augmented CPU revokes the two values corresponding exactly to that table, namely $\{(u, i, 0), (u, i, 1)\}$ where $i$ is the memory location to be read (and since this is read-only, $u$ is always 0). Indeed, let $F$ be a revocable PRF, and let $(\text{GCircuit}, \text{CircEval})$ be a projective circuit garbling scheme. Let $k$ be the master PRF key that we choose upon initiation.

**Garbled Data.** The garbled data $\tilde{D}$ is an array of secret keys, which are just outputs of $F$ as we will describe.

- $\tilde{D} \leftarrow \text{GData}(D, k)$: For each $i \in [n]$, set $\tilde{D}[i] := sk$ where $sk = F_k(0, i, D[i])$.

**Garbled Program.** We describe the augmented-CPU-step circuit $C^P_{\text{CPU}+}$ for the program $P$ in Figure 1. The garbled RAM program for $P$ will consist of $t$ copies of a garbled augmented-CPU-Step circuit $\tilde{C}^P_{\text{CPU}+}(j)$. As before, the labels for the output wires in each circuit are chosen carefully so that some wire values are revealed in the clear while others remain garbled for the next circuit.
Input: \((\text{state}, b^{\text{read}}, X, k_X)\)  
Output: \((\text{state}', i^{\text{read}}, \text{translate}, X', k_{X'})\)

Hard-Coded Parameters: \(j, l_{0}^{\text{read}}, l_{1}^{\text{read}}, \) \(l_{\text{input}}^{\text{read}}\), \(l_{\text{output}}^{\text{read}}\)

The circuit \(C_{\text{CPU}^+}^{P}\) performs the following computation:

- \((\tilde{P}, k^{\text{input}}) \leftarrow \text{GProg}(P, k, n, t^{\text{init}}, t_{\text{cur}})\): Let \(t^{\max} := t^{\text{init}} + t_{\text{cur}}\). We let \(j\) count down from \(j = t^{\max}\) to \(t^{\text{init}} + 1\). For each \(j\), we garble \(C_{\text{CPU}^+}^{P}\) by calling \(\tilde{C}_{\text{CPU}^+}^{P}(j) \leftarrow \text{GCircuit}(1^k, C_{\text{CPU}^+}^{P}, \mathcal{I})\) where the output labels \(\mathcal{I}\) are chosen as follows:

  - The outputs \(i^{\text{read}}, \text{translate}, X'\) are given out in the clear (in fact, \(X'\) can be inferred simply from the access pattern). For the last circuit \(j = t^{\max}\), we do not provide these outputs, but instead provide the output \(\text{state}'\) in the clear and this serves as the output of the computation. This completely fixes all of the output-wire labels for that circuit.
  - For \(j \neq t^{\max}\), the labels of the output wires corresponding to \(\text{state}', k', X'\) are set to match the labels of the input wires corresponding to \(\text{state}, k, X\) in circuit \(\tilde{C}_{\text{CPU}^+}^{P}(j + 1)\).
  - For the initial circuit \(j = t^{\text{init}} + 1\), we also hard-code the input bit \(b^{\text{read}}\) to 0, and hard-code \(k\) to be the master PRF key \(k\), and \(X\) to be the empty set.

For \(j \neq t^{\max}\), let \(l_{0}^{\text{read}, j+1}, l_{1}^{\text{read}, j+1}\) be the labels of input wire for the bit \(b^{\text{read}}\) in the \((j + 1)\)st garbled circuit. The \(j\)th garbled circuit contains the hard-coded secret values:

\[
(j, l_{0}^{\text{read}, j+1}, l_{1}^{\text{read}, j+1})
\]

Set \(\tilde{P} := \left[\tilde{C}_{\text{CPU}^+}^{P}(t^{\text{init}} + 1), \ldots, \tilde{C}_{\text{CPU}^+}^{P}(t^{\max})\right]\), and set \(k^{\text{input}} := \{(i, b, l_{b}^{\text{input}, i}) : i \in [v], b \in \{0, 1\}\}\) to consist of all of the input-wire labels for the \(v\) input wires corresponding to the input in the initial circuit \(\tilde{C}_{\text{CPU}^+}^{P}(t^{\text{init}} + 1)\).

Garbled Input. Finally, the garbled input \(\tilde{x}\) is created the same way as in garbled circuits. It simply consists of the subset of labels of \(k^{\text{input}} := \{(i, b, l_{b}^{\text{input}, i}) : i \in [v], b \in \{0, 1\}\}\) corresponding to the bits of \(x\).

\[
\tilde{x} \leftarrow \text{Ginput}(x, k^{\text{input}}): \text{Parse } k^{\text{input}} := \{(i, b, l_{b}^{\text{input}, i}) : i \in [v], b \in \{0, 1\}\}\text{ and output } \tilde{x} = (l_{x[1]}^{\text{input}}, \ldots, l_{x[v]}^{\text{input}})\text{ where } x[i] \text{ denotes the } i\text{th bit of } x\text{ and } v := |x|.
\]

Evaluation. To run \(y = \text{GEval}^{\tilde{D}}(\tilde{P}, \tilde{x})\): parse \(\tilde{P} := \left[\tilde{C}_{\text{CPU}^+}^{P}(1), \ldots, \tilde{C}_{\text{CPU}^+}^{P}(t)\right]\) as consisting of \(t\) garbled circuits. The evaluator evaluates the circuits one-by-one. Set \(\tilde{x}_1 := \tilde{x}\). For \(j = 1, \ldots, t\):

- When \(j = 1\), run \(\text{CircEval}(\tilde{C}_{\text{CPU}^+}^{P}(1), \tilde{x}_1)\) else run \(\text{CircEval}(\tilde{C}_{\text{CPU}^+}^{P}(j), (\tilde{x}_j, l_{b}^{\text{read}, j}, k, \tilde{X}))\) where \(\tilde{x}_j\) consists of labels for the garbled \(\text{state}_j\), \(\tilde{k}\) consists of labels for the garbled \(k\), and \(\tilde{X}\) consists of labels for the garbled \(X\), and \(l_{b}^{\text{read}, j}\) is a label for the read-bit in the \(j\)th CPU circuit. This reveals the outputs \(i^{\text{read}}, \text{translate}_j, X'\) in the clear. For \(j < t\) it also reveals the garbled output corresponding to the labels of \(k_{X'}\) and \(\text{state}_{j-1}\) for circuit \(j + 1\). For \(j = t\) it reveals the output of the computation \(y = \text{state}_{t+1}\) in the clear.
• Look up $\text{sk} = D[\text{translate}_j]$ and decrypt the corresponding row in $\text{translate}_j$ to obtain $\text{lbl}^{\text{read},j+1}$.

**Theorem 3.2** (Read-only Warmup to Main Theorem). Assume $F$ is a revocable PRF, concretely the revocable GGM-PRF, and suppose that $(\text{GCircuit}, \text{CircEval})$ be a projective circuit garbling scheme with wire labels with security parameter $\kappa$ (both of which can be constructed from any one-way function). Then the warm-up RAM construction above is a read-only, read-once GRAM satisfying UMA-security. Furthermore the space overhead of the garbled memory is $\text{poly}(\kappa)$, and the space overhead of the garbled CPU steps, and the running time overhead are $\text{poly}(\kappa)\cdot \text{polylog}(n) \cdot t$ where $t$ is the running time.

**Proof Sketch.**

The overhead of the garbled memory is obviously the size of the outputs of the PRF which is $\text{poly}(\kappa)$. However, because each revocation increases the size of the key by a factor of $\log n$, after $t$ steps the size of this key will be $t \log n$.

We provide a sketch at a very high level to provide the intuition for the rest of our construction. This intuition will be used for our full construction, and parallels the intuition for the alternative scheme found in Part I [6]. Since we are dealing with UMA-security, the simulator is allowed to get both the contents of memory and the access pattern, the simulator can pick its own master PRF key $k$ and almost generate an honest transcript by using the underlying garbled circuit simulator for every step. The only real difficulty is in simulating the translation table, which in a real execution contains the wire labels for both zero and one for the next CPU step. Since a simulated garbled circuit will only return one wire label, we must set that to be the one corresponding to the bit we want, and the other one must be fabricated. Indeed, the simulator sets this other dummy key to be all zeroes in the simulated execution.

Then in order to prove security, we set up a series of hybrids $\text{Hyb}_j$, where in hybrid $j$, garbled circuits $1, \ldots, j$ are created as in the simulation and garbled circuits $j + 1, \ldots, \ell^{\text{max}}$ are created as in the real distribution. We then have a series of companion hybrids $\text{Hyb}'_j$, where in $\text{Hyb}'_j$ the wire label unused row in the translation table is set to the correct label instead of all zeroes as in the simulation.

The proof then proceeds as follows. $\text{Hyb}_j \equiv \text{Hyb}'_{j+1}$ directly due to the security of garbled circuits. In order to show $\text{Hyb}_j \equiv \text{Hyb}'_j$, we rely on the security of revocable PRFs. Any distinguisher $A$ can be used to construct a pseudorandomness breaker $A'$ for $F$: since $\text{Hyb}_j$ can be generated using only a revoked key, $A'$ invoke the PRF oracle to populate the entries of $\hat{D}$ that have been revoked and plant the PRF challenge in the revoked, unused row of the translation table in CPU step $j$. If the PRF challenge was purely random, the two distributions are identical, and if it were pseudorandom, the two distributions are as in the hybrid, thus $A'$ conveys any advantage $A$ has.

4 GRAM With Reading and Writing From OWF

Notice that our scheme is not only read-only, it is read-once. This is due to the fact that once an element has been read, it gets revoked, so no future time step could ever generate a valid symmetric-key encryption for that entry in memory again. A tantalizing solution to this problem is to use a key mechanism that allows the CPU to encrypt but not decrypt something, i.e. a public-key encryption scheme. The two main issues are that 1) there is a black-box separation between OWFs and basic public-key encryption, and 2) the public key needs to be compact and yet be able to encrypt to many possible secret keys. If we insist on assuming only OWFs, it seems that 1) is difficult to overcome, however there is a beautiful resolution using identity-based encryption for issue 2 in the companion work of Part I [6].

We resolve this issue of reading multiple times via repetition. By maintaining $p$ copies of the database, we can read from any location up to $p$ times. If we restrict ourselves to RAM programs that are read-only and read-$p$, we can now obtain GRAM with the overhead of another $p$. Note that this seems like a strong restriction, but when we include RAM writing, we can relax the restriction to reading at most $p$ times between writes to a location. We can trivially relax this condition by overwriting a location after it has
been read $p$ times. We define the following notion of bounded reads for RAM programs (with read and write).

**Bounded Reads.** We impose another restriction on the number of read instructions that can be performed on a given location before a new value is written to that location. We say that a program has $p$-bounded reads ($p$-bdReads) if no location is read more than $p$ times before it is overwritten. Formally:

**Definition 4.1 ($p$-Bounded Reads ($p$-bdReads)).** A program execution $P(D)(x)$ has $p$-Bounded Reads ($p$-bdReads) if the following holds for each location $i \in [n]$. For any set of $p+1$ CPU instructions $j_1 < \ldots < j_{p+1}$, let the inputs/outputs of the $j$-th CPU step be $C^P_{CPU}(state_{j}, i^{\text{read}}_{j}) = (state_{j+1}, i^{\text{read}}_{j}, i^{\text{write}}_{j})$. If $i^{\text{read}}_{\ell} = i$ for all $\ell = 1, \ldots, p + 1$ then there must exist a CPU instruction $j^*$ (not necessarily equal to any $j_{\ell}$) with $j_1 \leq j^* < j_{p+1}$ such that $i^{\text{write}}_{j^*} = i$. Similarly, we define a $p$-bdReads property for a sequence of program executions $(P_1(x_1), \ldots, P_\ell(x_\ell))^D$ if the above property holds across all the conjoined CPU steps in the sequence.

In order to enable writing, we again rely on the $p$-writes restriction on RAM. Whenever we want to write a bit $b$ to a location $i$ at time $j$, we simply write $F$ evaluated on $(j, i, b)$ under $p$ different keys to $D$. Whenever we want to read from some location $i$, we discover the time $u$ in which it was last written to, and proceed as before.

### 4.1 Overview of Construction

We describe at a high level how to construct a garbled RAM scheme for programs with $p$-writes and $p$-bdReads. We maintain a “key schedule” $K$ that is fed as garbled input to each garbled augmented CPU step, which then processes it and outputs an updated garbled $K'$. This key schedule is an array of $p$ (possibly with different revocation sets) PRF keys. Each entry of the key schedule has a revocation set $X$ associated with it that is known in the clear (we omit writing this $X$ every time for each key, and implicitly allow a key $k_{X}$ to reveal $X$).

Whenever the $j$-th CPU step wants to issue a read from a location $i$ which was last written to in step $u$ and this is the $\alpha$-th read to that location since $u$, then it picks the key $K[\alpha|X \ni (u, i, 0), (u, i, 1)] \notin X$. This is the key by which it will output (among other things) translate, and it updates $K$ to $K'$ by updating $K[\alpha|X]$ to $K[\alpha|X']$ by puncturing $(u, i, 0)$ and $(u, i, 1)$ from the key (so $X' = X \cup \{(u, i, 0), (u, i, 1)\}$). This is where we avoid the circularity: the translation table can only be decoded by this key, but this key will have the critical values punctured, so all future keys in the key schedule still computationally hide the table. Since there are $p$ values in $K$, a location can be read up to $p$ times before it is overwritten.

Whenever the $j$-th CPU step wants to write a bit $b$ to a location $i$, it computes $F_k(j, i, b)$ for every $k = K[1], \ldots, K[p]$. Since $j$ is brand new, these values could not have possibly been revoked.

### 4.2 Detailed Construction

We now give the detailed construction of our GRAM scheme under any one-way function for programs with $p$-writes and $p$-bdReads satisfying UMA (unprotected memory access) security for multi-program execution with persistent memory. Let $F$ be the revocable GGM-PRF described above, which can be based on any one-way function, and let $\mathcal{K}$ be the keyspace for $F$. Let $(\text{GCircuit}, \text{CircEval})$ be a projective circuit garbling scheme with wire labels. The master key will be the fresh key schedule $K[1] \leftarrow \mathcal{K}, \ldots, K[p] \leftarrow \mathcal{K}$. We also keep an array of sets corresponding to revoked values $X[1], \ldots, X[p]$ (implicitly), and we write $K[\ell]_X$
whenever we want to alert the reader that this is not necessarily a fresh PRF key. One minor drawback is that these sets $X$ need to be carried across program boundaries in the case of multiple program executions in order to ensure we do not run into circularity.

**Garbled Data.** The garbled data $\tilde{D}$ is a $n \times p$ matrix of “secret keys”, which are just outputs of $F$ as we will describe. We have a fresh key schedule, $K$, an array of $p$ PRF keys as our master secret.

- $\tilde{D} \leftarrow \text{GData}(D, K)$: For each $i \in [n]$, $j \in [p]$ set $\tilde{D}[i, j] := sk$ where $sk = F_{K[j]}(0, i, D[i])$.

**Garbled Program.** We describe the augmented-CPU-step circuit $C_{\text{CPU}}^P$ for the program $P$ in Figure 2. The garbled RAM program for $P$ will consist of $t$ copies of a garbled augmented-CPU-Step circuit

| Input: $(\text{state}, b_{\text{read}}, k_{\text{input}})$ | Output: $(\text{state}', i_{\text{read}}, i_{\text{write}}, \{sk_{\text{write}, 1}, \ldots, sk_{\text{write}, p}\}, \text{translate}, k_{\text{output}})$ |
| Hard-Coded Parameters: $j, lb_0|_{\text{read}}, lb_1|_{\text{read}}$ |

The circuit $C_{\text{CPU}}^P$ performs the following computation:

- $(\text{state}', i_{\text{read}}, i_{\text{write}}, b_{\text{write}}) := C_{\text{CPU}}^P(\text{state}, b_{\text{read}})$ are the outputs of the basic CPU-step circuit.
- If $i_{\text{write}} = \perp$ then set $sk_{\text{write}, \ell} := \perp$ for $\ell = 1, \ldots, p$. Else set $sk_{\text{write}, \ell} := F_{k_{\text{output}}[\ell]}(j, i_{\text{write}}, b_{\text{write}})$.
- Compute $u := \text{WriteTime}(j, \text{state}, i_{\text{read}})$. Find the entry $\alpha$ of $K$ for which $(u, i_{\text{read}}, 0), (u, i_{\text{read}}, 1) \notin X[\alpha]$ (not revoked), which we will call $k^* = k_{\text{input}}[\alpha] | X$. For $b \in \{0, 1\}$, compute $ct_b := F_{k^*}(u, i_{\text{read}}, b) \oplus lb_b_{\text{read}}$. Set translate $:= (ct_0, ct_1)$. Update $k_{\text{output}} = k_{\text{input}}$ except revoke the values $(u, i_{\text{read}}, 0), (u, i_{\text{read}}, 1)$ from $k_{\text{input}}[\alpha] | X$, resulting in $X[\alpha] := X[\alpha] \cup \{(u, i_{\text{read}}, 0), (u, i_{\text{read}}, 1)\}$.

*Figure 2: The Augmented CPU-Step Circuit*

$\tilde{C}_{\text{CPU}}^P(j)$. As before, the labels for the output wires in each circuit are chosen carefully so that some wire values are revealed in the clear while others remain garbled for the next circuit.

- $(\tilde{P}, k_{\text{input}}) \leftarrow \text{GProg}(P, K, n, t_{\text{init}}, t_{\text{cur}})$: Let $t_{\text{max}} := t_{\text{init}} + t_{\text{cur}}$. We let $j$ count down from $j = t_{\text{max}}$ to $t_{\text{init}} + 1$. For each $j$, we garble $C_{\text{CPU}}^P$ by calling $\tilde{C}_{\text{CPU}}^P(j) \leftarrow \text{GCircuit}(1^n, C_{\text{CPU}}^P, \tilde{b})$ where the output labels $\tilde{b}$ are chosen as follows:
  - The outputs $i_{\text{read}}, i_{\text{write}}, sk_{\text{write}}, \text{translate}$ are given out in the clear. For the last circuit $j = t_{\text{max}}$, we do not provide these outputs, but instead provide the output state' in the clear and this serves as the output of the computation. This completely fixes all of the output-wire labels for that circuit.
  - For $j \neq t_{\text{max}}$, the labels of the output wires corresponding to state' are set to match the labels of the input wires corresponding to state in circuit $\tilde{C}_{\text{CPU}}^P(j + 1)$.
  - For the initial circuit $j = t_{\text{init}} + 1$, we also hard-code the input bit $b_{\text{read}}$ to 0, and $k_{\text{input}} = K$ but with the appropriate “carried over” $X$ values revoked.

For $j \neq t_{\text{max}}$, let $lb_0_{\text{read}, j + 1}$, $lb_1_{\text{read}, j + 1}$ be the labels of input wire for the bit $b_{\text{read}}$ in the $(j + 1)$st garbled circuit. The $j$th garbled circuit contains the hard-coded secret values:

$(j, lb_0_{\text{read}, j + 1}, lb_1_{\text{read}, j + 1})$.

Set $\tilde{P} := \left[ \tilde{C}_{\text{CPU}}^P(t_{\text{init}} + 1), \ldots, \tilde{C}_{\text{CPU}}^P(t_{\text{max}}) \right]$, and set $k_{\text{input}} := \{(i, b, lb_b_{\text{input}, i}) : i \in [v], b \in \{0, 1\}\}$ to consist of all of the input-wire labels for the $v$ input wires corresponding to the input state in the initial circuit $\tilde{C}_{\text{CPU}}^P(t_{\text{init}} + 1)$. 

13
Garbled Input. Finally, the garbled input $\hat{x}$ is created the same way as in garbled circuits. It simply consists of the subset of labels of $k^{input} := \{(i, b, lb_b^{input,i}) : i \in [v], b \in \{0, 1\}\}$ corresponding to the bits of $x$.

- $\hat{x} \leftarrow \text{GInput}(x, K, k^{input})$: Parse $\tilde{\text{input}} := \{(i, b, lb_b^{input,i}) : i \in [v], b \in \{0, 1\}\}$ and output $\hat{x} = (lb_x^{input,1}, \ldots, lb_x^{input,v})$ where $x[i]$ denotes the $i$th bit of $x$ and $v := |x|$.

Evaluation. To run $y = \mathcal{G}\text{Eval}^{\hat{P}}(\tilde{P}, \hat{x})$: parse $\tilde{P} = [\tilde{C}^{P}_{\text{CPU}+,1}, \ldots, \tilde{C}^{P}_{\text{CPU}+}(t)]$ as consisting of $t$ garbled circuits. The evaluator evaluates the circuits one-by-one. Set $\tilde{x}_1 := \hat{x}$. For $j = 1, \ldots, t$:

- When $j = 1$, run $\text{CircEval}(\tilde{C}^{P}_{\text{CPU}+,1}, \tilde{x}_1)$ else run $\text{CircEval}(\tilde{C}^{P}_{\text{CPU}+}(j), (\tilde{x}_j, \tilde{K}_j, \text{lbl}^{\text{read},j}))$ where $\tilde{x}_j$ consists of labels for the garbled state $j$, $\tilde{K}_j$ consists of labels for the garbled $K_j$, and $\text{lbl}^{\text{read},j}$ is a label for the read-bit in the $j$th CPU circuit. This reveals the outputs $j^{\text{read},1}, j^{\text{write},1}, \ldots, j^{\text{write},p}$, translate $j = (ct^{(j)}_0, ct^{(j)}_1)$ in the clear. For $j < t$ it also reveals the garbled output $\hat{x}_{j+1}$ and $\tilde{K}_{j+1}$ corresponding to the labels of state $j$ for circuit $j+1$. For $j = t$ it reveals the output of the computation $y = \text{state}_{t+1}$ in the clear.

- For $\ell = 1, \ldots, p$, look up $\text{sk}^{\text{read}}$, $\ell_j = \tilde{D}^{\text{read},1, \ell}$. WLOG we can efficiently tell when we have a valid wire label, so we try all $2p$ possibilities: there is some unique $\ell^* \in [p]$ and $b^* \in \{0, 1\}$, such that $\text{sk}_{j^{\ell^*}}^{\text{read},1} = (ct_{b^*})$ is a valid label, which we will call $\text{lbl}^{\text{read},j+1}$.

- If $j^{\text{write}} \neq \bot$, update the row $\tilde{D}^{j^{\text{write}}} := \{\text{sk}_{j^{\text{write}},1}, \ldots, \text{sk}_{j^{\text{write},p}}\}$.

4.3 Proof of Security

We now state a technical theorem.

Theorem 4.2. Given any OWF and a secure projective circuit garbling scheme (which can be built from the OWF), the above construction is a UMA (unprotected memory access) secure garbled RAM scheme for all program executions with ptWrites and p-bdReads. Furthermore, it is also secure in the setting of multi-program execution with persistent memory. If $p$ is $O(\text{polylog}(n))$, the garbled memory size is $\tilde{O}(n)$, the garbled program size is $\tilde{O}(t^2)$, and running time of evaluating the garbled program is $\tilde{O}(t^2)$ where $\tilde{O}$ denotes big-O up to poly($\kappa$)polylog($n$) terms.

The full proof appears in Appendix A.

5 Recursive Construction

In the previous section, we showed how to achieve a garbled data size of $\tilde{O}(n)$, a garbled program size of $\tilde{O}(t^2)$ (again observing that there are $t$ CPU steps, and $K$ can grow to size $\tilde{O}(t)$) and online running time of $\tilde{O}(t^2)$ where $\tilde{O}$ denotes big-O up to poly($\kappa$)polylog($n$) terms. In this section, we describe a way to use a recursive construction that gives an garbled program size of size of $\tilde{O}(t \cdot n^c)$ and running time of $\tilde{O}(t \cdot n^c)$ where $c$ is a constant depending on the security parameter as we describe below.

We consider a balancing technique to further lower the overhead: observe that we can re-encode everything under new PRF keys more frequently if we do not want to wait $t$ steps and let the keys $K$ grow to size $t \log n$. Suppose, for example, after $\sqrt{n}$ steps we overwrite the entire contents of memory with a single big “refresh” garbled circuit with fresh PRF keys. This ensures that each CPU step is no larger than $\tilde{O}(\sqrt{n})$, but now we incur an extra $\tilde{O}(n)$-sized circuit every $\sqrt{n}$ steps, a total of $t/\sqrt{n}$ such refreshes across the full execution. Thus, the total size of the garbled program we calculate as follows: there are $t$ CPU steps, each of size at most $\tilde{O}(\sqrt{n})$, and we incur also an additional $\tilde{O}(t/\sqrt{n})$ refresh steps, each of size $\tilde{O}(n)$, giving a total of $\tilde{O}(t\sqrt{n})$. 

14
Pretend we have a CPU of size $f(n)$, then we only need a refresh of size $n$ every $f(n)$ steps, but we also need to simulate this large CPU with a smaller CPU. In order to do so, we must emulate the CPU as a RAM program. We can simply convert the circuit into a RAM program generically, and then apply ORAM to guarantee the properties we need: ptWrites and polylog$(n)$-bdReads, which incurs a poly-log overhead. We can then simulate this CPU using our GRAM construction. This simulation loses a factor of poly$(\kappa)\text{polylog}(n)$ so it needs to be carefully balanced against the CPU size $f$. If $C(n)$ denotes the average running time of emulating an ordinary CPU step with a smaller GRAM of size $f(n)$, we have a recurrence relation

$$C(n) = \frac{n}{f(n)} + \text{poly}(\kappa)\text{polylog}(n) \cdot C(f(n)).$$

The left summand is for the average running time for the refresh steps, and the right summand is average running time of emulating an ordinary CPU step with a smaller GRAM of size $f(n)$. In the case where $\kappa$ is polylog$(n)$ (for exponential size RAM), this can be solved by setting $C(n) = 2\sqrt{2 \log n \log \text{poly}(\kappa)\text{polylog}(n)}$ and $f$ appropriately. Thus the amortized overhead is less than $n^c$ for any $c > 0$. On the other hand, if $\kappa$ is polynomially related to $n$, then suppose poly$(\kappa)\text{polylog}(n)$ is $O(n^d)$, then we can solve the recurrence by setting $f$ to be $2n^a$ and $C(n)$ to be $n^\epsilon$ as follows:

$$\epsilon = \frac{n}{2n^a} + n^d \cdot (2n)^{\epsilon a}$$

which we balance as

$$\frac{n^\epsilon}{2} = \frac{n}{2n^a}$$

and

$$\frac{n^\epsilon}{2} = n^d \cdot (2n)^{\epsilon a}$$

Which is then solved as $a = 1 - \epsilon$ and $d = \epsilon^2 + \epsilon \cdot \frac{\epsilon - 1}{\log n}$. By appropriately choosing the security parameter, $\epsilon$ can be made arbitrarily small. We therefore obtain the following theorem.

**Theorem 5.1.** Given any OWF and a secure projective circuit garbling scheme (which can be built from the OWF), the above construction is a UMA (unprotected memory access) secure garbled RAM scheme for all program executions with ptWrites and $p$-bdReads. Furthermore, it is also secure in the setting of multi-program execution with persistent memory. If $p$ is $O(\text{polylog}(n))$, the garbled data size is $\tilde{O}(n)$, the garbled program size is $\tilde{O}(t \cdot n^\epsilon)$, and running time of evaluating the garbled program is $\tilde{O}(t \cdot n^\epsilon)$ where $\tilde{O}$ denotes big-O up to $\text{poly}(\kappa)\text{polylog}(n)$ terms.

Combining this with the following lemma, we obtain our main theorem.

**Lemma 5.2.** Any garbled RAM scheme $G$ that provides UMA security and supports programs with ptWrites and $p$-bdReads (where $p$ is polylog$(n)$) can be extended to a scheme $G'$ with full security supporting arbitrary programs with at most $\text{poly}(\kappa)\text{polylog}(n)$ overhead.

The proof of this lemma is of the same spirit of the security upgrade found in Part I [6]. For completeness, we include the proof in Appendix B.

We then obtain our main theorem, where we balance the running time with $n^\epsilon$.

**Theorem 5.3 (Main Theorem).** Given any OWF and a secure projective circuit garbling scheme (which can be built from the OWF), there exists a garbled RAM scheme that is fully secure supporting arbitrary programs, and also secure in the setting of multi-program execution with persistent memory. The resulting garbled memory size is $\tilde{O}(n)$, the garbled CPU steps size is $\tilde{O}(t \cdot \min(t, n^\epsilon))$, and running time of evaluating the garbled program is $\tilde{O}(t \cdot \min(t, n^\epsilon))$ where $\tilde{O}$ denotes big-O up to $\text{poly}(\kappa)\text{polylog}(n)$ terms.
6 Conclusions and Open Problems

In this paper, we showed how to construct Garbled RAM programs from one-way functions where, ignoring \(\text{poly}(\kappa)\text{polylog}(n)\) factors, the garbled memory size stays the same, and the garbled CPU size and running time grow by a factor of \(\min(t, n^\epsilon)\). It remains open whether or not \(\text{poly}(\kappa)\text{polylog}(n)\) can be achieved just under one-way functions.

In our construction, we note that although the program code can be compactly garbled, the CPU steps that are needed to run the program grow with the number of steps. The recent work of Goldwasser et al. [10] have shown how to compactly transmit a Turing Machine that can be run on encrypted data, though under stronger assumptions. We leave as an open problem the question of constructing a compact Garbled RAM where the size does not grow in the number of steps executed under just one-way functions.

References


A Proof of Theorem 5.3

Correctness follows from the definition. We calculate the efficiency as follows: $\tilde{D}$ stores $n \cdot p$ “keys” which are PRF outputs. Each CPU step must be large enough to keep the key schedule which has $p$ PRF keys, and since each time a key gets punctured it grows by $\log n$, after $t$ steps the key schedule may take up to $p \cdot t$ keys. Then during evaluation, the running time is as large as the space since the entire circuit is evaluated. If $p$ is $\text{polylog}(n)$, then our claim follows. Now we prove security.

Let $\text{CircSim}$ be the simulator of the circuit garbling scheme. We describe a simulator $\text{Sim}$ for our garbled program:

**Input:** Under UMA-security, $\text{Sim}$ gets the inputs $\{P_a, t_a, y_a\}_{a=1}^{\ell}, D, \text{MemAccess} = \{(i_{j}^{\text{read}}, i_{j}^{\text{write}}, i_{j}^{\text{write}})\}_{j=1}^{t_{\text{max}}}$, where program $P_a$ executes $t_a$ CPU steps and output $y_a$, the initial memory contents are $D \in \{0, 1\}^n$ and $\text{MemAccess}$ describes the entire memory access throughout all $t_{\text{max}} = \sum_{a=1}^{\ell} t_a$ CPU steps executed.

**Output:** The simulator $\text{Sim}$ outputs: $\tilde{D}, \tilde{P}_1, \ldots, \tilde{P}_\ell, \tilde{x}_1, \ldots, \tilde{x}_\ell, \tilde{K}_1, \ldots, \tilde{K}_\ell$.

**Initialization:** The simulator $\text{Sim}$ samples the master key schedule $K[1] \leftarrow \mathcal{K}, \ldots, K[p] \leftarrow \mathcal{K}$.
Garbled Data: It creates $\bar{D}$ using the honest process.

Garbled Programs/Inputs: The simulator Sim processes each program $a \in [\ell]$ separately. Let $j_a^{\text{init}} = \sum_{b=1}^{a-1} t_a + 1$ be the first CPU step executed by the program $P_a$ and let $j_a^{\text{max}} = j_a^{\text{init}} + t_a - 1$ be the last step. The simulator starts from $j = j_a^{\text{max}}$ and counts down to $j = j_a^{\text{init}}$. For each $j$:

- For the last circuit $j = j_a^{\text{max}}$, create $\bar{C}_{\text{CPU}^+}(j)$ by calling CircSim on the circuit $C_{\text{CPU}^+}$ with the output-labels of state$_{j+1}$ set to the value $y_a$ in the clear. This produces some set of input labels for the input state$_j$, they key schedule $K_j$, and the bit $b_j^{\text{read}}$.

- For any other $j \neq j_a^{\text{max}}$, create $\bar{C}_{\text{CPU}^+}(j)$ by calling CircSim on the circuit $C_{\text{CPU}^+}$ where the output-labels of $i_j^{\text{read}}, i_j^{\text{write}}, \{s_j^{\text{write},1}, \ldots, s_j^{\text{write},p}\}$, translate are given “in the clear” and the output-labels of of the updated state$_{j+1}$ are set to match the input labels for state$_{j+1}$ given by the circuit-simulator for the circuit $j + 1$. The actual values $\{s_j^{\text{write},1}, \ldots, s_j^{\text{write},p}\}$, translate are computed via:
  - If $i_j^{\text{write}} = \bot$ then set $s_j^{\text{write}} := \bot$. Otherwise, set $s_j^{\text{write}} = F_{K_j}(j, i_j^{\text{write}}, b_j^{\text{write}})$.
  - Let $u < j$ be the last write-time to location $i_u^{\text{read}}$ and let $b = i_u^{\text{write}}$ be the bit written to the location at time $u$. Let $\alpha$ be the $\alpha$-th time this location has been read since $u$. The values $u, b, \alpha$ can be easily computed given MemAccess. Set $K^* = K[\alpha]$ (which corresponds to the entry in the key schedule from which the CPU will read), and set:
    $$ct_b := F_{K^*}(u, i_u^{\text{read}}, b) \oplus lb^{\text{read},j+1},\quad ct_{1-b} := F_{K^*}(u, i_u^{\text{read}}, 1 - b) \oplus 0^{\text{nil}}$$

where $lb^{\text{read},j+1}$ is the label of the “read-bit” wire given by the circuit-simulator for the circuit $j + 1$ and $\text{nil}$ is the label-length. Set translate := ($ct_0, ct_1$).

Set $\bar{x}$ to be the input labels created by the circuit-simulator for the input state of the initial circuit, and similarly for $\bar{K}$.

Our proof follows the overall structure of [6] where we argue indistinguishability of the real output and the output of Sim by using a series of hybrid distributions $\text{Hyb}_j$ for $j = 1, \ldots, t_{\text{max}}$. In the hybrid $j$, garbled circuits $1, \ldots, j$ are created as in the simulation and garbled circuits $j + 1, \ldots, t_{\text{max}}$ are created as in the real distribution.

We also define a hybrid distribution $\text{Hyb}_j'$ which is like $\text{Hyb}_j$ except for the simulation of the $j$th CPU-step circuit (except for $j = t_{\text{max}}$ for which we set $\text{Hyb}_j' = \text{Hyb}_j$). Instead of choosing translate as in the simulation described above, we choose translate = ($ct_0, ct_1$) to both be encryptions of the correct label of the next circuit:

$$ct_0 := F_{K^*}(u, i_u^{\text{read}}, 0) \oplus lb_0^{\text{read},j+1},\quad ct_1 := F_{K^*}(u, i_u^{\text{read}}, 1) \oplus lb_1^{\text{read},j+1}$$

where $lb_0^{\text{read},j+1}, lb_1^{\text{read},j+1}$ are the labels corresponding to the bits 0,1 for the wire $b^{\text{read}}$ in circuit $j + 1$ which is still created using the real garbling procedure.

Notice that $\text{Hyb}_0$ is equal to the real distribution and $\text{Hyb}_{t_{\text{max}}}$ is equal to the simulated distribution. Therefore, we prove the theorem by showing that for each $j$, we have:

$$\text{Hyb}_j \approx \text{Hyb}_{j+1} \approx \text{Hyb}_{j+1}'$$

We prove this in the following two claims.

**Claim A.1.** For each $j \in \{0, \ldots, t_{\text{max}}\}$ we have $\text{Hyb}_j \approx \text{Hyb}_{j+1}'$.

**Proof.** This follows directly from the security of the circuit-garbling scheme applied only to the garbled augmented-CPU circuit $j + 1$. \qed
Claim A.2. For each \( j \in \{1, \ldots, t_{\text{max}}\} \) we have \( \text{Hyb}_{j}^{\text{comp}} \approx \text{Hyb}_{j} \).

Proof. This follows from the security of our revocable GGM-PRF scheme. The only difference between \( \text{Hyb}_{j}^{\prime} \) and \( \text{Hyb}_{j} \) is the value of \( \text{translate}_{j} = (c_{0}, c_{1}) \) used to simulate the \( j \)th garbled circuit. Let \( b = b_{j+1}^{\text{read}} \) be the value of the read-bit in location \( i_{j}^{\text{read}} \) in the real computation. Then, in \( \text{Hyb}_{j}^{\prime} \) we set

\[
ct_{b} := F_{k^{\star}}(u, i_{j}^{\text{read}}, b) \oplus lb_{b}^{\text{read},j+1}, \quad ct_{1-b} := F_{k^{\star}}(u, i_{j}^{\text{read}}, 1 - b) \oplus lb_{1-b}^{\text{read},j+1}
\]

whereas in \( \text{Hyb}_{j} \) we set

\[
ct_{b} := F_{k^{\star}}(u, i_{j}^{\text{read}}, b) \oplus lb_{b}^{\text{read},j+1}, \quad ct_{1-b} := F_{k^{\star}}(u, i_{j}^{\text{read}}, 1 - b) \oplus 0^{|\text{lbl}|}
\]

where \( u < j \). Observe the following, where we consider the key schedule and how it interacts with \( \tilde{D} \):

- When \( b \) was written to \( \tilde{D} \) in CPU step \( u \), the PRF was evaluated on \( (u, i_{j}^{\text{read}}, b) \) under \( K[1], \ldots, K[p] \), one of which is \( k^{\star} \).
- For the real CPU steps after \( j \), \( (u, i_{j}^{\text{read}}, b) \) has been revoked from \( k^{\star} \).
- If this location is written to again, it will be under some different time \( u' \neq u \), where it can then be further read again.

Suppose on the contrary that some adversary \( \mathcal{A} \) is able to distinguish the two hybrids. Then we devise an adversary \( \mathcal{A}' \) that will break the pseudorandomness of \( F \) as follows. \( \mathcal{A} \) produces an input and memory that it wishes to be challenged upon. Using \text{MemAccess}, \( \mathcal{A}' \) will determine exactly which \( k^{\star} \) will be used and what values have already been revoked from \( K \). It asks for these partially-revoked keys from the PRF challenger, and therefore can construct all subsequent real steps after \( j \). In order to obtain the \( \text{sk} \) output by some earlier CPU step in order to simulate it, \( \mathcal{A}' \) asks the PRF challenger to evaluate \( F \) at those points.

Note by the definition of \( k^{\star} \) it will never be the case that \( \mathcal{A}' \) asks the PRF oracle for \( (u, i_{j}^{\text{read}}, 0) \) or \( (u, i_{j}^{\text{read}}, 1) \). Then in order to generate \( \text{translate} \), \( \mathcal{A}' \) asks the oracle to evaluate \( F_{k^{\star}}(u, i_{j}^{\text{read}}, b) \) on the correct read bit \( b \), and asks to be challenged on \( (u, i_{j}^{\text{read}}, 1 - b) \). \( \mathcal{A}' \) gets from the PRF challenger a value \( R \) which is either real or random, and \( \mathcal{A}' \) completes the translation table by picking a challenge bit \( c \) and if \( c = 0 \) sets the ciphertext to be \( R \oplus lb_{1-b}^{\text{read},j+1} \) and \( R \) otherwise. Clearly, the advantage of \( \mathcal{A} \) is then inherited by \( \mathcal{A}' \) since these two distributions perfectly indistinguishable if \( R \) is random, and identical to the hybrids if \( R \) is real. Since the PRF key \( k^{\star} \) is punctured after step \( j \), indistinguishability follows from the pseudorandomness of \( F \).

\[ \square \]

B Upgrading to Full Security/Functionality

This section is largely a restatement from Part I [6] with the key difference of bdReads.

We now describe a general transformation from any garbled RAM scheme that only provides UMA security and only supports program executions with ptWrites and bdReads into a fully secure garbled RAM scheme for arbitrary programs. This transformation uses oblivious RAM (ORAM) to first compile the original program \( P \) into a new program \( P' \) that stores/accesses its memory using ORAM. This ensures that the memory contents and access pattern of the compiled program do not reveal anything about those of the original program. For simplicity, we assume the ORAM scheme already ensures that the compiled program satisfies the ptWrites and \( p \)-bdReads property. Indeed, many ORAM scheme do satisfy these with \( p = \text{polylog}(n) \), e.g. Ostrovsky’s hierarchical scheme [18]. Now we can simply apply our original UMA-secure garbled RAM scheme for ptWrites on this compiled program to get a fully secure solution.
The Compiler. Let $G = (G_{\text{Data}}, G_{\text{Prog}}, G_{\text{Input}}, G_{\text{Eval}})$ be any garbled RAM scheme that provides UMA security and only supports program executions with ptWrites and $p$-bdReads. Let $O = (O_{\text{Data}}, O_{\text{Prog}})$ be any ORAM that guarantees ptWrites and $p$-bdReads. We define a garbled RAM scheme $G'$ which first applies the ORAM $O$ to the program to make it oblivious and satisfy ptWrites and $p$-bdReads, and then uses $G$ to garble it. In detail, we define $G' = (G_{\text{Data}}', G_{\text{Prog}}', G_{\text{Input}}', G_{\text{Eval}})$ as follows:

- $G_{\text{Data}}'(D, k_1 = (k_1, k_2))$: Call $D^* \leftarrow O_{\text{Data}}(D, k_1), \tilde{D}^* \leftarrow G_{\text{Data}}(D^*, k_2)$. Output $\tilde{D}^*$.

- $G_{\text{Prog}}'(P, k_1 = (k_1, k_2), n, t_{\text{init}}, t_{\text{cur}})$: Call $P^* \leftarrow O_{\text{Prog}}(P)$ and $(\tilde{P}^*, k_2^{\text{input}}) \leftarrow G_{\text{Prog}}(P^*, k_2, n, t_{\text{init}}^*, t_{\text{cur}}^*)$ where $t_{\text{init}}^*, t_{\text{cur}}^*$ are the updated times with the overhead of the ORAM scheme. Output $\tilde{P}^*, k_{\text{input}} = (k_1, k_2^{\text{input}})$.

- $G_{\text{Input}}'(x, k_{\text{input}} = (k_1, k_2^{\text{input}}))$: Output $\tilde{x}^* \leftarrow G_{\text{Input}}((x, k_1), k_2^{\text{input}})$.

**Theorem B.1.** If $G$ is a garbled RAM scheme that provides UMA security and supports programs with ptWrites and $p$-bdReads and $O$ is an ORAM with ptWrites then $G'$ is a garbled RAM with full security and supporting arbitrary programs.

**Proof.** It is clear that the use of ORAM with ptWrites and $p$-bdReads in the above construction ensures that $G$ is only used on program executions that satisfy ptWrites. Therefore, we only need to prove that $G'$ provides security. Let $\text{Sim}_1$ be the ORAM simulator and let $\text{Sim}_2$ be the simulator for $G$. Then we define the simulator $\text{Sim}'$ for $G'$ which first calls $(D^*_\text{sim}, \text{MemAccess}_\text{sim}) \leftarrow \text{Sim}_1(1^\kappa, n, \{t_i\}_{i=1}^\ell)$ to compute the simulate data and access pattern, and then outputs $\text{Sim}_2(1^\kappa, \{P^*_i, t^*_i, y_i\}_{i=1}^\ell, D^*_\text{sim}, \text{MemAccess}_\text{sim})$ where $t^*_i$ are the updated running times after applying ORAM. Security follows from two simple hybrid arguments:

- By the security of $G'$, the real distribution is indistinguishable from

  $\text{Sim}_2(1^\kappa, \{P^*_i, t^*_i, y_i\}_{i=1}^\ell, D^*, \text{MemAccess})$

  where $D^*, \text{MemAccess}$ are the “real” data and access pattern produced by the oblivious RAM scheme.

- By the security of the ORAM scheme $O$, we know that $(D^*, \text{MemAccess})$ is indistinguishable from the simulated $(D^*_\text{sim}, \text{MemAccess}_\text{sim})$.

This completes the proof. \qed