# Structured Encryption and Controlled Disclosure

 Structured Encryption and Controlled Disclosure Authors Melissa Chase Seny Kamara Published 2011 Download original

Abstract. We consider the problem of encrypting structured data (e.g., a web graph or a social network) in such a way that it can be efficiently and privately queried. For this purpose, we introduce the notion of structured encryption which generalizes previous work on symmetric searchable encryption (SSE) to the setting of arbitrarily-structured data.

We present a model for structured encryption, a formal security definition and several efficient constructions.We present schemes for performing queries on two simple types of structured data, specifically lookup queries on matrix-structured data, and search queries on labeled data. We then show how these can be used to construct efficient schemes for encrypting graph data while allowing for efficient neighbor and adjacency queries.

Finally, we consider data that exhibits a more complex structure such as labeled graph data (e.g., web graphs). We show how to encrypt this type of data in order to perform focused subgraph queries, which are used in several web search algorithms. Our construction is based on our labeled data and basic graph encryption schemes and provides insight into how several simpler algorithms can be combined to generate an efficient scheme for more complex queries.

## Introduction

The most common use of encryption is to provide confidentiality by hiding all useful information about the plaintext. Encryption, however, often renders data useless in the sense that one loses the ability to operate on it. In certain settings this is undesirable and one would prefer encryption schemes that allow for some form of computation over encrypted data.

One example is in the context of remote data storage, or so-called “cloud storage”, where a data owner wishes to store structured data (e.g., a collection of web pages) on an untrusted server and only retain a constant amount of information locally. To guarantee confidentiality, the owner could encrypt the data before sending it to the server but this approach is unsatisfactory because the data loses its structure and, in turn, the owner loses the ability to query it efficiently.

To address this problem we introduce the notion of structured encryption. A structured encryption scheme encrypts structured data in such a way that it can be queried through the use of a query-specific token that can only be generated with knowledge of the secret key. In addition, the query process reveals no useful information about either the query or the data. An important consideration in this context is the efficiency of the query operation on the server side. In fact, in the context of cloud storage, where one often works with massive datasets, even linear time operations can be infeasible.

Roughly speaking, we view structured data as a combination of a data structure ${\displaystyle \delta \!}$ and a sequence of data items ${\displaystyle ~m=(m_{1},\ldots ,m_{n})}$ such that ${\displaystyle \delta \!}$ encodes the data’s structure and ${\displaystyle ~m}$ represents the actual data. For example, in the case of graph-structured data such as a social network, ${\displaystyle \delta \!}$ is a graph with ${\displaystyle ~n}$ nodes and the ${\displaystyle ~i}$th element of ${\displaystyle ~m}$ is the data associated with node ${\displaystyle ~i}$. To query the data efficiently, one queries ${\displaystyle \delta \!}$ to recover a set of pointers ${\displaystyle ~I\subseteq [1,n]}$ and then retrieves the items in ${\displaystyle ~m}$ indexed by ${\displaystyle ~I}$.

At a high level, a structured encryption scheme takes as input structured data ${\displaystyle (\delta ,m)~\,\!}$and outputs an encrypted data structure ${\displaystyle \gamma \!}$ and a sequence of ciphertexts ${\displaystyle ~c=(c_{1},\ldots ,c_{n})}$ . Using the private key, a token ${\displaystyle \tau \!}$ can be constructed for any query such that pointers to the encryptions of ${\displaystyle ~(m_{i})_{i\in I}}$ can be recovered from ${\displaystyle \gamma \!}$ and ${\displaystyle \tau \!}$. Furthermore, given the private key, one can decrypt any ciphertext ${\displaystyle ~c_{i}}$.

A certain class of symmetric searchable encryption (SSE) schemes [1] [2] [3] can be viewed as structured encryption schemes for the special purpose of private keyword search over encrypted document collections. Of course, the functionality provided by structured encryption can be achieved using general techniques like oblivious RAMs [4], secure two-party computation [5] and fully-homomorphic encryption (FHE) [6]. In our context, however, we are interested in solutions that are non-interactive and, at worst, linear in the number of data items as opposed to linear in the length of the data. All the schemes described in this work are non-interactive and optimal in that the query time is linear in the number of data items to be returned.

Informally, a basic notion of security for structured encryption guarantees that (1) an encrypted data structure${\displaystyle \gamma \!}$ and a sequence of ciphertexts ${\displaystyle ~c}$ reveal no partial information about the data ${\displaystyle ~m}$; and that (2) given, in addition, a sequence of tokens ${\displaystyle ~(\tau _{1},\ldots ,\tau _{t})}$ for queries ${\displaystyle ~q=(q_{1},\ldots ,q_{t})}$ no information is leaked about either ${\displaystyle ~m}$ or ${\displaystyle ~q}$ beyond what can be inferred from some limited leakage which is a function of ${\displaystyle \delta \!}$, ${\displaystyle ~m}$ and ${\displaystyle ~q}$. A stronger notion, introduced in [3], guarantees that (2) holds even when the queries are generated adaptively.

All known constructions that can be considered efficient structured encryption schemes (i.e., the index-based SSE schemes) reveal some limited information about the data items and queries. In particular, for any query they reveal at least (1) the access pattern, which consists of the pointers I; and (2) the query pattern, which reveals whether two tokens were for the same query.

### Applications of Structured Encryption

Private queries on encrypted data.

The most immediate application of structured encryption is for performing private queries on encrypted data. In this setting, a client encrypts its (structured) data ${\displaystyle (\delta ,m)~\,\!}$ resulting in an encrypted data structure ${\displaystyle \gamma \!}$ and a sequence of ciphertexts ${\displaystyle ~c}$. It then sends ${\displaystyle (\gamma ,c)~\,\!}$ to the server. Whenever the client wishes to query the data, it sends a token ${\displaystyle \tau \!}$ to the server which the latter uses to recover pointers ${\displaystyle ~J}$ to the appropriate ciphertexts. Using a structured encryption scheme in this manner enables the client to store its data remotely while simultaneously guaranteeing confidentiality against the server (in the sense outlined above) and efficient querying and retrieval.While this problem has received considerable attention for the special case of document collections [7] [1] [8] [9] [10] [11] [12] [13], as far as we know, it has never been considered for other kinds of data.

Controlled disclosure for local algorithms.

While the original motivation for structured encryption was to perform private queries on encrypted data (or more precisely, private searches on encrypted data), we introduce here a new application which we refer to as controlled disclosure.

In this setting, the client not only wants to store its data remotely but expects the server (or some third party) to perform some computation over the data. In particular, while the client is willing to reveal the information necessary for the server to perform its task, the client does not want to reveal anything else. Consider, e.g., a client that stores a large-scale social network remotely and that, at some point, needs the server to analyze a small subset of the network. If the social network were encrypted using a classical encryption scheme the client would have to reveal the entire network, leaking extra information to the server. Ideally, what we want in this setting is a mechanism that allows the client to encrypt the data and later disclose the “pieces” of it that are necessary for the server to perform its task.

Another application of controlled disclosure is to the emerging area of (cloudbased) data brokerage services, such as Microsoft’s Dallas [14] and Infochimps [15] .

Here, the cloud provider acts as a broker between a data provider that wishes to sell access to a massive dataset and a data consumer that needs access to the data. The data is stored “in the cloud” and the cloud operator manages the consumer’s access to the provider’s data. Using controlled disclosure, the provider could encrypt its data before storing it in the cloud and release tokens to the consumer as appropriate. Such an approach would have several advantages including (1) enabling the producer to get an accurate measure of the consumer’s use of the data; and (2) ensuring the producer that the consumer can only access the authorized segments of data, even if the consumer and the cloud operator collude.

Clearly, if the algorithm executed by the server (or the data consumer) is “global”, in the sense that it needs to read all the data, then controlled disclosure provides no security. On the other hand, if the algorithm is “local”, in that it only needs to read part of the data, then controlled disclosure preserves the confidentiality of the remaining data. There are numerous algorithms that exhibit this kind of local behavior and they are used extensively in practice to solve a variety of problems. For example, many optimization problems like the traveling salesman problem or vertex cover are handled in practice using local search algorithms (e.g., hill climbing, genetic algorithms or simulated annealing). Several link-analysis algorithms for web search such as Kleinberg’s seminal HITS algorithm [16] (and the related SALSA [17] algorithm) are local. Finally, the recent work of Brautbar and Kearns on “jump and crawl” algorithms [18] motivates and proposes several local algorithms for social network analysis, including for finding vertices with high-degree and high clustering coefficient.

Controlled disclosure can be viewed as a compromise between full security on the one hand and efficiency and functionality on the other. In settings where computation needs to be performed on massive datasets and “fully secure” solutions like multi-party computation [19] [5] [20] and fully-homomorphic encryption [6] are prohibitively expensive, controlled disclosure provides a practical solution without completely compromising security.

### Our Results

Performing private queries on encrypted data is an important goal that is well motivated by the recent trend towards cloud storage. Giving clients the means to encrypt their data without losing the ability to efficiently query and retrieve it provides obvious benefits to the client but also frees the cloud provider from many legal exposures (see [21] [22] [23] for discussion of these issues). It additionally provides a mechanism by which clients from regulated industries can make use of cloud storage (e.g., to store medical records or financial documents) while remaining compliant.

While the recent work on searchable encryption constitutes an important step towards this goal, we note that a noticeable fraction of the data generated today is not text data. Indeed, many large-scale datasets (e.g., image collections, social network data, maps or location information) exhibit a different and sometimes more complex structure that cannot be handled properly using searchable encryption.

1. introduce the notion of structured encryption, which generalizes index-based symmetric searchable encryption [1] [2] [3] to arbitrarily-structured data and propose a novel application of structured encryption (and therefore of SSE) to the problem of controlled disclosure.
2. extend the adaptive security definition of [3] to the setting of structured encryption.
3. give constructions of adaptively-secure structured encryption schemes for a variety of structures and queries including:
1. (lookup queries on matrix-structured data) given a matrix and pair ${\displaystyle (i,j)~\,\!}$, return the value stored at row ${\displaystyle ~i}$ and column ${\displaystyle ~j}$. This captures, e.g., lookup queries on digital images or retrieval of maps.
2. (search queries on labeled data) given a set of labeled items and keyword ${\displaystyle ~w}$, return the items labeled with ${\displaystyle ~w}$. This captures the familiar setting of searchable encryption. We briefly note that our construction exhibits a combination of useful properties that, as far as we know, no revious scheme achieves.
3. (neighbor queries on graph-structured data) given a graph and a node ${\displaystyle ~i}$ , return all the nodes adjacent to ${\displaystyle ~i}$. This captures, e.g., retrieving a user’s “friend list” in a social network.
4. (adjacency queries on graph-structured data) given a graph and two nodes ${\displaystyle ~i}$ and ${\displaystyle ~j}$, return 1, if they are adjacent and return 0 otherwise. This captures, e.g., testing whether two users are “friends” in a social network.
While the previous constructions are useful in their own right, an important goal with respect to structured encryption is to construct schemes that are able to encrypt complex structures and to handle expressive queries that take full advantage of the complexity of the data’s structure. As an example, consider the case of web graphs (i.e., subgraphs of the Web) which are composed of pages with both text and hyperlinks. Encrypting the pages of a web graph using a searchable encryption scheme will only enable keyword search over the encrypted pages. Web graphs, however, exhibit a much richer structure and we typically want to perform more complex queries on them. Towards this goal, our final contribution is to show how to encrypt web graphs and, more generally, what we refer to as labeled graph data. In particular, we:
1. give a structured encryption scheme for labeled graphs that handles focused subgraph queries. Roughly speaking, for a given search keywork, a focused subgraph query on a web graph returns a subgraph that encodes enough information about it to yield a good ranking of the pages for that search. These queries are an essential part of Kleinberg’s seminal HITS algorithm [16] (and its many successors). Our construction uses as building blocks some of the schemes mentioned above. We stress, however, that it is not sufficient to use the schemes “as-is” and we show a novel way of combining structured encryption schemes for simple structures in order to build schemes that handle more complex data and more expressive queries. The approach is general and can be adapted to other complex data types.
2. ## Related Work

We already mentioned work on oblivious RAMs, secure two-party computation and FHE so we restrict the following discussion to searchable and functional encryption.

Searchable encryption. As mentioned above, structured encryption is a generalization of the notion of a secure index first proposed by Goh [1] for the purpose of building symmetric searchable encryption schemes [7] . In [1] Goh gives a formal security definition for secure indexes and a construction based on Bloom filters. This was followed by [2] and [3], the latter of which gave stronger security definitions and more efficient constructions. Our security definitions for structured encryption in section 4 generalize the ones in [3] to arbitrarily-structured data. Searchable encryption has also been considered in the public-key setting [10] [9] [8] [24] [13] [25].

Functional encryption. Functional encryption [26] яis a recent paradigm that generalizes work on a variety of problems including identity-based encryption [27] [28] [27] [28], attribute-based encryption [29] [30] [31], and predicate encryption [32] [33].

Roughly speaking, a structured encryption scheme can be viewed as a functional encryption scheme for which a token can only be used on a single ciphertext. We provide a more detailed comparison between the two approaches in the full version [34] .

## Notation and Preliminaries

Notation. Given a sequence ${\displaystyle ~v}$ of ${\displaystyle ~n}$ elements, we refer to its ${\displaystyle ~i}$th element as ${\displaystyle ~v_{i}}$. If ${\displaystyle ~f}$ is a function with domain ${\displaystyle ~U}$ and ${\displaystyle ~S\subseteq U}$ then ${\displaystyle ~f[S]}$ refers to the image of ${\displaystyle ~S}$ under ${\displaystyle ~f}$. The set of all ${\displaystyle ~\lambda _{1}\times \lambda _{2}}$ matrices over a set ${\displaystyle ~S}$ is denoted ${\displaystyle ~S^{\lambda _{1}\times \lambda _{2}}}$. ${\displaystyle ~G_{m}}$ and ${\displaystyle ~G_{n}}$ are the sets of all undirected and directed graphs of size ${\displaystyle ~m}$ and ${\displaystyle ~n}$, respectively. An undirected graph ${\displaystyle ~G=(V,E)}$ consists of a set of vertices ${\displaystyle ~V}$ and a set of edges ${\displaystyle ~E={(i,j)}}$ where ${\displaystyle ~i,j\in V}$. We denote by ${\displaystyle ~(i)}$ the degree of node ${\displaystyle ~i}$. If ${\displaystyle ~G}$ is directed, then the pairs ${\displaystyle ~(i,j)}$ are ordered and we refer to ${\displaystyle ~i}$ as the tail and to ${\displaystyle ~j}$ as the head of the edge. In addition, we denote i’s in and out degrees by ${\displaystyle ~deg-(i)}$ and ${\displaystyle ~deg+(i)}$ , respectively.

Data types. An abstract data type is a collection of objects together with a set of operations defined on those objects. For simplicity and visual clarity we define data types as having a single operation but this can be extended to model data types with multiple operations in the natural way. Formally, a data type ${\displaystyle ~T}$ is defined by a universe ${\displaystyle ~U=\{Uk\}_{k\in N}}$ and an operation Query : ${\displaystyle ~U\times Q\rightarrow R}$, where ${\displaystyle ~Q=\{Qk\})_{k\in N}}$ is the operation’s query space and ${\displaystyle ~R=\{Rk\}_{k\in N}}$is its response space. The universe, query and response spaces are ensembles of finite sets indexed by the security parameter ${\displaystyle ~k}$. In this work, we assume the universe is a totally ordered set, and that the response space includes a special element ${\displaystyle ~\bot }$ denoting failure.

## Definitions

In this section we formalize structured encryption schemes and present our main security definition. Before doing so, however, we make explicit two properties of structured encryption which we will make use of throughout this work.

Induced permutation. Unlike previous work on searchable encryption we choose to include the data items (i.e., the documents in the case of searchable encryption) and their encryptions in our definitions. We prefer this approach because explicitly capturing each component of the system can bring to light subtle interactions between them. As an example, consider the correlation between the location of the data items in the sequence ${\displaystyle ~m}$ and the locations of their corresponding ciphertexts in ${\displaystyle ~c}$. More precisely, let ${\displaystyle \pi \!}$ be the permutation over ${\displaystyle ~[n]}$ such that for all ${\displaystyle ~i\in [n],m_{i}:=Dec_{k}(c_{\pi (i)})}$ . We refer to ${\displaystyle \pi \!}$ as the permutation induced by ${\displaystyle ~m}$ and ${\displaystyle ~c}$.

The reason most SSE constructions (with the exception of oblvious RAMs) leak the access pattern is because ${\displaystyle \pi \!}$ - is the identity function. This means that in order to (efficiently) retrieve items ${\displaystyle \{m_{i}:i\in I\}}$ the server must know ${\displaystyle ~I}$. Our constructions hide part of the access pattern essentially because they break this correlation by inducing a (pseudo-)random permutation between ${\displaystyle ~m}$ and ${\displaystyle ~c}$.

Associativity. We also make explicit a property possessed by some constructions (e.g., the non-adaptively secure SSE construction of [3] ) that we refer to as associativity. Intuitively, a scheme is associative if one can associate an item ${\displaystyle ~v_{i}}$ with data item ${\displaystyle ~m_{i}}$ in such a way that a query operation returns, in addition to the pointers ${\displaystyle ~J}$, the strings ${\displaystyle ~(v_{i}),i\in I}$. We capture this by re-defining the message space of our encryption algorithms to take, in addition to a data structure ${\displaystyle \delta \!}$ a sequence ${\displaystyle M=((m_{1},v_{1}),\ldots ,(m_{n},v_{n}))}$ of pairs that consist of a private data item ${\displaystyle ~m_{i}}$ and a semi-private item ${\displaystyle ~v_{i}}$. We sometimes refer to the sequences ${\displaystyle ~(m_{1},\ldots ,m_{n})}$ and ${\displaystyle ~(v_{1},\ldots ,v_{n})}$ as ${\displaystyle ~m}$ and ${\displaystyle ~v}$, respectively.

Associativity is useful for several reasons. The most direct application is to provide the client the ability to associate some meta-data with the ciphertexts that may be useful to the server (e.g., file name or size). In situations where the client wishes to grant the server access to the data, the semi-private items could even be decryption keys for the associated ciphertexts. As we will see in Section 6, however, associativity can also be used to “chain” structured encryption schemes together in order to construct complex schemes from for simpler ones.

Definition 1 (Private-key structured encryption).

Let ${\displaystyle ~T}$ be an abstract data type supporting operation Query : ${\displaystyle ~U\times Q\rightarrow R}$, where ${\displaystyle ~R=[n]}$ для ${\displaystyle ~n\in N}$ .An associative private-key structured encryption scheme for ${\displaystyle ~T}$ - is a tuple of five polynomial-time algorithms ${\displaystyle ~\Pi =(Gen,Enc,Token,Querye,Dec)}$ such that:

${\displaystyle ~K\leftarrow Gen(1_{k})}$ : is a probabilistic algorithm that takes as input a security parameter ${\displaystyle ~k}$ and outputs a private key ${\displaystyle ~K}$.

${\displaystyle ~(\gamma ,c)\leftarrow Enc(K,\delta ,M)}$  : is a probabilistic algorithm that takes as input a private key ${\displaystyle ~K}$ a data structure ${\displaystyle \delta \!}$ of type ${\displaystyle ~T}$, and a sequences of private and semi-private data ${\displaystyle ~M}$. It outputs an encrypted data structure ${\displaystyle \gamma \!}$ It outputs an encrypted data structure ${\displaystyle ~c}$. We sometimes write this as ${\displaystyle ~(\gamma ,c)\leftarrow Enc_{k}(\delta ,M)}$.

${\displaystyle ~\tau \leftarrow Token(K,q)}$ : is a (possibly probabilistic) algorithm that takes as input a private key ${\displaystyle ~K}$ and a query ${\displaystyle ~q\in Q}$ and outputs a token ${\displaystyle ~\tau \!}$. We sometimes write this ${\displaystyle ~\tau \leftarrow Token_{k}(q)}$ .

${\displaystyle ~(J,v_{I}):=Query_{e}(\gamma ,\tau )}$ : is a deterministic algorithm that takes as input an encrypted data structure ${\displaystyle \gamma \!}$ and a token ${\displaystyle \tau \!}$. It outputs a set of pointers ${\displaystyle J\subseteq [n]}$ and a sequence of semi-private data ${\displaystyle ~v_{I}=(v_{i})_{i\in I}}$ , where ${\displaystyle ~I=(\pi )-1[J]}$.

${\displaystyle ~m_{j}:=Dec(K,c_{j})}$ : is a deterministic algorithm that takes as input a secret key ${\displaystyle ~K}$ and a ciphertext ${\displaystyle ~c_{j}}$ and outputs a message ${\displaystyle ~m_{j}}$.

We say that ${\displaystyle \Pi \!}$ is correct if for all ${\displaystyle k\in N}$, for all ${\displaystyle ~K}$ output by ${\displaystyle ~Gen(1_{k})}$, for all ${\displaystyle \delta \in U_{k}}$ , for all ${\displaystyle ~M}$, for all ${\displaystyle ~(\gamma ,c)}$ output by ${\displaystyle ~Enc(K,\delta ,M)}$, for all ${\displaystyle ~q\in Q_{k}}$, for all ${\displaystyle \tau \!}$, output by ${\displaystyle Token(K,q)}$, for ${\displaystyle ~(J,v_{I})}$ output by ${\displaystyle ~Query_{e}(\gamma ,\tau )}$, ${\displaystyle ~J=\pi [Query(\delta ,q)]_{D}ec_{K}(c_{j})=m_{j}}$ for all ${\displaystyle ~j\in [n]}$, where ${\displaystyle \pi \!}$ – is the permutation induced by ${\displaystyle ~m}$ and ${\displaystyle ~c}$.

The intuitive security guarantee we seek is that (1) given an encrypted data structure ${\displaystyle \gamma \!}$ and a sequence of ciphertexts ${\displaystyle ~c}$, no adversary can learn any partial information about ${\displaystyle ~m}$; (and that (2) given, in addition, a sequence of tokens ${\displaystyle ~\tau =(\tau _{1},\ldots ,\tau _{t})}$ for an adaptively generated sequence of queries ${\displaystyle ~q=(q_{1},\ldots ,q_{t})}$, no adversary can learn any partial information about either ${\displaystyle ~m}$ or ${\displaystyle ~q}$ beyond what is revealed by the semi-private data ${\displaystyle ~(v_{I}1,\ldots ,v_{I}t)}$.

This exact intuition can be difficult to achieve and in some settings is unnecessarily strong. Consider, e.g., the fact that the number of data items ${\displaystyle ~n}$ is immediately revealed to the adversary since it receives the ciphertexts ${\displaystyle ~c}$. Another example is in the setting of SSE where, as discussed earlier, all known efficient and non-interactive schemes [1] [2] [3] reveal the access and query patterns. We would therefore like to weaken the definition appropriately by allowing some limited information about the messages and the queries to be revealed. On the other hand, it is not clear that such leakage is always necessary in order to achieve efficiency (e.g., the number of data items can be easily hidden by padding) so we prefer not to “hardcode” this leakage in our definition. To formalize this we parameterize the definition with two leakage functions ${\displaystyle ~L_{1}}$ and ${\displaystyle ~L_{2}}$ that capture precisely what is being leaked by the ciphertext and the tokens.

We now present our security definition for adaptive adversaries which is a generalization of the definition of [35]. Intuitively, we require that the view of an adversary (i.e., the encrypted data structure, the sequence of ciphertexts, and the sequence of tokens) generated from any adaptive query strategy be simulatable given the leakage information and the semi-private data.

Definition 2 (CQA2-security).

Let ${\displaystyle ~\Sigma =(Gen,Enc,Token,Querye,Dec)}$ - be an associative private-key structured encryption scheme for data of type ${\displaystyle ~T}$ supporting operation Query : ${\displaystyle ~U\times Q\rightarrow [n]}$, for some ${\displaystyle ~n\in N}$, and consider the following probabilistic experiments where ${\displaystyle ~A}$ is an adversary, ${\displaystyle ~S}$ is a simulator and ${\displaystyle ~L_{1}}$ and ${\displaystyle ~L_{2}}$ are (stateful) leakage algorithms:

${\displaystyle ~Real_{\Sigma ,A(k)}}$: the challenger begins by running ${\displaystyle ~Gen(1_{k})}$ to generate a key ${\displaystyle ~K}$. ${\displaystyle ~A}$ outputs a pair ${\displaystyle ~(\delta ,M)}$ and receives ${\displaystyle ~(\gamma ,c)\leftarrow Enc_{K}(\delta ,M)}$ from the challenger. The adversary makes a polynomial number of adaptive queries and, for each query ${\displaystyle ~q}$ receives a token ${\displaystyle \tau \leftarrow Token_{K}(q)}$ from the challenger. Finally, ${\displaystyle ~A}$ returns a bit ${\displaystyle ~b}$ that is output by the experiment.

${\displaystyle ~Ideal_{\Sigma ,A,S(k)}}$  : ${\displaystyle ~A}$ outputs a tuple ${\displaystyle ~(\delta ,M)}$. Given ${\displaystyle ~L_{1}(\delta ,M)}$, ${\displaystyle ~S}$ generates and sends a pair ${\displaystyle ~(\gamma ,c)}$ to ${\displaystyle ~A}$. The adversary makes a polynomial number of adaptive queries and for each query ${\displaystyle ~q}$ the simulator is given ${\displaystyle ~(L_{2}(\delta ,q),v_{I})}$ , where ${\displaystyle ~I:=Query(\delta ,q)}$ . The simulator returns a token ${\displaystyle \tau \!}$. Finally, ${\displaystyle A}$ returns a bit ${\displaystyle b}$, that is output by the experiment.

We say that ${\displaystyle \Sigma (L_{1},L_{2})}$ - secure against adaptive chosen-query attacks if for all ppt adversaries ${\displaystyle ~A}$ there exists a ${\displaystyle ~ppt}$ simulator ${\displaystyle ~S}$ such that

${\displaystyle ~|Pr[Real_{\Sigma ,A(k)}=1]-Pr[Ideal_{\Sigma ,A,S(k)}=1]|\leq negl(k)}$

As previously discussed, the ${\displaystyle ~L_{2}}$ leakage of our constructions mainly consists of the query and intersection patterns. Intuitively, the query pattern reveals when a query is repeated while the intersection pattern reveals when the same items are accessed. The intersection pattern reveals when the same items are accessed but not which items are accessed (i.e., their locations in ${\displaystyle ~m}$). The latter is hidden in our definition below by applying a random permutation to the item’s locations in ${\displaystyle ~m}$.

Definition 3 (Query and intersection patterns).

Let ${\displaystyle ~q}$ непустая последовательность запросов. be a non-empty sequence of queries. For any ${\displaystyle ~p_{t}\in q}$, the query pattern ${\displaystyle ~qp(q_{t})}$ is a binary vector of length ${\displaystyle ~t}$ with a 1 at location ${\displaystyle ~i}$, if ${\displaystyle ~q_{t}=q_{i}}$ and a ${\displaystyle ~0}$ otherwise. The intersection pattern ${\displaystyle ~ip(q_{t})}$ is a sequence of length ${\displaystyle ~t}$ with ${\displaystyle ~f[I]}$ at location ${\displaystyle ~t}$, where ${\displaystyle ~f}$ is a fixed random permutation over ${\displaystyle ~[n]}$ and ${\displaystyle ~I:=Query(\delta ,q_{t})}$.

## Structured Encryption for Basic Structures

In this Section we present constructions of structured encryption schemes for data with simple structures. In Section 6 we will use some of these as building blocks to design schemes for data that exhibits a more complex structure. We stress, however, that the constructions presented here are of independent interest.

### Lookup Queries on Matrices

We describe a structured encryption scheme for matrix-structured data which consists of an ${\displaystyle ~\lambda _{1}\times \lambda _{2}}$ matrix ${\displaystyle ~M}$ of pointers into a sequence of ${\displaystyle ~n}$ data items ${\displaystyle ~m}$. Here, the matrix type has universe ${\displaystyle ~U=[n]^{\lambda _{1}\times \lambda _{2}}}$ and supports the lookup operation ${\displaystyle ~Lkp:{{[n]^{\lambda _{1}\times \lambda _{2}}\times [\lambda _{1}]\times [\lambda _{2}]}\rightarrow [n]}}$, that takes as input a matrix ${\displaystyle ~M}$ and a pair ${\displaystyle ~(\alpha ,\beta )}$ and returns ${\displaystyle ~M[\alpha ,\beta ]}$.

Matrix-structured data is ubiquitous and includes any kind of two-dimensional data. Consider, e.g., the case of digital images which can be viewed as a pair ${\displaystyle ~(M,m)}$ where ${\displaystyle ~M}$ is a matrix such that the cell at location ${\displaystyle ~(\alpha ,\beta )}$ points to some ${\displaystyle ~m_{i}}$ that encodes the color of the pixel at location ${\displaystyle ~(\alpha ,\beta )}$ in the image.

Our construction, described in Figure 1 below, is associative. At a high level, encryption is done by (1) padding the data items to be of the same length; (2) randomly permuting the location of the data items, (3) randomly permuting the location of the matrix cells using a PRP; and (4) encrypting the contents of the cells (and the semi-private data) using the output of a PRF. The purpose of the last two steps are immediate. Steps (1) and (2) are what allow us hide part of the access pattern by inducing a pseudo-randompermutation between ${\displaystyle ~m}$ and ${\displaystyle ~c}$.

Lookup queries are handled by sending the permuted location of a cell (which can be recovered by the client since it stores the key to the PRP) and the output PRF of the PRF used to encrypt the contents (which can also be recovered since the client stores the key to the PRF).

In Theorem 1 below we show that the construction above is secure against adaptive chosen-query attacks.

Theorem 1

If ${\displaystyle ~F}$, ${\displaystyle ~P}$ and ${\displaystyle ~G}$ are pseudo-random and if ${\displaystyle ~\Pi \!}$ is CPA-secure then

Matrix is ${\displaystyle ~(L_{1},L_{2})}$ - secure against adaptive chosen-query attacks, where ${\displaystyle ~L_{1}(M,M)=(\lambda _{1},\lambda _{1},n,\_)}$ and ${\displaystyle ~L_{2}(M,\alpha ,\beta )=(qp(\alpha ,\beta ),ip(\alpha ,\beta ))}$.
Proof

The proof is omitted due to lack of space but appears in [12].

Let ${\displaystyle ~F:{{\{0,1\}_{k}}\times \{0,1\}_{*}\rightarrow \{0,1\}_{*}}}$ be a pseudo-random function, ${\displaystyle ~P:{{\{0,1\}_{k}}\times [\lambda _{1}]\times [\lambda _{2}]\rightarrow [\lambda _{1}]\times [\lambda _{2}]}}$ be pseudo-random permutation and ${\displaystyle ~\Pi =(Gen,Enc,Dec)}$ be a private-key encryption scheme. Our encryption scheme ${\displaystyle ~=(Gen,Enc,Token,Lkpe,Dec)}$ is defined as follows:

• ${\displaystyle ~Gen(1_{k})}$ : generate two random ${\displaystyle ~k}$ -bit strings ${\displaystyle ~K_{1}}$, ${\displaystyle ~K_{2}}$ and a key ${\displaystyle ~K_{3}\leftarrow \Pi .Gen(1_{k})}$ . Set ${\displaystyle ~K:=(K_{1},K_{2},K_{3})}$.
• ${\displaystyle ~Enc(K,M,M)}$ : construct a ${\displaystyle ~\lambda _{1}\times \lambda _{2}}$ mutrix ${\displaystyle ~C}$ размером as follows:
1. parse ${\displaystyle ~M}$ as ${\displaystyle ~m}$ and ${\displaystyle ~v}$
2. choose a pseudo-random permutation ${\displaystyle ~G:{{\{0,1\}_{k}}\times [n]\rightarrow [n]}}$
3. sample a ${\displaystyle ~k}$-bit string ${\displaystyle ~K_{4}}$ uniformly at random
4. for all ${\displaystyle ~(\alpha ,\beta )\in [\lambda _{1}]\times [\lambda _{2}]}$ ,

store ${\displaystyle ~\oplus F_{K_{1}}(\alpha ,\beta )}$ where ${\displaystyle ~i:=M[\alpha ,\beta ]}$, at location ${\displaystyle ~(\alpha ',\beta '):=P_{K_{2}}(\alpha ,\beta )inC}$.

If ${\displaystyle ~M[\alpha ,\beta ]=\bot }$ , то ${\displaystyle ~}$ above is replaced with a random string of appropriate length. Let ${\displaystyle ~m}$ be the sequence that results from padding the elements of ${\displaystyle ~m}$ so that they are of the same length and permuting them according to ${\displaystyle ~G_{K_{4}}}$. For ${\displaystyle ~1\leq j\leq n}$ , let ${\displaystyle ~c_{j}\leftarrow \Pi .Enc_{K_{3}}(m_{j})}$. Output ${\displaystyle ~\gamma :=C}$ and ${\displaystyle ~c=(c_{1},\ldots ,c_{n})}$.

• ${\displaystyle ~Token(K,\alpha ,\beta )}$ : output ${\displaystyle ~\tau :=(s,\alpha \_,\beta \_)}$ , where ${\displaystyle ~s:=F_{K_{1}}(\alpha ,\beta )}$ and ${\displaystyle ~(\alpha \_,\beta \_):=P_{K_{2}}(\alpha ,\beta )}$.
• ${\displaystyle ~Lkp_{e}(\gamma ,t)}$: parse ${\displaystyle ~\tau \!}$ as ${\displaystyle ~(s,\alpha \_,\beta \_)}$ ; compute and output ${\displaystyle ~(j,v):=s\oplus C[\alpha \_,\beta \_]}$ .
• ${\displaystyle ~Dec(K,c_{j})}$ : return ${\displaystyle ~m_{j}:=\Pi .Dec_{K_{3}}(c_{j})}$.

Fig. 1. An associative structured encryption scheme for matrices

### Search Queries on Labeled Data

We now present a structured encryption scheme for labeled data which consists of a “labeling” ${\displaystyle ~L}$ and a sequence of data items ${\displaystyle ~m}$. Informally, a labeling just associates a set of keywords to each data item. More formally, the labeling data type has as universe ${\displaystyle ~U}$ the set of all binary relations between ${\displaystyle ~[n]}$ and ${\displaystyle ~W}$, where ${\displaystyle ~W}$ is a set of keywords. In addition, it supports the operation ${\displaystyle ~Search:U\times W\rightarrow 2_{[n]}}$, that takes as input a labeling and a keyword ${\displaystyle ~w}$ and returns the set ${\displaystyle ~L(w)=\{i\in [n]:(i,w)\in L\}}$.

Our construction, described in Figure 2, is efficient, associative and adaptively secure and, as far as we know, is the first scheme to achieve all three properties. It is based on the first scheme of [3] (SSE 1), which is efficient and associative but not adaptively secure3. The second scheme of [3] , on the other hand, is adaptively secure but is inefficient and not associative.

Our construction makes use of a dictionary which is a data structure that stores pairs ${\displaystyle ~(a,b)}$ such that given ${\displaystyle ~a}$, the corresponding value ${\displaystyle ~b}$ can be recovered efficiently. We refer to${\displaystyle ~a}$ as the “search key” and to ${\displaystyle ~b}$ as the value. Dictionaries can be implemented in a variety of ways, including using search trees or hash tables. Intuitively, encryption proceeds as follows in our scheme. As in our previous construction, we pad and permute the data items with a PRP ${\displaystyle ~G}$. For each keyword ${\displaystyle ~w}$ an array is constructed where each cell stores (1) a pointer ${\displaystyle ~j}$ from the set ${\displaystyle ~L*(w)=G_{K}[L(w)]}$ and (2) the corresponding semi-private item ${\displaystyle ~v_{i}}$. The array is then padded up to a standard length, and encrypted using the output of a PRF and is stored in a dictionary using as search key the output of another PRF on the keyword. Search queries are handled by sending the search key (which can be recovered by the client using the key to the second PRF) and the output of the PRF used to encrypt the array (which can be recovered using the key to the first PRF). The efficiency of our search operation depends on how the underlying dictionary is implemented but in this context any solution based on hash tables is appropriate and will give search time that is ${\displaystyle ~O(|I|)}$, which is optimal.

Let ${\displaystyle ~F:{{\{0,1\}_{k}}\times W\rightarrow \{0,1\}_{*}}}$ and ${\displaystyle ~P:{{\{0,1\}_{k}}\times W\rightarrow \{0,1\}_{k}}}$ be pseudorandom functions and ${\displaystyle ~\Pi =(Gen,Enc,Dec)}$ be a private-key encryption scheme. Our scheme ${\displaystyle ~Label=(Gen,Enc,Token,Search_{e},Dec)}$ is defined as follows:

• ${\displaystyle ~Gen(1_{k})}$: sample two random ${\displaystyle ~k}$ - bit keys ${\displaystyle ~K_{1}}$,${\displaystyle ~K_{2}}$, and generate a key ${\displaystyle ~K_{3}\leftarrow \Pi .Gen(1_{k})}$ . Set ${\displaystyle ~K:=(K_{1},K_{2},K_{3})}$.
• ${\displaystyle ~Enc(K,L,M)}$ : construct a dictionary ${\displaystyle ~T}$ as follows:
1. parse ${\displaystyle ~M}$ as ${\displaystyle ~m}$ and ${\displaystyle ~v}$.
2. choose a pseudo-random permutation ${\displaystyle ~G:\{0,1\}_{k}\times [n]\rightarrow [n]}$
3. sample a ${\displaystyle ~k}$ - bit string ${\displaystyle ~K_{4}}$ uniformly at random
4. for each ${\displaystyle ~w\in W}$ such that ${\displaystyle ~L(w)\neq \varnothing }$, ${\displaystyle ~let_{w}:=P_{K_{2}}(w)}$ and

store ${\displaystyle ~<(G_{K_{4}}(i),v_{i})_{i\in L(w)}>\oplus F_{K_{1}}(w)}$ in ${\displaystyle ~T}$ with search key ${\displaystyle ~K_{w}}$. Use padding to ensure that the strings ${\displaystyle ~<(G_{K_{4}}(i),vi)_{i\in L(w)}>}$ are all of the same length. Let ${\displaystyle ~m}$ be the sequence that results from padding the elements of ${\displaystyle ~m}$ so that they are of the same length and permuting them according to ${\displaystyle ~G_{K_{4}}}$. For ${\displaystyle ~1\leq j\leq n}$, let ${\displaystyle ~c_{j}\leftarrow \Pi .Enc_{K_{3}}(m_{*j})}$. Output ${\displaystyle ~\gamma :=T}$ and ${\displaystyle ~c=(c_{1},\ldots ,c_{n})}$ .

• ${\displaystyle ~Token(K,w)}$ : output ${\displaystyle ~\tau :=(F_{K_{1}}(w),P_{K_{2}}(w))}$.
• ${\displaystyle ~Search_{e}(\gamma ,\tau )}$ : parse ${\displaystyle ~\tau \!}$ as ${\displaystyle ~(\alpha ,\beta )}$ and compute ${\displaystyle ~s:=T(\beta )\oplus \alpha }$, where ${\displaystyle ~T(\beta )}$ refers to the value stored in ${\displaystyle ~T}$ with search key ${\displaystyle ~\beta \!}$ . If ${\displaystyle ~\beta \!}$ is not in ${\displaystyle ~T}$ then output ${\displaystyle ~J=\varnothing }$ and ${\displaystyle ~v_{I}=\bot }$. Otherwise parse ${\displaystyle ~s}$ as ${\displaystyle ~<(j_{1},v_{i_{1}}),\ldots ,(j_{t},v_{i_{t}})>}$ and output ${\displaystyle ~J=(j_{1},\ldots ,j_{t})}$ и ${\displaystyle ~v_{I}=(v_{i_{1}},\ldots ,v_{i_{t}})}$.
• ${\displaystyle ~Dec(K,c_{j})}$ : output ${\displaystyle ~m_{j}:=\Pi .Dec_{K_{3}}(c_{j})}$ .

Fig. 2. An associative structured encryption scheme for labeled data

Theorem 2

If ${\displaystyle ~F}$, ${\displaystyle ~P}$ and ${\displaystyle ~G}$ are pseudo-random and if ${\displaystyle ~\Pi \!}$ is CPA-secure then ${\displaystyle ~Label(L_{1},L_{2})}$ - secure against adaptive chosen-query attacks, where ${\displaystyle ~L_{1}(L,M)=(|W|,n,l)}$ and ${\displaystyle ~L_{2}(L,w)=(|I|,qp(w),ip(w))}$.

Proof
The proof is omitted due to lack of space but appears in [12].

### Neighbor Queries on Graphs

We now consider encryption of graph-structured data and, in particular, of graphs that support neighbor queries. Formally, the graph type we consider has universe ${\displaystyle ~U=G_{n}}$ and supports the neighbor operation ${\displaystyle ~Neigh:{G_{n}\times [n]\rightarrow 2_{[n]}}}$, that takes as input an undirected graph ${\displaystyle ~G}$ with ${\displaystyle ~n}$ nodes and a node ${\displaystyle ~i}$ and returns the nodes adjacent to ${\displaystyle ~i}$.

Our approach here is to encode the graph as a labeling and to apply a structured encryption scheme for labeled data (such as the one described in the previous Section). Given some graph-structured data ${\displaystyle ~(G,m)}$, where ${\displaystyle ~G=(V,E)}$, we construct the labeled data ${\displaystyle ~(L,m)}$ such that ${\displaystyle ~L}$ assigns to each data item ${\displaystyle ~m_{i}}$ a label set corresponding to the set of nodes adjacent to the ${\displaystyle ~i}$th node. Neighbor queries are handled by sending a token for “keyword” ${\displaystyle ~i\in V}$ , which allows the server to recover pointers to all the data items associated with ${\displaystyle ~i}$ by the labeling. Our construction is described in detail in Figure 3 below.

Let ${\displaystyle ~Label=(Gen,Enc,Token,Search_{e},Dec)}$ - be an associative structured encryption scheme for labeled data. Our scheme ${\displaystyle ~Graph=(Gen,Enc,Token,Neigh_{e},Dec)}$ is defined as follows:

• ${\displaystyle ~Gen(1_{k})}$: generate and output ${\displaystyle ~K\leftarrow Label.Gen(1_{k})}$ .
• ${\displaystyle ~Enc(K,G,M)}$ : parse ${\displaystyle ~M}$ as ${\displaystyle ~m}$ and ${\displaystyle ~v}$ and construct a labeling ${\displaystyle ~L}$, that associates to each ${\displaystyle ~m_{i}}$ the set ${\displaystyle ~\{j\in [n]:(i,j)\in E\}}$, where ${\displaystyle ~E}$

is the set of edges in ${\displaystyle ~G}$. Output ${\displaystyle ~(\gamma ,c)\leftarrow Label.EncK(L,M)}$.

• ${\displaystyle ~Token(K,i)}$ : compute and output ${\displaystyle ~\tau \leftarrow Label.Token_{K}(i)}$ .
• ${\displaystyle ~Neigh_{e}(\gamma ,\tau )}$ : output ${\displaystyle ~J:=Label.Search(\gamma ,\tau )}$ .
• ${\displaystyle ~Dec(K,c_{j})}$ : output ${\displaystyle ~m_{j}:=Label.Dec_{K}(c_{j})}$.

Fig. 3. A structured encryption scheme for graphs supporting neighbor queries

Theorem 3

If ${\displaystyle ~Label}$ is ${\displaystyle (L_{1},L_{2})}$ -secure against adaptive chosen-query attacks, then ${\displaystyle ~Graph}$ is ${\displaystyle [itex](L_{1},L_{2})}$ - secure against adaptive chosen-query attacks as well.

Proof

The theorem follows by construction. Note that if Label is instantiated with the scheme from Section 5.2, then ${\displaystyle ~L_{1}}$ leaks the size of the graph, the number of data items and the length of the largest data item while ${\displaystyle ~L_{2}}$ leaks the degree of the node and the query and intersection patterns.

We now discuss a slight variation of this construction to handle incoming and outgoing neighbor queries on directed graphs. This will be useful as a building block for the construction we describe in Section 6. An incoming neighbor query is: given a node ${\displaystyle ~i}$ return all the nodes that point to it; and an outgoing neighbor query is: given a node ${\displaystyle ~i}$ return all the nodes that it points to. We stress that the changes we describe do not affect security in any way.

Consider the scheme ${\displaystyle ~Graph_{+}=(Gen,Enc,Token,Neighe,Dec)}$ defined exactly as ${\displaystyle ~(Graph)}$ except that the ${\displaystyle ~Enc}$ algorithm constructs the labeling in the following manner: instead of associating a data item ${\displaystyle ~m_{i}}$ to the set of nodes adjacent to node ${\displaystyle ~i}$ , associate ${\displaystyle ~m_{i}}$ to the nodes that are pointed to by node ${\displaystyle ~i}$. Similarly, a scheme ${\displaystyle ~Graph}$ can be constructed by associating to data item ${\displaystyle ~m_{i}}$ the set of nodes that point to node ${\displaystyle ~i}$ .

In this Section we give a simple scheme to encrypt graphs supporting adjacency queries based on any matrix encryption scheme. The approach is straightforward and, at a high level, consists of encrypting the graph’s adjacency matrix. Given data ${\displaystyle ~(G,m)}$ , where ${\displaystyle ~G=(V,E)}$ is a directed graph of size ${\displaystyle ~n}$ and each data item ${\displaystyle ~m_{i}}$ is assigned to some edge in ${\displaystyle ~E}$, encryption proceeds as follows. We create a matrix ${\displaystyle ~M.}$ , that holds at location ${\displaystyle ~(\alpha ,\beta )}$ a pointer to the data item associated with edge ${\displaystyle ~(\alpha ,\beta )\in V}$ (or ${\displaystyle ~\bot }$ when there is no such edge). We then use the matrix encryption scheme on ${\displaystyle ~(M.,m)}$. Our construction is described in detail in Figure 4.

Let ${\displaystyle ~Matrix=(Gen,Enc,Token,Lkp_{e},Dec)}$ be an associative encryption scheme for matrix-structured data. Our scheme ${\displaystyle ~Graph=(Gen,Enc,Token,Adj_{e},Dec)}$ is defined as follows:

• ${\displaystyle ~Gen(1_{k})}$ : generate and output ${\displaystyle ~K\leftarrow Matrix.Gen(1_{k})}$ .
• ${\displaystyle ~Enc(K,G,M)}$ : construct a matrix ${\displaystyle ~M}$ as follows:

if ${\displaystyle ~(\alpha ,\beta )\in V}$ , then ${\displaystyle ~M[\alpha ,\beta ]}$ stores a pointer to the item assigned to edge ${\displaystyle ~(\alpha ,\beta )}$ ; if ${\displaystyle ~(\alpha ,\beta )\_\in V}$ , then ${\displaystyle ~M[\alpha ,\beta ]=\bot }$ . Output ${\displaystyle ~(\gamma ,c)\leftarrow Matrix.Enc_{K}(M,M)}$.

• ${\displaystyle ~Token(K,i,j)}$ : compute and output ${\displaystyle ~\tau \leftarrow Matrix.Token_{K}(i,j)}$.
• ${\displaystyle ~Adj_{e}(\gamma ,\tau )}$ : output ${\displaystyle ~J:=Matrix.Lkpe(\gamma ,\tau )}$.
• ${\displaystyle ~Dec(K,c_{j})}$ : output ${\displaystyle ~m_{j}:=Matrix.Dec_{K}(c_{j})}$.

Fig. 4. A structured encryption scheme for graphs supporting adjacency queries

Theorem 4

If ${\displaystyle ~Matrix}$ is ${\displaystyle (L_{1},L_{2}))}$ -secure against adaptive chosen-query attacks, then so is ${\displaystyle ~Graph}$.

Proof

Again, the theorem follows by construction. If ${\displaystyle ~Matrix}$ is instantiated with the construction from Section 5.1, then ${\displaystyle ~L_{1}}$ leaks the size of the graph, the number

of edges the number of data items and the length of the largest data item. ${\displaystyle ~L_{2}}$ leaks the query and intersection patterns.

## Structured Encryption for Labeled Graphs

In this Section we describe an adaptively secure structured encryption scheme for data that is both labeled and associated with a graph-structure. As an example, consider a web graph where each page is labeled with a set of keywords (which could be the set of all the words in the page) and points to a set of other pages. Another example is social network data which consists of user profiles (with some associated meta-data) that link to other users.

While the constructions from the previous Section can be used to encrypt this type of data, the queries they support (i.e., keyword search, adjacency, and neighbor queries) are limited in this setting since they are only relevant to part of the data’s structure. Indeed, if we were to encrypt a web graph using a scheme for labeled data, then we could only perform keyword search. Similarly, if we were to use a graph encryption scheme that supports only neighbor queries then we could only retrieve pages that are linked from a particular page. But web graphs, and labeled graph data in general, exhibit a much richer structure and ideally we would like to design schemes that support more complex queries that take advantage of this structure.

Focused subgraph queries. One example of complex queries on web graphs are focused subgraph queries. These queries are an essential part of a certain class of search engine algorithms which includes Kleinberg’s seminal HITS algorithm [16] and the SALSA algorithm [17]. At a high level, they work as follows. Given a keyword ${\displaystyle ~w}$ , a keyword search is performed over the web pages. This results in a subset of pages called the root graph. A focused subgraph is then constructed by adding all the pages that either link to pages in the root graph or are linked from pages in the root graph. An iterative algorithm is then applied to the focused subgraph which returns, for each page, a score that quantifies its relevance with respect to keyword ${\displaystyle ~w}$ . The key property of these “link-analysis” algorithms (and the reason for their success) is that they take advantage not only of the information provided by the keywords associated with the pages, but also of the implicit information embedded in the graph structure (i.e., the links) of the web graph.

Our approach.

At a high level, our approach is to decompose the complex structure into simpler structures (e.g., in the case of a web graph into its graph and its labeling) and then use different structured encryption schemes to handle each “sub-structure”. We note, however, that the sub-structures cannot be handled in isolation. In particular, for this approach to work the individual schemes have to be combined in a particular way. This is where we make essential use of ssociativity, which will allow us to “chain” the schemes together in order to obtain the functionality we want (this technique will be illustrated in our discussion below).

Our construction.

We now illustrate our second approach for the case of web graphs but note that our construction applies to any labeled graph data. A detailed description of our construction is given in Figure 5. We note that it is not associative. A web graph will be viewed as a tuple ${\displaystyle ~(G,L,m)}$ , which consists of a directed graph ${\displaystyle ~G\in G_{n}}$ of size ${\displaystyle ~n}$ , a labeling ${\displaystyle ~L}$ over a keyword space ${\displaystyle ~W}$ and text pages ${\displaystyle ~m}$. The graph ${\displaystyle ~G}$ encodes the link structure of the web graph and the labeling assigns keywords to each page. The focused subgraph operation ${\displaystyle ~Subgraph:G_{n}\times W\rightarrow G\leq n}$ takes as input a directed graph ${\displaystyle ~G}$ of size ${\displaystyle ~n}$ and a keyword ${\displaystyle ~w}$ and returns the subgraph ${\displaystyle ~G(w)}$ , that consists of (1) the nodes ${\displaystyle ~i}$ in ${\displaystyle ~L(w)}$ ; (2) any node that links to the nodes in ${\displaystyle ~L(w)}$ ; and (3) any node that is linked from the nodes in ${\displaystyle ~L(w)}$ .

Our construction makes use of three structured encryption schemes: ${\displaystyle ~Label}$, that supports search over labeled data, ${\displaystyle ~Graph_{-}}$ , that supports incoming neighbor queries over graph-structured data, and ${\displaystyle ~Graph_{+}}$ , that supports outgoing neighbor queries over graph-structured data. We stress that ${\displaystyle ~Label}$ must be associative. Given a web graph ${\displaystyle ~(G,L,m)}$, we encrypt ${\displaystyle ~(G,m)}$, using both ${\displaystyle ~Graph_{+}}$ and ${\displaystyle ~Graph_{-}}$, resulting in ciphertexts ${\displaystyle ~c+}$ and ${\displaystyle ~c-}$. Now, for each node ${\displaystyle ~i}$ in ${\displaystyle ~G}$ , we generate a pair of tokens ${\displaystyle ~(\tau _{i+},\tau _{i-})}$. We then use ${\displaystyle ~Label}$, to encrypt ${\displaystyle ~(L,m)}$, using the token pairs ${\displaystyle ~(\tau _{i+},\tau _{i-})}$ as semi-private data (recall that ${\displaystyle ~Label}$ is associative). We then output the encryption ${\displaystyle ~c}$ of ${\displaystyle ~(L,m)}$ .

A focused subgraph query on keyword ${\displaystyle ~w}$ is handled as follows. A token ${\displaystyle ~\tau _{l}\leftarrow Label.Token_{K}(w)}$ is generated and sent to the server.When used with the ciphertext ${\displaystyle ~c}$, this token will reveal to the server (1) pointers to all the (encrypted) web pages labeled with keyword ${\displaystyle ~w}$; and (2) for each of these encrypted pages ${\displaystyle ~c_{j}}$, the semi-private information which consists of tokens ${\displaystyle ~(\tau _{j+},\tau _{j-})}$ . For each encrypted page, the server can then use the token pairs with ciphertexts ${\displaystyle ~c_{j+})}$ and ${\displaystyle ~c_{j-})}$ , to recover pointers to any incoming and outgoing neighbors of page ${\displaystyle ~c_{j}}$ .

Theorem 5

If ${\displaystyle ~Label}$ , ${\displaystyle ~Graph_{+}}$ and ${\displaystyle ~Graph_{-}}$ are respectively (stateless) ${\displaystyle ~(L_{l1},L_{l2})}$ - secure ${\displaystyle ~(L_{+1},L_{+2})}$ - secure and ${\displaystyle ~(L_{-1},L_{-2})}$ -secure against adaptive chosen query attacks, then the scheme described above is ${\displaystyle ~(L_{1},L_{2})}$ - secure against adaptive chosen-query attacks, where

${\displaystyle ~L_{1}(G,L,m)=(L_{l1}(L,m),L_{+1}(G,m),L_{-2}(G,m))}$

and

${\displaystyle ~L_{2}(G,L,w)=(L_{l2}(L,w),(L_{+2}(G,i))_{i\in |R(w)|},(L_{-2}(G,i))_{i\in |R(w)|})}$ .

Proof
The proof is omitted due to lack of space but appears in [12].

Let ${\displaystyle ~Label=(Gen,Enc,Token,Search_{e},Dec)}$ be an encryption scheme for labeled data, ${\displaystyle ~Graph_{+}=(Gen,Enc,Token,Neigh_{e},Dec)}$ and ${\displaystyle ~Graph_{-}=(Gen,Enc,Token,Neigh_{e},Dec)}$ be graph encryption schemes that support neighbor queries. Our scheme ${\displaystyle ~LabGraph=(Gen,Enc,Token,Subgraph_{e},Dec)}$ is defined as follows:

• ${\displaystyle ~Gen(1_{k})}$ : generate three keys ${\displaystyle ~K_{1}\leftarrow Graph_{+}.Gen(1_{k})}$, ${\displaystyle ~K_{2}\leftarrow Graph_{-}.Gen(1_{k})andK_{3}\leftarrow Label.Gen(1_{k}).LetK=(K_{1},K_{2},K_{3})}$.
• ${\displaystyle ~Enc(K,G,m)}$:
1. compute ${\displaystyle ~(\gamma _{+},c_{+})\leftarrow Graph_{+}.Enc_{K_{1}}(G,m)}$,
2. compute ${\displaystyle ~(\gamma _{-},c_{-})\leftarrow Graph_{-}.Enc_{K_{2}}(G,m)}$,
3. for ${\displaystyle ~1\leq i\leq n}$ ,
1. compute ${\displaystyle ~\tau _{+i}\leftarrow Graph_{+}.Token_{K_{1}}(i)}$,
2. compute ${\displaystyle ~\tau _{-i}\leftarrow Graph_{-}.Token_{K_{2}}(i)}$,
4. let ${\displaystyle ~L}$ be the labeling generated from all the words in ${\displaystyle ~m}$ (i.e., each ${\displaystyle ~m_{i}}$ is labeled with the words it contains) and let ${\displaystyle ~v=\{(t_{+i},t_{=}{-i})_{i}\}}$ ,
5. compute ${\displaystyle ~(\gamma _{l},c_{l})\leftarrow Label.Enc_{K_{3}}(L,M)}$, where ${\displaystyle ~M}$ is composed of ${\displaystyle ~m}$ and ${\displaystyle ~v}$,
6. output ${\displaystyle ~\gamma =(\gamma _{+},\gamma _{-},\gamma _{l})}$ and ${\displaystyle ~c=(c+,c-,c)}$.
• ${\displaystyle ~Token(K,w)}$ : output ${\displaystyle ~\tau \leftarrow Label.Token_{K_{3}}w)}$.
• ${\displaystyle ~Subgraph_{e}(\gamma ,\tau )}$ :
1. compute ${\displaystyle ~(J_{l},v_{I}):=Label.Search(\gamma _{l},\tau )}$
2. for all ${\displaystyle ~j\in J_{l}}$ ,
1. compute ${\displaystyle ~J_{+j}:=Graph_{+}.Neigh(\gamma _{+},\tau _{+j})}$,
2. compute ${\displaystyle ~J_{-j}:=Graph_{-}.Neigh(\gamma _{-},\tau _{-j})}$,
3. output ${\displaystyle ~J=(J_{l},(J_{+j},J_{-j})_{j\in J_{l}})}$.
• ${\displaystyle ~Dec(K,c_{j})}$ : return ${\displaystyle ~m_{j}:=\Pi .Dec_{K_{3}}(c_{j})}$.

Fig. 5. A structured encryption scheme for web graphs supporting focused subgraph queries

## Conclusions and Future Directions

Several interesting future directions are suggested by this work. The most immediate is whether efficient and non-interactive structured encryption can be achieved while leaking less than the query and intersection pattern. The construction of efficient dynamic structured encryption schemes (i.e., that allow for updates to the encrypted data) is another direction left open by this work. Of course, the construction of schemes that handle other types of structured data and more complex queries on the data types considered here would also be interesting.

Acknowledgements

We are grateful to Kristin Lauter for encouragement during the early stages of this work, to Sherman Chow and Satya Lokam for useful discussions regarding graph encryption and to Susan Hohenberger for insisting on a thoroughcomparison with functional encryption. We are also grateful to Adam O’Neill for several helpful discussions on functional encryption. Finally, we thank Emily Shen and Charalampos Papamanthou for useful feedback on the manuscript and the anonymous reviewers for helpful suggestions.

## References

1. E-J. Goh. Secure indexes. Technical Report 2003/216, IACR ePrint Cryptography Archive, 2003.See http://eprint.iacr.org/2003/216. See http://eprint.iacr.org/2003/216.
2. M. Brautbar and M. Kearns. Local algorithms for _nding intersting individuals in large networks.In Innovations in Computer Science (ICS '10), 2010.
3. R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky. Searchable symmetric encryption: Improved de_nitions and e_cient constructions. In ACM Conference on Computer and Communications Security (CCS '06), pages 79{88. ACM, 2006.
4. O. Goldreich and R. Ostrovsky. Software protection and simulation on oblivious RAMs. Journal of the ACM, 43(3):431{473, 1996.
5. B. Waters, D. Balfanz, G. Durfee, and D. Smetters. Building an encrypted and searchable auditlog. In Network and Distributed System Security Symposium (NDSS '04). The Internet Society, 2004.
6. C. Gentry. Fully homomorphic encryption using ideal lattices. In ACM Symposium on Theory of Computing (STOC '09), pages 169{178. ACM Press, 2009.
7. E. Shi, J. Bethencourt, T. Chan, D. Song, and A. Perrig. Multi-dimensional range query over encrypted data. In IEEE Symposium on Security and Privacy, pages 350{364, Washington, DC, USA, 2007. IEEE Computer Society.
8. D. Boneh, G. di Crescenzo, R. Ostrovsky, and G. Persiano. Public key encryption with keyword search. In Advances in Cryptology { EUROCRYPT '04, volume 3027 of Lecture Notes in Computer Science, pages 506{522. Springer, 2004.
9. D. Song, D. Wagner, and A. Perrig. Practical techniques for searching on encrypted data. In IEEE Symposium on Research in Security and Privacy, pages 44{55. IEEE Computer Society, 2000.
10. M. Abdalla, M. Bellare, D. Catalano, E. Kiltz, T. Kohno, T. Lange, J. M. Lee, G. Neven, P. Paillier, and H. Shi. Searchable encryption revisited: Consistency properties, relation to anonymous IBE, and extensions. In V. Shoup, editor, Advances in Cryptology { CRYPTO '05, volume 3621 of Lecture Notes in Computer Science, pages 205{222. Springer, 2005.
11. M. Bellare, A. Boldyreva, and A. O'Neill. Deterministic and e_ciently searchable encryption. In A. Menezes, editor, Advances in Cryptology { CRYPTO '07, Lecture Notes in Computer Science, pages 535{552. Springer, 2007.
12. A. Shamir. Identity-based cryptosystems and signature schemes. In George Robert Blakley andDavid Chaum, editors, Advances in Cryptology { CRYPTO '84, volume 196 of Lecture Notes inComputer Science, pages 47{53. Springer, 1985.
13. D. Boneh, E. Kushilevitz, R. Ostrovsky, and W. Skeith. Public-key encryption that allows PIR queries. In A. Menezes, editor, Advances in Cryptology { CRYPTO '07, volume 4622 of Lecture Notes in Computer Science, pages 50{67. Springer, 2007.
14. Micrsoft Corp. Windows azure marketplace. http://www.microsoft.com/windowsazure/marketplace/.
15. Infochimps. http://www.infochimps.org.
16. J. Katz, A. Sahai, and B. Waters. Predicate encryption supporting disjunctions, polynomial equations, and inner products. In Advances in Cryptology - EUROCRYPT 2008, volume 4965 ofLecture Notes in Computer Science, pages 146{162. Springer, 2008.
17. J. Kleinberg. Authoritative sources in a hyperlinked environment. In Symposium on Discrete Algorithms (SODA '08), pages 668{677. Society for Industrial and Applied Mathematics, 1998.22
18. X. Boyen and B. Waters. Anonymous hierarchical identity-based encryption (without randomoracles). In Advances in Cryptology - CRYPTO 2006, volume 4117 of Lecture Notes in ComputerScience, pages 290{307. Springer, 2006.
19. O. Goldreich, S. Micali, and A. Wigderson. How to play ANY mental game. In ACM Symposium on the Theory of Computation (STOC '87), pages 218{229. ACM, 1987.
20. D. Chaum, C. Cr_epeau, and I. Damgard. Multiparty unconditionally secure protocols. In ACM symposium on Theory of computing (STOC '88), pages 11{19. ACM, 1988.
21. J. Bardin, J. Callas, S. Chaput, P. Fusco, F. Gilbert, C. Ho_, D. Hurst, S. Kumaraswamy, L. Lynch,S. Matsumoto, B. O'Higgins, J. Pawluk, G. Reese, J. Reich, J. Ritter, J. Spivey, and J. Viega. Security guidance for critical areas of focus in cloud computing. Technical report, Cloud Security Alliance, April 2009.
22. S. Kamara and K. Lauter. Cryptographic cloud storage. In Workshop on on Real-Life Cryptographic Protocols and Standardization, volume 6054 of Lecture Notes in Computer Science, pages 136{149. Springer, 2010.
23. E. Shen, E. Shi, and B.Waters. Predicate privacy in encryption systems. In Theory of Cryptography Conference (TCC '09), pages 457{473, Berlin, Heidelberg, 2009. Springer-Verlag.
24. D. Boneh and B. Waters. Conjunctive, subset, and range queries on encrypted data. In Theory of Cryptography Conference (TCC '07), volume 4392 of Lecture Notes in Computer Science, pages 535{554. Springer, 2007.
25. D. Boneh, A. Sahai, and B. Waters. Functional encryption: De_nitions and challenges. Technical Report 2010/543, IACR ePrint Cryptography Archive, 2010. See http://eprint.iacr.org/ 2010/543.
26. [34] C. Soghoian. Caught in the cloud: Privacy, encryption, and government back doors in the web 2.0 era. Journal on Telecommunications and High Technology Law, 8(2), 2010.
27. Yehuda Lindell and Benny Pinkas. A proof of security of yao's protocol for two-party computation. J. Cryptology, 22(2):161{188, 2009.
28. D. Boneh and M. Franklin. Identity-based encryption from the weil pairing. In Advances in Cryptology - CRYPTO 2001, volume 2139 of Lecture Notes in Computer Science, pages 213{229. Springer-Verlag, 2001.
29. R. Lempel and S. Moran. SALSA: The stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131{160, April 2001.
30. V. Goyal, O. Pandey, A. Sahai, and B. Waters. Attribute-based encryption for _ne-grained access control of encrypted data. In ACM conference on Computer and communications security (CCS'06), pages 89{98, New York, NY, USA, 2006. ACM.
31. J. Bethencourt, A. Sahai, and B. Waters. Ciphertext-policy attribute-based encryption. In IEEE Symposium on Security and Privacy, pages 321{334. IEEE Computer Society, 2007.
32. J. Katz and Y. Lindell. Introduction to Modern Cryptography. Chapman & Hall/CRC, 2008.
33. A. Sahai and B. Waters. Fuzzy identity-based encryption. In R. Cramer, editor, Advances in Cryptology { EUROCRYPT '05, volume 3494 of Lecture Notes in Computer Science, pages 457{473. Springer, 2005.
34. Y. Chang and M. Mitzenmacher. Privacy preserving keyword searches on remote encrypted data. In Applied Cryptography and Network Security (ACNS '05), volume 3531 of Lecture Notes in Computer Science, pages 442{455. Springer, 2005.
35. R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky. Searchable symmetric encryption: Improved de_nitions and e_cient constructions. Journal version (under submission), 2010.