Collision Attacks against the Knudsen-Preneel Compression Functions

From Bauman National Library
This page was last modified on 6 December 2015, at 19:06.

Collision Attacks against the Knudsen-Preneel Compression Functions
Collision Attacks against the Knudsen-Preneel Compression Functions.png
Authors Onur O ̈zen[@: 1] and
Martijn Stam[@: 2]
Published 2010 г.
Web site International Association for Cryptologic Research
Download оригинал


Abstract. Knudsen and Preneel (Asiacrypt’96 and Crypto’97) intro- duced a hash function design in which a linear error-correcting code is used to build a wide-pipe compression function from underlying block- ciphers operating in Davies-Meyer mode. Their main design goal was to deliver compression functions with collision resistance up to, and even beyond, the block size of the underlying blockciphers. In this paper, we present new collision-finding attacks against these compression functions using the ideas of an unpublished work of Watanabe and the preimage attackofO ̈zen,Shrimpton,andStam(FSE’10).Inbrief,ourbestattack has a time complexity strictly smaller than the block-size for all but two of the parameter sets. Consequently, the time complexity lower bound proven by Knudsen and Preneel is incorrect and the compression func- tions do not achieve the security level they were designed for.
Keywords: Collision attack, coding theory, compression function.


Hash functions are currently at the centre of the cryptographic community’s attention. While most of this attention is geared directly towards the SHA- 3 competition (by analysing its remaining candidates), other, arguably more fundamental questions regarding hash function design should not be forgotten. After all, the study of the underlying principles of hash function design are potentially beneficial for the SHA-3 decision process.

The two most revered principles in hash function design are (i) the Merkle-Damgard iterative construction, or more generally the principle of designing a secure compression function and (ii) the Davies-Meyer construction, or more generally the principle of using a blockcipher as underlying primitive. Indeed, the currently standardized hash functions from the SHA family follow this approach (as did their predecessor MD5) as well as several of the SHA-3 candidates.

It was already recognized early on that the output sizes of traditional block-ciphers are insufficient to yield a secure compression function[1].. This still holds true today: for all the (optimally secure) PGV blockcipher-based compression functions[2], [3], [4] based on an 𝑛-bit blockcipher, the (time) complexity of collision- and preimage-finding attacks is at most resp. ; when 𝑛 = 128 (e.g. AES) the resulting bounds are mostly unacceptable for current practice (in particular for collision resistance).

To achieve acceptable security (based on small block sizes) it is necessary to output a multiple of the block-length. In the 1990s many constructions were proposed for this goal, mostly outputting 2𝑛 bits with the explicit collision re- sistance target of 2𝑛 (see[5],[6] for an overview). The standard goal for these constructions has been optimal collision-resistance: a target output size is fixed and the compression function should be collision resistant up to the birthday bound for that digest size. In three papers[7],[8],[9], Knudsen and Preneel adopted a different approach, namely to fix a particular security target and let the output size (and relatedly the number of blockcipher calls) vary as needed in order to guarantee a particular security target without imposing optimal security.

Specifically, given 𝑟 independent ideal compression functions each mapping to bits, they create a new ‘bigger’ compression function out- putting 𝑟𝑛 bits. Following principles (i) and (ii) already mentioned, they then propose to instantiate the underlying ideal compression functions with a blockci- pher run in Davies-Meyer mode and to iterate the compression function to obtain a full blockcipher-based hash function. However, they derive their security from the compression function, so that is where we will focus our attention.

The are run in parallel where each of their inputs is some lin- ear combination of the blocks of message and chaining variable that are to be processed; the -bit output of their construction is the concatenation of the out- puts of these parallel calls. The elegance of the KP construction is in how the inputs to are computed. They use the generator matrix of an error-correcting code over to determine how the input blocks of the ‘big’ compression function are xor’ed together to form the inputs to the underlying functions. (In a generalization they consider the as mapping from to bits instead and use a code over .)

The (deliberate) effect of this design is that when two inputs to the ‘big’ compression function differ, the corresponding inputs for the underlying func- tions will differ for at least 𝑑 functions. In particular, when using a systematic generator, a change in the systematic part of the input results in at least so-called active functions in the non-systematic part. Intuitively this means that one has to find a preimage, resp. a collision for the active functions in parallel. Based on this observation, Knudsen and Preneel claim that (under an assumption) any collision attack needs time at least (and as many 𝑓𝑖 evaluations) and they conjecture that a preimage attack will require time at least . Additionally, they give preimage and collision attacks (sometimes matching their lower bounds).

Table 1. A summary of collision attacks on the Knudsen-Preneel compression functions, with constant and polynomial factors (in 𝑛) ignored. Non-MDS pa- rameters are in italic, for the underlying primitive , and for

Таблица 1.png

Watanabe[10] already pointed out a collision attack beating the one given by Knudsen and Preneel for many of the parameter sets. In particular, his dif- ferential attack works whenever and has a query and time complexity of essentially . Thus he demonstrated that the proven collision resistance lower bound given by Knudsen and Preneel is incorrect whenever and . For a code with minimum distance he matches the Knudsen-Preneel collision-resistance lower bound, but does not violate it; the two codes proposed by Knudsen and Preneel with (namely and ) seem beyond reproach. Yet this was the first indication that something is amiss with the claim by Knudsen and Preneel. A second indication arrived at FSE’10, when O ̈zen, Shrimpton, and Stam [7] demonstrated a remarkably efficient preimage attack that, for 9 out of 16 cases, runs in time which was shown optimal. More- over, using a yield-based argument, they showed that an information-theoretic adversary in principle should be able to find collisions after only queries.

Our contribution. In this paper we deal what we believe will be the final blow against the Knudsen-Preneel compression functions. Our contribution is four- fold, with a summary provided in Table 1. For completeness, we have also investigated the time complexity that one would obtain by straightforward adaptation of the ideas and query complexities of OSS; we refer to the full version of this paper[11]. for the details.

The mise en place in Section 4 provides a detailed mathematical character- ization of the Knudsen-Preneel compression function’s preprocessing. As a first simple result, this allows us in Section 5.1 to revise the attack of Watanabe in a way that slightly reduces the time requirements, yet significantly increases the number of collisions it can produce. More precisely, after an initial effort of , we can generate (up to) collisions in constant time (for ).

In our revised version of Watanabe’s attack, we fix a pure tensor to create a differential. By adaptively looking for an arbitrary tensor and using the same type of queries as Ozen,Shrimpton,and Stam we arrive in Section5.2 at a new, symbiotic collision-finding attack with time complexity . The attack works whenever (as in Watanabe’s case). Even more amazing is that if the inequality is strict, that is if , the adversary can create further collisions (like our revised attack) in constant time (up to collisions).

Thirdly, in Section 6.1 we introduce a parametrized information-theoretic collision attack. It turns out that the new symbiotic attack and the old OSS information-theoretic collision attack are both on opposite ends of the spectrum of this parametrized attack, yet optimality is typically achieved somewhere in the middle—with and ) again as exceptions—yielding query complexity .

Our final contribution is a reduced-time variant of our optimized attack above. For this we use the same ideas as O ̈SS, but with a crucial twist: where they used the dual code to look for preimages efficiently, we will use the dual shortened code to search for collisions efficiently. As a result, for 12 out of 16 suggested parameters we can mount a collision attack whose time complexity matches its query complexity (ignoring constants and logarithmic factors). Even better, only for we are unable to beat the time-complexity of any prior attack we are aware of, for the rest we set new records.


With some minor modifications, we will adhere to the notation also used by Ozen, Shrimpton, and Stam.

Linear error correcting codes. linear error correcting code 𝒞 is the set of elements (codewords) in a 𝑘-dimensional subspace of (for ), where the minimum distance 𝑑 is defined as the minimum Hamming weight (taken over all nonzero codewords in 𝒞). The dual code is the set of all elements in -dimensional subspace orthogonal to 𝒞 (with respect to the usual inner product), and its minimum distance is denoted . The Singleton bound puts a limit on the minimum distance:. Codes matching the Singleton bound are called maximum distance separable (MDS). An important property of an MDS code is that its dua is MDS as well,so .

An code 𝒞 can be generated by a matrix , C= (using row vectors throughout). A generator matrix 𝐺 is 𝑘×(𝑟−𝑘) called systematic if it has the form for and the identitu matrix in . Furthermore, 𝐺 is the generator matrix of an MDS code if any 𝑘 columns are linearly independent. For an index set we define as the restriction of 𝐺 to those columns indexed by . For a code and any index set , we want to define such that is invertible and or . For MDS codes, the existence of such an can be shown easily (and we can impose uniqueness e.g. by virtue of an ordering). For non-MDS codes there exist some for which such an does not exist (for example the ), however for any target cardinality it is possible to find an (of that cardinality) that does have an (e.g. by first going through the systematic columns); we call such an admissible.

A given code 𝒞 can be shortened to obtain a new, derived code 𝒞′. Let , then consider the set of all codewords in 𝒞 that are 0 on position 𝑖. The new code 𝒞′ consists of these codewords with position 𝑖 dropped, however we sometimes ‘quasi-shorten’ and keep the superfluous zeroes present (we always keep the original indexing). It is easy to see that 𝒞′ is an code unless all codewords in 𝒞 had a 0 on position 𝑖 or (in the latter case the shortening might result in the trivial one-codeword code. The shortening of an MDS code is an MDS code itself. By repeated application one can shorten by any index set , for which to obtain a derived MDS code 𝒞′. If 𝐺 is systematic and we can generate the shortened code by dropping the first rows , where . (For the four non-MDS codes used by Knudsen and Preneel we will perform a separate analysis on repeated shortening.)

Линейно-блочные функции сжатия. A compression function is a mapping for some blocksize and integer parameters . For positive integers 𝑐 and 𝑛, we let Func(𝑐𝑛,𝑛) denote the set of all functions mapping into . A compression function is Public Random Function (PuRF)-based if its mapping is computed by a program with oracle access to a finite number of specified oracles , where . When a PuRF-based compression function operates on input 𝑊 , we write for the resulting value. Of primary interest for us will be single-layer PuRF-based compression functions without feedforward. These call all oracles in parallel and compute the output based only on the results of these calls; in particular, input to the compression function is not considered.

Most PuRF-based (and blockcipher-based) compression functions are of a special type. Instead of arbitrary pre- and postprocessing, one finds only func- tions that are blockwise linear. The Knudsen-Preneel construction is also block- wise linear, so let us recall from [7] what is a blockwise-linear scheme.

Definition 1 (Blockwise-linear scheme).

Let 𝑟,𝑐,𝑏,𝑡,𝑠 be positive integers and let matrices be given. We define o be a family of single-layer PuRF-based compression functions , for all positive integers 𝑛 with. Specifically, let and . Then on input (interpreted as column vector), computes the digest as follows:

1. Compute

2. Parse and for compute

3. Parse and output

where denotes the Kronecker product and the identity matrix in .

In the definition above we silently identified with the vector space . The map corresponding to will occasionally be denoted and its image . It will be convenient for us to write the codomain of as a direct sum, so we identify with where for . If and , then consequently will be in . (This extends naturally to , when )

Knudsen-Preneel compression functions. Knudsen and Preneel[7],[8] introduced a family of hash functions employing error correcting codes. (We use the journal version[9] as our frame of reference). Although their work was ostensibly targeted at blockcipher-based designs, the main technical thread of their work develops a transform that extends the range of an ‘ideal’ compression function (blockcipher-based, or not) in a manner that delivers some target level of security. As is nowadays typical, we understand an ideal compression function to be a PuRF. In fact, the KP transform is a special instance of a blockwise-linear scheme (Definition 1), in which the inputs to the PuRFs are determined by a linear code over a binary field with extension degree , and with being the identity matrix over . The extension field itself is represented as a subring of the matrix ring (of dimension equalling the extension degree) over the base field. We formalize this by an injective ring homomorphism and let be the component-wise application of and subsequent identification of with (we will use for matrices over of arbitrary dimensions). For completeness, there is also a group homomorphism such that for all .

Definition 2 (Knudsen-Preneel transform).

Let be a linear code over with generator matrix . Let - be an injec- tive ring homomorphism and let be positive divisor of such cthat . Then the Knudsen-Preneel compression function equals with and .

If , then with is defined for al , for which - divides . Moreover, is based on r PuRF in . For use of 𝐻 in an iterated hash function, note that per invocation (of 𝐻) one can compress message blocks (hence the requirement ensures actually compression is taking place), and the rate of the compression function . We will concentrate on the case and then in particular on the 16 parameter sets given by Knudsen and Preneel.1 Since is uniquely determined given (and ), we will often omit it.

Security notions. A collision-finding adversary is an algorithm whose goal is to find two distinct inputs 𝑊, 𝑊 ′ that hash to the same value, so . We will consider adversaries in two scenarios: the information-theoretic one and a more realistic concrete setting. For information-theoretic adversaries the only resource of interest is the number of queries made to their oracles. Otherwise, these adversaries are considered (computationally) unbounded. In the concrete setting, on the other hand, we are interested in the actual runtime of the algorithm and, to a lesser extent, its memory consumption (and code-size).

Prior Art on the Knudsen-Preneel Hash Functions

Knudsen and Preneel’s security claims. Knudsen and Preneel concentrate on the collision resistance of their compression function in the complexity theo- retic model. Under a fairly generous (but plausible) assumption, they essentially show that if , then finding collisions in takes time at least . For preimage resistance Knudsen and Preneel do not give a cor- responding theorem and assumption, yet they do conjecture it to be essentially the square of the collision resistance.

Knudsen and Preneel also present two attacks, one for finding preimages [[9], Proposition 3] and one for finding collisions [[9], Proposition 4] (see results in Table 1). Both attacks revolve around finding multi-preimages for the systematic part of the construction in sufficient numbers to make it likely that completion to the non-systematic part will yield a full preimage respectively a full collision.

Watanabe’s collision-finding attack. Knudsen and Preneel left a consider- able gap between the actual complexity of attacks and their lower bounds in the case of collision resistance. Watanabe[10] has pointed out a collision attack that runs in time (and as many PuRF evaluations). Thus, for many of the parameter sets, it beats the one given by Knudsen and Preneel. More interestingly, his attack serves as proof that the lower bound given by Knudsen and Preneel is incorrect for a large class of parameters: whenever and , which involves 6 out of 16 parameter sets. (See also Table 1.)

Assume that the code’s generator matrix is systematic, that is with . Then the goal is to generate, for each a colliding pair of inputs (and ) in such a way that their completion to full ‘codewords’ satisfies for . This is done by ensuring that , where is in the kernel of (since ,the kernel is guaranteed to contain a non- trivial element). Mutual independence of the inputs to the PuRFs corresponding to the code’s systematic part allow the initial collision searches to be mounted independently. Unfortunately, since the collisions need to be rather special (due to fixed , the birthday paradox does not apply and a collision search costs about . queries and time (per PuRF). On the plus side, the attack is trivially memoryless and parallellizable.

Ozen-Shrimpton-Stam preimage-finding attack.An extensive security analysis for the preimage resistance of KP-constructions, falsifying the designers’ conjectured lower bound, has been provided by O ̈zen, Shrimpton, and Stam [12].. Additionally, they also provided a related collision-finding attack with a surpris- ingly low query complexity: (but no analysis of its time complexity).

At the core of the Ozen-Shrimpton-Stam attacks is the simple observation that yields a string of the form . More generally, any linear combination of strings with the same pattern of fixed zero bits will yield a string with the same form. By restricting PuRF queries to strings with the same (blockwise) pattern one can optimize the yield of these queries (i.e. the maximum number of KP compression function evaluations an adversary can compute for a given number of queries). Matching the yield with the size of the codomain (resp. its square root) gives rise to an information-theoric preimage (resp. collision) attack.

A second observation is that, in the case of a preimage attack, the dual code can be used to find the preimage far more efficiently than a naive method. Direct application of this method however is disappointing (see Table 1). The resulting time complexities are typically much higher than the corresponding query complexities and the attack is seldom competitive with that of Knudsen and Preneel, let alone with that of Watanabe.

Decoding the Knudsen-Preneel Preprocessing

An important property that is exploited by both Watanabe and OSS is linearity . Indeed, the image itself can be regarded as an 𝑒𝑘𝑛′-dimensional subspace of or equivalently as an code (where the minimum distance d' is largely irrelevant; it satisfies ). This has the consequence that if and , collide, it is guaranteed that , i.e. the difference itself is a (nonzero) codeword in . Below we will give a more detailed mathematical characterization of ,with a special eye towards the improved collision-finding algorithms we will give later on. Most of the results below are mathematically rather straightforward (and the proofs are left to the full version); the machinery is mainly needed to ensure that, when using canonical bases for the various vector spaces, everything lines up correctly and consistently with the actual Knudsen-Preneel compression function. Recall that we are given an injective ring homomorphism and a group isomorphism , that satisfy for all . Let - be a linear code with generator matrix , let - be a positive divisor of , such that and finally let . Then the input processing of the Knudsen- Preneel compression function is defined by (and note that ).

Characterization of as a sum. We have already written the codomain of as a direct sum of PuRF inputs by identifying with where for . Here we will use a second interpretation that emphasizes the code. We will consider where for . Since and, by extension is a vector space over , we cannot find a vector space isomorphism (as for the earlier direct sum). Nonetheless we can find a suitable group isomorphism from to .

To define the group isomorphism we exploit that, luckily, the underlying , arithmetic is essentially preserved by , even though the '' in garbles things up. To formalize this, let - be the group isomorphism such that for all and .

Lemma 1

Let , let be the (quasi) shortening of on and let for . Then with for all if such that .

The following proposition develops the key idea on how to recognize that a given is an element of . This result is exploited in [12] to efficiently find preimages for Knudsen-Preneel compression functions.

Proposition 1

Let , and a nonzero be given. Suppose that for some , then iff for all positive integers 𝑛′ it holds that .

Since is isomorphic , this leads in a natural way to a function from to , where with and . Note that we do not discriminate between different representatives, that is for nonzero we have that .

Lemma 2

If and , then if or .

The following lemma states that invertibility of suffices to invert C^{PRE}.

Lemma 3

Let G - be a generator matrix for an code. Let be such that is invertible, with transposed inverse . Let be an integer and, for let be a direct-sum-decomposition of . If given , for , or equivalently then is the unique element for which satisfies for .

A New Symbiotic Collision-Finding Attack

Revising Watanabe’s Attack

Watanabe’s attack has complexity , requires and essentially finds a single collision. Below we give a revised and improved version of his algorithm. It only has complexity , requires and it potentially results in many, many collisions. More precisely, if , then after the initial effort (of ) we can find a new collision in constant time, for up to a whopping collisions.

we can find a new collision in constant time, for up to a whopping was computed as some non-trivial kernel element, we will compute it based on a codeword of sufficiently low weight and an arbitrary (nonzero) ‘block multiplier’ . In particular, we will set, . By using a minimal weight codeword the attack performs best.

Algorithm 1 (Revised Watanabe Collision Attack)

For the revised attack to work, we need one further ingredient. Watanabe assumes a systematic code and exploits that, when there exists a nonzero codeword for which . This allows easy completion of a partial collision to a full collision. Our revised version allows an arbitrary (nonzero) codeword of weight at most (existence of which requires ). Thus might no longer map to the systematic part of the code. Luckily, Lemma 3 provides completion to a full collision, provided is admissable; for the four non-MDS codes proposed by Knudsen and Preneel it can be verified that the minimum distance codewords are admissible.

Theorem 1(Revised Watanabe attack.)
Let be given with. Consider (with ). Then Algorithm 1 using a minimum-weight codeword (and an arbitrary nonzero ) finds collisions for in expected time (using as many PuRF evaluations)
W/o proof.

Algorithm 2 (New Symbiotic Collision Attack)

A New Symbiotic attack

Our revised version of Watanabe’s attack clearly shows that an attacker poten- tially has a lot of freedom. Below we transform some of this freedom into a faster attack. More to the point, as in the revised Watanabe attack we still look for a collision with differential and fix the codeword but we do not fix the multiplier up front. Instead we determine it based on the outcomes of the queries we make. To increase our success probability, we restrict to the same kind of queries as O ̈zen, Shrimpton, and Stam did.

Theorem 2(New symbiotic attack)
Let be given with . Consider ). Then Algorithm 2 finds collisions for in time (using as many PuRF evaluations) and memory (expressed in 𝑛-bit blocks).

Proof (Sketch). We will leave showing the correctness of Algorithm 2 to the full version of this paper [8] and only prove here that a collision is expected and that the query and time complexities are as claimed.

Since , by construction, the attack has the stated query complexity (per PuRF) for , since all queries are made during the Query Phase. Using a naive approach, Local Collision Detection step can be performed in roughly (грубо) comparisons resulting in partial collision lists of expected cardinality for .

For Global Collision Detection, we just enumerate one partial collision list and check for membership against the others. Assuming constant time mem- ory access, the time complexity of this step is at most. Since , it follows that , making the Query Phase dominant with its time complexity of .

Since we have 𝑑 active PuRFs in total, the probability of finding a common element among 𝑑 such lists is then или . To ensure an expected number of collisions of one, we need the second exponent to be at least zero, and indeed, solving for zero gives the desired

A Parametrized Collision-Finding Attack.

Optimizing the Query Complexity.

Algorithm 3 (Parameterized Collision Attack).

The symbiotic attack and the information-theoretic attack by Ozen, Shrimpton, and Stam have completely different query complexities and which one is the best seems very parameter dependent. However, it turns out that both attacks are the extreme cases of a more general parametrized attack, as given by Algorithm 3.

Theorem 3
Let be given. Consider (with ). Then collisions for can be found with Alg. 3 using queries (per PuRF) where


That the attack has the stated query complexity follows readily from the usual observation that combined with the computation of . exactly matching the theorem statement. What remains to show is that collisions are indeed output and expected with good probability.

For correctness, let be output by the algorithm and consider , and . First, notice that Lemma 3 implies that projecting onto is in . Now, either of the steps Degrees of Freedom, Filtering or Skip ensures that . Finallys, since , it follows that for and hence by construction (Local Collision Detection) we have .

Moreover Collision Pruning guarantees that and De- grees of Freedom ensures that the projections of and onto are equal. Hence, for all .

Let us move on to the number of expected collisions output. Since , the expected number of local collisions found per active PuRF for is . Using that , we arrive at a total number of potential collisions of . For a true collision to occur, we need to find a tuple such that both , and can be completed to codewords subject to the constraint that , then is a codeword as well and the above implies that . The restriction ensures nontriviality of the shortened code (shortening any further and the shortened code would consist of the zero codeword only resulting in , so no collision). In case of MDS codes, the shortened code has parameters , in particular it has dimension . (For non-MDS codes it is possible that a higher dimension is achieved.)

As a result, a fraction of the differentials will be satisfactory, leading to an expected number of . If or equivalently we are done whenever . Since can be rewritten to , we are in the second case, with . Writing and substitution lead to or as desired.

If on the other hand, , further filtering is needed. In particular, given a potential ‘half’ of a collision we need to check if it can correspond to a codeword. Since , we can uniquely complete to a codeword given of its elements. The remaining coordinates need to be in sync. Per remaining element, this occurs with probability , leading to . Now we are in the first case since . Writing , we obtain . Since we aim for , as desired.

Collolary 1
Assuming , substitution of in Theorem 3 gives . This is optimal (for Algorithm 3) whenever .

That the substition does what it says can be readily verified, so we restrict ourselves to prove the optimality here. Let and - 2 be two real valued functions defined over closed intervals and respectively. Note that both and are continuous in their respective domains (since their respective poles fall outside the domains). So both and attain their maximum and minimum in the closed intervals и .Since and , we can conclude that is decreasing и is increasing. Therefore, they both attain their minimum at their shared boundary .

Remark 1 The only two parameter sets proposed by Knudsen and Preneel not satisfying the conditions of the corollary above are and . n both cases and only is applicable. For можно проверить и .one can check that is attained at . For it holds that , so that - is in fact a constant function and both and lead to the same .

Remark 2. Substitution of in Theorem 3 gives and the resulting query complexity coincides with that reported by O ̈zen, Shrimpton, and Stam. On the other extreme, substitution of gives (assuming ). For MDS codes this simplifies to , coinciding with our symbiotic attack. For non-MDS codes there seems to be a slight mismatch. The reason is that if a non-MDS code is maximally shortened, the shortened code has dimension 1, whereas in the derivation of Theorem 3 we pessimistically assumed (at least for the KP non-MDS codes that satisfy ). Correcting for this looseness would result in a match with the symbiotic attack.

Generic Collision Attack against MDS Constructions

Algorithm 4 (Collision Attack against MDS-based schemes).

If we want to run Algorithm 3 (with fixed и as obtained in Corollary 1) we ideally want a time complexity almost coinciding with the targeted query complexity. For it holds that ,obviating the need for the steps Filtering and Degrees of Freedom. We have already seen that Local Collision Detection costs at most a small logarithmic factor, which leaves only the Merge Phase and Collision Pruning to worry about. Together, these two steps are designed to produce . A naive approach would enumerate all elements in the much larger , which is wasteful. Our task is therefore, given the lists of partial collisions for , to create more efficiently.

In the sequel, we will follow in the footsteps of O ̈zen, Shrimpton, and Stam who used the dual code in a similar problem related to their preimage-finding attack. An important innovation for the collision-finding attack stems from the realization that can be regarded as belonging to the (quasi) shortened code. This allows the use of the dual of the shortened code to speed up the search. As the minimum distance of the dual code is an important parameter in determining the overall time-complexity and shortening a code reduces the minimum distance of its dual accordingly, we make a significant efficiency gain this way

Road map. We present our collision attack against Knudsen-Prennel compression functions whose is based on MDS codes in Alg. 4, whereas its analysis is given in Thm. 4. We leave the generalization of our attack to (KP-suggested) non-MDS parameters together with the proof of Thm. 4 to the full version of this work where we also investigate a more space efficient version of Alg. 4.

Reducing the Time Complexity. Since and , we know from Algorithm 3 that it is enough to find a nonzero of the form for to complete the collision. Now notice that is lying in a smaller space , identified by , that is the shortened code obtained from (by dropping the zeroes of the codewords from all the positions appearing in). This observation allows us to guarantee that , once we determine that a candidate is in. Hence, it is enough for our purposes to limit ourselves to , rather than looking for membership in the larger space.

To this end, we first identify an index set (the role of will be explained momentarily) defining a subspace , for which when restricted to this subspace, is not surjective. As a consequence, we will be able to prune significantly the total collection of candidate , которые допустимы (restricted to ). In the sequel, we will show how to efficiently find an index set and how to efficiently prune.

An important parameter determining the runtime of our collision attack is , the minimum distance of the dual shortened code. Let be function that maps to the set of indices of non-zero entries in . Thus, and equals the Hamming weight of the codeword .

An easy adaptation of Proposition 1 shows that if we are given a codeword and an element , then can only be in , if , where the only parts of relevant for this check are those lining up with the nonzero entries of . Indeed, an element can be completed to an element in the range of derived mapping if . Efficient creation of


can be done adapting standard techniques [13], [14], [15]by splitting the codeword in two and looking for all collisions in respective entries. That is, assume that with , assume, for


Then consists of those elements , for which and .

Theorem 4
Let code be given and be a shortened code, derived from for . Let - be the minimum distance of the dual code of . Suppose is MDS and consider the collision attack described in Alg. 4 run against using queries for . Then the expected number of collision outputs is equal to one and the expectations for the internal list sizes are:

, , , . The average case time complexity of the algorithm is max with a memory requirement of (expressed in 𝑐𝑛-bit blocks).

W/o proof


In this paper we provide an extensive security analysis of the Knudsen-Preneel compression functions by focusing on their collision resistance. We present three improved collision attacks namely the revised Watanabe, symbiotic collision and parametrized collision attacks. Our new attacks work with the least number of queries reported so far. Moreover, except for only one out of 16 suggested parameters, these attacks beat the time-complexity of any prior attack we are aware of.

Acknowledgments. We thank Joachim Rosenthal and Thomas Shrimpton for their useful comments and suggestions.


  1. LACAL, EPFL Station 14, CH-1015 Lausanne, Switzerland,
  2. LACAL, EPFL Station 14, CH-1015 Lausanne, Switzerland,


  1. Yuval, G.: How to swindle Rabin. Cryptologia 3, 187–189 (1979)
  2. Black,J.,Rogaway,P.,Shrimpton,T.:Black-boxanalysisoftheblock-cipher-based hash-function constructions from PGV. In: Yung [15], pp. 320–335
  3. Preneel, B., Govaerts, R., Vandewalle, J.: Hash functions based on block ciphers: A synthetic approach. In: Stinson, D. (ed.) Advances in Cryptography—Crypto’93. LNCS, vol. 773, pp. 368–378. Springer, Heidelberg (1993)
  4. Stam, M.: Blockcipher-based hashing revisited. In: Dunkelman, O. (ed.) FSE’09. LNCS, vol. 5665, pp. 67–83. Springer, Heidelberg (2009)
  5. Knudsen, L., Muller, F.: Some attacks against a double length hash proposal. In: Roy, B.K. (ed.) Advances in Cryptography—Asiacrypt’05. LNCS, vol. 3788, pp. 462–473. Springer, Heidelberg (2005)
  6. O ̈zen, O., Stam, M.: Another glance at double-length hashing. In: Parker, M.G. (ed.) CCC’09. LNCS, vol. 5921, pp. 176–201. Springer, Heidelberg (2009)
  7. 7.0 7.1 Knudsen, L.R., Preneel, B.: Hash functions based on block ciphers and qua- ternary codes. In: Kim, K., Matsumoto, T. (eds.) Advances in Cryptography— Asiacrypt’96. LNCS, vol. 1163, pp. 77–90. Springer, Heidelberg (1996)
  8. 8.0 8.1 Knudsen, L.R., Preneel, B.: Fast and secure hashing based on codes. In: Burt Kaliski, J., Burton, S. (eds.) Advances in Cryptography—Crypto’97. LNCS, vol. 1294, pp. 485–498. Springer, Heidelberg (1997)
  9. 9.0 9.1 9.2 9.3 Knudsen, L.R., Preneel, B.: Construction of secure and fast hash functions using nonbinary error-correcting codes. IEEE Transactions on Information Theory 48(9), 2524–2539 (2002)
  10. 10.0 10.1 Watanabe, D.: A note on the security proof of Knudsen-Preneel construction of a hash function (2006), unpublished manuscript, Available at
  11. Ozen O.,Stam,M.:CollisionattacksagainstKnudsen-Preneelcompressionfunc- tions, full version of this paper, Manuscript (2010)
  12. 12.0 12.1 Ozen, O., Shrimpton, T., Stam, M.: Attacking the Knudsen-Preneel compression functions. In: Hong, S., Iwata, T. (eds.) FSE’10. LNCS, vol. 6147, pp. 94–115. Springer, Heidelberg (2010)
  13. Chose, P., Joux, A., Mitton, M.: Fast correlation attacks: An algorithmic point of view. In: Knudsen, L. (ed.) Advances in Cryptography—Eurocrypt’02. LNCS, vol. 2332, pp. 209–221. Springer, Heidelberg (2002)
  14. Schroeppel, R., Shamir, A.: A 𝑇 = 𝑂(2𝑛/2), 𝑆 = 𝑂(2𝑛/4) algorithm for certain NP-complete problems. SIAM Journal on Computing 10, 456–464 (1981)
  15. Wagner, D.: A generalized birthday problem. In: Yung [15], pp. 288–303