Abstract
Modern digital communication relies heavily on cryptographic protection to ensure data integrity and privacy. In order to deploy stateofthe art cryptographic primitives and protocols in realworld scenarios, one needs to highly optimize software for both speed and security. This requires careful choices of highlevel cryptographic parameters, lowlevel optimization of software on the assembly level for a given microarchitecture and the subtle interactions between highlevel and lowlevel optimizations. This thesis considers three examples of cryptographic primitives and describes software implementations of these primitives that set new speed records. The Advanced Encryption Standard (AES) is one of the most widely used symmetric cryptographic primitives. The traditional implementation approach for AES is based on table lookups. While software based on this approach still achieves best performance for a variety of 32bit and 64bit architectures, it is usually vulnerable to cachetiming attacks. Another implementation approach for AES is the bitslicing technique. Not only is software based on this approach inherently protected against cachetiming attacks, on some microarchitectures it even achieves better performance. Ellipticcurve cryptography is the current state of the art of asymmetric cryptography. For ellipticcurve Di??eHellman key exchange, Bernstein proposed the Curve25519 function. Several speedrecordsetting implementations of this function have been developed for a variety of architectures. Optimizing Curve25519 software for the Synergistic Processor Units of the Cell Broadband Engine is a particularly interesting challenge because the small integer multipliers of this architecture do not seem to make it the bestsuited platform for publickey cryptography. Another use of elliptic curves in cryptography is in the construction of cryptographic pairings. In order to make pairings fast and secure, very special elliptic curvessocalled pairingfriendly curvesare required. For cryptographic pairings on the 128bit security level BarretoNaehrig curves of size about 256 bits are the best choice. Optimizing pairing software is more complex than optimizing, e.g., ellipticcurve Di??eHellman keyexchange software. The reason is that pairings involve multiple computation steps and multiple mathematical structures. A choice of parameters considering only some of these steps or structures is likely to incur high performance penalties in the other steps or structures. Evaluating the security of cryptographic primitives requires cryptanalytic effort. In many ways optimizing cryptanalytic algorithms is similar to optimizing cryptographic primitives: Careful choices of highlevel algorithmic parameters such as a Pollard rho iteration function need to be combined with lowlevel software optimization. A major di??erence when optimizing cryptanalytic algorithms is the high degree of parallelism that requires additional understanding in optimizing parallel algorithms and in network protocols. This thesis considers two cryptanalytical applications with very di??erent performance bottlenecks and optimization requirements. Pollard's rho algorithm is the best known algorithm to solve the ellipticcurve discretelogarithm problem for most primeorder elliptic curve groups. Large instances of this problem, such as Certicom's challenge ECC2K130 considered in this thesis, are usually solved using a parallel version of Pollard's rho algorithm, which uses a clientserver approach. The e??ciency of this approach is mainly determined by the speed of the iteration function, which runs on all clients independently in parallel. A cryptanalytical algorithm with signi??cantly more complex parallelization requirements isWagner's tree algorithm. This algorithm involves a huge amount of data for cryptographically relevant inputs; each byte of data needs to be loaded and stored several times. In a parallel environment with distributed storage data cannot be kept local: each byte also needs to be sent various times over the network. The implementation of Wagner's tree algorithm to find a collision in the toy version FSB48 of the SHA3 round1 candidate FSB on a cluster of 8 computers with a total of 5.5 TB of distributed storage demonstrates techniques to apply this algorithm in storagerestricted environments.
Original language  English 

Qualification  Doctor of Philosophy 
Awarding Institution 

Supervisors/Advisors 

Award date  24 Jan 2011 
Place of Publication  Eindhoven 
Publisher  
Print ISBNs  9789038624150 
DOIs  
Publication status  Published  2011 
Fingerprint Dive into the research topics of 'Highspeed cryptography and cryptanalysis'. Together they form a unique fingerprint.
Cite this
Schwabe, P. (2011). Highspeed cryptography and cryptanalysis. Eindhoven: Technische Universiteit Eindhoven. https://doi.org/10.6100/IR693478