Improved SIMD Implementation of Poly1305.

Date of Submission

December 2018

Date of Award

Winter 12-12-2019

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Applied Statistics Unit (ASU-Kolkata)

Supervisor

Sarkar, Palash (ASU-Kolkata; ISI)

Abstract (Summary of the Work)

Message Authentication Code is an important cryptographic concept which is used for checking message integrity. The Wegman-Carter construction is important in this field. The polynomial-based hash function, Poly1305 as proposed by Daniel J.Bernstein, is a widely used construction. It can be used to instantiate the hash function as required in Wegman-Carter construction. The vectorization of Poly1305 by Shay Gueron and Martin Goll has shown improvement over the known pre-existing implementations.The algorithm developed by Shay Gueron and Martin Goll has left some scope of improvement both for 256-bit and 512-bit vectorizations. In 256-bit vectorization, improvement has been achieved for messages each of whose number of 16-byte blocks is not a multiple of 4. In 512-bit vectorization, improvement has been achieved for messages each of whose number of 16-byte blocks is not a multiple of 8. For the said cases the alignment of the input is disturbed repeatedly because 4-decimation Horner and 8-decimation Horner for 256-bit and 512-bit vectorizations respectively have been applied incompletely. Goll and Gueron have used Intel Intrinsics for 256-bit and 512-bit vectorizations of Poly1305. For 256- bit vectorization AVX2 has been used. For 512-bit vectorization AVX512 has been used. We have used 4-decimation Horner and 8-decimation Horner throughout the length of input message for 256-bit and 512-bit vectorizations respectively irrespective of the message length. We have obtained better results both for 256-bit and 512-bit vectorizations. Detailed result analysis is available for 256-bit vectorization. The detailed result analysis of 512-bit vectorization is unavailable due to time constraints.In this report we have shown how to balance a message whose number of 16-byte blocks is not divisible by 4 so that it becomes suitable for application of 4-decimation Horner throughout its length. Same modifications have been done for application of 8-decimation Horner. We also provide a modified SIMD multiplication algorithm for handling messages where in each case, the number of 16-byte blocks when divided by 4 leaves 1 as remainder. Then we give detailed result analysis for Skylake and Kaby Lake cores using suitable graphs and tables.

Comments

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843768

Control Number

ISI-DISS-2018-398

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/6964

This document is currently not available here.

Share

COinS