Improved SIMD Implementation of Poly1305.
Date of Submission
December 2018
Date of Award
Winter 12-12-2019
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Applied Statistics Unit (ASU-Kolkata)
Supervisor
Sarkar, Palash (ASU-Kolkata; ISI)
Abstract (Summary of the Work)
Message Authentication Code is an important cryptographic concept which is used for checking message integrity. The Wegman-Carter construction is important in this field. The polynomial-based hash function, Poly1305 as proposed by Daniel J.Bernstein, is a widely used construction. It can be used to instantiate the hash function as required in Wegman-Carter construction. The vectorization of Poly1305 by Shay Gueron and Martin Goll has shown improvement over the known pre-existing implementations.The algorithm developed by Shay Gueron and Martin Goll has left some scope of improvement both for 256-bit and 512-bit vectorizations. In 256-bit vectorization, improvement has been achieved for messages each of whose number of 16-byte blocks is not a multiple of 4. In 512-bit vectorization, improvement has been achieved for messages each of whose number of 16-byte blocks is not a multiple of 8. For the said cases the alignment of the input is disturbed repeatedly because 4-decimation Horner and 8-decimation Horner for 256-bit and 512-bit vectorizations respectively have been applied incompletely. Goll and Gueron have used Intel Intrinsics for 256-bit and 512-bit vectorizations of Poly1305. For 256- bit vectorization AVX2 has been used. For 512-bit vectorization AVX512 has been used. We have used 4-decimation Horner and 8-decimation Horner throughout the length of input message for 256-bit and 512-bit vectorizations respectively irrespective of the message length. We have obtained better results both for 256-bit and 512-bit vectorizations. Detailed result analysis is available for 256-bit vectorization. The detailed result analysis of 512-bit vectorization is unavailable due to time constraints.In this report we have shown how to balance a message whose number of 16-byte blocks is not divisible by 4 so that it becomes suitable for application of 4-decimation Horner throughout its length. Same modifications have been done for application of 8-decimation Horner. We also provide a modified SIMD multiplication algorithm for handling messages where in each case, the number of 16-byte blocks when divided by 4 leaves 1 as remainder. Then we give detailed result analysis for Skylake and Kaby Lake cores using suitable graphs and tables.
Control Number
ISI-DISS-2018-398
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/6964
Recommended Citation
Bhattacharyya, Sreyosi, "Improved SIMD Implementation of Poly1305." (2019). Master’s Dissertations. 395.
https://digitalcommons.isical.ac.in/masters-dissertations/395
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843768