Performance Aware Decoding Algorithms for H.264 Codec on a Multi-Core Platform.

Date of Submission

December 2011

Date of Award

Winter 12-12-2012

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Advance Computing and Microelectronics Unit (ACMU-Kolkata)


Bhattacharya, Bhargab Bikram (ACMU-Kolkata; ISI)

Abstract (Summary of the Work)

The latest video compression standard, H.264 (also known as MPEG-4 Part 10/AVC for Advanced Video Coding) [3] is expected to become the video standard of choice in the coming years.H.264 is an open, licensed standard that supports the most efficient video compression techniques available today. Without compromising image quality, an H.264 encoder can reduce the size of a digital video file by more than 80% compared to the MotionJP EG format and as much as 50% more than with the MP EG − 4P art2 standard. This means, much less network bandwidth and storage space are required for a video file. Or seen another way, much higher video quality can be achieved for a given bit rate.1.1 MotivationThe H.264 decoder and encoder both have a sequential, data dependent flow as shown in Figure 1.1. This property makes it difficult to leverage the potential performance gain that could be achieved by the use of emerging many core processors.Multimedia applications remain important workloads in the future and demand high speed video coding and decoding. Their performance efficiency should increase with the increase in number of processor cores. A central question is whether H.264 decoder can scale to such large number of cores.One possible way of increasing efficiency is to identify data parallelism within one or more blocks of the encoder/decoder flow. We can assign independent data to the multiple cores available in the processor to achieve speed-up in execution.We can have dedicated Silicon implementations to achieve this. But hardware implementation is costly and we need different implementation for each new video compression standard. That is why we need a parallel software implementation of H.264 codec that can perform as efficiently as the hardware implementation and can run on different hardware platforms. If the hardware platform changes the software implementation needs smaller amount of change than dedicated silicon implementation.1.2 Scope of the ThesisIn this project, we consider an H.264 decoder and explore the possibilities of parallelism in the Intra prediction block.There are a couple of reasons behind taking up the decoder (and not the encoder) as part of this parallelization effort. Firstly, the encoding problem is a natively parallel one, and hence, lends itself more naturally to a parallelized execution environment and there are already numerous successful attempts in this direction. However, the decoding algorithm poses certain challenges to parallelization. Secondly, there is a decoding step inside the encoder as well and therefore, any success in parallelizing the decoder would naturally expedite the encoder as well.The slice is now the basic independent spatial element in H.264. This prevents an error in one slice from affecting other slices. Each block in the decoder receives as input a complete Slice. Each Slice is made up of multiple Macroblocks(MBs). If we can find data independence between the constituents of a slice i.e between MBs, we can considerably speed-up the processing of each block in the decoder. In this thesis we have performed this analysis on the Intra prediction block.1.3 Organization of the ThesisIn Chapter 2 we have describe the Intra-prediction process used in H.264 video decoding and how this creates data dependency between Macroblocks of a video slice. In Chapter 3 we give a brief description of multi-core architectures and how the H.264 can fit in such an architecture.In Chapter 4 we describe in brief the survey of other related works in this area done by us.In Chapter 5 we describe our objective and the Macroblock scheduling problem in detail and derive a problem formulation. In Chapter 6 we describe the various Macroblock scheduling strategies proposed by us. In Chapter 7 we have given the performance plots of all the proposed strategies based on various parameters. Finally in Chapter 8 we have analysed the performance of all scheduling strategies and described the scenario under which each of these strategies can be best used. We have also described the scope of future work from this thesis


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.