It has been theoretically shown that performing coding in networked systems, including Reed-Solomon codes, fountain codes, and random network coding, has a clear advantage with respect to simplifying the design of protocols. These coding techniques can be deployed on a wide range of networked nodes, from multi-Gigabits/s servers to smartphone devices. However, large-scale real-world deployment of systems using coding is still rare, mainly due to the computational complexity of coding algorithms. This is especially a concern on both extremes: in high-bandwidth servers where coding may not be able to saturate the uplink bandwidth, and in smartphone devices where hardware limitations prevail.
This work represents the first attempt towards a high performance implementation of network coding. As part of this thesis, we provide a comprehensive toolkit to make coding practical across a wide range of networked nodes, from servers to smartphones. We strive to push the performance of our cross-platform coding toolkit to the limits allowed by off-the-shelf hardware. To show the practicality of the toolkit in real-world network applications, it has been used to build coded on-demand media streaming systems from a GPU-based server to thousands of emulated nodes, and to iPhone devices with actual playback.
Abstract
One key technique for improving the coding efficiency of H.264, the state-of-the
art video compression standard, is the entropy coding technique known as context adaptive
binary arithmetic coder (CABAC). However, the complexity of the encoding
process of CABAC is significantly higher than the traditional table driven entropy
encoding schemes such as Huffman coding. CABAC is also bit serial and its multibit
parallelization is extremely difficult. For a high definition video encoder with a
20 Mbps output stream, multi-giga hertz RISC (reduced instruction set computer)
processors will be needed to implement the CABAC encoder.
In this work, we investigate and develop an efficient, pipelined VLSI architecture
for CABAC encoding. The resulting architecture efficiently decouples and pipelines
the critical stages to address the bottlenecks of renormalization, outstanding bits, and
regular/bypass coding modes. The final solution is a single cycle throughput for encoding
a binary symbol. An FPGA (field-programmable gate array) implementation
of the proposed scheme is capable of 97 Mbps encoding rate. An ASIC (application
specific integrated circuit) synthesis and simulation for a 0.18 µm process technology
indicates that the design is capable of encoding 190 million binary symbols per second
using an area of 0.209 sq-mm. The proposed design is thoroughly tested for several standard
test contents through both software and hardware simulations with test vectors
up to a 300 frames foreman content. Also, several designs for CABAC’s binarization
block and its interface are explored each with different levels of hardware support.
Online copy of master's thesis
msthesis.pdf (~36MB)