Hassan Shojania


PhD Thesis - "Making Coding Practical: From Servers to Smartphones"

Abstract

The fundamental insight of use of coding in computer networks is that information to be transmitted from the source in a session can be inferred, or decoded, by the intended receivers, and does not have to be transmitted verbatim. Several coding techniques have gained popularity over the recent years. Among them is random network coding with random linear codes, in which a node in a network topology transmits a linear combination of incoming, or source, packets to its outgoing links. Theoretically, the high computational complexity of random linear codes (RLC) is well known, and is used to motivate the application of more efficient codes, such as the traditional Reed-Solomon (RS) codes and, more recently, fountain codes (LT codes). Factors like computational complexity, network overhead, and deployment flexibility can make one coding schemes more attractive for one application than the others. While there is no one-fit-all coding solution, random linear coding is very flexible, well known to be able to achieve optimal flow rates in multicast sessions, and universally adopted in all proposed protocols using network coding. However, its practicality has been questioned, due to its high computational complexity. Unfortunately, to date, there has been no commercial real-world system reported in the literature that take advantage of the power of network coding.

This research represents the first attempt towards a high-performance design and implementation of network coding. The objective of this work is to explore the computational limits of network coding in off-the-shelf modern processors, and to provide a solid reference implementation to facilitate commercial deployment of network coding. We promote the development of new coding-based systems and protocols through a comprehensive toolkit with coding implementations that are not just reference implementations. Instead, they have attained high-performance and flexibility to find widespread adoption.

The final work, packaged as a toolkit code-named Tenor, includes high-performance implementations of a number of coding techniques: random linear network coding (RLC), fountain codes (LT codes), and Reed-Solomon (RS) codes in CPUs (single and multi core(s) for both Intel x86 and IBM POWER families), GPUs (single and multiple), and mobile/embedded devices based on ARMv6 and ARMv7 cores. Tenor is cross-platform with support on Linux, Windows, Mac OS X, and iPhone OS, and supports both 32-bit and 64-bit platforms. The toolkit includes some 23K lines of C++ code.

In order to validate the effectiveness of the Tenor toolkit, we build coding-based on-demand media streaming systems with GPU-based servers, thousands of clients emulated on a cluster of computers, and a small number of actual iPhone devices. To facilitate deployment of such large experiments, we develop Blizzard, a high-performance framework, with the main goals of: 1) emulating hundreds of client/peer applications on each physical node; 2) facilitating scalable servers that can efficiently communicate with thousands of clients. Our experiences offer an illustration of Tenor components in action, and their benefits in rapid system development. With Tenor, it is trivial to switch from one coding technique to another, scale up to thousands of clients, and deliver actual video to be played back even on mobile devices.

Online copy of PhD thesis

PhD_thesis.pdf (~17MB)



Master's Thesis - "VLSI Design and Implementation of a High Performance H.264 CABAC Encoder"

Abstract

One key technique for improving the coding efficiency of H.264, the state-of-the art video compression standard, is the entropy coding technique known as context adaptive binary arithmetic coder (CABAC). However, the complexity of the encoding process of CABAC is significantly higher than the traditional table driven entropy encoding schemes such as Huffman coding. CABAC is also bit serial and its multibit parallelization is extremely difficult. For a high definition video encoder with a 20 Mbps output stream, multi-giga hertz RISC (reduced instruction set computer) processors will be needed to implement the CABAC encoder.

In this work, we investigate and develop an efficient, pipelined VLSI architecture for CABAC encoding. The resulting architecture efficiently decouples and pipelines the critical stages to address the bottlenecks of renormalization, outstanding bits, and regular/bypass coding modes. The final solution is a single cycle throughput for encoding a binary symbol. An FPGA (field-programmable gate array) implementation of the proposed scheme is capable of 97 Mbps encoding rate. An ASIC (application specific integrated circuit) synthesis and simulation for a 0.18 µm process technology indicates that the design is capable of encoding 190 million binary symbols per second using an area of 0.209 sq-mm. The proposed design is thoroughly tested for several standard test contents through both software and hardware simulations with test vectors up to a 300 frames foreman content. Also, several designs for CABAC’s binarization block and its interface are explored each with different levels of hardware support.

Online copy of master's thesis

MS_thesis.pdf (~36MB)

Last modified: Aug 2010