Monday, September 12, 2011

Graphic Processing Units (GPUs): Computer Architecture, Fifth Edition: A Quantitative Approach, 5th edition , Chapter 4

This reading talks about 3 architectural changes that can increase computation on data: vector architecture, SIMD instruction set extensions for multimedias, and GPUs.

The reading first goes through each of these architectures in close detail. Vector architectures were available before, but unpopular. Increased memory bandwidth and cheaper transistors have allowed them to become viable options today. Bunching operations on a line of operands helps the vector architecture gain great speedups, especially in loops and matrices. In addition, various additions have been made to take care of problems like data dependencies between operands (chaining) and handling of if clauses in the middle of vector operations (vector-mask control)

SIMD extensions are a sort of mini-vector operations. Instead of performing vector operations on 64 64-bit operands, the goal is to target multimedia processing applications, where the common sizes for units are 8-bit or 16-bit. So, the instructions might be on 32 8-bit operands. The SIMD extensions offer a compromise between vector architectures and today's popular architectures.

GPUs are a completely different beast than the other two architectural changes. GPUs are basically processing units with thousands of parallel SIMD lanes, meaning all sorts of parallelization can be exploited with GPUs. Nvidia's CUDA allows programmers to easily express what they wish to be run on the host CPU and what to be run on the GPU.

The reading then goes into a brief tutorial of how to detect loop-level parallelism, and getting rid of dependent computations like recurrence. The paper also discusses other architectural changes that may affect the three main architectural changes and lists factors in these changes: memory bandwidth, compute bandwidth, cache benefits, gather-scatter, and synchronization.

The reading concludes with some fallacies and pitfalls about the previously mentioned architectures with statements, none of which aren't too hard to understand.

Overall, the reading did a great job of discussing the architectural support available these days for data-level parallelism.

No comments:

Post a Comment