Skip to main content

Posts

Showing posts from April, 2020

Notes on CUDA programming II - Matrix multiplication

First of all, we may need a review for the very basic CUDA programming and GPU execution principle. Here, I want to use the diagram below to demonstrate the threadings on GPU with a very simple array addition operation example. In this simple case, we only have a single block on GPU, which contains, say, 500 threads (the actual number of threads may be larger than 500 but in this example, we only need 500 of them). The part above the dashed line refers to the single block. The for loop in the middle of the bottom half is usually what we do for coding in the array addition task. Basically, we loop over all the 500 indexes to do the addition one-by-one in a serial manner. Using GPU, the working scheme is different. Instead of a single 'worker' executing our commands one after another, we now have a whole bunch of 'workers' ready at the same time, waiting for our commands so that they can do their own job, independently. Each 'worker' has his/her unique ID - t

Notes on CUDA programming I - Preparation

To understand and implement CUDA codes in practice, the very first step is to understand the allocation of threads on GPU. Fundamentally, threads are basic units on GPU where computation can happen in a parallel way. To the first level, threads are grouped into blocks , either in a 1D, 2D or 3D manner, where each thread within a certain block has its own "coordinate" - a unique index for pinning down the location of a certain thread in the block . At the second level, blocks are further grouped into grid , in a similar way as how threads are grouped into block . Here we use three diagrams showing the basic idea of such a division manner for GPU computation units and how we really locate a thread in a block and a block in a grid . Based on the understanding of the threads-block-grid for GPU computation units division, we can then move on to practical CUDA programming. Here we have a simple piece of CUDA codes from Ref. [1], which demonstrates the basics of