![]() ![]() In this post, I would like to explain a basic but confusing concept of CUDA programming: Thread Hierarchies. We will not cover all aspects, but it could be a nice first step. If you are starting with CUDA and want to know how to setup your environment, using VS2017, I recommend you to read this post. To get started, let’s write something straightforward to run on the CPU. Now, let’s change this code to run on the GPU. In the example, we specified we are creating a 2-dimensional structure (3x3x1). To use a dim3 as a grid dimension, leave out the last argument or set it to one. We can use the dim3 structure to specify dimensions for blocks and threads. Both blocks and grids use this type even though grids are 2D. You can declare dimensions like this: dim3 myDimensions (1,2,3), signifying the ranges on each dimension. Let’s remember some concepts we learned in a previous post: The cudaDeviceSyncronize function determines that all the processing on the GPU must be done before continuing. CUDA provides a handy type, dim3 to keep track of these dimensions. The _global_ keyword indicates that the following function will run on the GPU.The code executed on the CPU is referred to as host code, and code executed on the GPU is referred to as device code.It is required that functions defined with the _global_ keyword return type void.When calling a function to run on the GPU, we call this function a kernel (In the example, printHelloGPU is the kernel).We are using in total 161616 blocks and 1688. And here dim3 (16,8,8) says how threads are structured in each block. When launching a kernel, we must provide an execution configuration, which is done by using the > syntax.Īt a high level, the execution configuration allows programmers to specify the thread hierarchy for a kernel launch, which defines the number of thread blocks, as well as. Here, dim3 (16,16,16) says how the blocks are structured in the grid.How many threads to execute in each block. d, n n sizeof(float), cudaMemcpyHostToDevice)) // Run kernel dim3 dimBlock(16, 16) dim3 dimGrid(divup(n. Notice, in the previous example, the kernel is launching with 1 block of threads (the first execution configuration argument) which contains 1 thread (the second configuration argument). ![]() The execution configuration allows programmers to specify details about launching the kernel to run in parallel on multiple GPU threads. Thus, under the assumption that a kernel called printHelloGPU has been defined, the following are true: The syntax for this is:Ī kernel is executed once for every thread in every thread block configured when the kernel is launched. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |