Hello World\n
The purpose of this blog is to collect some thought, notes and algorithms of mine regarding GPGPU computations in general and some specific neat OpenCL code. We'll start with a small summary of the concepts and links to a few relevant sources to get you started with writing OpenCL code -- the intention of which is mostly for me remember the links and not to make a complete getting started tutorial in OpenCL.
First of, GPGPU stands for general purpose GPU (computation) where GPU in term stands for graphic processing unit. These units are the very powerful massively parallel processors that are in use today mainly for rendering graphics, but which incidentally also happens to be good for solving other parallelizable tasks requiring mostly floating point operations. When looking at the raw computational performance of these devices they are often one or two orders of magnitude (10-100) times faster than modern CPU's. However, the power of these devices come mainly from a higher level of parallelism rather than the clock frequency. As such, programming these devices efficiently is a challenge and require very different development tools as well as algorithms in order to be efficient.
CUDA is a language specific toolkit developed by NVidia that allows the programmer to perform general purpose computations (convolutions, fourier transforms, signal processing, bitcoin mining, ...) of NVidia cards. Since CUDA was one of the first widespread toolkits dedicated to GPGPU computations it has a large adoption in the high-performance computing community and there exists many GPU cluster machines (supercomputers build from GPU's) that perform computationally heavy tasks.
The main drawback (IMHO) of CUDA is that it restricts the user to NVidia specific platforms, and that it lacks support for some features. The main advantage is the ease of use coming from the mature toolkit and the integration into the programming language itself. (The later is also a drawback with it...)
OpenCL is an open standard maintained by Khronos (the same group that develops OpenGL) that serves to give a standardized interface for compute devices to be used from any programming language through a standardized API interfacing to one or more drivers. A compute device can here be a GPU or any other device that can perform parallel computations such as a multi-core CPU or a Cell-processor.
The main advantage (IMHO) with OpenCL is that is an Open Standard, that it works with a wide range of devices and that is (host) language agnostic. The main drawback, is that it is language agnostic. Using OpenCL directly from your C/C++/Python etc. code means a significant overhead in glue code - but once you have this in place it gives you alot of flexibility as well as very good interfacing with OpenGL.
For the purpose of this blog I will write mainly on the development of algorithms and neat tricks for OpenCL and I use mainly AMD/ATI's OpenCL drivers and to some lesser extent Intel's OpenCL drivers.
- AMD 5870 / 5870M / 6870 cards running on AMD APP drivers
- AMD CPUs (x4, x6) running on AMD APP drivers
- Intel core i7 CPUs (x4) running on AMD APP drivers
- Intel core i7 CPUs (x4) running on Intel's OpenCL drivers
In addition to the above device/driver combinations a common other combination is to use NVidia's OpenCL drivers. Currently I don't have access to modern NVidia hardware, but perhaps I'll get one for measuring differences at a later point.
No comments:
Post a Comment