News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

OpenCL GPGPU Demo Application

Started by aw27, June 25, 2019, 05:37:43 AM

Previous topic - Next topic

aw27

The mandelbrot sample can be made work after some surgery, in particular removing the attempt to use the CPU for the calculations.
What it does is produce a .bmp called output.bmp in the folder of the .exe
I attach what is required to see the mandelbrot.

LiaoMi

Quote from: AW on June 26, 2019, 06:53:37 AM
The mandelbrot sample can be made work after some surgery, in particular removing the attempt to use the CPU for the calculations.
What it does is produce a .bmp called output.bmp in the folder of the .exe
I attach what is required to see the mandelbrot.

:azn: :thumbsup:

one more example (Source Code) .. MatrixMultiplication_OpenCL_cpp

hutch--

Jose,

The last one worked OK, I deleted the existing bitmap and ran it and it produced the new one.

johnsa

Quote from: HSE on June 25, 2019, 11:26:48 PM
Quote from: AW on June 25, 2019, 11:09:03 PM
I know it works in 32-bit but the movq/movsd bug is still there in 32-bit causing problems to print the result.

Well! Then there is 2 bugs because UASM32 build VMOVD instruction with EVEX disabled. 

Because in 32 bit general registers can not contain a real8, I used Kusswurm technique:

            movups reg2, xmm2
            printf("Computed sum = %.1f.\n", reg2.r8[0*8]);


were:
        RegXMM union 16
            i8  sbyte 16 dup (0)
            i16 sword 8 dup (0)
            i32 sdword 4 dup (0)
            i64 qword 2 dup (0)
            r4 real4 4 dup (0.0)
            r8 real8 2 dup (0.0)
        RegXMM ends
.data
       reg2 RegXMM <>


Computed sum = 2016.0.
Check passed.


Quick note on this, UASM has built-in types for XMM/YMM/ZMM to match the C intrinsic types. So if you don't need it to be cross-asm compatible, or put the definition in an IFDEF you can use __m128 built-in type which is already a union of structs with each element type (byte/word/dword/qword/real4/real8).

With regards to the VMOVD/Q issue and C call return I will check these out.

HSE

Quote from: johnsa on June 26, 2019, 06:08:51 PM
... to match the C intrinsic types.    ... to be cross-asm compatible

I don't know C  :biggrin:, and the idea always is compatibility  :thumbsup:.

Thanks.
Equations in Assembly: SmplMath

daydreamer

works great with mandel AW,btw gaussian blur is a 2d paint program function,so I was thinking it was related to the gaussian math,but applied in different way on image
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

aw27

Everything you always wanted to know about your GPU, CUDA, OpenCL, Vulkan etc and were afraid to ask is in this little application:


aw27


Found 2 platform(s)

*** Platforms Information ***

Platform 0 - Name: NVIDIA CUDA
Vendor: NVIDIA Corporation
Version: OpenCL 1.2 CUDA 10.2.120
Profile: FULL_PROFILE
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer

Platform 1 - Name: Intel(R) CPU Runtime for OpenCL(TM) Applications
Vendor: Intel(R) Corporation
Version: OpenCL 2.1 WINDOWS
Profile: FULL_PROFILE
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer cl_intel_vec_len_hint

*** Devices Information ***

Platform 0
Device Name: GeForce GTX 1060 6GB
Driver Version: 430.86
Device Profile: FULL_PROFILE
Device Version: OpenCL 1.2 CUDA

Platform 1
Device Name: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Driver Version: 18.1.0.0920
Device Profile: FULL_PROFILE
Device Version: OpenCL 2.1 (Build 0)

<Press any key to Exit>


My NVidia card has no OpenCL CPU capability, (it appears that all NVidias have no CPU capability). However, we can install Intel OpenCL for CPUs from their site.
WARNING: The procedure is convoluted and may fail for unforeseen reasons. You have been advised.

I proceeded this way, you may need to proceed differently:
1) Download from https://software.intel.com/en-us/articles/opencl-drivers the
Intel® CPU Runtime for OpenCL™ Applications 18.1 for Windows* OS (64bit or 32bit). If you have an AMD CPU, this will not work for you, of course, but a similar course may be available on the AMD site.
2) Install it.
3) This will disable the Cuda OpenCL drivers. Don't worry.
4) Insert in the Registry at Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors a DWORD value intelocl64.dll  of 0. Do the same for the WOW6432 hive at Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Khronos\OpenCL\Vendors and make a DWORD value intelocl32.dll of 0
For further information on this Registry procedure search for OpenCL Khronos Registry Key.
5) Now reinstall the CUDA drivers. 
You got another OpenCL platform. Wow!



I attach the updated detection application that now provides for OpenCL detection of everything (previously were GPUs).

LiaoMi

 :thumbsup:

Found 2 platform(s)

*** Platforms Information ***

Platform 0 - Name: NVIDIA CUDA
Vendor: NVIDIA Corporation
Version: OpenCL 1.2 CUDA 10.1.120
Profile: FULL_PROFILE
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options
cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing
cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer

Platform 1 - Name: Intel(R) OpenCL
Vendor: Intel(R) Corporation
Version: OpenCL 1.2
Profile: FULL_PROFILE
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread
cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64

*** Devices Information ***

Platform 0
Device Name: Quadro P4000
Driver Version: 430.39
Device Profile: FULL_PROFILE
Device Version: OpenCL 1.2 CUDA

Platform 1
Device Name: Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
Driver Version: 5.2.0.10094
Device Profile: FULL_PROFILE
Device Version: OpenCL 1.2 (Build 10094)

six_L

QuoteFound 2 platform(s)

*** Platforms Information ***

Platform 0 - Name: NVIDIA CUDA
Vendor: NVIDIA Corporation
Version: OpenCL 1.2 CUDA 8.0.0
Profile: FULL_PROFILE
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer

Platform 1 - Name: Intel(R) OpenCL
Vendor: Intel(R) Corporation
Version: OpenCL 2.0
Profile: FULL_PROFILE
Extensions: cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir

*** Devices Information ***

Platform 0
Device Name: GeForce 940M
Driver Version: 382.05
Device Profile: FULL_PROFILE
Device Version: OpenCL 1.2 CUDA

Platform 1
Device Name: Intel(R) HD Graphics 5500
Driver Version: 20.19.15.4835
Device Profile: FULL_PROFILE
Device Version: OpenCL 2.0
Device Name: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
Driver Version: 5.2.0.10094
Device Profile: FULL_PROFILE
Device Version: OpenCL 2.0 (Build 10094)

<Press any key to Exit>
Say you, Say me, Say the codes together for ever.

daydreamer

I tested gpu caps viewer and it shows opencl has 10+ compute units,but a new nvidia card shows 1500+ CUDA cores in its papers,does this means it runs more than 100 times faster?
also it shows me disappointed figures in usuable RAM and installed RAM is big difference
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

fearless

Found 2 platform(s)

*** Platforms Information ***

Platform 0 - Name: NVIDIA CUDA
Vendor: NVIDIA Corporation
Version: OpenCL 1.2 CUDA 10.0.132
Profile: FULL_PROFILE
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer

Platform 1 - Name: Experimental OpenCL 2.1 CPU Only Platform
Vendor: Intel(R) Corporation
Version: OpenCL 2.1
Profile: FULL_PROFILE
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer

*** Devices Information ***

Platform 0
Device Name: GeForce GTX 980 Ti
Driver Version: 416.81
Device Profile: FULL_PROFILE
Device Version: OpenCL 1.2 CUDA

Platform 1
Device Name: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
Driver Version: 6.3.0.1904
Device Profile: FULL_PROFILE
Device Version: OpenCL 2.1 (Build 18)

aw27

So, a lot of people had OpenCL available for the CPUs (much more than I was thinking, but now I joined the gang too  :biggrin:).

I modified again the mandelbrot C/C++ sample provided by LiaoMi, this time to make it work 1st priority for OpenCL CPU, if available, if not available it will work for the GPU.

BTW, the modifications are all in a single function, so I will leave it here:


cl_context create_context(cl_uint* num_devices) {
cl_platform_id * platforms;
cl_uint num_platforms;
cl_int err;
cl_device_id *devices;
cl_uint num_cpus=0;
cl_context context;
cl_uint n_devices = 0;
int i;

*num_devices = 0;

if (clGetPlatformIDs(0, NULL, &num_platforms) != CL_SUCCESS)
{
perror("No platforms");
exit(1);
}
platforms = malloc(sizeof(cl_platform_id)*num_platforms);

err = clGetPlatformIDs(num_platforms, platforms, NULL);
if (err < 0) {
perror("Couldn't identify a platform");
exit(1);
}
// Look for a CL_DEVICE_TYPE_CPU
for (i = 0; i < num_platforms; i++)
{
if (clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_CPU, 0, NULL, &num_cpus) == CL_SUCCESS)
{
*num_devices = num_cpus;
devices = malloc(num_cpus * sizeof(cl_device_id));
clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_CPU, num_cpus, devices, NULL);
                        break;
}

}

if (num_cpus == 0)
{

for (i = 0; i < num_platforms; i++)
{
if (clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_GPU, 0, NULL, &n_devices) == CL_SUCCESS)
{
*num_devices = n_devices;
devices = malloc(n_devices * sizeof(cl_device_id));
clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_GPU, n_devices, devices, NULL);
break;
}

}

}

context = clCreateContext(0, *num_devices, devices, NULL, NULL, &err);
check_succeeded((char*)"Creating context", err);

return context;
}


I attach also the built .exe

This is what I got:

Device 0: Intel(R) Corporation Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz


Conclusion: it works and produces the expected mandelbrot.

@daydreamer
Well, from what I read CUDA, may be 10 to 25% faster in most cases. But I have not done any tests on that (yet).




hutch--


TimoVJL

#29
An older ATI/AMD driver might not have OpenCL or AMD Crimson driver don't have it to older cards ?

https://forums.guru3d.com/threads/non-gcn-crimson-and-opencl.407531/

EDIT: OclInfoDyn.c for dynamic load OpenCL.dll or dll name from command line.
OclInfoDyn64.exe Intelocl64.dll
Number of platforms:    1

Platform:               0

        Platform Vendor:        Intel(R) Corporation
        Number of devices:      1

        Device: 0
                Type:                           2 CL_DEVICE_TYPE_CPU
                Name:                           AMD Athlon(tm) II X2 220 Processor
                Vendor:                         Intel(R) Corporation
                Available:                      Yes
                Compute Units:                  2
                Clock Frequency:                0 MHz
                Global Memory:                  8191 mb
                Max Allocateable Memory:        2048 mb
                Local Memory:                   32768 kb
May the source be with you