Author Topic: OpenCL GPGPU Demo Application  (Read 1020 times)

AW

  • Member
  • *****
  • Posts: 2232
  • Let's Make ASM Great Again!
OpenCL GPGPU Demo Application
« on: June 25, 2019, 05:37:43 AM »
This is an OpenCL GPGPU application which computes a sum of floats from 0 to 63. As we know from high school, there is a formula invented by Gauss for that and the expected result is 2016.0

- The code is strongly based on this Dr. Dobbs article. However, the kernel source is supplied in the resources, not loaded from the directory as in the Dr. Dobbs example.

- It was tested successfully in Intel, AMD and NVidia GPUs.
- It was produced with UASM.

Code: [Select]
Computed sum = 2016.0.
Check passed.
<Press any key to Exit>

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 6582
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: OpenCL GPGPU Demo Application
« Reply #1 on: June 25, 2019, 01:40:49 PM »
 :biggrin:

Damn, I was expecting a graphics extravaganza.

Computed sum = 2016.0.
Check passed.

<Press any key to Exit>

hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

AW

  • Member
  • *****
  • Posts: 2232
  • Let's Make ASM Great Again!
Re: OpenCL GPGPU Demo Application
« Reply #2 on: June 25, 2019, 05:30:30 PM »
:biggrin:
Damn, I was expecting a graphics extravaganza.
May be someone will do at least a Mandelbrot?  :icon_idea:

HSE

  • Member
  • *****
  • Posts: 1079
  • <AMD>< 7-32>
Re: OpenCL GPGPU Demo Application
« Reply #3 on: June 25, 2019, 08:39:43 PM »
Hi Atelier!

The kernel doesn't work in 32 bit. How you builded it?

LATER: .res I obtained is not exactly identical but look good. Any way clCreateProgramWithSource fail.


LATER2: my mistake creating context  :thumbsup:
« Last Edit: June 25, 2019, 10:05:39 PM by HSE »

daydreamer

  • Member
  • ****
  • Posts: 894
  • watch Chebyshev on the backside of the Moon
Re: OpenCL GPGPU Demo Application
« Reply #4 on: June 25, 2019, 09:46:35 PM »
:biggrin:
Damn, I was expecting a graphics extravaganza.
May be someone will do at least a Mandelbrot?  :icon_idea:
it works ok here,but I expect something more blurry(like in gaussian blur)  :biggrin:
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
what cpu handle "press any key"? any cpu of course(from C#) :D

jj2007

  • Member
  • *****
  • Posts: 9635
  • Assembler is fun ;-)
    • MasmBasic
Re: OpenCL GPGPU Demo Application
« Reply #5 on: June 25, 2019, 09:48:00 PM »
"Check passed" - but why does it take ages?

AW

  • Member
  • *****
  • Posts: 2232
  • Let's Make ASM Great Again!
Re: OpenCL GPGPU Demo Application
« Reply #6 on: June 25, 2019, 11:09:03 PM »
LATER2: my mistake creating context  :thumbsup:

I know it works in 32-bit but the movq bug is still there in 32-bit causing problems to print the result.
« Last Edit: June 26, 2019, 04:54:37 PM by AW »

AW

  • Member
  • *****
  • Posts: 2232
  • Let's Make ASM Great Again!
Re: OpenCL GPGPU Demo Application
« Reply #7 on: June 25, 2019, 11:13:29 PM »
"Check passed" - but why does it take ages?
It does not take ages here, any idea?

it works ok here,but I expect something more blurry(like in gaussian blur)  :biggrin:
I am not an expert in blurry stuff.  :sad:

HSE

  • Member
  • *****
  • Posts: 1079
  • <AMD>< 7-32>
Re: OpenCL GPGPU Demo Application
« Reply #8 on: June 25, 2019, 11:26:48 PM »
I know it works in 32-bit but the movq/movsd bug is still there in 32-bit causing problems to print the result.

Well! Then there is 2 bugs because UASM32 build VMOVD instruction with EVEX disabled. 

Because in 32 bit general registers can not contain a real8, I used Kusswurm technique:
Code: [Select]
            movups reg2, xmm2
            printf("Computed sum = %.1f.\n", reg2.r8[0*8]);

were:
Code: [Select]
        RegXMM union 16
            i8  sbyte 16 dup (0)
            i16 sword 8 dup (0)
            i32 sdword 4 dup (0)
            i64 qword 2 dup (0)
            r4 real4 4 dup (0.0)
            r8 real8 2 dup (0.0)
        RegXMM ends
.data
       reg2 RegXMM <>

Code: [Select]
Computed sum = 2016.0.
Check passed.
« Last Edit: June 26, 2019, 01:27:34 AM by HSE »

AW

  • Member
  • *****
  • Posts: 2232
  • Let's Make ASM Great Again!
Re: OpenCL GPGPU Demo Application
« Reply #9 on: June 26, 2019, 02:08:43 AM »
This is a 32-bit version (without using the Kusswurm union).
In 32-bit, C-Style Calling does not appear to support assignment of return values (or may be I am missing something).

HSE

  • Member
  • *****
  • Posts: 1079
  • <AMD>< 7-32>
Re: OpenCL GPGPU Demo Application
« Reply #10 on: June 26, 2019, 02:48:25 AM »
Fantastic  :thumbsup:

32-bit return values perhaps rely on other secret option  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 9635
  • Assembler is fun ;-)
    • MasmBasic
Re: OpenCL GPGPU Demo Application
« Reply #11 on: June 26, 2019, 03:51:15 AM »
"Check passed" - but why does it take ages?
It does not take ages here, any idea?

Some initialisation required maybe? It takes about 1.5 seconds until I see the result, while launching a complex graphics application takes only a few milliseconds. Note this is nothing against your code (compliments), I am just curious why there is a delay. You are not generating Millions of numbers somewhere afaics...

LiaoMi

  • Member
  • ****
  • Posts: 573
Re: OpenCL GPGPU Demo Application
« Reply #12 on: June 26, 2019, 04:11:48 AM »
:biggrin:
Damn, I was expecting a graphics extravaganza.
May be someone will do at least a Mandelbrot?  :icon_idea:

Hi AW,

the program says that the platform is invalid, apparently I am doing something wrong, on the web, I read that function calls in the nvidia system looks different ... most likely this is the problem  :undecided:

Quote
OpenCLMandelBrot>ucl
Found 2 platform(s)

*** Platforms Information ***

Platform 0 - Name: NVIDIA CUDA
Vendor: NVIDIA Corporation
Version: OpenCL 1.2 CUDA 10.1.120
Profile: FULL_PROFILE
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer

Platform 1 - Name: Intel(R) OpenCL
Vendor: Intel(R) Corporation
Version: OpenCL 1.2
Profile: FULL_PROFILE
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64

*** Devices Information ***

Platform 0
Device Name: Quadro P4000
Driver Version: 430.39
Device Profile: FULL_PROFILE
Device Version: OpenCL 1.2 CUDA
clGetDeviceIDs failed.

Quote
Computed sum = 2016.0.
Check passed.

<Press any key to Exit>

AW

  • Member
  • *****
  • Posts: 2232
  • Let's Make ASM Great Again!
Re: OpenCL GPGPU Demo Application
« Reply #13 on: June 26, 2019, 05:19:51 AM »
Quote
the program says that the platform is invalid,

We are 2 getting an invalid platform  :sad:
I will have a look tomorrow and see if I can figure out what's going on.

Some initialisation required maybe? It takes about 1.5 seconds until I see the result, while launching a complex graphics application takes only a few milliseconds. Note this is nothing against your code (compliments), I am just curious why there is a delay. You are not generating Millions of numbers somewhere afaics...

Some people report delays. Searching google "why opencl slow to initialize"....

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 6582
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: OpenCL GPGPU Demo Application
« Reply #14 on: June 26, 2019, 05:36:10 AM »
My Win 10 64 pro does not like either.

A:\OpenCL\Release>openclmandelbrot
Getting device IDs: Invalid platform
Creating context: Invalid device
Getting context info: Invalid context
Getting device info: Invalid device
Device 0:

x64 version

A:\OpenCL\x64\Release>openclmandelbrot
Getting device IDs: Invalid platform
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy: