News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

OpenCL GPGPU Demo Application

Started by aw27, June 25, 2019, 05:37:43 AM

Previous topic - Next topic

aw27

This is an OpenCL GPGPU application which computes a sum of floats from 0 to 63. As we know from high school, there is a formula invented by Gauss for that and the expected result is 2016.0

- The code is strongly based on this Dr. Dobbs article. However, the kernel source is supplied in the resources, not loaded from the directory as in the Dr. Dobbs example.

- It was tested successfully in Intel, AMD and NVidia GPUs.
- It was produced with UASM.


Computed sum = 2016.0.
Check passed.
<Press any key to Exit>

hutch--

 :biggrin:

Damn, I was expecting a graphics extravaganza.

Computed sum = 2016.0.
Check passed.

<Press any key to Exit>


aw27

Quote from: hutch-- on June 25, 2019, 01:40:49 PM
:biggrin:
Damn, I was expecting a graphics extravaganza.
May be someone will do at least a Mandelbrot?  :icon_idea:

HSE

#3
Hi Atelier!

The kernel doesn't work in 32 bit. How you builded it?

LATER: .res I obtained is not exactly identical but look good. Any way clCreateProgramWithSource fail.


LATER2: my mistake creating context  :thumbsup:
Equations in Assembly: SmplMath

daydreamer

Quote from: AW on June 25, 2019, 05:30:30 PM
Quote from: hutch-- on June 25, 2019, 01:40:49 PM
:biggrin:
Damn, I was expecting a graphics extravaganza.
May be someone will do at least a Mandelbrot?  :icon_idea:
it works ok here,but I expect something more blurry(like in gaussian blur)  :biggrin:
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

"Check passed" - but why does it take ages?

aw27

#6
Quote from: HSE on June 25, 2019, 08:39:43 PM
LATER2: my mistake creating context  :thumbsup:

I know it works in 32-bit but the movq bug is still there in 32-bit causing problems to print the result.

aw27

Quote from: jj2007 on June 25, 2019, 09:48:00 PM
"Check passed" - but why does it take ages?
It does not take ages here, any idea?

Quote from: daydreamer on June 25, 2019, 09:46:35 PM
it works ok here,but I expect something more blurry(like in gaussian blur)  :biggrin:
I am not an expert in blurry stuff.  :sad:

HSE

#8
Quote from: AW on June 25, 2019, 11:09:03 PM
I know it works in 32-bit but the movq/movsd bug is still there in 32-bit causing problems to print the result.

Well! Then there is 2 bugs because UASM32 build VMOVD instruction with EVEX disabled. 

Because in 32 bit general registers can not contain a real8, I used Kusswurm technique:

            movups reg2, xmm2
            printf("Computed sum = %.1f.\n", reg2.r8[0*8]);


were:
        RegXMM union 16
            i8  sbyte 16 dup (0)
            i16 sword 8 dup (0)
            i32 sdword 4 dup (0)
            i64 qword 2 dup (0)
            r4 real4 4 dup (0.0)
            r8 real8 2 dup (0.0)
        RegXMM ends
.data
       reg2 RegXMM <>


Computed sum = 2016.0.
Check passed.
Equations in Assembly: SmplMath

aw27

This is a 32-bit version (without using the Kusswurm union).
In 32-bit, C-Style Calling does not appear to support assignment of return values (or may be I am missing something).

HSE

Fantastic  :thumbsup:

32-bit return values perhaps rely on other secret option  :biggrin:
Equations in Assembly: SmplMath

jj2007

Quote from: AW on June 25, 2019, 11:13:29 PM
Quote from: jj2007 on June 25, 2019, 09:48:00 PM
"Check passed" - but why does it take ages?
It does not take ages here, any idea?

Some initialisation required maybe? It takes about 1.5 seconds until I see the result, while launching a complex graphics application takes only a few milliseconds. Note this is nothing against your code (compliments), I am just curious why there is a delay. You are not generating Millions of numbers somewhere afaics...

LiaoMi

Quote from: AW on June 25, 2019, 05:30:30 PM
Quote from: hutch-- on June 25, 2019, 01:40:49 PM
:biggrin:
Damn, I was expecting a graphics extravaganza.
May be someone will do at least a Mandelbrot?  :icon_idea:

Hi AW,

the program says that the platform is invalid, apparently I am doing something wrong, on the web, I read that function calls in the nvidia system looks different ... most likely this is the problem  :undecided:

QuoteOpenCLMandelBrot>ucl
Found 2 platform(s)

*** Platforms Information ***

Platform 0 - Name: NVIDIA CUDA
Vendor: NVIDIA Corporation
Version: OpenCL 1.2 CUDA 10.1.120
Profile: FULL_PROFILE
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer

Platform 1 - Name: Intel(R) OpenCL
Vendor: Intel(R) Corporation
Version: OpenCL 1.2
Profile: FULL_PROFILE
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64

*** Devices Information ***

Platform 0
Device Name: Quadro P4000
Driver Version: 430.39
Device Profile: FULL_PROFILE
Device Version: OpenCL 1.2 CUDA
clGetDeviceIDs failed.

QuoteComputed sum = 2016.0.
Check passed.

<Press any key to Exit>

aw27

Quote
the program says that the platform is invalid,

We are 2 getting an invalid platform  :sad:
I will have a look tomorrow and see if I can figure out what's going on.

Quote from: jj2007 on June 26, 2019, 03:51:15 AM
Some initialisation required maybe? It takes about 1.5 seconds until I see the result, while launching a complex graphics application takes only a few milliseconds. Note this is nothing against your code (compliments), I am just curious why there is a delay. You are not generating Millions of numbers somewhere afaics...

Some people report delays. Searching google "why opencl slow to initialize"....

hutch--

My Win 10 64 pro does not like either.

A:\OpenCL\Release>openclmandelbrot
Getting device IDs: Invalid platform
Creating context: Invalid device
Getting context info: Invalid context
Getting device info: Invalid device
Device 0:

x64 version

A:\OpenCL\x64\Release>openclmandelbrot
Getting device IDs: Invalid platform