News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

How to use the GPU for fast work with huge arrays of large structures

Started by assembler, February 20, 2017, 02:51:55 AM

Previous topic - Next topic

assembler

I have a huge array of large structures and I want in my program to modify all structures in this array. I have a procedure that modifies one structure in the array as I want. I can use the .while loop macro to iterate through all the structures in the array and use this procedure for each structure, passing the address of the structure, but using the .while loop macro to iterate through all the structures in the array is very slow. Also the loop, cmp and conditional jumps and add instructions are not fast enough. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores. I need to use here the GPU to do this task quickly, because the GPU has thousands of cores and thousands of threads, so I can use one GPU thread for each structure in the array, but the problem is that I don't know how to use the GPU at all.

Suppose that "foo" is the procedure that modifies one structure in the array, and "bar" is the procedure that should invoke "foo" for each structure in the array, and I want to invoke "bar" this way:

invoke bar, foo, myArray, number_of_structures_in_my_array, size_of_each_structure_in_my_array

Where always first parameter/argument is the address of the procedure to invoke, second parameter/argument is the address of the array, and third parameter/argument is the length of the array, i.e. the number of structures in the array, and fourth parameter/argument is the size of each structure in the array (number of bytes per structure).

Again I can easily implement the "bar" procedure by using either the .while macro or loop instruction or cmp and conditional jump instructions and add instruction, but again this is slow and not efficient for huge arrays with large structures. In order to make "bar" fast always, even for huge arrays of large structures, "bar" must invoke the GPU in some way, but I don't know how to do this. If procedure, like "bar", already exists, then please tell me about it and how can I use it in my programs, I will thank anybody who will help me.

hutch--

I imagine you would need to obtain the GPU libraries for the video card you are using and the OS version you are using and learn how to use them. GPU libraries are not x86 assembler code and unless you have access to the type of code in the libraries, you will have to use what the vendor provides.

jj2007

http://www.iwocl.org/resources/opencl-libraries-and-toolkits/ should give you some ideas. One question is what to do if your machine is not running on NVIDIA ::)

assembler

Quote from: jj2007 on February 20, 2017, 04:28:54 AM
http://www.iwocl.org/resources/opencl-libraries-and-toolkits/ should give you some ideas. One question is what to do if your machine is not running on NVIDIA ::)

My machine does run on Nvidia and I don't know what to do in the website that your link brings me in.

qWord

Quote from: assembler on February 20, 2017, 02:51:55 AM. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores.
It is a common strategy to use few threads and partition the array corresponding. For example, two threads per physical core and each of them process several millions structures.
MREAL macros - when you need floating point arithmetic while assembling!

assembler

Quote from: qWord on February 20, 2017, 04:57:25 AM
Quote from: assembler on February 20, 2017, 02:51:55 AM. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores.
It is a common strategy to use few threads and partition the array corresponding. For example, two threads per physical core and each of them process several millions structures.
Are you serious?

Raistlin

He \ She is probably correct. (unsure of motive vs intellect) :icon_rolleyes: But it is normal for server type implementations
Re: multi  server  Query from clients. Think grid. Possibly
thinking about it, actual genius answer.....
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

qWord

Quote from: assembler on February 20, 2017, 06:10:46 AM
Quote from: qWord on February 20, 2017, 04:57:25 AM
Quote from: assembler on February 20, 2017, 02:51:55 AM. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores.
It is a common strategy to use few threads and partition the array corresponding. For example, two threads per physical core and each of them process several millions structures.
Are you serious?
What means "HUGE" and "modify"?
MREAL macros - when you need floating point arithmetic while assembling!

assembler

Quote from: qWord on February 20, 2017, 06:41:39 AM
Quote from: assembler on February 20, 2017, 06:10:46 AM
Quote from: qWord on February 20, 2017, 04:57:25 AM
Quote from: assembler on February 20, 2017, 02:51:55 AM. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores.
It is a common strategy to use few threads and partition the array corresponding. For example, two threads per physical core and each of them process several millions structures.
Are you serious?
What means "HUGE" and "modify"?
"HUGE" means extremely large array, for instance array of millions structures, as said earlier
"modify" means "change", for instance if some member of some structure at index 1005 was the dword 12, so after modification it is now 31.

Raistlin

As I said - Genius. Qword does have the crux of the
matter. Multithread makes sense when confronted
by 'HuGE'  data structures. Provided Huge = Really Huge
Else : Think normal....

My HuGE = Mb per struct OR else Rather use DBMS
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

jj2007

Use one thread per core (more), and concentrate your learning efforts on SIMD instructions that can process your structures in parallel.

Or go for the GPU stuff, but the learning curve will be steep.

qWord

Quote from: assembler on February 20, 2017, 06:44:38 AM
"HUGE" means extremely large array, for instance array of millions structures, as said earlier
"modify" means "change", for instance if some member of some structure at index 1005 was the dword 12, so after modification it is now 31.
How do you get the new values? Anyway, try it as proposed - its quickly implemented.

BTW: you can't expect that people read all the posts you did in other threads before...
MREAL macros - when you need floating point arithmetic while assembling!

assembler

Quote from: qWord on February 20, 2017, 07:07:04 AM
Quote from: assembler on February 20, 2017, 06:44:38 AM
"HUGE" means extremely large array, for instance array of millions structures, as said earlier
"modify" means "change", for instance if some member of some structure at index 1005 was the dword 12, so after modification it is now 31.
How do you get the new values? Anyway, try it as proposed - its quickly implemented.

BTW: you can't expect that people read all the posts you did in other threads before...

QuoteBTW: you can't expect that people read all the posts you did in other threads before...

What is the connection? My other threads don't relate to this thread at all.

qWord

Quote from: assembler on February 20, 2017, 07:39:52 AM
What is the connection? My other threads don't relate to this thread at all.
just a misunderstanding of "as said earlier" from my side - ignore it.
MREAL macros - when you need floating point arithmetic while assembling!

assembler

Quote from: qWord on February 20, 2017, 07:46:47 AM
Quote from: assembler on February 20, 2017, 07:39:52 AM
What is the connection? My other threads don't relate to this thread at all.
just a misunderstanding of "as said earlier" from my side - ignore it.

I said "for instance array of millions structures, as said earlier"
And someone else said earlier "For example, two threads per physical core and each of them process several millions structures."

That was my mean when I said "as said earlier", I didn't refer to my other & past threads in this forum.
Do you understand this now?