The MASM Forum

General => The Campus => Topic started by: assembler on February 20, 2017, 02:51:55 AM

Title: How to use the GPU for fast work with huge arrays of large structures
Post by: assembler on February 20, 2017, 02:51:55 AM
I have a huge array of large structures and I want in my program to modify all structures in this array. I have a procedure that modifies one structure in the array as I want. I can use the .while loop macro to iterate through all the structures in the array and use this procedure for each structure, passing the address of the structure, but using the .while loop macro to iterate through all the structures in the array is very slow. Also the loop, cmp and conditional jumps and add instructions are not fast enough. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores. I need to use here the GPU to do this task quickly, because the GPU has thousands of cores and thousands of threads, so I can use one GPU thread for each structure in the array, but the problem is that I don't know how to use the GPU at all.

Suppose that "foo" is the procedure that modifies one structure in the array, and "bar" is the procedure that should invoke "foo" for each structure in the array, and I want to invoke "bar" this way:

invoke bar, foo, myArray, number_of_structures_in_my_array, size_of_each_structure_in_my_array

Where always first parameter/argument is the address of the procedure to invoke, second parameter/argument is the address of the array, and third parameter/argument is the length of the array, i.e. the number of structures in the array, and fourth parameter/argument is the size of each structure in the array (number of bytes per structure).

Again I can easily implement the "bar" procedure by using either the .while macro or loop instruction or cmp and conditional jump instructions and add instruction, but again this is slow and not efficient for huge arrays with large structures. In order to make "bar" fast always, even for huge arrays of large structures, "bar" must invoke the GPU in some way, but I don't know how to do this. If procedure, like "bar", already exists, then please tell me about it and how can I use it in my programs, I will thank anybody who will help me.
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: hutch-- on February 20, 2017, 03:44:00 AM
I imagine you would need to obtain the GPU libraries for the video card you are using and the OS version you are using and learn how to use them. GPU libraries are not x86 assembler code and unless you have access to the type of code in the libraries, you will have to use what the vendor provides.
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: jj2007 on February 20, 2017, 04:28:54 AM
http://www.iwocl.org/resources/opencl-libraries-and-toolkits/ should give you some ideas. One question is what to do if your machine is not running on NVIDIA ::)
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: assembler on February 20, 2017, 04:51:23 AM
Quote from: jj2007 on February 20, 2017, 04:28:54 AM
http://www.iwocl.org/resources/opencl-libraries-and-toolkits/ should give you some ideas. One question is what to do if your machine is not running on NVIDIA ::)

My machine does run on Nvidia and I don't know what to do in the website that your link brings me in.
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: qWord on February 20, 2017, 04:57:25 AM
Quote from: assembler on February 20, 2017, 02:51:55 AM. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores.
It is a common strategy to use few threads and partition the array corresponding. For example, two threads per physical core and each of them process several millions structures.
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: assembler on February 20, 2017, 06:10:46 AM
Quote from: qWord on February 20, 2017, 04:57:25 AM
Quote from: assembler on February 20, 2017, 02:51:55 AM. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores.
It is a common strategy to use few threads and partition the array corresponding. For example, two threads per physical core and each of them process several millions structures.
Are you serious?
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: Raistlin on February 20, 2017, 06:34:07 AM
He \ She is probably correct. (unsure of motive vs intellect) :icon_rolleyes: But it is normal for server type implementations
Re: multi  server  Query from clients. Think grid. Possibly
thinking about it, actual genius answer.....
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: qWord on February 20, 2017, 06:41:39 AM
Quote from: assembler on February 20, 2017, 06:10:46 AM
Quote from: qWord on February 20, 2017, 04:57:25 AM
Quote from: assembler on February 20, 2017, 02:51:55 AM. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores.
It is a common strategy to use few threads and partition the array corresponding. For example, two threads per physical core and each of them process several millions structures.
Are you serious?
What means "HUGE" and "modify"?
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: assembler on February 20, 2017, 06:44:38 AM
Quote from: qWord on February 20, 2017, 06:41:39 AM
Quote from: assembler on February 20, 2017, 06:10:46 AM
Quote from: qWord on February 20, 2017, 04:57:25 AM
Quote from: assembler on February 20, 2017, 02:51:55 AM. CreateThread is not helpful, because the array is HUGE, and I can't use CreateThread for each structure in the array, my CPU has only few cores.
It is a common strategy to use few threads and partition the array corresponding. For example, two threads per physical core and each of them process several millions structures.
Are you serious?
What means "HUGE" and "modify"?
"HUGE" means extremely large array, for instance array of millions structures, as said earlier
"modify" means "change", for instance if some member of some structure at index 1005 was the dword 12, so after modification it is now 31.
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: Raistlin on February 20, 2017, 06:49:06 AM
As I said - Genius. Qword does have the crux of the
matter. Multithread makes sense when confronted
by 'HuGE'  data structures. Provided Huge = Really Huge
Else : Think normal....

My HuGE = Mb per struct OR else Rather use DBMS
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: jj2007 on February 20, 2017, 06:56:04 AM
Use one thread per core (more (http://superuser.com/questions/663166/single-threaded-qaud-core-v-s-hyper-threading-dual-core)), and concentrate your learning efforts on SIMD instructions that can process your structures in parallel.

Or go for the GPU stuff, but the learning curve will be steep.
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: qWord on February 20, 2017, 07:07:04 AM
Quote from: assembler on February 20, 2017, 06:44:38 AM
"HUGE" means extremely large array, for instance array of millions structures, as said earlier
"modify" means "change", for instance if some member of some structure at index 1005 was the dword 12, so after modification it is now 31.
How do you get the new values? Anyway, try it as proposed - its quickly implemented.

BTW: you can't expect that people read all the posts you did in other threads before...
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: assembler on February 20, 2017, 07:39:52 AM
Quote from: qWord on February 20, 2017, 07:07:04 AM
Quote from: assembler on February 20, 2017, 06:44:38 AM
"HUGE" means extremely large array, for instance array of millions structures, as said earlier
"modify" means "change", for instance if some member of some structure at index 1005 was the dword 12, so after modification it is now 31.
How do you get the new values? Anyway, try it as proposed - its quickly implemented.

BTW: you can't expect that people read all the posts you did in other threads before...

QuoteBTW: you can't expect that people read all the posts you did in other threads before...

What is the connection? My other threads don't relate to this thread at all.
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: qWord on February 20, 2017, 07:46:47 AM
Quote from: assembler on February 20, 2017, 07:39:52 AM
What is the connection? My other threads don't relate to this thread at all.
just a misunderstanding of "as said earlier" from my side - ignore it.
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: assembler on February 20, 2017, 08:40:14 AM
Quote from: qWord on February 20, 2017, 07:46:47 AM
Quote from: assembler on February 20, 2017, 07:39:52 AM
What is the connection? My other threads don't relate to this thread at all.
just a misunderstanding of "as said earlier" from my side - ignore it.

I said "for instance array of millions structures, as said earlier"
And someone else said earlier "For example, two threads per physical core and each of them process several millions structures."

That was my mean when I said "as said earlier", I didn't refer to my other & past threads in this forum.
Do you understand this now?
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: jj2007 on February 20, 2017, 10:06:09 AM
Quote from: assembler on February 20, 2017, 02:51:55 AMI hav e a huge array of large structures and I want in my program to modify all structures in this array. I have a procedure that modifies one structure in the array as I want. I can use the .while loop macro to iterate through all the structures in the array and use this procedure for each structure, passing the address of the structure, but using the .while loop macro to iterate through all the structures in the array is very slow. Also the loop, cmp and conditional jumps and add instructions are not fast enough.

OK, maybe your program is so slow that it takes several days to process the whole dataset. Does that justify investing several weeks into the mysteries of GPU programming?

It would be helpful if you replaced "HUGE" etc with real figures. How big is your dataset, which operations do you need, and how long does it currently take? Did you already use the relevant parallel SIMD instructions?
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: hutch-- on February 20, 2017, 11:29:56 AM
assembler,

Lets be clear about what is going on here, you have posted no code, no viable description of what you are trying to do and expect members to go and read your posting as if you are doing them a favour. Filtering the data through your interpretation and opinions is an impossible task for anyone to respond to accurately. Your bad manners in response to members trying to help you will only end up with one result, the pleasure of finding another venue to act in an arrogant manner. You have had strike 2, there will be no strike 3.
Title: Re: How to use the GPU for fast work with huge arrays of large structures
Post by: K_F on March 02, 2017, 04:45:43 AM
I've been looking at this for while now.. and this is the low down.

Nvidia and AMD(not sure here), do not give out register specifics and structures, nor their GPU asm instruction mnenomics.
There is no real specific assembler only a C compiler (and other shyt stuff) AFAIK for the GPU's.

All the dev kits are in C (ugh), and they want you to use their pre-compiled libraries, so if you're willing to play around with C and then try decifer the opcodes from there .. good luck as the GPU structures and opcodes seem to change with each generation of GPU. :eusa_clap: