fAddGPU proc uses esi edi lpVertexDest:dword,lpVertexA:dword,lpVertexB:dword
LOCAL m[16]:dword
mov esi,lpVertexA
mov edi,lpVertexB
invoke glPushMatrix
invoke glLoadIdentity
invoke glTranslatef,[esi].VERTEX.x,[esi].VERTEX.y,[esi].VERTEX.z
invoke glTranslatef,[edi].VERTEX.x,[edi].VERTEX.y,[edi].VERTEX.z
invoke glGetFloatv,GL_MODELVIEW_MATRIX,addr m
invoke glPopMatrix
lea edi,m
add edi,12*4
mov esi,lpVertexDest
invoke MemCopy,edi,esi,4*4
ret
fAddGPU endp
Im curious and testing the speed difference between CPU and GPU which doing the same task but on different processor. On multiplying a vector, GPU is far faster than CPU, about 60 times faster, but on this test, where I want to add a vector, I got a less fast result. And it is could be, indeed, my code is not optimized enough, but the result might help you analyzing your algo better.
Yesterday Im playing with GPU Assembler instructions and I hope within 2 months I am able to understand how to make a simple code and understand how to obtain the output value from each GPU code.