News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

SSE Math Library

Started by Farabi, June 16, 2012, 06:22:27 PM

Previous topic - Next topic

Farabi




fsmVERTEX32 struct
X real4 0.
Y real4 0.
Z real4 0.
W real4 0.
fsmVERTEX32 ends


.data

fsmVecSub proc uses esi edi lpVDest:dword,lpVA:dword,lpVB:dword

mov esi,lpVA
mov edi,lpVB
mov eax,lpVDest

movups xmm0,[esi]
movups xmm1,[edi]
subps xmm0,xmm1
movups [eax],xmm0


ret
fsmVecSub endp

fsmVecAdd proc uses esi edi lpVDest:dword,lpVA:dword,lpVB:dword

mov esi,lpVA
mov edi,lpVB
mov eax,lpVDest

movups xmm0,[esi]
movups xmm1,[edi]
addps xmm0,xmm1
movups [eax],xmm0


ret
fsmVecAdd endp

fsmVecMul proc uses esi edi lpVDest:dword,lpVA:dword,lpVB:dword

mov esi,lpVA
mov edi,lpVB
mov eax,lpVDest

movups xmm0,[esi]
movups xmm1,[edi]
mulps xmm0,xmm1
movups [eax],xmm0


ret
fsmVecMul endp



Anyone done it? It no one done it, I'll do it.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

Farabi

After I read the Documentation of the SSE1-4 I think FPU will still be on the chip for a long time, except SSE had a geometry calculation instruction.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

qWord

Wrapping basic instructions by a function or even a macro makes not much sense. Also you are using the unaligned version, which are slower than their aligned counterpart.
A library should add functions, which are not available through SSEx. f.e.: sin,arcsin,exp,ln,...
Some time back, Clive pointed out a good book about this topic: "Math Toolkit for Real-Time Development", Jack W. Crenshaw
MREAL macros - when you need floating point arithmetic while assembling!

RuiLoureiro

Farabi,
            Is it to add, sub, mul etc. only 2 elements
            or an array of elements ?

Farabi

Quote from: RuiLoureiro on June 16, 2012, 11:34:54 PM
Farabi,
            Is it to add, sub, mul etc. only 2 elements
            or an array of elements ?

Yeah, only for 2 VERTEX. Where the member is X-Y-Z-W real4 number.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

Farabi

Quote from: qWord on June 16, 2012, 10:55:03 PM
Wrapping basic instructions by a function or even a macro makes not much sense. Also you are using the unaligned version, which are slower than their aligned counterpart.
A library should add functions, which are not available through SSEx. f.e.: sin,arcsin,exp,ln,...
Some time back, Clive pointed out a good book about this topic: "Math Toolkit for Real-Time Development", Jack W. Crenshaw

Well, I think the library should save some times for some people. If anyone done it before, I think I'll just wasting time re-inventing it.

Anyway, the code above is only part of it. Im not done yet.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

dedndave

#6
your code might be useful to show us newbies how to do it   :P

but, i think what qWord and Rui are aluding to is that the advantage of SSE is "pipelining" operations
so - if you can do a bunch of something (a la Henry Ford), you get an advantage

Farabi

Ouch my codes is slower than the original "Hitchckr" Vertices codes.
http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165