fsmVERTEX32 struct
X real4 0.
Y real4 0.
Z real4 0.
W real4 0.
fsmVERTEX32 ends
.data
fsmVecSub proc uses esi edi lpVDest:dword,lpVA:dword,lpVB:dword
mov esi,lpVA
mov edi,lpVB
mov eax,lpVDest
movups xmm0,[esi]
movups xmm1,[edi]
subps xmm0,xmm1
movups [eax],xmm0
ret
fsmVecSub endp
fsmVecAdd proc uses esi edi lpVDest:dword,lpVA:dword,lpVB:dword
mov esi,lpVA
mov edi,lpVB
mov eax,lpVDest
movups xmm0,[esi]
movups xmm1,[edi]
addps xmm0,xmm1
movups [eax],xmm0
ret
fsmVecAdd endp
fsmVecMul proc uses esi edi lpVDest:dword,lpVA:dword,lpVB:dword
mov esi,lpVA
mov edi,lpVB
mov eax,lpVDest
movups xmm0,[esi]
movups xmm1,[edi]
mulps xmm0,xmm1
movups [eax],xmm0
ret
fsmVecMul endp
Anyone done it? It no one done it, I'll do it.
After I read the Documentation of the SSE1-4 I think FPU will still be on the chip for a long time, except SSE had a geometry calculation instruction.
Wrapping basic instructions by a function or even a macro makes not much sense. Also you are using the unaligned version, which are slower than their aligned counterpart.
A library should add functions, which are not available through SSEx. f.e.: sin,arcsin,exp,ln,...
Some time back, Clive pointed out a good book about this topic: "Math Toolkit for Real-Time Development", Jack W. Crenshaw
Farabi,
Is it to add, sub, mul etc. only 2 elements
or an array of elements ?
Quote from: RuiLoureiro on June 16, 2012, 11:34:54 PM
Farabi,
Is it to add, sub, mul etc. only 2 elements
or an array of elements ?
Yeah, only for 2 VERTEX. Where the member is X-Y-Z-W real4 number.
Quote from: qWord on June 16, 2012, 10:55:03 PM
Wrapping basic instructions by a function or even a macro makes not much sense. Also you are using the unaligned version, which are slower than their aligned counterpart.
A library should add functions, which are not available through SSEx. f.e.: sin,arcsin,exp,ln,...
Some time back, Clive pointed out a good book about this topic: "Math Toolkit for Real-Time Development", Jack W. Crenshaw
Well, I think the library should save some times for some people. If anyone done it before, I think I'll just wasting time re-inventing it.
Anyway, the code above is only part of it. Im not done yet.
your code might be useful to show us newbies how to do it :P
but, i think what qWord and Rui are aluding to is that the advantage of SSE is "pipelining" operations
so - if you can do a bunch of something (a la Henry Ford), you get an advantage
Ouch my codes is slower than the original "Hitchckr" Vertices codes.