Hi Guys
Someone has a example (or tested) a SSE2 version of a null terminated string copy function (Both, in Ansi and Unicode) ?
I have one (for ansi) that i use for copy memory (and also fixed len strings) adapted from JJ but it is for fixed lenght, as this:
Proc memcpy_SSE_V3:
Arguments @pDest, @pSource, @Length
Uses esi, edi, ecx, edx, eax
mov edi D@pDest
mov esi D@pSource
; we are copying a memory from 128 to 128 bytes at once
mov ecx D@Length
mov eax ecx | shr ecx 4 ; integer count. Divide by 16 (4 dwords)
jz L0> ; The memory size if smaller then 16 bytes long. Jmp over
; No we must compute he remainder, to see how many times we will loop
mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes
mov edx 0 ; here it is used as an index
L1:
movdqu XMM1 X$esi+edx*8 ; copy the 1st 4 dwords from esi to register XMM
movdqu X$edi+edx*8 XMM1 ; copy the 1st 4 dwords from register XMM to edi
dec ecx
lea edx D$edx+2
jnz L1<
test eax eax | jz L4> ; No remainders ? Exit
jmp L9> ; jmp to the remainder computation
L0:
; If we are here, It means that the data is smaller then 16 bytes, and we ned to compute the remainder.
mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes
L2:
; If the memory is not 4 dword aligned we may have some remainder here So, just clean them.
test eax eax | jz L4> ; No remainders ? Exit
L9:
lea edi D$edi+edx*8 ; mul edx by 8 to get the pos
lea esi D$esi+edx*8 ; mul edx by 8 to get the pos
L3: movsb | dec eax | jnz L3<
L4:
EndP
I´m trying to do the same, but for null termination string (Ansi and Unicode), but i´m clueless how to make it work :(