I am not used to avx syntax avxmnemonic ymm0,ymm1,ymm2 yet
which is annoying
outc db 32 dup("0")
asciimul dw 1000,100,10,1,1000,100,10,1,1000,100,10,1,1000,100,10,1
digits = ascii input
lea eax, outc
lea ebx, asciimul
vmovaps ymm0, digits
vmovups ymm7, [eax]
vmovups ymm6, [ebx]
vpsubb ymm1, ymm0, ymm7
VPUNPCKLBW ymm1, ymm2, ymm0
; VPUNPCKHBW ymm1, ymm2, ymm0
vpmullw ymm4,ymm1,ymm6
VPHADDW ymm1,ymm2,ymm3
vmovd eax,xmm3
hello sir daydreamer;
I tested your avx code and it's not giving correct value.
I can try that to you, no problem. Can you write an xmm version?, so will be more easy to translate that to avx version.
Well, I do not understood that "add eax,1", ymm2(xmm2) register should be initialized I suppose with zeros?
Your idea it's a strong candidate to 64 bits version.
By other side I was thinking in a lookup table with size of 128kb, this will fit in most L1 processor cache and can get result faster to 8 millions conversion.
I posted my SSE version earlier,in avx two register are used in operations,one is used for preserve previous value in for example mulps reg1,reg2
corrected code
vpsubb works
unpack gives wrong to hibytes in words unpack
I prefer the floating point SIMD instructions,dont lack divps,rcpps,sqrtps
avx syntax xmm1=xmm2+xmm3 :vaddps xmm1,xmm2,xmm3
almost there
lea eax, outc
lea ebx, asciimul
vmovaps ymm0, digits ;load ascii numbers
vmovups ymm7, [eax] ;"0"'s
vmovups ymm6, [ebx] ;1000,100,10,1 ...
vpxor ymm5, ymm5,ymm5;zero
vpsubb ymm1, ymm0, ymm7
VPUNPCKLBW ymm2, ymm1, ymm5
; VPUNPCKHBW ymm1, ymm2, ymm0
vpmullw ymm4,ymm2,ymm6
VPHADDW ymm3,ymm4,ymm4
VPHADDW ymm3, ymm3,ymm3
unrolled avx2 load 32bytes and unrolled twice
lea eax, outc
lea ebx, asciimul
vmovaps ymm0, digits
vmovups ymm7, [eax]
vmovups ymm6, [ebx]
vpxor ymm2, ymm2, ymm2
vpxor ymm5, ymm5, ymm5; zero
vpsubb ymm1, ymm0, ymm7
VEXTRACTF128 xmm2, ymm1, 1;split to 2 128bit regs
VPUNPCKLBW ymm1, ymm1, ymm5
VPUNPCKLBW ymm2, ymm2, ymm5
; VPUNPCKHBW ymm1, ymm2, ymm0
vpmullw ymm1,ymm1,ymm6
vpmullw ymm2, ymm2, ymm6
VPHADDW ymm3,ymm1,ymm1
VPHADDW ymm3, ymm3, ymm3
VPHADDW ymm4, ymm2,ymm2
VPHADDW ymm4, ymm4, ymm4