News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

ASM for FUN NEW step #1

Started by frktons, November 09, 2021, 09:04:58 AM

Previous topic - Next topic

frktons

Hello.

I have just created a starting point for validating the ALGO to convert
some millions strings into numbers.

Here is the example I used in PowerBasic, that is considered quite a good Compiler,
and the results I got. From here the ASM versions should start and confront the speed
against it.

The Code I used, quite simple:

QuoteSUB Convert
   
    TIX CycleCount
   
    FOR Counter = 1 TO Indice
   
       MyNumbers(Counter) = VAL(MyStrings(Counter))
   
    NEXT
   
    TIX END CycleCount
   
END SUB   

The strings are 7,110,268 in the string array.
The CPU tix used: 2,607,804,194

Let's see how we can speed it up with some ASM code.  :eusa_dance:

The file that I use for the test is quite big > 300 MB. They are real data from
S&P 500 quotations from 1997 up to now with one minute lapse.

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

Try the sval("123") macro, it's part of masm32rt.inc:

include \masm32\include\masm32rt.inc

.data
x123 db "-12345", 0

.code
start:
  MsgBox 0, cat$(str$(sval(offset x123)), " is the value"), "val is great:", MB_OK
  exit

end start


MasmBasic Val(a string) is probably a bit slower (it's an allrounder that reads dec, hex, bin in various flavours).

mineiro

This algo only deals with non signed numbers.

;first approach
;"1234" == 1*10^3 + 2*10^2 + 3*10^1 + 4*10^0 == 1000+200+30+4
mov eax,"1234"          ;30303030h to 39393939h
xor eax,"0000"          ;sub "0000"
xor ecx,ecx
xor edx,edx
xor ebx,ebx

shld ecx,eax,8          ;"1"
shl eax,8               ;"234"
lea ebx,[ecx*8]         ;1*8=8
lea ebx,[ebx+ecx*2]     ;8+(1*2)=10

xor ecx,ecx
shld ecx,eax,8          ;"2"
shl eax,8               ;"34"
add ebx,ecx             ;10+2=12
lea edx,[ebx*8]         ;12*8=96
lea edx,[edx+ebx*2]     ;96+(12*2)=120

xor ecx,ecx
xor ebx,ebx
shld ecx,eax,8          ;"3"
shl eax,8               ;"4"
add edx,ecx             ;120+3=123
lea ebx,[edx*8]         ;123*8=984
lea ebx,[ebx+edx*2]     ;984+(123*2)=1230

xor ecx,ecx
shld ecx,eax,8          ;"4"
add ebx,ecx             ;1230+4=1234
;ebx="1234" converted to binary equals to 4d2h

You can create a lookup table with all possible combinations from 0000 to 9999 instead of do calculus of each number each time. Will have a execution time gain in 1 million attempts.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

frktons

Quote from: jj2007 on November 09, 2021, 10:53:52 AM
Try the sval("123") macro, it's part of masm32rt.inc:

include \masm32\include\masm32rt.inc

.data
x123 db "-12345", 0

.code
start:
  MsgBox 0, cat$(str$(sval(offset x123)), " is the value"), "val is great:", MB_OK
  exit

end start


MasmBasic Val(a string) is probably a bit slower (it's an allrounder that reads dec, hex, bin in various flavours).

For didactical purpose I'd like to use just pure and simple ASM opcodes, registers, jumps, moves, adds and so on.
Using a MACRO or a FUNCTION that somebody else has assembled can just be done to test the efficiency and speed
but it's not fully satisfying for my thirst of ASM knowledge  :-)

Moreover I can't find the sval() macro in MASM32 help, so I can't see what is it and how it works.

Thanks anyway JJ for your suggestions. :-)

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Quote from: mineiro on November 09, 2021, 01:11:54 PM
This algo only deals with non signed numbers.

;first approach
;"1234" == 1*10^3 + 2*10^2 + 3*10^1 + 4*10^0 == 1000+200+30+4
mov eax,"1234"          ;30303030h to 39393939h
xor eax,"0000"          ;sub "0000"
xor ecx,ecx
xor edx,edx
xor ebx,ebx

shld ecx,eax,8          ;"1"
shl eax,8               ;"234"
lea ebx,[ecx*8]         ;1*8=8
lea ebx,[ebx+ecx*2]     ;8+(1*2)=10

xor ecx,ecx
shld ecx,eax,8          ;"2"
shl eax,8               ;"34"
add ebx,ecx             ;10+2=12
lea edx,[ebx*8]         ;12*8=96
lea edx,[edx+ebx*2]     ;96+(12*2)=120

xor ecx,ecx
xor ebx,ebx
shld ecx,eax,8          ;"3"
shl eax,8               ;"4"
add edx,ecx             ;120+3=123
lea ebx,[edx*8]         ;123*8=984
lea ebx,[ebx+edx*2]     ;984+(123*2)=1230

xor ecx,ecx
shld ecx,eax,8          ;"4"
add ebx,ecx             ;1230+4=1234
;ebx="1234" converted to binary equals to 4d2h

You can create a lookup table with all possible combinations from 0000 to 9999 instead of do calculus of each number each time. Will have a execution time gain in 1 million attempts.

Yes Mineiro, this is the kind of reasoning I'm doing myself.
Just need time to regain my ability to use the MASM opcodes.

My data set has not negative numbers, all of them are not signed.

Thanks
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

I am thinking about an ASCII approach to the conversion from string to number.
If I have the string number "4735", every digit has an ASCII representation, and value:

4 is ASCII 52
7 is ASCII 55
3 is ASCII 51
5 is ASCII 53

The four bytes in memory have these ASCII codes 52,55,51,53.

If I move these 4 bytes into four registers: EAX, EBX, ECX, EDX
and subtract from each the value 48, I have the value

4 in EAX
7 in EBX
3 in ECX
5 in EDX

EAX contains the thousands (4) so it's value has to be multiplied for 1000
EBX contains the hundreds (7) so it's value has to be multiplied for 100
ECX contains the tens (3) so it's value  has to be multiplied for 10
EDX contains the units

the sum of the contents of the 4 registers is the number that is represented in the ASCII string.

I don't know if this approach is efficient for speed, but the logic should work.

If there are MMX or SSE opcodes that do in parallel the work on all the four bytes/registers the
efficiency should be better. But I am not aware at the moment if these opcodes exist.



There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

daydreamer

Quote from: frktons on November 09, 2021, 02:24:00 PM
I am thinking about an ASCII approach to the conversion from string to number.
If I have the string number "4735", every digit has an ASCII representation, and value:

4 is ASCII 52
7 is ASCII 55
3 is ASCII 51
5 is ASCII 53

The four bytes in memory have these ASCII codes 52,55,51,53.

If I move these 4 bytes into four registers: EAX, EBX, ECX, EDX
and subtract from each the value 48, I have the value

4 in EAX
7 in EBX
3 in ECX
5 in EDX

EAX contains the thousands (4) so it's value has to be multiplied for 1000
EBX contains the hundreds (7) so it's value has to be multiplied for 100
ECX contains the tens (3) so it's value  has to be multiplied for 10
EDX contains the units

the sum of the contents of the 4 registers is the number that is represented in the ASCII string.

I don't know if this approach is efficient for speed, but the logic should work.

If there are MMX or SSE opcodes that do in parallel the work on all the four bytes/registers the
efficiency should be better. But I am not aware at the moment if these opcodes exist.
you can get ascii numbers being " 123" too instead of "0123",so I prefer AND to mask out 32+16 instead of sub

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

frktons

Yes daydreamer, I think it is an efficient way of getting the same result.

I am not used to using these tricks, and my ASM is a bit rusty from long inactivity.

But I understand the point. Thanks

In the data set I am using there are no spaces, but leading "0" if the case.
There are no cases of leading zero, all the values are between 1000 and 5000 indeed.

:thumbsup:
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

Quote from: frktons on November 09, 2021, 02:24:00 PMEAX contains the thousands (4) so it's value has to be multiplied for 1000
EBX contains the hundreds (7) so it's value has to be multiplied for 100
ECX contains the tens (3) so it's value  has to be multiplied for 10
EDX contains the units

include \masm32\include\masm32rt.inc

.data
x123 db "1234", 0

.code
start:
  mov ecx, dword ptr x123
  movzx eax, cl ; "1"
  and eax, 15 ; 1
  imul eax, eax, 1000 ; 1000
  movzx edx, ch ; "2"
  and edx, 15 ; 2
  imul edx, edx, 100 ; 200
  add eax, edx ; 1200
  bswap ecx
  movzx edx, ch ; "3"
  and edx, 15 ; 3
  imul edx, edx, 10 ; 30
  add eax, edx ; 1230
  and ecx, 15 ; "4" -> 4
  add eax, ecx ; 1234
  MsgBox 0, cat$(str$(eax), " is the value"), "That was simple:", MB_OK
  exit
end start

frktons

Hi JJ.

Nice solution using only one register and its one byte segments.

Great idea, compliment my friend.

I was thinking about something similar, but I was wondering if Full 32 bit registers are faster
doing these operations. Maybe not. Need to check it out.

:thumbsup:
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

hutch--

What about a lookup table with 0000 to 9999 members ? Only 40k.

mineiro

This is a variation from daydreamer code.

.data
align 8
total dw 1,10,100,1000
.code
movd xmm1,qword ptr [total]
pxor xmm2,xmm2

mov eax,"1234"      ;mov eax,dword ptr []
                    ;bswap eax
xor eax,"0000"      ;and eax,0f0f0f0fh
movd xmm0,eax       ;01020304h
punpcklbw xmm0,xmm2 ;0001000200030004h
pmaddwd xmm0,xmm1   ;mul and add
phaddd xmm0,xmm0    ;add
movd eax,xmm0       ;eax=4d2h
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007

Quote from: hutch-- on November 09, 2021, 11:32:05 PM
What about a lookup table with 0000 to 9999 members ? Only 40k.

How would you look up, for example, "1234"? Conversion is from Ascii to binary...

hutch--

Simple, make a dedicated integer conversion that dealt exactly with 4 characters then use it as an index for the array. What is in the array ? Anything you like.

daydreamer

Quote from: mineiro on November 10, 2021, 01:15:12 AM
This is a variation from daydreamer code.

.data
align 8
total dw 1,10,100,1000
.code
movd xmm1,qword ptr [total]
pxor xmm2,xmm2

mov eax,"1234"      ;mov eax,dword ptr []
                    ;bswap eax
xor eax,"0000"      ;and eax,0f0f0f0fh
movd xmm0,eax       ;01020304h
punpcklbw xmm0,xmm2 ;0001000200030004h
pmaddwd xmm0,xmm1   ;mul and add
phaddd xmm0,xmm0    ;add
movd eax,xmm0       ;eax=4d2h

great :thumbsup:
actually my code was just first to get it working right,after that make ascii->double and double->ascii and possible 128bit and 256bit
@Hutch
LUT works fast,how fast if there is a prime among first 1-65535 LUT used code somewhere in forum

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding