News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

AsciiHextoDword (SSE2 version)

Started by guga, March 08, 2025, 11:29:50 AM

Previous topic - Next topic

zedd151

Quote from: guga on March 08, 2025, 06:13:38 PMBtw..benchmarking would be nice.
:smiley:

With jj's  test
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (SSE4)

20151  cycles for 100 * Val()
992    cycles for 100 * AsciiHex2dwNew

21162  cycles for 100 * Val()
1018    cycles for 100 * AsciiHex2dwNew

20740  cycles for 100 * Val()
1004    cycles for 100 * AsciiHex2dwNew

20399  cycles for 100 * Val()
1029    cycles for 100 * AsciiHex2dwNew

21908  cycles for 100 * Val()
987    cycles for 100 * AsciiHex2dwNew

Averages:
20767  cycles for Val()
1005    cycles for AsciiHex2dwNew

13      bytes for Val()
202    bytes for AsciiHex2dwNew

1234ABCDh      eax Val()
1234ABCDh      eax AsciiHex2dwNew

--- ok ---
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

zedd151

#16
with guga's test, disregarding the StrLenW  results, seems irrelevant.

Quote0 cycles -> StrLenW_Guga ANSI,  Return in EAX: 0
13 cycles -> StrLenW_Lingo ANSI,  Return in EAX: 100
30 cycles -> StrLenW_Guga No SAR,  Return in EAX: 200
22 cycles -> StrLenW_Lingo No SAR,  Return in EAX: 200
15 cycles -> StrLenW_Guga with SAR,  Return in EAX: 34
16 cycles -> StrLenW_Lingo with SAR,  Return in EAX: 34



17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7
17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: A45B7 . Return in EAX: A45B7
16 cycles -> Ascii Hex to Dw by Guga (Old version - fixed Lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7

Press enter to exit...

Quote from: guga on March 08, 2025, 06:13:38 PMBtw..benchmarking would be nice. I tried to compíle with qeditor but got an error. But i did a small test in one tiny benchmark app i made to test some Lingo´s functions.
Sorry guga, I somehow missed your attached testbed the first time around...  It's about 6:20 AM here, I haven't had my morning coffee yet.  :biggrin:
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

Quote from: jj2007 on March 08, 2025, 10:10:42 PM
Quote from: guga on March 08, 2025, 07:33:12 PMMaybe it could be usefull to also return in eax the amount of bytes converted....thinking

IMHO eax should return the value. Val(), for example, returns the value in eax, the number of bytes used in dl and the type in dh (where dh 0=decimal, dh 1=binary, dh 2=hex)

Hi JJ. Sure, eax will still return the converted value. About the usage of edx register i don´t know, maybe for people who code in C it would be better return those values on another variable (or perhaps a structure formed by a Dword). So, on a extended AsciiHex2dw_Ex function may work like this:

call AsciiHex2dw_Ex {B$ "0AFCDE5", 0}, Output

or

[Sz_Input: B$ "0AFCDE5", 0]
[Output.Bytes: W$ 0 ; return the amount of converted bytes
Output.Error: W$ 0] ; returns some error checking Flag here

call AsciiHex2dw_Ex Sz_Input, Output

About this: dl and the type in dh (where dh 0=decimal, dh 1=binary, dh 2=hex)

You mean making the function identify the input type ? If it is, i don´t think it could be much useful on this function. It may kill performance if i had to add more input checkings.

I was considering in check the input basically for case sensitive (forcing the function to work in case insensitive by converting the input in xmm0 to Caps), and some basic error checks, like non 0-9 and A-F chars on input. Maybe this should be enough to not kill the performance of it, and on such cases export an error flag in Output.Error member of the structure (Or whatever name of that structure will be).

This function is better to use without any loops, since the maximum allowed amount of bytes is only 8 chars and it will always return a Dword in eax.

I´m doing another function that can handle 16 Bytes at once and if i succeed to not ruin the performnce on that one, then perhaps that´s the function to do more error checkings, and force it to go on a loop converting a null terminated hexadecimal string of any size.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Quote from: zedd151 on March 08, 2025, 11:20:53 PMwith guga's test, disregarding the StrLenW  results, seems irrelevant.

Quote0 cycles -> StrLenW_Guga ANSI,  Return in EAX: 0
13 cycles -> StrLenW_Lingo ANSI,  Return in EAX: 100
30 cycles -> StrLenW_Guga No SAR,  Return in EAX: 200
22 cycles -> StrLenW_Lingo No SAR,  Return in EAX: 200
15 cycles -> StrLenW_Guga with SAR,  Return in EAX: 34
16 cycles -> StrLenW_Lingo with SAR,  Return in EAX: 34



17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7
17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: A45B7 . Return in EAX: A45B7
16 cycles -> Ascii Hex to Dw by Guga (Old version - fixed Lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7

Press enter to exit...

Quote from: guga on March 08, 2025, 06:13:38 PMBtw..benchmarking would be nice. I tried to compíle with qeditor but got an error. But i did a small test in one tiny benchmark app i made to test some Lingo´s functions.
Sorry guga, I somehow missed your attached testbed the first time around...  It's about 6:20 AM here, I haven't had my morning coffee yet.  :biggrin:

 :mrgreen:  :mrgreen:  :mrgreen:  :mrgreen:  :mrgreen:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

 :biggrin:
I'm good now, I've had my second cup already.
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

jj2007

Quote from: guga on March 09, 2025, 02:49:12 AMYou mean making the function identify the input type ? If it is, i don´t think it could be much useful on this function. It may kill performance if i had to add more input checkings.

Indeed, Val() is an allrounder, and therefore much slower than your algo. It also returns in edx -127 in case of an error, such as a bad format.

guga

Hi JJ. Later we test for speed the newer functions. If it is ok, then i´ll go further adding some error checkings and we check again for final tests.

This morning i succeeded to find the proper mask to handle the Qword hexadecimal string.  I put the results on a table in excel to try to understand the maths behind this and see how to properly set the index without loosing performance.

So far, the newer routine for Qword is this (not working yet, because i forced the shl to shift only 16 bits for my testings until i identify the proper maths - But i´m getting closer to a solution)

[<16 Mask5e: W$ 0, 0, 0, 0, 0, 0, 0D00, 0] ; To remove trash. In fact is is the valuye of a negative byte: -36 (It fits to what i found so far in excel, when it exceeds the 16 chars)
; Note: Te trash could also be remove with something like:  pslldq xmm2 8 |  psrldq xmm2 8 But it would take some extra clocks and needs 2 instructions rather than a single pand

Proc AsciiHex2dw_Ex2:
    Arguments @pString, @pOutput
    Uses ecx

    mov eax, D@pString         ; ESI = pointer to the input string
    movdqu xmm0, X$eax           ; Loads 8 bytes of the string into XMM0
                               ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII
                               ; Words: 0000 0000 0000 0000 3742 3534 4132 4630

    ; get the size of the string to calculate a index to be shifted at the end
    xorps xmm1 xmm1 | pcmpeqb xmm0 xmm1 | pmovmskb ecx xmm0 | bsf cx cx
    ;mov ch 64 | shl cl 3 | sub ch cl | shr ch 1|  mov cl ch
    mov ch 32 | shl cl 2 | sub ch cl; | mov cl ch

;;
    Examples:
    0F2A45B7 = shr eax 0,  ax = 8 => 32-32 = 32-(8*4) = 0*4
     F2A45B7 = shr eax 4,  ax = 7 => 32-28 = 32-(7*4) = 1*4
      2A45B7 = shr eax 8,  ax = 6 => 32-24 = 32-(6*4) = 2*4
       A45B7 = shr eax 12, ax = 5 => 32-20 = 32-(5*4) = 3*4
        45B7 = shr eax 16, ax = 4 => 32-16 = 32-(4*4) = 4*4
         5B7 = shr eax 20, ax = 3 => 32-12 = 32-(3*4) = 5*4
          B7 = shr eax 24, ax = 2 => 32-8  = 32-(2*4) = 6*4
           7 = shr eax 28, ax = 1 => 32-4  = 32-(1*4) = 7*4
;;

    movdqu xmm0, X$eax
    ; Subtract '0'
    psubb xmm0, X$Mask1         ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                ; Mask1: 3030 3030 3030 3030 3030 3030 3030 3030
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    pcmpgtb xmm1, X$Mask2       ; Compares each byte with '9' to identify A-F
                                ; Mask2: 0909 0909 0909 0909 0909 0909 0909 0909
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 00FF 0000 FF00 FF00 (FF where > 9)

    pand xmm1, X$Mask3          ; Applies a 7 correction to bytes > 9 (A-F)
                                ; Mask3: 0707 0707 0707 0707 0707 0707 0707 0707
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700 (where 7 is the settled on the bytes greater that was greater then 7)

    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00

    pand xmm1, X$Mask4a         ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                ; Mask4a: 0F00 0F00 0F00 0F00 0F00 0F00 0F00 0F00
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

;pand xmm0, X$Mask5a

    pxor xmm0 xmm1              ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 000B 0004 0002 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00
;movupd xmm2 X$Mask4a
;pand xmm0, X$Mask5c


   
    psllw xmm0 4                ; Shifts high nibbles 4 bits left
                                ; XMM0: D000 D000 D000 D000 00B0 0040 0020 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pslld xmm0 8                ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 000D 0000 000D 0000 B000 4000 2000 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    por xmm0 xmm1               ; Combines high and low nibbles
                                ; XMM0: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pshuflw xmm0, xmm0, {SHUFFLE 3, 2, 1, 0} ; Reorders the lower 4 words: [0, 1, 2, 3] => {SHUFFLE 3, 2, 1, 0} = 27 (In decimal)
                                ; XMM0 Before: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM0 After:  000D 0000 000D 0000 0F00 2A00 4500 B700
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psrld xmm0 8                ; Shifts 8 bits right to align values
                                ; XMM0 Before: 000D 0000 000D 0000 0F00 2A00 4500 B700
                                ; XMM0 After:  0000 000D 0000 000D 000F 002A 0045 00B7

;movsldup xmm3 xmm0;movupd xmm2 X$Mask5d | pandn xmm3 xmm2
;movupd xmm3 xmm0 | pand xmm3 X$Mask5d;pmaxsw xmm3 X$Mask5a
;movupd xmm4 X$Mask5e | movupd xmm3 xmm0 | pxor xmm3 xmm4;X$Mask5e;pmaxsw xmm3 X$Mask5a
pxor xmm0 X$Mask5e ; remove trash Mask5e
;pand xmm0, X$Mask5a ; ok here
    packuswb xmm0 xmm0          ; Packs bytes, taking the low byte of each word
                                ; XMM0 Before:  0000 000D 0000 000D 000F 002A 0045 00B7
                                ; XMM0 After:   00FF 00FF 0F2A 45B7 00FF 00FF 0F2A 45B7


    movd eax xmm0               ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0F2A45B7 (Correct result)


    mov cl ch
    Test_If ch 00_1000_0000 ; if ch < 0
      add cl 32
      pshufd xmm1, xmm0, {SHUFFLE 2, 2, 2, 2}
      ;pshuflw xmm1, xmm0, {SHUFFLE 1, 0, 1, 0}
      movd edx xmm1

      mov cl 4
      shr edx cl
      mov cl 0
    ;Test_Else
     ;   mov cl ch
    Test_End

    shr eax cl

    mov ecx D@pOutput
    mov D$ecx eax

EndP


Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

NoCforMe

Quote from: jj2007 on March 08, 2025, 10:10:42 PM
Quote from: guga on March 08, 2025, 07:33:12 PMMaybe it could be usefull to also return in eax the amount of bytes converted....thinking
IMHO eax should return the value. Val(), for example, returns the value in eax, the number of bytes used in dl and the type in dh (where dh 0=decimal, dh 1=binary, dh 2=hex)

Mmm, I just loves functions that return side effects like that. Seriously. Do it all the time in my own code. Goes against the dominant paradigm of "proper programming". Why not?
Assembly language programming should be fun. That's why I do it.

guga

Hi NoCforMe

Putting error values in edx is an alternative, but i just don´t know if it could be useful for others that don't uses assembly. For example, i plan to use the function not only inside RosAsm code itself to fix some very old bugs, but include it on a dll i created sometime ago that other can uses as well, regardless they code in asm or C etc, so if i make the errors values be returned in edx, i don´t know if it can be useful for others as well.

Personally i prefer to return only the necessary in eax, leaving the other registers intact to use in other functions.

I'm trying the newer version to see if i can find the proper math to retrieve the index on a qword. Once i succeed i´ll think better in what errors flags should be returned and how it can be returned (On a register, like edx or a output variable).

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

@guga,
Rather than returning values in more registers, maybe use another argument to pass the address of a structure or variable(s), for the function to fill with additional info that might be required by the caller?
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

NoCforMe

Quote from: guga on March 09, 2025, 08:40:06 AMPutting error values in edx is an alternative, but i just don´t know if it could be useful for others that don't uses assembly.

Right. C programmers don't have access to anything in registers after a function returns other than the main result in EAX (or RAX). This is an assembly-only thing.
Assembly language programming should be fun. That's why I do it.

NoCforMe

Quote from: zedd151 on March 09, 2025, 08:50:12 AM@guga,
Rather than returning values in more registers, maybe use another argument to pass the address of a structure or variable(s), for the function to fill with additional info that might be required by the caller?

Well, sure: Windoze does that all the time:
BOOL WINAPI ReadFile (
   HANDLE       hFile,
   LPVOID       lpBuffer,
   DWORD        nNumberOfBytesToRead,
   LPDWORD      lpNumberOfBytesRead,
   LPOVERLAPPED lpOverlapped);

The last 2 parameters are pointers to variables, the first of which gets set to the number of bytes read after the function completes.
Assembly language programming should be fun. That's why I do it.

zedd151

Obviously. But perhaps guga had not yet though of that, that is why I mentioned it.  :smiley:
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

Quote from: zedd151 on March 09, 2025, 08:50:12 AM@guga,
Rather than returning values in more registers, maybe use another argument to pass the address of a structure or variable(s), for the function to fill with additional info that might be required by the caller?

Yep, thats the idea. Using other arguments to pass the result, and eax to the error values.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Quote from: NoCforMe on March 09, 2025, 08:59:25 AM
Quote from: zedd151 on March 09, 2025, 08:50:12 AM@guga,
Rather than returning values in more registers, maybe use another argument to pass the address of a structure or variable(s), for the function to fill with additional info that might be required by the caller?

Well, sure: Windoze does that all the time:
BOOL WINAPI ReadFile (
   HANDLE       hFile,
   LPVOID       lpBuffer,
   DWORD        nNumberOfBytesToRead,
   LPDWORD      lpNumberOfBytesRead,
   LPOVERLAPPED lpOverlapped);

The last 2 parameters are pointers to variables, the first of which gets set to the number of bytes read after the function completes.

Yep, this is what i plan to do. Using other arguments to store the returned values of the conversion and leave eax to the error return.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com