The MASM Forum

General => The Laboratory => Topic started by: guga on March 08, 2025, 11:29:50 AM

Title: AsciiHextoDword (SSE2 version)
Post by: guga on March 08, 2025, 11:29:50 AM
Hi Guys

I was thinking on a faster way to convert a hex string to dword using SSE2 only. I saw some good starting points here: Masm forum reference (https://masm32.com/board/index.php?topic=984.msg8975#msg8975) and Stackoverflow reference (https://stackoverflow.com/questions/67054154/is-there-an-algorithm-to-convert-massive-hex-string-to-bytes-stream-quickly-asm)

I came up with a code that works for 8 hexadecimal string (Now working only for 8 bytes in lenght and in caps) to be tested.

Here is the RosAsm version:

RosAsm Macros used

; using values from 0 to 3
[SHUFFLE | (255 - ((#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4))]

; using values from 3 to 0
[SHUFFLE_INV | ( (#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4 )] ; Marinus/Sieekmanski

[pshufd | pshufd #1 #2 #3]
[shufps | shufps #1 #2 #3]
[shufpd | shufpd #1 #2 #3]
[pshuflw | pshuflw #1 #2 #3]
[pshufhw | pshufhw #1 #2 #3]

Main function
;;
 AsciiHex2dw

 Converts an 8-character ASCII hexadecimal string into a 32-bit DWORD value.

 Syntax:
   HexToDword (pString: pointer)

 Parameters:
   pString [in] - Pointer to an 8-character ASCII string representing a hexadecimal value
                  (e.g., "0F2A45B7"). The string must contain only digits 0-9 and uppercase
                  letters A-F, with no explicit null terminator.

 Return Value:
   Returns in EAX the 32-bit DWORD value corresponding to the converted hexadecimal string.
   For example, for "0F2A45B7", returns EAX = 0x0F2A45B7.

 Remarks:
   This function uses SSE2 instructions to efficiently process the string, converting ASCII
   characters to binary values, adjusting A-F letters, separating high and low nibbles,
   and packing the result into a DWORD. The ESI register is preserved per calling convention.
   The function assumes a valid input and does not perform additional validation. The
   SHUFFLE macro defines the pshuflw immediate as 27 (binary 00011011), reversing the order
   of the lower 4 words to align the nibbles correctly. Word values in XMM registers are
   displayed in memory order (left-to-right), as shown in the RosAsm debugger.

   Masks used:
   - Mask1: Subtracts the ASCII value of '0' (0x30) to convert characters to numeric values.
   - Mask2: Compares with '9' (0x39) to identify A-F letters.
   - Mask3: Subtracts 7 to adjust A-F letters to their correct hexadecimal range.
   - Mask4a: Isolates low nibbles (0x0F000F00 per dword). We are intercalating the mask 0F 00 0F 00
   - Mask5a: Isolates high nibbles (0x0F000F0 per dword), not used in this version, but it was the opposed intercalation above  00 F0 00 F0.

 Example:
   For pString pointing to "0F2A45B7":
   - Input: "0F2A45B7"
   - Output: EAX = 0x0F2A45B7 (decimal: 15,929,847)

References: https://masm32.com/board/index.php?topic=984.msg8975#msg8975
            https://stackoverflow.com/questions/67054154/is-there-an-algorithm-to-convert-massive-hex-string-to-bytes-stream-quickly-asm


;;

[<16 Mask1: Q$ 030303030_30303030, 030303030_30303030]  ; '0'
[<16 Mask2: Q$ 09090909_09090909, 09090909_09090909]  ; '9'
[<16 Mask3: Q$ 07070707_07070707, 07070707_07070707]  ; '7'
[<16 Mask4a: Q$ 0F_00_0F_00_0F_00_0F_00, 0F_00_0F_00_0F_00_0F_00]  ; 0x0F_00_0F_00
[<16 Mask5a: Q$  0_F0_00_F0_00_F0_00_F0, 0_F0_00_F0_00_F0_00_F0]   ; 0x00_F0_00_F0

Proc AsciiHex2dw:
    Arguments @pString
    Uses esi

    mov esi, D@pString         ; ESI = pointer to the input string
    movq xmm0, Q$esi           ; Loads 8 bytes of the string into XMM0
                               ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII
                               ; Words: 0000 0000 0000 0000 3742 3534 4132 4630

;    xorps xmm1 xmm1 | pcmpeqb xmm0 xmm1 | pmovmskb eax xmm0 | add esi 16 | sub esi D@pString  | add esi 0-16 | bsf ax ax

    ; Subtract '0'
    psubb xmm0, X$Mask1         ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                ; Mask1: 3030 3030 3030 3030 3030 3030 3030 3030
        ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
        ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    pcmpgtb xmm1, X$Mask2       ; Compares each byte with '9' to identify A-F
                                ; Mask2: 0909 0909 0909 0909 0909 0909 0909 0909
        ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 00FF 0000 FF00 FF00 (FF where > 9)

    pand xmm1, X$Mask3          ; Applies a 7 correction to bytes > 9 (A-F)
                                ; Mask3: 0707 0707 0707 0707 0707 0707 0707 0707
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
        ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700 (where 7 is the settled on the bytes greater that was greater then 7)

    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
        ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00

    pand xmm1, X$Mask4a         ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                ; Mask4a: 0F00 0F00 0F00 0F00 0F00 0F00 0F00 0F00
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
        ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pxor xmm0 xmm1              ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 000B 0004 0002 0000
        ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psllw xmm0 4                ; Shifts high nibbles 4 bits left
                                ; XMM0: D000 D000 D000 D000 00B0 0040 0020 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pslld xmm0 8                ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 000D 0000 000D 0000 B000 4000 2000 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00
   
    por xmm0 xmm1               ; Combines high and low nibbles
                                ; XMM0: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pshuflw xmm0, xmm0, {SHUFFLE 3, 2, 1, 0} ; Reorders the lower 4 words: [0, 1, 2, 3] => {SHUFFLE 3, 2, 1, 0} = 27 (In decimal)
                                ; XMM0 Before: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM0 After:  000D 0000 000D 0000 0F00 2A00 4500 B700
        ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psrld xmm0 8                ; Shifts 8 bits right to align values
                                ; XMM0 Before: 000D 0000 000D 0000 0F00 2A00 4500 B700
        ; XMM0 After:  0000 000D 0000 000D 000F 002A 0045 00B7


    packuswb xmm0 xmm0          ; Packs bytes, taking the low byte of each word
                                ; XMM0 Before:  0000 000D 0000 000D 000F 002A 0045 00B7
                                ; XMM0 After:   00FF 00FF 0F2A 45B7 00FF 00FF 0F2A 45B7


    movd eax xmm0               ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0F2A45B7 (Correct result)


EndP

And here is the masm translation of it (I hope the porting to masm is ok)
AsciiHex2dw PROC USES esi pString:PTR BYTE
    ; ESI = pointer to the input string (pString is automatically available via stack)
    mov esi, pString            ; Loads the pointer from the parameter
    movq xmm0, qword ptr [esi]  ; Loads 8 bytes of the string into XMM0
                                ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII)
                                ; Words: 0000 0000 0000 0000 3732 3534 4132 4630

    ; Subtract '0'
    psubb xmm0, xmmword ptr [Mask1]  ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                     ; XMM0: 0000 0000 0000 0000 0600 0102 0504 0702

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
                                ; XMM1: 0000 0000 0000 0000 0600 0102 0504 0702
    pcmpgtb xmm1, xmmword ptr [Mask2]  ; Compares each byte with '9' to identify A-F
                                       ; XMM1: 0000 0000 0000 0000 0000 FFFF 0000 FFFF
    pand xmm1, xmmword ptr [Mask3]     ; Applies a 7 correction to bytes > 9 (A-F)
                                       ; XMM1: 0000 0000 0000 0000 0000 0707 0000 0707
    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: 0000 0000 0000 0000 0600 0F00 0504 0F00

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM1: 0000 0000 0000 0000 0600 0F00 0504 0F00
    pand xmm1, xmmword ptr [Mask4a]  ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                     ; XMM1: 0000 0000 0000 0000 0000 0000 0004 0000
    pxor xmm0, xmm1             ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: 0000 0000 0000 0000 0600 0F00 0500 0F00
    psllw xmm0, 4               ; Shifts high nibbles 4 bits left
                                ; XMM0: 0000 0000 0000 0000 6000 F000 5000 F000
    pslld xmm0, 8               ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 0000 0000 0000 0000 0060 00F0 0050 00F0
    por xmm0, xmm1              ; Combines high and low nibbles
                                ; XMM0: 0000 0000 0000 0000 0060 00F0 0054 00F0

    pshuflw xmm0, xmm0, 27      ; Reorders the lower 4 words: [3, 2, 1, 0] (SHUFFLE 3, 2, 1, 0 = 27)
                                ; Before: 0000 0000 0000 0000 0060 00F0 0054 00F0
                                ; After:  0000 0000 0000 0000 00F0 0054 00F0 0060
    psrld xmm0, 8               ; Shifts 8 bits right to align values
                                ; XMM0: 0000 0000 0000 0000 000F 002A 0045 00B7

    packuswb xmm0, xmm0         ; Packs bytes, taking the low byte of each word
                                ; XMM0: 0000 0000 0000 0000 0000 0000 0F2A 45B7

    movd eax, xmm0              ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0x0F2A45B7

    ret                         ; Return (stack cleanup handled by stdcall)

AsciiHex2dw ENDP

.data
    ; Input string
    SzInputHex db "0F2A45B7", 0

    ; Masks (16-byte aligned for XMM operations)
    ALIGN 16
Mask1           xmmword 30303030303030303030303030303030h
Mask2           xmmword 9090909090909090909090909090909h
Mask3           xmmword 7070707070707070707070707070707h
Mask4a          xmmword 0F000F000F000F000F000F000F000F00h
.end

The same masks, using Qword (MAsk5a is unused on this testing):
Mask1           dq 3030303030303030h
                 dq 3030303030303030h
Mask2           dq 909090909090909h
                 dq 909090909090909h
Mask3           dq 707070707070707h
                 dq 707070707070707h
Mask4a          dq 0F000F000F000F00h
                 dq 0F000F000F000F00h
Mask5a          dq 0F000F000F000F0h
                 dq 0F000F000F000F0h




Can someone test the masm version for speed ? I want to see if it´s worth continue with this function. I´ll later make a way to check the lenght of the input indexing a value to be shifted right before the function returns. And if succeeded, i´ll create a variation of this to work with a string with 16 bytes (or even longer).
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 08, 2025, 12:01:08 PM
Note: Updated the Mask values on the masm version. (I did ported them incorrectly earlier)
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 08, 2025, 01:02:16 PM
Hi guga...

QuoteAnd here is the masm translation of it (I hope the porting to masm is ok)

Cannot assemble... using ml 6.14.8444

guga_A_Dw.asm(13) : fatal error A1016: Internal Assembler Error

Here I try a later ml version:
Microsoft (R) Macro Assembler Version 14.00.24210.0
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: guga_A_Dw.asm

***********
ASCII build
***********

guga_A_Dw.asm(15) : error A2138:invalid data initializer
guga_A_Dw.asm(16) : error A2138:invalid data initializer
guga_A_Dw.asm(17) : error A2138:invalid data initializer
guga_A_Dw.asm(18) : error A2138:invalid data initializer
Press any key to continue . . .
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 08, 2025, 03:14:29 PM
I´ll try to disassemble my rosasm version and see if it is different then the version i tried to translate by hand. Hold on.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 08, 2025, 03:49:45 PM
Quote from: zedd151 on March 08, 2025, 01:02:16 PMguga_A_Dw.asm(15) : error A2138:invalid data initializer
guga_A_Dw.asm(16) : error A2138:invalid data initializer
guga_A_Dw.asm(17) : error A2138:invalid data initializer
guga_A_Dw.asm(18) : error A2138:invalid data initializer

Dumb question: is xmmword defined somewhere?
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 08, 2025, 04:13:19 PM
Quote from: NoCforMe on March 08, 2025, 03:49:45 PM
Quote from: zedd151 on March 08, 2025, 01:02:16 PMguga_A_Dw.asm(15) : error A2138:invalid data initializer
guga_A_Dw.asm(16) : error A2138:invalid data initializer
guga_A_Dw.asm(17) : error A2138:invalid data initializer
guga_A_Dw.asm(18) : error A2138:invalid data initializer

Dumb question: is xmmword defined somewhere?

You are right, I looked it up.

It should to be 'oword'...  guga.   :biggrin:

include \masm32\include\masm32rt.inc
.586p
.mmx
.xmm

AsciiHex2dw PROTO :PTR BYTE

.data

    ; Input string
    SzInputHex db "0F2A45B7", 0

    ; Masks (16-byte aligned for XMM operations)
    ALIGN 16
Mask1          oword 30303030303030303030303030303030h
Mask2          oword 9090909090909090909090909090909h
Mask3          oword 7070707070707070707070707070707h
Mask4a          oword 0F000F000F000F000F000F000F000F00h


.code

start proc

    invoke AsciiHex2dw, addr SzInputHex

    invoke MessageBox, 0, hex$(eax), 0, 0
    invoke ExitProcess, 0
start endp


AsciiHex2dw PROC USES esi pString:PTR BYTE
    ; ESI = pointer to the input string (pString is automatically available via stack)
    mov esi, pString            ; Loads the pointer from the parameter
    movq xmm0, qword ptr [esi]  ; Loads 8 bytes of the string into XMM0
                                ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII)
                                ; Words: 0000 0000 0000 0000 3732 3534 4132 4630

    ; Subtract '0'
    psubb xmm0, xmmword ptr [Mask1]  ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                    ; XMM0: 0000 0000 0000 0000 0600 0102 0504 0702

    ; Adjust A-F
    movdqa xmm1, xmm0          ; Copies XMM0 to XMM1 for adjustment
                                ; XMM1: 0000 0000 0000 0000 0600 0102 0504 0702
    pcmpgtb xmm1, xmmword ptr [Mask2]  ; Compares each byte with '9' to identify A-F
                                      ; XMM1: 0000 0000 0000 0000 0000 FFFF 0000 FFFF
    pand xmm1, xmmword ptr [Mask3]    ; Applies a 7 correction to bytes > 9 (A-F)
                                      ; XMM1: 0000 0000 0000 0000 0000 0707 0000 0707
    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: 0000 0000 0000 0000 0600 0F00 0504 0F00

    ; Separate and combine nibbles
    movdqa xmm1, xmm0          ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM1: 0000 0000 0000 0000 0600 0F00 0504 0F00
    pand xmm1, xmmword ptr [Mask4a]  ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                    ; XMM1: 0000 0000 0000 0000 0000 0000 0004 0000
    pxor xmm0, xmm1            ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: 0000 0000 0000 0000 0600 0F00 0500 0F00
    psllw xmm0, 4              ; Shifts high nibbles 4 bits left
                                ; XMM0: 0000 0000 0000 0000 6000 F000 5000 F000
    pslld xmm0, 8              ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 0000 0000 0000 0000 0060 00F0 0050 00F0
    por xmm0, xmm1              ; Combines high and low nibbles
                                ; XMM0: 0000 0000 0000 0000 0060 00F0 0054 00F0

    pshuflw xmm0, xmm0, 27      ; Reorders the lower 4 words: [3, 2, 1, 0] (SHUFFLE 3, 2, 1, 0 = 27)
                                ; Before: 0000 0000 0000 0000 0060 00F0 0054 00F0
                                ; After:  0000 0000 0000 0000 00F0 0054 00F0 0060
    psrld xmm0, 8              ; Shifts 8 bits right to align values
                                ; XMM0: 0000 0000 0000 0000 000F 002A 0045 00B7

    packuswb xmm0, xmm0        ; Packs bytes, taking the low byte of each word
                                ; XMM0: 0000 0000 0000 0000 0000 0000 0F2A 45B7

    movd eax, xmm0              ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0x0F2A45B7

    ret                        ; Return (stack cleanup handled by stdcall)

AsciiHex2dw ENDP

end start

Now it assembles fine with ml.exe v 14.xxxxx, guga. Appears to work as well.

But ml.exe 6.14.8444  chokes with an "internal error" still.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 08, 2025, 04:37:17 PM
oword ? Ahn, ok....Tks. I thought xmmword  also was valid for masm
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 08, 2025, 04:39:23 PM
Quote from: guga on March 08, 2025, 04:37:17 PMoword ? Ahn, ok....Tks. I thought xmmword  also was valid for masm
Probably defined somewhere in rosasm, but apparent not in masm32 SDK.
Now that it assembles fine, we need a timing testbed, plus other similar functions to test it against.  :biggrin:

Odd though, that ml accepts "xmmword ptr xxxxxx" in the code as valid, but not "xmmword" in the .data section.  Nice job, Microsoft.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 08, 2025, 04:50:25 PM
btw, this is a new version that can be used to variable size (i mean, it can work for a string with 1 byte lenght to 8 byte lenght)

AsciiHex2dwNew  proc near               ; CODE XREF: start+32↑p
                                        ; .text:0042BC94↑j

pString         = dword ptr  8

                push    ebp
                mov     ebp, esp
                push    ecx
                mov     eax, [ebp+pString]
                movq    xmm0, qword ptr [eax]
                xorps   xmm1, xmm1
                pcmpeqb xmm0, xmm1
                pmovmskb ecx, xmm0
                bsf     cx, cx
                mov     ch, 32
                shl     cl, 2
                sub     ch, cl
                mov     cl, ch
                movq    xmm0, qword ptr [eax]
                psubb   xmm0, Mask1
                movdqa  xmm1, xmm0
                pcmpgtb xmm1, Mask2
                pand    xmm1, Mask3
                psubb   xmm0, xmm1
                movdqa  xmm1, xmm0
                pand    xmm1, Mask4a
                pxor    xmm0, xmm1
                psllw   xmm0, 4
                pslld   xmm0, 8
                por     xmm0, xmm1
                pshuflw xmm0, xmm0, 27
                psrld   xmm0, 8
                packuswb xmm0, xmm0
                movd    eax, xmm0
                shr     eax, cl
                pop     ecx
                mov     esp, ebp
                pop     ebp
                retn    4
AsciiHex2dwNew  endp




And the RosAsm version:


Proc AsciiHex2dwNew3:
    Arguments @pString
    Uses ecx

    mov eax, D@pString         ; ESI = pointer to the input string
    movq xmm0, Q$eax           ; Loads 8 bytes of the string into XMM0
                               ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII
                               ; Words: 0000 0000 0000 0000 3742 3534 4132 4630

    ; get the size of the string to calculate a index to be shifted at the end
    xorps xmm1 xmm1 | pcmpeqb xmm0 xmm1 | pmovmskb ecx xmm0 | bsf cx cx
    mov ch 32 | shl cl 2 | sub ch cl | mov cl ch

;;
    Examples:
    0F2A45B7 = shr eax 0,  ax = 8 => 32-32 = 32-(8*4) = 0*4
     F2A45B7 = shr eax 4,  ax = 7 => 32-28 = 32-(7*4) = 1*4
      2A45B7 = shr eax 8,  ax = 6 => 32-24 = 32-(6*4) = 2*4
       A45B7 = shr eax 12, ax = 5 => 32-20 = 32-(5*4) = 3*4
        45B7 = shr eax 16, ax = 4 => 32-16 = 32-(4*4) = 4*4
         5B7 = shr eax 20, ax = 3 => 32-12 = 32-(3*4) = 5*4
          B7 = shr eax 24, ax = 2 => 32-8  = 32-(2*4) = 6*4
           7 = shr eax 28, ax = 1 => 32-4  = 32-(1*4) = 7*4
;;

    movq xmm0, Q$eax
    ; Subtract '0'
    psubb xmm0, X$Mask1         ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                ; Mask1: 3030 3030 3030 3030 3030 3030 3030 3030
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    pcmpgtb xmm1, X$Mask2       ; Compares each byte with '9' to identify A-F
                                ; Mask2: 0909 0909 0909 0909 0909 0909 0909 0909
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 00FF 0000 FF00 FF00 (FF where > 9)

    pand xmm1, X$Mask3          ; Applies a 7 correction to bytes > 9 (A-F)
                                ; Mask3: 0707 0707 0707 0707 0707 0707 0707 0707
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700 (where 7 is the settled on the bytes greater that was greater then 7)

    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00

    pand xmm1, X$Mask4a         ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                ; Mask4a: 0F00 0F00 0F00 0F00 0F00 0F00 0F00 0F00
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pxor xmm0 xmm1              ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 000B 0004 0002 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psllw xmm0 4                ; Shifts high nibbles 4 bits left
                                ; XMM0: D000 D000 D000 D000 00B0 0040 0020 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pslld xmm0 8                ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 000D 0000 000D 0000 B000 4000 2000 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    por xmm0 xmm1               ; Combines high and low nibbles
                                ; XMM0: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pshuflw xmm0, xmm0, {SHUFFLE 3, 2, 1, 0} ; Reorders the lower 4 words: [0, 1, 2, 3] => {SHUFFLE 3, 2, 1, 0} = 27 (In decimal)
                                ; XMM0 Before: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM0 After:  000D 0000 000D 0000 0F00 2A00 4500 B700
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psrld xmm0 8                ; Shifts 8 bits right to align values
                                ; XMM0 Before: 000D 0000 000D 0000 0F00 2A00 4500 B700
                                ; XMM0 After:  0000 000D 0000 000D 000F 002A 0045 00B7


    packuswb xmm0 xmm0          ; Packs bytes, taking the low byte of each word
                                ; XMM0 Before:  0000 000D 0000 000D 000F 002A 0045 00B7
                                ; XMM0 After:   00FF 00FF 0F2A 45B7 00FF 00FF 0F2A 45B7


    movd eax xmm0               ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0F2A45B7 (Correct result)

    shr eax cl

EndP





Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 08, 2025, 04:53:33 PM
Quote from: zedd151 on March 08, 2025, 04:39:23 PM
Quote from: guga on March 08, 2025, 04:37:17 PMoword ? Ahn, ok....Tks. I thought xmmword  also was valid for masm
Probably defined somewhere in rosasm, but apparent not in masm32 SDK.
Now that it assembles fine, we need a timing testbed, plus other similar functions to test it against.   :biggrin:

No.. this is not a token from RosAsm. It is from IdaPro. RosAsm don´t have any of those things by default. I made it simple. Only D$ for Dword, Q$ for Qword, X$ for SSe registers, W$ for Word, T$ for TenByte and B$ for byte. You can, however uses things like dword ptr with enabling the preparser routine, but personally, i prefer the simple way.

I´m currently working on RosAsm trying to fix some very very old issues.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 08, 2025, 04:56:04 PM
Quote from: guga on March 08, 2025, 04:53:33 PMNo.. this is not a token from RosAsm. It is from IdaPro. RosAsm don´t have any of those things by default. I made it simple. Only D$ for Dword, Q$ for Qword, X$ for SSe registers, W$ for Word, T$ for TenByte and B$ for byte. You can, however uses things like dword ptr with enabling the preparser routine, but personally, i prefer the simple way.

I´m currently working on RosAsm trying to fix some very very old issues.
Ah, okay.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 08, 2025, 06:13:38 PM
Btw..benchmarking would be nice. I tried to compíle with qeditor but got an error. But i did a small test in one tiny benchmark app i made to test some Lingo´s functions.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: jj2007 on March 08, 2025, 07:14:04 PM
Congrats, it's fast :thumbsup:

Quote from: guga on March 08, 2025, 11:29:50 AMCan someone test the masm version for speed ?
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

18532  cycles for 100 * Val()
708    cycles for 100 * AsciiHex2dwNew

18615  cycles for 100 * Val()
711    cycles for 100 * AsciiHex2dwNew

18780  cycles for 100 * Val()
743    cycles for 100 * AsciiHex2dwNew

18562  cycles for 100 * Val()
741    cycles for 100 * AsciiHex2dwNew

18535  cycles for 100 * Val()
725    cycles for 100 * AsciiHex2dwNew

Averages:
18571  cycles for Val()
726    cycles for AsciiHex2dwNew

13      bytes for Val()
202    bytes for AsciiHex2dwNew

1234ABCDh      eax Val()
1234ABCDh      eax AsciiHex2dwNew
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 08, 2025, 07:33:12 PM
Tks a lot JJ.

I`m now trying to make a extended version to convert 16 chars. The actual code, do in fact convert it already. The problem now is only to find the proper index to be shifted and how to put the extra bytes after the 1st dwords.

(...)
mov ch 32 | shl cl 2 | sub ch cl | mov cl ch <--- need only o change here, to get some index settled in ch and cl differently. Ex: If ch = 0 (or smaller then 8 it means we are dealing with a dword. Otherwise it is a qword to be converted.). Trying to figure it out now how to get the proper index to settled at:
(...)
    shr eax cl < --- Will need some test to check for ch as well in order to put the values from another xmm register on the proper buffer on output.

Maybe it could be usefull to also return in eax the amount of bytes converted....thinking... :dazzled:

btw.i´m deeply tired..it´s 06:00AM right now :mrgreen:  :mrgreen:  :mrgreen:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: jj2007 on March 08, 2025, 10:10:42 PM
Quote from: guga on March 08, 2025, 07:33:12 PMMaybe it could be usefull to also return in eax the amount of bytes converted....thinking

IMHO eax should return the value. Val() (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1202), for example, returns the value in eax, the number of bytes used in dl and the type in dh (where dh 0=decimal, dh 1=binary, dh 2=hex)


Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 08, 2025, 10:42:51 PM
Quote from: guga on March 08, 2025, 06:13:38 PMBtw..benchmarking would be nice.
:smiley:

With jj's  test
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (SSE4)

20151  cycles for 100 * Val()
992    cycles for 100 * AsciiHex2dwNew

21162  cycles for 100 * Val()
1018    cycles for 100 * AsciiHex2dwNew

20740  cycles for 100 * Val()
1004    cycles for 100 * AsciiHex2dwNew

20399  cycles for 100 * Val()
1029    cycles for 100 * AsciiHex2dwNew

21908  cycles for 100 * Val()
987    cycles for 100 * AsciiHex2dwNew

Averages:
20767  cycles for Val()
1005    cycles for AsciiHex2dwNew

13      bytes for Val()
202    bytes for AsciiHex2dwNew

1234ABCDh      eax Val()
1234ABCDh      eax AsciiHex2dwNew

--- ok ---
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 08, 2025, 11:20:53 PM
with guga's test, disregarding the StrLenW  results, seems irrelevant.

Quote0 cycles -> StrLenW_Guga ANSI,  Return in EAX: 0
13 cycles -> StrLenW_Lingo ANSI,  Return in EAX: 100
30 cycles -> StrLenW_Guga No SAR,  Return in EAX: 200
22 cycles -> StrLenW_Lingo No SAR,  Return in EAX: 200
15 cycles -> StrLenW_Guga with SAR,  Return in EAX: 34
16 cycles -> StrLenW_Lingo with SAR,  Return in EAX: 34



17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7
17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: A45B7 . Return in EAX: A45B7
16 cycles -> Ascii Hex to Dw by Guga (Old version - fixed Lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7

Press enter to exit...

Quote from: guga on March 08, 2025, 06:13:38 PMBtw..benchmarking would be nice. I tried to compíle with qeditor but got an error. But i did a small test in one tiny benchmark app i made to test some Lingo´s functions.
Sorry guga, I somehow missed your attached testbed the first time around...  It's about 6:20 AM here, I haven't had my morning coffee yet.  :biggrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 09, 2025, 02:49:12 AM
Quote from: jj2007 on March 08, 2025, 10:10:42 PM
Quote from: guga on March 08, 2025, 07:33:12 PMMaybe it could be usefull to also return in eax the amount of bytes converted....thinking

IMHO eax should return the value. Val() (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1202), for example, returns the value in eax, the number of bytes used in dl and the type in dh (where dh 0=decimal, dh 1=binary, dh 2=hex)

Hi JJ. Sure, eax will still return the converted value. About the usage of edx register i don´t know, maybe for people who code in C it would be better return those values on another variable (or perhaps a structure formed by a Dword). So, on a extended AsciiHex2dw_Ex function may work like this:

call AsciiHex2dw_Ex {B$ "0AFCDE5", 0}, Output

or

[Sz_Input: B$ "0AFCDE5", 0]
[Output.Bytes: W$ 0 ; return the amount of converted bytes
Output.Error: W$ 0] ; returns some error checking Flag here

call AsciiHex2dw_Ex Sz_Input, Output

About this: dl and the type in dh (where dh 0=decimal, dh 1=binary, dh 2=hex)

You mean making the function identify the input type ? If it is, i don´t think it could be much useful on this function. It may kill performance if i had to add more input checkings.

I was considering in check the input basically for case sensitive (forcing the function to work in case insensitive by converting the input in xmm0 to Caps), and some basic error checks, like non 0-9 and A-F chars on input. Maybe this should be enough to not kill the performance of it, and on such cases export an error flag in Output.Error member of the structure (Or whatever name of that structure will be).

This function is better to use without any loops, since the maximum allowed amount of bytes is only 8 chars and it will always return a Dword in eax.

I´m doing another function that can handle 16 Bytes at once and if i succeed to not ruin the performnce on that one, then perhaps that´s the function to do more error checkings, and force it to go on a loop converting a null terminated hexadecimal string of any size.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 09, 2025, 02:52:39 AM
Quote from: zedd151 on March 08, 2025, 11:20:53 PMwith guga's test, disregarding the StrLenW  results, seems irrelevant.

Quote0 cycles -> StrLenW_Guga ANSI,  Return in EAX: 0
13 cycles -> StrLenW_Lingo ANSI,  Return in EAX: 100
30 cycles -> StrLenW_Guga No SAR,  Return in EAX: 200
22 cycles -> StrLenW_Lingo No SAR,  Return in EAX: 200
15 cycles -> StrLenW_Guga with SAR,  Return in EAX: 34
16 cycles -> StrLenW_Lingo with SAR,  Return in EAX: 34



17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7
17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: A45B7 . Return in EAX: A45B7
16 cycles -> Ascii Hex to Dw by Guga (Old version - fixed Lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7

Press enter to exit...

Quote from: guga on March 08, 2025, 06:13:38 PMBtw..benchmarking would be nice. I tried to compíle with qeditor but got an error. But i did a small test in one tiny benchmark app i made to test some Lingo´s functions.
Sorry guga, I somehow missed your attached testbed the first time around...  It's about 6:20 AM here, I haven't had my morning coffee yet.  :biggrin:

 :mrgreen:  :mrgreen:  :mrgreen:  :mrgreen:  :mrgreen:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 09, 2025, 03:01:33 AM
 :biggrin:
I'm good now, I've had my second cup already.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: jj2007 on March 09, 2025, 05:37:06 AM
Quote from: guga on March 09, 2025, 02:49:12 AMYou mean making the function identify the input type ? If it is, i don´t think it could be much useful on this function. It may kill performance if i had to add more input checkings.

Indeed, Val() is an allrounder, and therefore much slower than your algo. It also returns in edx -127 in case of an error, such as a bad format.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 09, 2025, 06:48:26 AM
Hi JJ. Later we test for speed the newer functions. If it is ok, then i´ll go further adding some error checkings and we check again for final tests.

This morning i succeeded to find the proper mask to handle the Qword hexadecimal string.  I put the results on a table in excel to try to understand the maths behind this and see how to properly set the index without loosing performance.

So far, the newer routine for Qword is this (not working yet, because i forced the shl to shift only 16 bits for my testings until i identify the proper maths - But i´m getting closer to a solution)

[<16 Mask5e: W$ 0, 0, 0, 0, 0, 0, 0D00, 0] ; To remove trash. In fact is is the valuye of a negative byte: -36 (It fits to what i found so far in excel, when it exceeds the 16 chars)
; Note: Te trash could also be remove with something like:  pslldq xmm2 8 |  psrldq xmm2 8 But it would take some extra clocks and needs 2 instructions rather than a single pand

Proc AsciiHex2dw_Ex2:
    Arguments @pString, @pOutput
    Uses ecx

    mov eax, D@pString         ; ESI = pointer to the input string
    movdqu xmm0, X$eax           ; Loads 8 bytes of the string into XMM0
                               ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII
                               ; Words: 0000 0000 0000 0000 3742 3534 4132 4630

    ; get the size of the string to calculate a index to be shifted at the end
    xorps xmm1 xmm1 | pcmpeqb xmm0 xmm1 | pmovmskb ecx xmm0 | bsf cx cx
    ;mov ch 64 | shl cl 3 | sub ch cl | shr ch 1|  mov cl ch
    mov ch 32 | shl cl 2 | sub ch cl; | mov cl ch

;;
    Examples:
    0F2A45B7 = shr eax 0,  ax = 8 => 32-32 = 32-(8*4) = 0*4
     F2A45B7 = shr eax 4,  ax = 7 => 32-28 = 32-(7*4) = 1*4
      2A45B7 = shr eax 8,  ax = 6 => 32-24 = 32-(6*4) = 2*4
       A45B7 = shr eax 12, ax = 5 => 32-20 = 32-(5*4) = 3*4
        45B7 = shr eax 16, ax = 4 => 32-16 = 32-(4*4) = 4*4
         5B7 = shr eax 20, ax = 3 => 32-12 = 32-(3*4) = 5*4
          B7 = shr eax 24, ax = 2 => 32-8  = 32-(2*4) = 6*4
           7 = shr eax 28, ax = 1 => 32-4  = 32-(1*4) = 7*4
;;

    movdqu xmm0, X$eax
    ; Subtract '0'
    psubb xmm0, X$Mask1         ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                ; Mask1: 3030 3030 3030 3030 3030 3030 3030 3030
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    pcmpgtb xmm1, X$Mask2       ; Compares each byte with '9' to identify A-F
                                ; Mask2: 0909 0909 0909 0909 0909 0909 0909 0909
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 00FF 0000 FF00 FF00 (FF where > 9)

    pand xmm1, X$Mask3          ; Applies a 7 correction to bytes > 9 (A-F)
                                ; Mask3: 0707 0707 0707 0707 0707 0707 0707 0707
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700 (where 7 is the settled on the bytes greater that was greater then 7)

    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00

    pand xmm1, X$Mask4a         ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                ; Mask4a: 0F00 0F00 0F00 0F00 0F00 0F00 0F00 0F00
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

;pand xmm0, X$Mask5a

    pxor xmm0 xmm1              ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 000B 0004 0002 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00
;movupd xmm2 X$Mask4a
;pand xmm0, X$Mask5c


   
    psllw xmm0 4                ; Shifts high nibbles 4 bits left
                                ; XMM0: D000 D000 D000 D000 00B0 0040 0020 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pslld xmm0 8                ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 000D 0000 000D 0000 B000 4000 2000 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    por xmm0 xmm1               ; Combines high and low nibbles
                                ; XMM0: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pshuflw xmm0, xmm0, {SHUFFLE 3, 2, 1, 0} ; Reorders the lower 4 words: [0, 1, 2, 3] => {SHUFFLE 3, 2, 1, 0} = 27 (In decimal)
                                ; XMM0 Before: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM0 After:  000D 0000 000D 0000 0F00 2A00 4500 B700
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psrld xmm0 8                ; Shifts 8 bits right to align values
                                ; XMM0 Before: 000D 0000 000D 0000 0F00 2A00 4500 B700
                                ; XMM0 After:  0000 000D 0000 000D 000F 002A 0045 00B7

;movsldup xmm3 xmm0;movupd xmm2 X$Mask5d | pandn xmm3 xmm2
;movupd xmm3 xmm0 | pand xmm3 X$Mask5d;pmaxsw xmm3 X$Mask5a
;movupd xmm4 X$Mask5e | movupd xmm3 xmm0 | pxor xmm3 xmm4;X$Mask5e;pmaxsw xmm3 X$Mask5a
pxor xmm0 X$Mask5e ; remove trash Mask5e
;pand xmm0, X$Mask5a ; ok here
    packuswb xmm0 xmm0          ; Packs bytes, taking the low byte of each word
                                ; XMM0 Before:  0000 000D 0000 000D 000F 002A 0045 00B7
                                ; XMM0 After:   00FF 00FF 0F2A 45B7 00FF 00FF 0F2A 45B7


    movd eax xmm0               ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0F2A45B7 (Correct result)


    mov cl ch
    Test_If ch 00_1000_0000 ; if ch < 0
      add cl 32
      pshufd xmm1, xmm0, {SHUFFLE 2, 2, 2, 2}
      ;pshuflw xmm1, xmm0, {SHUFFLE 1, 0, 1, 0}
      movd edx xmm1

      mov cl 4
      shr edx cl
      mov cl 0
    ;Test_Else
     ;   mov cl ch
    Test_End

    shr eax cl

    mov ecx D@pOutput
    mov D$ecx eax

EndP


Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 09, 2025, 07:30:04 AM
Quote from: jj2007 on March 08, 2025, 10:10:42 PM
Quote from: guga on March 08, 2025, 07:33:12 PMMaybe it could be usefull to also return in eax the amount of bytes converted....thinking
IMHO eax should return the value. Val() (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1202), for example, returns the value in eax, the number of bytes used in dl and the type in dh (where dh 0=decimal, dh 1=binary, dh 2=hex)

Mmm, I just loves functions that return side effects like that. Seriously. Do it all the time in my own code. Goes against the dominant paradigm of "proper programming". Why not?
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 09, 2025, 08:40:06 AM
Hi NoCforMe

Putting error values in edx is an alternative, but i just don´t know if it could be useful for others that don't uses assembly. For example, i plan to use the function not only inside RosAsm code itself to fix some very old bugs, but include it on a dll i created sometime ago that other can uses as well, regardless they code in asm or C etc, so if i make the errors values be returned in edx, i don´t know if it can be useful for others as well.

Personally i prefer to return only the necessary in eax, leaving the other registers intact to use in other functions.

I'm trying the newer version to see if i can find the proper math to retrieve the index on a qword. Once i succeed i´ll think better in what errors flags should be returned and how it can be returned (On a register, like edx or a output variable).

Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 09, 2025, 08:50:12 AM
@guga,
Rather than returning values in more registers, maybe use another argument to pass the address of a structure or variable(s), for the function to fill with additional info that might be required by the caller?
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 09, 2025, 08:55:51 AM
Quote from: guga on March 09, 2025, 08:40:06 AMPutting error values in edx is an alternative, but i just don´t know if it could be useful for others that don't uses assembly.

Right. C programmers don't have access to anything in registers after a function returns other than the main result in EAX (or RAX). This is an assembly-only thing.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 09, 2025, 08:59:25 AM
Quote from: zedd151 on March 09, 2025, 08:50:12 AM@guga,
Rather than returning values in more registers, maybe use another argument to pass the address of a structure or variable(s), for the function to fill with additional info that might be required by the caller?

Well, sure: Windoze does that all the time:
BOOL WINAPI ReadFile (
   HANDLE       hFile,
   LPVOID       lpBuffer,
   DWORD        nNumberOfBytesToRead,
   LPDWORD      lpNumberOfBytesRead,
   LPOVERLAPPED lpOverlapped);

The last 2 parameters are pointers to variables, the first of which gets set to the number of bytes read after the function completes.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 09, 2025, 09:03:20 AM
Obviously. But perhaps guga had not yet though of that, that is why I mentioned it.  :smiley:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 09, 2025, 02:58:43 PM
Quote from: zedd151 on March 09, 2025, 08:50:12 AM@guga,
Rather than returning values in more registers, maybe use another argument to pass the address of a structure or variable(s), for the function to fill with additional info that might be required by the caller?

Yep, thats the idea. Using other arguments to pass the result, and eax to the error values.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 09, 2025, 03:01:07 PM
Quote from: NoCforMe on March 09, 2025, 08:59:25 AM
Quote from: zedd151 on March 09, 2025, 08:50:12 AM@guga,
Rather than returning values in more registers, maybe use another argument to pass the address of a structure or variable(s), for the function to fill with additional info that might be required by the caller?

Well, sure: Windoze does that all the time:
BOOL WINAPI ReadFile (
   HANDLE       hFile,
   LPVOID       lpBuffer,
   DWORD        nNumberOfBytesToRead,
   LPDWORD      lpNumberOfBytesRead,
   LPOVERLAPPED lpOverlapped);

The last 2 parameters are pointers to variables, the first of which gets set to the number of bytes read after the function completes.

Yep, this is what i plan to do. Using other arguments to store the returned values of the conversion and leave eax to the error return.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 09, 2025, 04:01:44 PM
Quote from: guga on March 09, 2025, 03:01:07 PMYep, this is what i plan to do. Using other arguments to store the returned values of the conversion and leave eax to the error return.
:thumbsup:

But shouldn't  eax/rax hold either the returned value of the conversion, unless an error condition is met, then return a defined error code in eax/rax instead?  Seems more logical. And probably aligns better with most usage from what I have seen.

The other variables (address passed as arguments - if used) can hold any other additional info needed by the caller.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 09, 2025, 04:30:35 PM
Quote from: zedd151 on March 09, 2025, 04:01:44 PMBut shouldn't  eax/rax hold either the returned value of the conversion, unless an error condition is met, then return a defined error code in eax/rax instead?
Well, think about it:
You could do that, so long as the conversion values are all positive, in which case the error codes would have to be negative.

If the conversion value could be either positive or negative, this wouldn't work.

It's the old problem of trying to return two things at once. Probably better to return the error status in EAX and the converted value in a pointed-to variable.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 09, 2025, 04:31:39 PM
Good point. I forgot about possible intended negative conversion values.   :rolleyes:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 09, 2025, 05:07:20 PM
Quote from: zedd151 on March 09, 2025, 04:01:44 PM
Quote from: guga on March 09, 2025, 03:01:07 PMYep, this is what i plan to do. Using other arguments to store the returned values of the conversion and leave eax to the error return.
:thumbsup:

But shouldn't  eax/rax hold either the returned value of the conversion, unless an error condition is met, then return a defined error code in eax/rax instead?  Seems more logical. And probably aligns better with most usage from what I have seen.

The other variables (address passed as arguments - if used) can hold any other additional info needed by the caller.
Ok, but the new version i´m making converts a qword (64 bits(, so it couldn´t return in eax, because it won´t fit. So, to prevent using edx as the other part of the qword, better passing the value on a output buffer pointed by a parameter on the function.

Btw, if the user wants to convert only a 32bit hexadecimal value (a dword), he could then also use the function made to convert a qword, because the output buffer will fit anyway for both types of inputs.  That´s teh problem i´m facing right now to find teh proper index for both cases without affecting the performance.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 09, 2025, 05:08:10 PM
Quote from: NoCforMe on March 09, 2025, 04:30:35 PM
Quote from: zedd151 on March 09, 2025, 04:01:44 PMBut shouldn't  eax/rax hold either the returned value of the conversion, unless an error condition is met, then return a defined error code in eax/rax instead?
Well, think about it:
You could do that, so long as the conversion values are all positive, in which case the error codes would have to be negative.

If the conversion value could be either positive or negative, this wouldn't work.

It's the old problem of trying to return two things at once. Probably better to return the error status in EAX and the converted value in a pointed-to variable.

Agree
Title: Re: AsciiHextoDword (SSE2 version)
Post by: sinsi on March 09, 2025, 05:09:03 PM
Quote from: NoCforMe on March 09, 2025, 04:30:35 PMProbably better to return the error status in EAX and the converted value in a pointed-to variable.
That's what I am starting to do after noticing that Windows (interfaces for example) will just return a HRESULT in EAX.
Any returned variables are returned via an address passed to the function.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 09, 2025, 05:48:50 PM
Quote from: sinsi on March 09, 2025, 05:09:03 PM
Quote from: NoCforMe on March 09, 2025, 04:30:35 PMProbably better to return the error status in EAX and the converted value in a pointed-to variable.
That's what I am starting to do after noticing that Windows (interfaces for example) will just return a HRESULT in EAX.
Any returned variables are returned via an address passed to the function.
On the other hand:
Since I assume you're writing assembly-language code, why not just use EDX (RDX) to pass the other value? As long as your code isn't being used by any high-level language, why not use the easy way? That's what I do. (Just make sure you self-document this behavior.)
Title: Re: AsciiHextoDword (SSE2 version)
Post by: sinsi on March 09, 2025, 06:04:48 PM
Quote from: NoCforMe on March 09, 2025, 05:48:50 PM
Quote from: sinsi on March 09, 2025, 05:09:03 PM
Quote from: NoCforMe on March 09, 2025, 04:30:35 PMProbably better to return the error status in EAX and the converted value in a pointed-to variable.
That's what I am starting to do after noticing that Windows (interfaces for example) will just return a HRESULT in EAX.
Any returned variables are returned via an address passed to the function.
On the other hand:
Since I assume you're writing assembly-language code, why not just use EDX (RDX) to pass the other value? As long as your code isn't being used by any high-level language, why not use the easy way? That's what I do. (Just make sure you self-document this behavior.)
It's not following the ABI
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 09, 2025, 06:09:43 PM
ABI? Schmabi.

At least for 32-bit programs: in my own code I feel free to do this, since EDX is a "scratch" register anyhow. The WinAPI isn't even going to know what I'm doing behind its back, and what it doesn't know can't hurt me.

Dunno about x64, but really, what's stopping you from using RDX to pass parameters back (that is, from callee to caller)? It's a volatile register, so who cares?
Title: Re: AsciiHextoDword (SSE2 version)
Post by: sinsi on March 09, 2025, 07:08:44 PM
All I have to remember is that EAX=0 is success.
Same way that in 64-bit code I always allocate shadow space, even though I don't use it.

Habit. Don't have to think.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: jj2007 on March 09, 2025, 08:17:49 PM
Quote from: NoCforMe on March 09, 2025, 05:48:50 PMwhy not just use EDX (RDX) to pass the other value?

That's what I do sometimes. For example, Instr_ (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1153)() returns the index in edx and the pointer to the match in eax. It's documented and it's not meant for use in C/C++ :cool:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: TimoVJL on March 10, 2025, 01:03:09 AM
Quote from: jj2007 on March 09, 2025, 08:17:49 PM
Quote from: NoCforMe on March 09, 2025, 05:48:50 PMwhy not just use EDX (RDX) to pass the other value?

That's what I do sometimes. For example, Instr_ (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1153)() returns the index in edx and the pointer to the match in eax. It's documented and it's not meant for use in C/C++ :cool:
QuoteWhen returning struct/class,
Plain old data (POD) return values 32 bits or smaller are in the EAX register
POD return values 33–64 bits in size are returned via the EAX:EDX registers.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 10, 2025, 04:02:11 AM
Ok, guys, done, but i´m in doubt on how the data should be displaced


; ---------------------------------------------------------------------------

ShiftTbl        struc ; (sizeof=0x20, mappedto_30)
ShiftTbl_Data0  dd 28
ShiftTbl_Data1  dd 24                    ; base 10
ShiftTbl_Data2  dd 20                    ; base 10
ShiftTbl_Data3  dd 16                    ; base 10
ShiftTbl_Data4  dd 12                    ; base 10
ShiftTbl_Data5  dd 8
ShiftTbl_Data6  dd 4
ShiftTbl_Data7  dd 0
ShiftTbl        ends

the same as: ShiftTbl  ShiftTbl <28, 24, 20, 16, 12, 8, 4, 0>


; =============== S U B R O U T I N E =======================================

; Attributes: bp-based frame

AsciiHex2dw_Ex4 proc near               ; CODE XREF: start+2D↑p
                                        ; .text:0042BF57↑j

Lenght          = dword ptr -4
pString         = dword ptr  8
pOutput         = dword ptr  0Ch

                push    ebp
                mov     ebp, esp
                sub     esp, 4
                push    ecx
                mov     eax, [ebp+pString]
                movdqu  xmm0, xmmword ptr [eax]
                xorps   xmm1, xmm1
                pcmpeqb xmm0, xmm1
                pmovmskb ecx, xmm0
                bsf     cx, cx
                jnz     short loc_42BF84
                mov     ecx, 16

loc_42BF84:                             ; CODE XREF: AsciiHex2dw_Ex4+1D↑j
                mov     [ebp+Lenght], ecx
                dec     ecx
                movdqu  xmm0, xmmword ptr [eax]
                psubb   xmm0, Mask1
                movdqa  xmm1, xmm0
                pcmpgtb xmm1, Mask2
                pand    xmm1, Mask3
                psubb   xmm0, xmm1
                movdqa  xmm1, xmm0
                pand    xmm1, Mask4a
                pxor    xmm0, xmm1
                psllw   xmm0, 4
                pslld   xmm0, 8
                por     xmm0, xmm1
                pshuflw xmm0, xmm0, 1Bh
                psrld   xmm0, 8
                psllw   xmm0, 8
                psrlw   xmm0, 8
                movdqa  xmm1, xmm0
                packuswb xmm0, xmm0
                pshufd  xmm1, xmm1, 1Eh
                pshuflw xmm1, xmm1, 1Bh
                packuswb xmm1, xmm1
                movd    eax, xmm0
                mov     edi, [ebp+pOutput]
                mov     ecx, ShiftTbl[ecx*4]
                cmp     [ebp+Lenght], 8
                jbe     short loc_42C014
                movd    ecx, xmm1
                mov     [edi+4], ecx
                mov     ecx, 0

loc_42C014:                             ; CODE XREF: AsciiHex2dw_Ex4+A6↑j
                shr     eax, cl
                mov     [edi], eax
                mov     eax, [ebp+Lenght]
                pop     ecx
                mov     esp, ebp
                pop     ebp
                retn    8
AsciiHex2dw_Ex4 endp
Mask 1, Mask2, Mask3 and Mask4a are the same as before. I just added a table of indexes (previously it contained all 1 indexes but since i´m in doubt i placed ony the 1st 8 here.


it works like:
[SzInputHex:  B$ "43210F2A45B7", 0 ] ; our string
[Output: Q$ 0] ; A buffer with 8 bytes long

call AsciiHex2dw_Ex4 SzInputHex, Output

The problem is that, currently it displays the data in this order:
Output dd 43210F2A
       dd 000045B7

But, is it supposed to return like this ? Is it correct or should be ?...
Output dd 00004321
       dd 0F2A45B7
or
Output dd 43210F2A
       dd 45B70000

Right ?

In eax it returns the size of the string

I´m a bit confused because on RosAsm debugger it correctly shows the return as (Which is supposed to be correct):

Output dd 43210F2A
            dd 000045B7

But Idapro debugger is showing me this in memory:

Output dd 43210F2Ah
dd 45B70000h
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 10, 2025, 04:44:47 AM
Is it producing the correct result and perhaps ida displays the order of the data on a different way than in RosAsm, or i´m missing something ?

I tested with the other version (The one that works only for a dword), and if i use a sequence, like:
    mov edi SzInputHex
    call AsciiHex2dwNew edi
    mov D$Output eax
    add edi 8
    call AsciiHex2dwNew edi
    mov D$Output+4 eax

It correctly displays the result as:
Output dd 43210F2A
       dd 000045B7

Which is the same result on the Qword version (AsciiHex2dw_Ex4 ). Btw..i´ll rename the function to AsciiHex2Qword or AsciiHex2_Ex or something like that to distinguish from the Dword version.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 10, 2025, 04:52:29 AM
Damn, forget the last post.

It´s just showing the proper result in one test i made.

The former shift table i was using during the tests was correct. I´ll review the code and set the proper indexes.

It won´t work if the input is
[SzInputHex:  B$ "6543210F2A45B7", 0 ]
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 10, 2025, 05:07:53 AM
That's alright guga. Just keep this topic open as a "Work In Progress" topic.

Once you have a version that is 100% polished and ready, you can give it its own topic to showcase it, away from all of the distractions posted here (by myself included.  :tongue: )

:thumbsup:  We have faith in you.  :smiley:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 10, 2025, 05:55:21 AM
Tks Zedd

But, what should be the proper result ? I´m a bit confused right now.

For example, say the input is this string:
[SzInputHex:  B$ "6543210F2A45B7", 0 ]

How the output should be displayed/stored ? I mean, in both dwords (This new version now saves 2 dwords). Should it be displayed as

[Output: D$ 654321
0F2A45B7]
or
[Output: D$ 6543210F
02A45B7]
or
[Output: D$ 6543210F
2A45B700]

????


Returning this:
[Output: D$ 6543210F
02A45B7]

Is the same as if i used the string twice on the Dword version of the function AsciiHex2dwNew
    mov edi SzInputHex
    call AsciiHex2dwNew edi
    mov D$Output eax
    add edi 8
    call AsciiHex2dwNew edi
    mov D$Output+4 eax

Which is the expected result considering we are reading the string from left to right. So, we 1st process the 1st 8 chars (6543210F) and then continue the remainder 6 (2A45B7).

But on the new version that handles 2 dwords at once (AsciiHex2dw_Ex4), should it produce the same result as a sequence of AsciiHex2dwNew functions, or a different one ?
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 10, 2025, 06:13:52 AM
Quote from: guga on March 10, 2025, 05:55:21 AMFor example, say the input is this string:
[SzInputHex:  B$ "6543210F2A45B7", 0 ]


the output it seems should show:
00654321h for first dword,  and  0F2A45B7h for the second dword.

or a qword of  006543210F2A45B7h
;---------------------------------------------------------------------
Otherwise you could end up with
6543210Fh  for first dword and  002A45B7h for the second NOT what you really want.

or a qword of
6543210F002A45B7h NOT what you really want.

One of the results has a single zero inserted...

[Output: D$ 6543210F
02A45B7]  which is definitely not right.

Think of the first dword as carried hex digits (carried from the last 8 bytes, or second dword)
Maybe process the rightmost 8 bytes first? That is about the best way that I can articulate the way I see this problem.

Consider the first dword and the second as if they were joined into a qword, in the order of bytes in them.
Or maybe easier to do a 64 bit version first?
This way, you will have a way to compare for correct results?
Title: Re: AsciiHextoDword (SSE2 version)
Post by: TimoVJL on March 10, 2025, 06:27:48 AM
6543210F2A45B7
==
00654321 0F2A45B7
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 10, 2025, 06:29:59 AM
Quote from: TimoVJL on March 10, 2025, 06:27:48 AM6543210F2A45B7
==
00654321 0F2A45B7
Thats what I said, in maybe too many words.  :biggrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 10, 2025, 08:02:58 AM
Thanks Guga for the interesting algorithm :thumbsup:

I rewrote it with some improvements like expanding the output string range to work with different lengths from 1 to 8 bytes. :smiley:

; MASM64 SSE2 implementation of AsciiHexToDword for a strings with length from 1 to 8 bytes

.data
align   16                                           ; Masks (16-byte aligned for XMM operations)  
    Mask1   oword 30303030303030303030303030303030h  ; '0' (0x30) repeated 16 times
    Mask2   oword 09090909090909090909090909090909h  ; 0x09 repeated 16 times (threshold for digits)
    Mask3   oword 07070707070707070707070707070707h  ; 0x07 repeated 16 times (adjustment for A-F)
    Mask4   oword 0F000F000F000F000F000F000F000F00h  ; Mask to isolate low nibbles
; Masks for lowercase conversion
    LowerMinMinus1  db 60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h   ; 'a' - 1 (0x60)
    LowerMax        db 66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h   ; 'f' (0x66)
    AdjustLowercase db 20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h   ; 0x20 to convert to uppercase
align   16
    szTest   db "0f2A45b7",0                        ; Example input (8 bytes)
    ;szTest  db "f2A45",0                           ; Example input (5 bytes)
    ;szTest  db "0f2A45b78",0                       ; Example input (9 bytes)
    ;szTest  db "f",0                               ; Example input (1 byte small case)
    ;szTest  db "0",0                               ; Example input (1 byte=0)

.code
;***************************************************;
    ; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result
    ; On entry: lea  rcx,szTest
    ;           call AsciiHexTodw_Og
;***************************************************;
align 16
AsciiHexTodw_Og PROC                                ; rcx = pointer to the input string
; Load input bytes and apply length limit to 8 bytes
    movq  xmm0, qword ptr [rcx]                     ; Load 8 bytes from input in xmm0
    mov   eax, 1                                    ; XMM0 = 0000000000000000-3762353441326630            
    cmp   byte ptr[rcx],0
    je    @Ret 
@@:
    cmp   byte ptr[rcx+rax],0
    je    @f 
    add   eax,1 
    cmp   eax,8 
    jb    @b
@@:
    lea   rcx,[rax-8]                               ; rax=8 -> rcx=0   
; Convert lowercase letters to uppercase
    movdqa   xmm1, xmm0                             ; XMM1 = 0000000000000000-3762353441326630    
    pcmpgtb  xmm1, xmmword ptr [LowerMinMinus1]     ; Check if >= 'a' (61h)
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    movdqa   xmm3, xmm0                             ; XMM3 = 0000000000000000-3762353441326630
    pcmpgtb  xmm3, xmmword ptr [LowerMax]           ; Check if > 'f' (66h)
                                                    ; XMM3 = 0000000000000000-0000000000000000  
    pxor     xmm1, xmm3                             ; XMM1 = FF where 'a' <= char <= 'f'
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00 
    pand     xmm1, xmmword ptr [AdjustLowercase]    ; Apply 20h adjustment
                                                    ; XMM1 = 0000000000000000-0020000000002000
    psubb    xmm0, xmm1                             ; Convert lowercase to uppercase
                                                    ; XMM0 = 0000000000000000-3742353441324630
; Subtract '0' to convert ASCII to numeric values
    psubb    xmm0, xmmword ptr [Mask1]              ; XMM0 now has values 0-15 (for 0-9, A-F)
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-0712050411021600
; Adjust for A-F (values > 9)
    movdqa   xmm1, xmm0                             ; XMM1 = D0D0D0D0D0D0D0D0-0712050411021600    
    pcmpgtb  xmm1, xmmword ptr [Mask2]              ; XMM1 = FF where value > 9
                                                    ; XMM1 = 0000000000000000-00FF0000FF00FF00
    pand     xmm1, xmmword ptr [Mask3]              ; Apply 7 adjustment
                                                    ; XMM1 = 0000000000000000-0007000007000700    
    psubb    xmm0, xmm1                             ; Subtract 7 from A-F values
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-070B05040A020F00
; Combine nibbles into bytes
    movdqa   xmm1, xmm0                             ; XMM1 = D0D0D0D0D0D0D0D0-070B05040A020F00
    pand     xmm1, xmmword ptr [Mask4]              ; Isolate low nibbles
                                                    ; XMM1 = 0000000000000000-070005000A000F00    
    pxor     xmm0, xmm1                             ; Isolate high nibbles
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-000B000400020000
    psllw    xmm0, 4                                ; Shift high nibbles left by 4 bits
                                                    ; XMM0 = 0D000D000D000D00-00B0004000200000
    pslld    xmm0, 8                                ; Align high nibbles
                                                    ; XMM0 = 000D0000000D0000-B000400020000000
    por      xmm0, xmm1                             ; Combine high and low nibbles
                                                    ; XMM0 = 000D0000000D0000-B70045002A000F00
    psrld    xmm0, 8                                ; Align to lower 32 bits
                                                    ; XMM0 = 00000D0000000D00-00B70045002A000F
    packuswb xmm0, xmm0                             ; Pack bytes into lower 32 bits
                                                    ; XMM0 = 00FF00FFB7452A0F-00FF00FFB7452A0F     
    neg      ecx                                    ; ecx=0   
    movd     eax,  xmm0                             ; Move result to eax
                                                    ; RAX = 00000000B7452A0F   
    shl      rcx,  2                                ; RCX = 0000000000000000 
    bswap    eax                                    ; Correct byte order in eax
                                                    ; RAX = 000000000F2A45B7  -> End Result      
    shr      eax,  cl                               ; RCX = 0000000000000000
@Ret:
    ret
AsciiHexTodw_Og ENDP
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 10, 2025, 03:24:26 PM
Tks Ognil

Can u port this to 32 Bits, pls ?  (I cannot use 64 Bit versions yet)

I tried to port the modifications you did but i failed in some values, such as:

[SzInputHex:  B$ "18F2A45B7", 0 ]

It should return
00000001 8F2A45B7

or return an error case since the string is odd, correct ?


Can u test this string on your version and tell me what is the result ?
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 10, 2025, 03:31:26 PM
Quote from: guga on March 10, 2025, 03:24:26 PMCan u test this string on your version and tell me what is the result ?
His version only works for up to an 8 byte string, guga. Not 16 bytes.

Quote from: ognil on March 10, 2025, 08:02:58 AMI rewrote it with some improvements like expanding the output string range to work with different lengths from 1 to 8 bytes. :smiley:
I have assembled his version and can comfirm, it only works for up to an 8 byte string, returning a dword value only. Inputting a longer string (up to 16 bytes) will result in only the first 8 bytes being processed.

    include \masm64\include64\masm64rt.inc

.data
align  16                                          ; Masks (16-byte aligned for XMM operations) 
    Mask1  oword 30303030303030303030303030303030h  ; '0' (0x30) repeated 16 times
    Mask2  oword 09090909090909090909090909090909h  ; 0x09 repeated 16 times (threshold for digits)
    Mask3  oword 07070707070707070707070707070707h  ; 0x07 repeated 16 times (adjustment for A-F)
    Mask4  oword 0F000F000F000F000F000F000F000F00h  ; Mask to isolate low nibbles
; Masks for lowercase conversion
    LowerMinMinus1  db 60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h  ; 'a' - 1 (0x60)
    LowerMax        db 66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h  ; 'f' (0x66)
    AdjustLowercase db 20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h  ; 0x20 to convert to uppercase
   

    string1 db "123456789ABCDEF", 0
   
.code

start proc

    invoke AsciiHexTodw_Og, addr string1

    invoke MessageBox, 0, hex$(rax), 0, 0
    invoke ExitProcess, eax
start endp

; MASM64 SSE2 implementation of AsciiHexToDword for a strings with length from 1 to 8 bytes

.code
;***************************************************;
    ; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result
    ; On entry: lea  rcx,szTest
    ;          call AsciiHexTodw_Og
;***************************************************;
align 16
AsciiHexTodw_Og PROC                                ; rcx = pointer to the input string
; Load input bytes and apply length limit to 8 bytes
    movq  xmm0, qword ptr [rcx]                    ; Load 8 bytes from input in xmm0
    mov  eax, 1                                    ; XMM0 = 0000000000000000-3762353441326630           
    cmp  byte ptr[rcx],0
    je    @Ret
@@:
    cmp  byte ptr[rcx+rax],0
    je    @f
    add  eax,1
    cmp  eax,8
    jb    @b
@@:
    lea  rcx,[rax-8]                              ; rax=8 -> rcx=0 
; Convert lowercase letters to uppercase
    movdqa  xmm1, xmm0                            ; XMM1 = 0000000000000000-3762353441326630   
    pcmpgtb  xmm1, xmmword ptr [LowerMinMinus1]    ; Check if >= 'a' (61h)
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    movdqa  xmm3, xmm0                            ; XMM3 = 0000000000000000-3762353441326630
    pcmpgtb  xmm3, xmmword ptr [LowerMax]          ; Check if > 'f' (66h)
                                                    ; XMM3 = 0000000000000000-0000000000000000 
    pxor    xmm1, xmm3                            ; XMM1 = FF where 'a' <= char <= 'f'
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    pand    xmm1, xmmword ptr [AdjustLowercase]    ; Apply 20h adjustment
                                                    ; XMM1 = 0000000000000000-0020000000002000
    psubb    xmm0, xmm1                            ; Convert lowercase to uppercase
                                                    ; XMM0 = 0000000000000000-3742353441324630
; Subtract '0' to convert ASCII to numeric values
    psubb    xmm0, xmmword ptr [Mask1]              ; XMM0 now has values 0-15 (for 0-9, A-F)
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-0712050411021600
; Adjust for A-F (values > 9)
    movdqa  xmm1, xmm0                            ; XMM1 = D0D0D0D0D0D0D0D0-0712050411021600   
    pcmpgtb  xmm1, xmmword ptr [Mask2]              ; XMM1 = FF where value > 9
                                                    ; XMM1 = 0000000000000000-00FF0000FF00FF00
    pand    xmm1, xmmword ptr [Mask3]              ; Apply 7 adjustment
                                                    ; XMM1 = 0000000000000000-0007000007000700   
    psubb    xmm0, xmm1                            ; Subtract 7 from A-F values
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-070B05040A020F00
; Combine nibbles into bytes
    movdqa  xmm1, xmm0                            ; XMM1 = D0D0D0D0D0D0D0D0-070B05040A020F00
    pand    xmm1, xmmword ptr [Mask4]              ; Isolate low nibbles
                                                    ; XMM1 = 0000000000000000-070005000A000F00   
    pxor    xmm0, xmm1                            ; Isolate high nibbles
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-000B000400020000
    psllw    xmm0, 4                                ; Shift high nibbles left by 4 bits
                                                    ; XMM0 = 0D000D000D000D00-00B0004000200000
    pslld    xmm0, 8                                ; Align high nibbles
                                                    ; XMM0 = 000D0000000D0000-B000400020000000
    por      xmm0, xmm1                            ; Combine high and low nibbles
                                                    ; XMM0 = 000D0000000D0000-B70045002A000F00
    psrld    xmm0, 8                                ; Align to lower 32 bits
                                                    ; XMM0 = 00000D0000000D00-00B70045002A000F
    packuswb xmm0, xmm0                            ; Pack bytes into lower 32 bits
                                                    ; XMM0 = 00FF00FFB7452A0F-00FF00FFB7452A0F   
    neg      ecx                                    ; ecx=0 
    movd    eax,  xmm0                            ; Move result to eax
                                                    ; RAX = 00000000B7452A0F 
    shl      rcx,  2                                ; RCX = 0000000000000000
    bswap    eax                                    ; Correct byte order in eax
                                                    ; RAX = 000000000F2A45B7  -> End Result     
    shr      eax,  cl                              ; RCX = 0000000000000000
@Ret:
    ret
AsciiHexTodw_Og ENDP

end

Quote from: ognil on March 10, 2025, 08:02:58 AM; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result


input = "123456789ABCDEF"    <-------  the bold chars here guga, will never get processed using ognils code.
result = 12345678h  in rax/eax

:biggrin:

I did try to convert it to 32 bit, but had too many issues with doing that. I don't have enough coding mojo I guess.   :toothy: 
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 12:13:04 AM
Quote from: zedd151 on March 10, 2025, 03:31:26 PM
Quote from: guga on March 10, 2025, 03:24:26 PMCan u test this string on your version and tell me what is the result ?
His version only works for up to an 8 byte string, guga. Not 16 bytes.

Quote from: ognil on March 10, 2025, 08:02:58 AMI rewrote it with some improvements like expanding the output string range to work with different lengths from 1 to 8 bytes. :smiley:
I have assembled his version and can comfirm, it only works for up to an 8 byte string, returning a dword value only. Inputting a longer string (up to 16 bytes) will result in only the first 8 bytes being processed.

    include \masm64\include64\masm64rt.inc

.data
align  16                                          ; Masks (16-byte aligned for XMM operations) 
    Mask1  oword 30303030303030303030303030303030h  ; '0' (0x30) repeated 16 times
    Mask2  oword 09090909090909090909090909090909h  ; 0x09 repeated 16 times (threshold for digits)
    Mask3  oword 07070707070707070707070707070707h  ; 0x07 repeated 16 times (adjustment for A-F)
    Mask4  oword 0F000F000F000F000F000F000F000F00h  ; Mask to isolate low nibbles
; Masks for lowercase conversion
    LowerMinMinus1  db 60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h  ; 'a' - 1 (0x60)
    LowerMax        db 66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h  ; 'f' (0x66)
    AdjustLowercase db 20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h  ; 0x20 to convert to uppercase
   

    string1 db "123456789ABCDEF", 0
   
.code

start proc

    invoke AsciiHexTodw_Og, addr string1

    invoke MessageBox, 0, hex$(rax), 0, 0
    invoke ExitProcess, eax
start endp

; MASM64 SSE2 implementation of AsciiHexToDword for a strings with length from 1 to 8 bytes

.code
;***************************************************;
    ; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result
    ; On entry: lea  rcx,szTest
    ;          call AsciiHexTodw_Og
;***************************************************;
align 16
AsciiHexTodw_Og PROC                                ; rcx = pointer to the input string
; Load input bytes and apply length limit to 8 bytes
    movq  xmm0, qword ptr [rcx]                    ; Load 8 bytes from input in xmm0
    mov  eax, 1                                    ; XMM0 = 0000000000000000-3762353441326630           
    cmp  byte ptr[rcx],0
    je    @Ret
@@:
    cmp  byte ptr[rcx+rax],0
    je    @f
    add  eax,1
    cmp  eax,8
    jb    @b
@@:
    lea  rcx,[rax-8]                              ; rax=8 -> rcx=0 
; Convert lowercase letters to uppercase
    movdqa  xmm1, xmm0                            ; XMM1 = 0000000000000000-3762353441326630   
    pcmpgtb  xmm1, xmmword ptr [LowerMinMinus1]    ; Check if >= 'a' (61h)
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    movdqa  xmm3, xmm0                            ; XMM3 = 0000000000000000-3762353441326630
    pcmpgtb  xmm3, xmmword ptr [LowerMax]          ; Check if > 'f' (66h)
                                                    ; XMM3 = 0000000000000000-0000000000000000 
    pxor    xmm1, xmm3                            ; XMM1 = FF where 'a' <= char <= 'f'
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    pand    xmm1, xmmword ptr [AdjustLowercase]    ; Apply 20h adjustment
                                                    ; XMM1 = 0000000000000000-0020000000002000
    psubb    xmm0, xmm1                            ; Convert lowercase to uppercase
                                                    ; XMM0 = 0000000000000000-3742353441324630
; Subtract '0' to convert ASCII to numeric values
    psubb    xmm0, xmmword ptr [Mask1]              ; XMM0 now has values 0-15 (for 0-9, A-F)
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-0712050411021600
; Adjust for A-F (values > 9)
    movdqa  xmm1, xmm0                            ; XMM1 = D0D0D0D0D0D0D0D0-0712050411021600   
    pcmpgtb  xmm1, xmmword ptr [Mask2]              ; XMM1 = FF where value > 9
                                                    ; XMM1 = 0000000000000000-00FF0000FF00FF00
    pand    xmm1, xmmword ptr [Mask3]              ; Apply 7 adjustment
                                                    ; XMM1 = 0000000000000000-0007000007000700   
    psubb    xmm0, xmm1                            ; Subtract 7 from A-F values
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-070B05040A020F00
; Combine nibbles into bytes
    movdqa  xmm1, xmm0                            ; XMM1 = D0D0D0D0D0D0D0D0-070B05040A020F00
    pand    xmm1, xmmword ptr [Mask4]              ; Isolate low nibbles
                                                    ; XMM1 = 0000000000000000-070005000A000F00   
    pxor    xmm0, xmm1                            ; Isolate high nibbles
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-000B000400020000
    psllw    xmm0, 4                                ; Shift high nibbles left by 4 bits
                                                    ; XMM0 = 0D000D000D000D00-00B0004000200000
    pslld    xmm0, 8                                ; Align high nibbles
                                                    ; XMM0 = 000D0000000D0000-B000400020000000
    por      xmm0, xmm1                            ; Combine high and low nibbles
                                                    ; XMM0 = 000D0000000D0000-B70045002A000F00
    psrld    xmm0, 8                                ; Align to lower 32 bits
                                                    ; XMM0 = 00000D0000000D00-00B70045002A000F
    packuswb xmm0, xmm0                            ; Pack bytes into lower 32 bits
                                                    ; XMM0 = 00FF00FFB7452A0F-00FF00FFB7452A0F   
    neg      ecx                                    ; ecx=0 
    movd    eax,  xmm0                            ; Move result to eax
                                                    ; RAX = 00000000B7452A0F 
    shl      rcx,  2                                ; RCX = 0000000000000000
    bswap    eax                                    ; Correct byte order in eax
                                                    ; RAX = 000000000F2A45B7  -> End Result     
    shr      eax,  cl                              ; RCX = 0000000000000000
@Ret:
    ret
AsciiHexTodw_Og ENDP

end

Quote from: ognil on March 10, 2025, 08:02:58 AM; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result


input = "123456789ABCDEF"    <-------  the bold chars here guga, will never get processed using ognils code.
result = 12345678h  in rax/eax

:biggrin:

I did try to convert it to 32 bit, but had too many issues with doing that. I don't have enough coding mojo I guess.   :toothy: 

Hi  zedd151

His version has some minor flaws, but i´m trying to adjust it to my code that handles 2 dwords at once. I liked the way he used bswap (on the cost of only 2 clock cycles i presume) and saved to use other opcodes from SSE, making the algo a bit shorter (And perhaps a bit faster). The problem relies for strings longer than 8 bytes. I´m reviewing the tables in order to adjust for position and shift when a string is identified as being longer than 8 chars. I´m just concerned in how much performance loss it will result, but i guess i´m closer to a solution.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 04:31:22 AM
Hey guga, since you are effectively trying to convert ascii hex to a qword (via two dwords) maybe the topic title should be changed?

Either AsciiHextoDwords (with an 's' at the end) or alternatively AsciiHextoQword???
Maybe that is why ognil only processed a single dword in his code? "AsciiHextoDword (SSE2 version)"
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 05:04:40 AM
maybe....I started with a Dword, now i´m doing the qwrod version and later will do one for all sizes...Perhaps changing the title to HexAscii Conversions (or something).

Btw...i suceeded to make it work for the qword version. I suceed to fix Ognil optimization for 32 Bits...but didn´t tested yet to see if it really optimized the function in terms of speed.

I´ll try to port it to Masm and make it work with JJ´s benchmark app which is better for this sort of tests IMHO. I´pm not used yet with masmbasic but probably i can give a try to see if i can make it work on his tool so we can test the algorithm.

Although i liked the usage of bswap i don´t know if it will be good or bad for the algorithm speed itself. Once i succeed to port i´ll post it here the new version
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 05:49:15 AM
Quote from: guga on March 11, 2025, 05:04:40 AMPerhaps changing the title to HexAscii Conversions (or something).
I think you can think of a good name for what it will do.  :smiley:  I only offered a suggestion.

Quote from: guga on March 11, 2025, 05:04:40 AMOnce i succeed to port i´ll post it here the new version
:thumbsup:
I wouldn't worry about speed so much. It's more important that it works first, exactly the way that you intend it to work.
Adjustments for speed can always come later.

But, is being faster really necessary though? It would only need to be super fast if it is called many, many times (100's, 1000's or more times) within a given program. If used only once or twice in a program, speed won't make much difference overall. Unless of course it takes way too long to do the conversion only once or twice, like several seconds.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 07:00:50 AM
Hi Zedd

I´m trying to make it for speed in order to use it in RosAsm. Currently i´m trying to fix some old bugs and make it working better, but it is hard since all internal code is a mess, and making the functions works independently, i mean without the needs to reuse hundreds of global variables is a true hell. My goal is to make dlls for rosasm , such as one for the encoder,decoder, disassembler, resources editor. The problem is that when we started with RosAsm, we chose at the time to make it work that way, and we allowed contributors to directly code on it, which lead to several bugs. Even considering that the contributors could work on RosAsm, the ideal was that they used the same coding style, but, it didn´t happened that way. Several major functions in RosAsm was made by different people and many of them (if not all) reused several global variables that, by default we should make them Local.

That´s the reason why we never was able to make it as a dll etc,. I succeeded to isolate hundreds of functions, and improve them on their own dlls, such as RosMem.dll (A memory management library that can be used not only for rosAsm but for other purposes), a FastCRT dll, FastMath.dl etc...now i need to do the same for the encoder, etc. Which is a true hell and after working on it a while, i tend to get bored continuing and take a long time to work on it again.

The problam is that i need now that RosAsm be as fixed as possible, specially because i plan to create some plugins for Sony Vegas, VirtualDub, Audacity, etc. And it´s not a easy task to do with the current RosAsm development. Not to mention that i was never able to implement a 64bit version of it.


Anyway...hjere is the new version that works for qword (I hope the porting to masm is ok this time)

; ---------------------------------------------------------------------------
ShiftTbl        struc
Distance        db ?
Shift           db ?
IsQword         db ?
Reserved        db ?
ShiftTbl        ends

MaskOddAdjust   oword 30h

ShiftTbl2       ShiftTbl <0, 0, 0, 0>
                ShiftTbl <0, 24, 0, 0>
                ShiftTbl 2 dup(<0, 16, 0, 0>)
                ShiftTbl 2 dup(<0, 8, 0, 0>)
                ShiftTbl 2 dup(<0, 0, 0, 0>)
                ShiftTbl 2 dup(<1, 24, 1, 0>)
                ShiftTbl 2 dup(<2, 16, 1, 0>)
                ShiftTbl 2 dup(<3, 8, 1, 0>)
                ShiftTbl 2 dup(<4, 0, 1, 0>)

MaskOddAdjust   oword 30h
Mask1           oword 30303030303030303030303030303030h
Mask2           oword 9090909090909090909090909090909h
Mask3           oword 7070707070707070707070707070707h
Mask4a          oword 0F000F000F000F000F000F000F000F00h


AsciiHex2dw_Ex5 proc near
TmpStorage1Dis  = dword ptr -18h
TmpStorage2Dis  = dword ptr -14h
TmpStorage3Dis  = dword ptr -10h
TmpStorage      = dword ptr -8
Lenght          = dword ptr -4
pString         = dword ptr  8
pOutput         = dword ptr  0Ch

                push    ebp
                mov     ebp, esp
                sub     esp, 4
                sub     esp, 14h
                mov     [ebp+TmpStorage], esp
                push    ecx
                push    edi
                push    esi
                mov     eax, [ebp+TmpStorage]
                mov     [ebp+TmpStorage1Dis], 0
                mov     [ebp+TmpStorage2Dis], 0
                mov     [ebp+TmpStorage3Dis], 0
                mov     eax, [ebp+pString]
                movdqu  xmm0, qword ptr [eax]
                xorps   xmm1, xmm1
                pcmpeqb xmm0, xmm1
                pmovmskb ecx, xmm0
                bsf     cx, cx
                jnz     short loc_42C2A4
                mov     ecx, 10h

loc_42C2A4:                             ; CODE XREF: AsciiHex2dw_Ex5+3D↑j
                mov     [ebp+Lenght], ecx
                movdqu  xmm0, qword ptr [eax]
                mov     edi, [ebp+TmpStorage]
                test    ecx, 1
                jz      short loc_42C2C7
                movdqu  qword ptr [edi+1], xmm0
                movdqu  xmm0, qword ptr [edi]
                por     xmm0, MaskOddAdjust

loc_42C2C7:                             ; CODE XREF: AsciiHex2dw_Ex5+54↑j
                psubb   xmm0, Mask1
                movdqa  xmm1, xmm0
                pcmpgtb xmm1, Mask2
                pand    xmm1, Mask3
                psubb   xmm0, xmm1
                movdqa  xmm1, xmm0
                pand    xmm1, Mask4a
                pxor    xmm0, xmm1
                psllw   xmm0, 4
                pslld   xmm0, 8
                por     xmm0, xmm1
                psrld   xmm0, 8
                packuswb xmm0, xmm0
                movdqu  xmmword ptr [edi], xmm0
                dec     ecx
                mov     ecx, dword ptr ShiftTbl2.Distance[ecx*4]
                movzx   eax, cl
                mov     eax, [eax+edi]
                bswap   eax
                mov     esi, [ebp+pOutput]
                test    ecx, 10000h
                jz      short loc_42C334
                mov     [esi+4], eax
                mov     eax, [edi]
                bswap   eax

loc_42C334:                             ; CODE XREF: AsciiHex2dw_Ex5+CB↑j
                movzx   ecx, ch
                shr     eax, cl
                mov     [esi], eax
                mov     eax, [ebp+Lenght]
                pop     esi
                pop     edi
                pop     ecx
                mov     esp, ebp
                pop     ebp
                retn    8
AsciiHex2dw_Ex5 endp




The RosAsm syntax is:


[ShiftTbl2:
ShiftTbl2.Data0: B$ 0, 0, 0, 0 ; Length = 1 ; OK. (2nd byte was 28)
ShiftTbl2.Data1: B$ 0, 24, 0, 0 ; Length = 2 ; ok
ShiftTbl2.Data2: B$ 0, 16, 0, 0 ; Length = 3 ; OK. (2nd byte was 20)
ShiftTbl2.Data3: B$ 0, 16, 0, 0 ; Length = 4; ok
ShiftTbl2.Data4: B$ 0, 8, 0, 0 ; Length = 5 ; OK. (2nd byte was 12)
ShiftTbl2.Data5: B$ 0, 8, 0, 0 ; Length = 6 ; ok
ShiftTbl2.Data6: B$ 0, 0, 0, 0 ; Length = 7 ; OK. (2nd byte was 4)
ShiftTbl2.Data7: B$ 0, 0, 0, 0 ; Length = 8; OK

; now the 2Nd dword (Distance is from the 2nd dword) and ch for the shr the 1st dword
ShiftTbl2.Data8: B$ 1, 24, 1, 0 ; Length = 9 ; OK
ShiftTbl2.Data9: B$ 1, 24, 1, 0 ; Length = 10 ; OK
ShiftTbl2.Data10: B$ 2, 16, 1, 0 ; Length = 11 ; OK
ShiftTbl2.Data11: B$ 2, 16, 1, 0 ; Length = 12 ; OK
ShiftTbl2.Data12: B$ 3, 8, 1, 0 ; Length = 13 ; OK
ShiftTbl2.Data13: B$ 3, 8, 1, 0 ; Length = 14 ; OK
ShiftTbl2.Data14: B$ 4, 0, 1, 0 ; Length = 15 ; OK
ShiftTbl2.Data15: B$ 4, 0, 1, 0];0 ]; Length = 16 ecx = 0 pos = 0

; 1st byte = distance. 2nd byte = Shift, 3rd Byte = Flag for size. If True, size > 8 bytes. False, other wise

[<16 MaskOddAdjust: Q$ 030, 0, 0, 0]  ; '0'

[<16 Mask1: Q$ 030303030_30303030, 030303030_30303030]  ; '0'
[<16 Mask2: Q$ 09090909_09090909, 09090909_09090909]  ; '9'
[<16 Mask3: Q$ 07070707_07070707, 07070707_07070707]  ; '7'
[<16 Mask4a: Q$ 0F_00_0F_00_0F_00_0F_00, 0F_00_0F_00_0F_00_0F_00]  ; 0x0F_00_0F_00

[HEXCNV_LONG_STR 00__0000_0001__0000_0000__0000_0000]

Proc AsciiHex2dw_Ex5:
    Arguments @pString, @pOutput
    Local @Lenght
    Structure @TmpStorage 16, @TmpStorage1Dis 0, @TmpStorage2Dis 4, @TmpStorage3Dis 8
    Uses ecx, edi, esi


    mov eax D@TmpStorage | mov D@TmpStorage1Dis 0 | mov D@TmpStorage2Dis 0 | mov D@TmpStorage3Dis 0
    mov eax, D@pString
    movdqu xmm0, X$eax         ; Loads 8 bytes of the string into XMM0
                               ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII
                               ; Words: 0000 0000 0000 0000 3742 3534 4132 4630

    ; get the size of the string to calculate a index to be shifted at the end
    xorps xmm1 xmm1 | pcmpeqb xmm0 xmm1 | pmovmskb ecx xmm0 | bsf cx cx | jnz L1> | mov ecx 16 | L1:
    mov D@Lenght ecx
;;
; qword
String: 876543210F2A45B7    - CH: 64 - CL:  4 - CX: 16388
String: 76543210F2A45B7     - CH: 4  - CL: 64 - CX: 1088
String: 6543210F2A45B7      - CH: 8  - CL: 60 - CX: 2108
String: 543210F2A45B7       - CH: 12 - CL: 56 - CX: 3128
String: 43210F2A45B7        - CH: 16 - CL: 52 - CX: 4148
String: 3210F2A45B7         - CH: 20 - CL: 48 - CX: 5168
String: 210F2A45B7          - CH: 24 - CL: 44 - CX: 6188
String: 10F2A45B7           - CH: 28 - CL: 40 - CX: 7208
;dword
String: 0F2A45B7            - CH: 32 - CL: 36 - CX: 8228
String: F2A45B7             - CH: 36 - CL: 32 - CX: 9248
String: 2A45B7              - CH: 40 - CL: 28 - CX: 10268
String: A45B7               - CH: 44 - CL: 24 - CX: 11288
String: 45B7                - CH: 48 - CL: 20 - CX: 12308
String: 5B7                 - CH: 52 - CL: 16 - CX: 13328
String: B7                  - CH: 56 - CL: 12 - CX: 14348
String: 7                   - CH: 60 - CL:  8 - CX: 15368


; qword
String: 876543210F2A45B7    - CH: 64    - CL:  0 - CX: 16384
String: 76543210F2A45B7     - CH: 4     - CL: 60 - CX: 1084
String: 6543210F2A45B7      - CH: 8     - CL: 56 - CX: 2104
String: 543210F2A45B7       - CH: 12    - CL: 52 - CX: 3124
String: 43210F2A45B7        - CH: 16    - CL: 48 - CX: 4144
String: 3210F2A45B7         - CH: 20    - CL: 44 - CX: 5164
String: 210F2A45B7          - CH: 24    - CL: 40 - CX: 6184
String: 10F2A45B7           - CH: 28    - CL: 36 - CX: 7204

;dword
String: 0F2A45B7            - CH: 32    - CL: 32 - CX: 8224
String: F2A45B7             - CH: 36    - CL: 28 - CX: 9244
String: 2A45B7              - CH: 40    - CL: 24 - CX: 10264
String: A45B7               - CH: 44    - CL: 20 - CX: 11284
String: 45B7                - CH: 48    - CL: 16 - CX: 12304
String: 5B7                 - CH: 52    - CL: 12 - CX: 13324
String: B7                  - CH: 56    - CL:  8 - CX: 14344
String: 7                   - CH: 60    - CL:  4 - CX: 15364



    Examples:
    0F2A45B7 = shr eax 0,  ax = 8 => 32-32 = 32-(8*4) = 0*4
     F2A45B7 = shr eax 4,  ax = 7 => 32-28 = 32-(7*4) = 1*4
      2A45B7 = shr eax 8,  ax = 6 => 32-24 = 32-(6*4) = 2*4
       A45B7 = shr eax 12, ax = 5 => 32-20 = 32-(5*4) = 3*4
        45B7 = shr eax 16, ax = 4 => 32-16 = 32-(4*4) = 4*4
         5B7 = shr eax 20, ax = 3 => 32-12 = 32-(3*4) = 5*4
          B7 = shr eax 24, ax = 2 => 32-8  = 32-(2*4) = 6*4
           7 = shr eax 28, ax = 1 => 32-4  = 32-(1*4) = 7*4
;;

    movdqu xmm0, X$eax
    mov edi D@TmpStorage
    Test_If ecx 00_0000_0001; Check if the lenght of the number is odd and adjust the input accordly. So, check for 9, 11, 13, 15
        movdqu X$edi+1 xmm0 | movdqu xmm0 X$edi | por xmm0 X$MaskOddAdjust ; OR with an '0' at the end
    Test_End

    ; Subtract '0'
    psubb xmm0, X$Mask1         ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                ; Mask1: 3030 3030 3030 3030 3030 3030 3030 3030
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    pcmpgtb xmm1, X$Mask2       ; Compares each byte with '9' to identify A-F
                                ; Mask2: 0909 0909 0909 0909 0909 0909 0909 0909
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 00FF 0000 FF00 FF00 (FF where > 9)

    pand xmm1, X$Mask3          ; Applies a 7 correction to bytes > 9 (A-F)
                                ; Mask3: 0707 0707 0707 0707 0707 0707 0707 0707
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700 (where 7 is the settled on the bytes greater that was greater then 7)

    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700

    ; Combine nibbles into bytes

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00

    pand xmm1, X$Mask4a         ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                ; Mask4a: 0F00 0F00 0F00 0F00 0F00 0F00 0F00 0F00
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pxor xmm0 xmm1              ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 000B 0004 0002 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psllw xmm0 4                ; Shifts high nibbles 4 bits left
                                ; XMM0: D000 D000 D000 D000 00B0 0040 0020 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pslld xmm0 8                ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 000D 0000 000D 0000 B000 4000 2000 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    por xmm0 xmm1               ; Combines high and low nibbles
                                ; XMM0: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psrld    xmm0, 8
    packuswb xmm0, xmm0

    movdqu X$edi xmm0 ; save it to TmpStorage
    dec ecx ; our index to the shift table
    mov ecx D$ShiftTbl2+ecx*4

    ; 1st calculate distance (1st byte in ShiftTbl2)
    ; cl distance, ch = shift
    movzx eax cl
    mov eax D$edi+eax
    ; Reverse the value to be stores either in 1st dword (If string is less or equal to 8 bytes)
    ; or in 2nd Dword (If string is bigger than 8 bytes
    bswap eax
    mov esi D@pOutput
    ; Now check if the string is bigger than 8 bytes (3rd byte in ShiftTbl2)
    Test_If ecx HEXCNV_LONG_STR ;   010000
        ; get distance and shift for the 2nd dword
        mov D$esi+4 eax
        mov eax D$edi
        bswap eax
    Test_End
    ; Now calculate the shift (2nd byte in ShiftTbl2)
    movzx ecx ch
    shr eax cl
    ; store it either in 1st dword (If string is less or equal to 8 bytes) or in the 2nd Dword (If string is bigger than 8 bytes)
    mov D$esi eax

    mov eax D@Lenght

EndP


And here goes JJ´s version. (I just renamed on his app) the function to AsciiHex2Qword - but it i the same as this one. It stores the output on a buffer pointed by a parameter (output) and in eax it return the lenght of the input.   

Btw...JJ can u fix the code on your app in order o it shows the proper values on return ? I couldn´t find inside the ".asc" file where i could change it to pass the result stored in the parameter Output (A buffer containing 2 dwords)

The speed seems as fast as the previous version :)


Note:

Now it shows the proper result order, i suppose.

Ex:

[SzInputHex:  B$ "543210F2A45B7", 0 ]

[Output: D$ 0 #2] ; 16 bytes = 2 Dwords

    call AsciiHex2dw_Ex5 SzInputHex, Output

eax = 13 bytes
Output = 1st dword = 054321, 2nd dword = 0F2A45B7
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 07:40:38 AM
for several instructions such as

"movdqu  xmm0, qword ptr [eax]"
"movdqu  qword ptr [edi+1], xmm0"

guga2.asm(67) : error A2022:instruction operands must be the same size
You might need some additional help here. My knowledge of SSE is very, very limited. Practically zero.  :tongue:

Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 11, 2025, 07:59:12 AM
Hi Guga,

Yesterday I saw your last version of algo. Congratulations! :thumbsup:

I want to ask a stupid question to everyone:
1. Which masochist would write such a long string as szTest db "6543210F2A45B7", 0, to get the same result, instead of leaving only the numbers and putting an "h" at the end?
Who, when, where and for what would practically use such a large QWORD number?
Please give an example! :badgrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 08:01:37 AM
Quote from: zedd151 on March 11, 2025, 07:40:38 AMfor several instructions such as

"movdqu  xmm0, qword ptr [eax]"
"movdqu  qword ptr [edi+1], xmm0"

guga2.asm(67) : error A2022:instruction operands must be the same size
You might need some additional help here. My knowledge of SSE is very, very limited. Practically zero.  :tongue:


Hi Zedd

Use the version i assembled with JJ´s app. I don´t remember what are the necessary configuration to assemble it in masm. For such routines in SSE i normally use JJ´s Masm basic benchmark routine.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 08:03:48 AM
Quote from: guga on March 11, 2025, 08:01:37 AM
Quote from: zedd151 on March 11, 2025, 07:40:38 AMfor several instructions such as

"movdqu  xmm0, qword ptr [eax]"
"movdqu  qword ptr [edi+1], xmm0"

guga2.asm(67) : error A2022:instruction operands must be the same size
You might need some additional help here. My knowledge of SSE is very, very limited. Practically zero.  :tongue:


Hi Zedd

Use the version i assembled with JJ´s app. I don´t remember what are the necessary configuration to assemble it in masm. For such routines in SSE i normally use JJ´s Masm basic benchmark routine.
Okay, if I can untangle that mess.  :tongue:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 08:05:03 AM
Quote from: ognil on March 11, 2025, 07:59:12 AMI want to ask a stupid question to everyone:
1. Which masochist would write such a long string as szTest db "6543210F2A45B7", 0, to get the same result, instead of
Do you mean this?
Quote from: guga on March 10, 2025, 05:55:21 AMFor example, say the input is this string:
[SzInputHex:  B$ "6543210F2A45B7", 0 ]
Play nice, ognil.

He is working on converting ascii hex ---> qword (but as 2 dwords)

Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 08:24:43 AM
Quote from: ognil on March 11, 2025, 07:59:12 AMHi Guga,

Yesterday I saw your last version of algo. Congratulations! :thumbsup:

I want to ask a stupid question to everyone:
1. Which masochist would write such a long string as szTest db "6543210F2A45B7", 0, to get the same result, instead of leaving only the numbers and putting an "h" at the end?
Who, when, where and for what would practically use such a large QWORD number?
Please give an example! :badgrin:

Ognil .... Lingo ? Is that u ?  :biggrin:  :biggrin:  :biggrin:

Tks, lingo.

About your question, well..i can think a few tools, such as apps that uses pattern recognition, for example. Idapro uses a system where such functions can be useful. On Ida, for example, there's a tool called Flirt which is basically a pattern recognition system for files. Some signatures used in ida also comes in the form of text and it may contains hexadecimal strings larger than 50, 100 bytes etc etc
Me and René started a similar system 20 years ago, but never finished those things. Other old tools uses pattern recognition as text files, if i remember Peid used that too. But, such functions can be useful for someone who needs a faster way to convert those text files for their databases or something.

I don´t know if in games it do uses such text files containing hexadecimal values to be converted internally, but it may also exists.

Nobody knows when this could be useful until it is needed, so ...why don't we take the 1st step and write some functions that maybe useful for others and yet be really fast?


Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 11, 2025, 08:27:09 AM
Zedd151,
you didn't answer my question :undecided:


Quote"movdqu  xmm0, qword ptr [eax]"
"movdqu  qword ptr [edi+1], xmm0"

try:
"movdqu  xmm0, oword ptr [RAX]"
"movdqu  oword ptr [RDI+1], xmm0"

Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 08:32:15 AM
Quote from: guga on March 11, 2025, 08:01:37 AMUse the version i assembled with JJ´s app.

attached...
I am not seeing both dwords as result.  :sad:

Check if I missed something copying over the code and data
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 08:35:19 AM
Quote from: ognil on March 11, 2025, 08:27:09 AMyou didn't answer my question :undecided:
What question was addressed to me?
If it was concerning such large qwords, I believe guga covered that. This is not MY project, but gugas.

Quote from: ognil on March 11, 2025, 08:27:09 AMtry:
"movdqu  xmm0, oword ptr [RAX]"
"movdqu  oword ptr [RDI+1], xmm0"

In 32 bit???  :eusa_naughty:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 08:43:09 AM
Ognil/Lingo, I have to wholeheartedly agree with you here.
I can't really see the utility of this function either. When in real life would someone need such a conversion?
And if needed, who would care how fast it is? Not like anyone's going to be converting millions of such fields from a database, eh?
Interesting mental exercise, but that's about it.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 08:56:17 AM
Quote from: zedd151 on March 11, 2025, 08:32:15 AM
Quote from: guga on March 11, 2025, 08:01:37 AMUse the version i assembled with JJ´s app.

attached...
I am not seeing both dwords as result.  :sad:

Check if I missed something copying over the code and data

It´s not showing both results, because i dont know exactly how make the output for them in masmbasic. JJ can help, because he knows where and how use his benchmark tool in a way to export (show) the results.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 09:05:41 AM
Quote from: guga on March 11, 2025, 08:56:17 AMIt´s not showing both results, because i dont know exactly how make the output for them in masmbasic.
The code I attached in post #65 is pure Masm32.
Just your function and its data, and a Message Box to display the results.

From your comments, I thought Output would be for two dwords, one after the other... ?? unless I am missing some detail.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 11, 2025, 09:16:59 AM
zedd151,

If you are really translating from 32bit to 64bit
you will know that there is no such register in RAX and RDI in MACM32. :sad:
I missed Guga's answer because he mentions the name of IDA without having the source code of IDA or IDA64. :smiley:

NoCforMe,

Thanks for the correct answer. :thumbsup:
Now take a break and go for a walk in nature :badgrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 09:19:04 AM
Quote from: ognil on March 11, 2025, 09:16:59 AMIf you are really translating from 32bit to 64bit
No! I assembled YOUR 64 bit version (https://masm32.com/board/index.php?msg=136790)  to test the results. I did no such conversions here in this topic. guga had wanted you to port your version to 32 bit so that he can test the results. I did the next best thing, and assembled your 64 bit code (thank you, btw) to test it for guga, to see if the results matched the results from his version.   :badgrin:

And yes I know the difference between 32 bit and 64 bit maximum sized registers.  :rolleyes:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 09:31:16 AM
Quote from: ognil on March 11, 2025, 09:16:59 AMzedd151,

If you are really translating from 32bit to 64bit
you will know that there is no such register in RAX and RDI in MACM32. :sad:
I missed Guga's answer because he mentions the name of IDA without having the source code of IDA or IDA64. :smiley:

NoCforMe,

Thanks for the correct answer. :thumbsup:
Now take a break and go for a walk in nature :badgrin:

IdaPro is not OpenSource. The flirt system is well known since they started a long time ago. I just gave an example of a tool i know that uses (Import and export) such text formats (not only in Binary data).

Btw...about the results...here is a new version of the simple benchmark i made for RosAsm that can displays the results until the asc file is fixed (From JJ or others that are used with masmbasic syntax).

Since i´ll use the tool inside RosAsm and also on a dll, i´ll write the necessary error flags and also see how much performance can be delayed if i include the routines to convert the inputted string to make the function works case insensitive.

Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 09:34:46 AM
0 cycles -> StrLenW_Guga ANSI,  Return in EAX: 0
14 cycles -> StrLenW_Lingo ANSI,  Return in EAX: 100
28 cycles -> StrLenW_Guga No SAR,  Return in EAX: 200
20 cycles -> StrLenW_Lingo No SAR,  Return in EAX: 200
16 cycles -> StrLenW_Guga with SAR,  Return in EAX: 34
14 cycles -> StrLenW_Lingo with SAR,  Return in EAX: 34




17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7

17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: A45B7 . Return in EAX: A45B7

15 cycles -> Ascii Hex to Dw by Guga (Old version - fixed Lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7



52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 543210F2A45B7 . Return in EAX: 13 (Bytes)
Output:
D$ 54321
D$ F2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 18F2A45B7 . Return in EAX: 9 (Bytes)
Output:
D$ 1
D$ 8F2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 76543210F2A45B7 . Return in EAX: 15 (Bytes)
Output:
D$ 7654321
D$ F2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 76543210F2A45B7 . Return in EAX: 15 (Bytes)
Output:
D$ 7654321
D$ F2A45B7

25 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 876543210F2A45B7 . Return in EAX: 16 (Bytes)
Output:
D$ 87654321
D$ F2A45B7

:azn:  Looks good so far. Unless you might want to print the leading zeroes for the rightmost 8 bytes so that the two dwords printed in sequence appears like a qword.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 10:03:13 AM
The leading zeroes ? Ok...it´s not part of the function itself it was only a way i made on sprintf to see if the results were as expected.

Lingo usage of bswap was a good tip. Although the code needed to be fixed to work with any odd and even sizes. Which explains why it is a bit slower when dealing with odd strings on input, since i had to create a workaround and use jmps there.

Next step is creating the case insensitive and make the necessary flags to return in eax and it´s ok to go :)
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 10:04:43 AM
It's dinner time here, I will return shortly.  :smiley:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 10:10:15 AM
Quote from: guga on March 11, 2025, 10:03:13 AM... since i had to create a workaround and use jmps there.
If it works, it works!... you might think of a better way to handle those cases later on.  But for now, it is working. Congrats.

As for the leading zero's, that is just to make a nicer display, not a major issue right now.

Now, I gotta go and eat.....
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 10:17:12 AM
Quote from: zedd151 on March 11, 2025, 10:04:43 AMIt's dinner time here, I will return shortly.  :smiley:

Here too. Need also go eat something :biggrin:  :biggrin:  :biggrin:  :biggrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 10:49:22 AM
Now, does anyone care to answer Ognil/Lingo's (and my) questions about who would ever want to use such a function? What's the use case for this? I'm not seeing it.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 11, 2025, 11:13:32 AM
Thanks Guga for the answer, but
imagine the following:

1. Program A generates ASCII text in a buffer in memory:
szText db "123456789AB0",0
2. Program B starts your algo with the address of szText
3. Program C takes the result of your algo in a RAX register
and passes it to the next program.....

Why is the path so long?
Why not:

1. Program A generates ASCII text in a buffer in memory:
123456789AB0h and passes the address to the next program.....skipping steps 2 and 3!!

QuoteOgnil .... Lingo ? is that you

A lot of people here are making me unsolicited advertisements. :badgrin:
When hutch, jj, Lingo, etc. started this forum
I was a little boy born in the mid-90s.
On the other hand, I also became famous through Lingo. :badgrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 11:31:13 AM
Quote from: ognil on March 11, 2025, 11:13:32 AMWhen hutch, jj, Lingo, etc. started this forum
I was a little boy born in the mid-90s.
I doubt you had anything to do with starting this forum.

Prove it.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 11, 2025, 12:20:55 PM
Doubt is a way to discover the truth. :badgrin:
I am not Lingo and I never lie.
And for proof, ask JJ, Vortex, etc. who were at the opening of the forum or look for Lingo in the old archives from that time.

Now relax and go for a walk in nature. :badgrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 12:37:17 PM
Quote from: ognil on March 11, 2025, 12:20:55 PM... look for Lingo in the old archives from that time.
We did and found that lingo had used the pseudonym "ognil" more than once in the past.  :badgrin:

I myself do not believe in such coincidences, especially the one where you said that you happened upon lingo's forum by chance - by searching for "Shattering shaders with the real Slim Shady" or something to that affect.  But the only result at the time that came up on Google, is This forum.    :eusa_naughty:    :eusa_naughty:    :eusa_naughty:    :eusa_naughty:

Here is your exact quote.
Quote from: ognil on September 24, 2024, 03:47:25 AMI searched the internet about the SHADERS and came across a very good post
from the other admin - "Shattering shaders with the real Slim Shady".
Also by saying the other admin, means that you are an admin there? Thats called a Freudian slip.

I think your pants are on fire.... Pinocchio  :badgrin:
So, what did you learn about shaders???  :biggrin:
And I have noticed... Slim Shady no longer resides on that other thing that calls itself a forum.

QuoteNow relax and go for a walk in nature. :badgrin:
And you, too.  :greensml:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 12:46:08 PM
Quote from: NoCforMe on March 11, 2025, 10:49:22 AMNow, does anyone care to answer ....questions about who would ever want to use such a function? What's the use case for this? I'm not seeing it.
They used to say the same thing about my "ascii_adder" that takes two ascii decimal strings of literally any length and adds them together - the result being returned as an ascii decimal string of the appropriate length.

That function was quite useful to me when I built my fibonacci sequence generator, which generated a fibonacci sequence up to any required (string) length.

One early version of the fibonacci generator (not the final version, though)  is   Here. (https://masm32.com/board/index.php?msg=79712)

Nowdays, it might be useful for calculating the interest on the U.S. National Debt.  :joking:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 01:16:00 PM
Quote from: NoCforMe on March 11, 2025, 10:49:22 AMNow, does anyone care to answer Ognil/Lingo's (and my) questions about who would ever want to use such a function? What's the use case for this? I'm not seeing it.

I did
Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 11, 2025, 01:18:12 PM
QuoteWe did and found that lingo had used the pseudonym "ognil" more than once in the past.  :badgrin:

Give a link for Ognil and for Lingo I found this
https://masmforum.com/board/index.php?topic=1589.45 (https://masmforum.com/board/index.php?topic=1589.45)

On the other hand, I respect old people and never argue with them about their beliefs, emotional experiences and mental changes.

Now relax and go for a walk in nature. :badgrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 01:27:51 PM
QuoteWe did and found that lingo had used the pseudonym "ognil" more than once in the past.  :badgrin:
Quote from: ognil on March 11, 2025, 01:18:12 PMGive a link for Ognil...
Your post here (https://masm32.com/board/index.php?msg=55912)

Your post was quoted by jj2007 here (https://masm32.com/board/index.php?msg=55961)

The original post that jj2007 had quoted was deleted by hutch  (https://masm32.com/board/index.php?msg=55962), you must have pissed him off, lingo.

I have more links, but I will save them.  :badgrin:



Title: Re: AsciiHextoDword (SSE2 version)
Post by: TimoVJL on March 11, 2025, 01:28:16 PM
Quote from: NoCforMe on March 11, 2025, 10:49:22 AMNow, does anyone care to answer Ognil/Lingo's (and my) questions about who would ever want to use such a function? What's the use case for this? I'm not seeing it.
Actually answers for this same question have been seen in this site earlier in similar cases ?
Like processing ascii hex coded numerical data, like from GPS route mapping ?
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 01:32:49 PM
Quote from: ognil on March 11, 2025, 11:13:32 AMThanks Guga for the answer, but
imagine the following:

1. Program A generates ASCII text in a buffer in memory:
szText db "123456789AB0",0
2. Program B starts your algo with the address of szText
3. Program C takes the result of your algo in a RAX register
and passes it to the next program.....

Why is the path so long?
Why not:

1. Program A generates ASCII text in a buffer in memory:
123456789AB0h and passes the address to the next program.....skipping steps 2 and 3!!

QuoteOgnil .... Lingo ? is that you

A lot of people here are making me unsolicited advertisements. :badgrin:
When hutch, jj, Lingo, etc. started this forum
I was a little boy born in the mid-90s.
On the other hand, I also became famous through Lingo. :badgrin:

I agree with you. Easier would be using the raw data rather than a text file containing longs hexadecimal strings, but it do exists in some apps like the ones i told. For a daily basis it´s used for short conversions that would be more than enough, but, i don´t see any bad creating a faster function that can be useful for someone - even not agreeing on the way they may use, since there are better ways to retrieve/store this sort of long chains of data.

I started because i needed a faster function to replace an very old that was in RosAsm, and ended trying to make other derivative functions for general usage purposes to be included on a dll.

Don´t blame me on the profile name you choose, you created the profile with lingo written backwards :mrgreen:  :mrgreen:  :mrgreen:

You are Lingo´s lost twin brother

:biggrin:  :biggrin:  :biggrin:

(https://i.ibb.co/ZRzBd4N4/Screenshot-2025-03-10-at-23-31-34-Twins-1988.png) (https://ibb.co/ZRzBd4N4)
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 01:34:39 PM
 :biggrin:  that was funny
Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 11, 2025, 02:15:57 PM
QuoteYou are Lingo´s lost twin brother :badgrin:

Thank you Guga,
Nice... :badgrin:  :badgrin: 
Yes,I'm Ognil Da Vito :badgrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 02:22:11 PM
Okay guga, where did we leave off before this side-show started?
Oh yeah, you posted a new attachment. I will download it once I'm am back at my computer. I'm on my iPad on the back porch right now....
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 02:31:13 PM
Okay, I'm back at my computer....
From your latest attachment guga, "BenchMarkTest3a2" ...
Quote17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7
17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: A45B7 . Return in EAX: A45B7
15 cycles -> Ascii Hex to Dw by Guga (Old version - fixed Lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 543210F2A45B7 . Return in EAX: 13 (Bytes)
Output:
D$ 0x54321
D$ 0xF2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 18F2A45B7 . Return in EAX: 9 (Bytes)
Output:
D$ 0x1
D$ 0x8F2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 76543210F2A45B7 . Return in EAX: 15 (Bytes)
Output:
D$ 0x7654321
D$ 0xF2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 76543210F2A45B7 . Return in EAX: 15 (Bytes)
Output:
D$ 0x7654321
D$ 0xF2A45B7

25 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 876543210F2A45B7 . Return in EAX: 16 (Bytes)
Output:
D$ 0x87654321
D$ 0xF2A45B7
:thumbsup:
Did you make any changes to the algorithm code here?
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 02:55:50 PM
OK, let me set a minor cat amongst the pigeons here:

I've cooked up a really simple ASCII hex--> binary conversion routine.
So simple it borders on the dumbass: just translates each ASCII char. to a binary nybble in a loop and stuffs it into an accumulator:

;====================================
; Partial ASCII table:
; This contains translation elements up to
; ASCII 'f'. Non-hex values have zeroes;
; Valid hex chars. have their corresponding values.
;====================================

HexXlatTable LABEL BYTE
; Chars. below '0':
DB 48 DUP (0)
; '0'-'9':
DB 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
; Chars. up to 'A':
DB 7 DUP (0)
; 'A' - 'F':
DB 10, 11, 12, 13, 14, 15
; Chars. up to 'a':
DB 26 DUP (0)
; 'a' - 'f':
DB 10, 11, 12, 13, 14, 15


;====================================
; AscHex2Bin()
;
; Converts a string of ASCII hex characters
; @ EAX to a numeric value in EAX
; (string must be NULL-terminated)
;
; Hex string can contain:
;  o 0-9
;  o A-F
;  o a-f
;
; No error checking is done on ASCII text.
;====================================

AscHex2Bin PROC

PUSH EBX
MOV EBX, OFFSET HexXlatTable
MOV ECX, EAX ;ECX--> ASCII chars.

XOR EDX, EDX ;EDX: accumulator.

next: XOR EAX, EAX ;Clear entire register.
MOV AL, [ECX] ;Get next ASCII char.
INC ECX
TEST EAX, EAX ;Check for end of string.
JZ done
XLATB ;Get its hex value.
SHL EDX, 4 ;Shift existing accumulator contents.
OR EDX, EAX ;Lay nybble into the accumulator.
JMP next

done: MOV EAX, EDX ;Put into return reg.
POP EBX
RET

AscHex2Bin ENDP

So I'm curious how much slower this might be than all that fancy-schmancy SSE/XMM or whatever code y'all are using here.

Anyone care to put this into a testbed and give it a spin? Zedd? Shouldn't be hard to do.

Routine takes the hex chars. pointed to by EAX, returns value in EAX.
Does a maximum of 8 hex chars. (largest value in 32-bit reg.).
Could easily be converted to 64-bit, giving a max. of 16 hex chars (1 register) or 32 chars. (2-register pair). 32-bit code could be expanded to max. 16. hex chars in a 2-register pair.

If anyone wants the testbed (console program) I can attach that here.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 03:08:36 PM
Quote from: NoCforMe on March 11, 2025, 02:55:50 PMSo I'm curious how much slower this might be than all that fancy-schmancy SSE/XMM or whatever code y'all are using here.

This should give you some indication of the possible speed differences. One of guga's algorithms was tested against one of jj's, and jj usually has pretty fast code...    FROM HERE  (https://masm32.com/board/index.php?msg=136709)
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

18532  cycles for 100 * Val()
708    cycles for 100 * AsciiHex2dwNew

18615  cycles for 100 * Val()
711    cycles for 100 * AsciiHex2dwNew

18780  cycles for 100 * Val()
743    cycles for 100 * AsciiHex2dwNew

18562  cycles for 100 * Val()
741    cycles for 100 * AsciiHex2dwNew

18535  cycles for 100 * Val()
725    cycles for 100 * AsciiHex2dwNew

Averages:
18571  cycles for Val()
726    cycles for AsciiHex2dwNew

13      bytes for Val()
202    bytes for AsciiHex2dwNew

1234ABCDh      eax Val()
1234ABCDh      eax AsciiHex2dwNew
I would write my version very similar to yours, NoCforme. What guga is doing is totally different. Its like comparing apples to oranges.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 03:12:45 PM
Quote from: zedd151 on March 11, 2025, 03:08:36 PMWhat guga is doing is totally different. Its like comparing apples to oranges.

Yeah, yeah, I know all that: he's using them fancy bit-shuffling instructions instead of the regular x86 stuff.

Just curious how much slower my "old-school" code is.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 03:14:15 PM
Quote from: NoCforMe on March 11, 2025, 03:12:45 PM
Quote from: zedd151 on March 11, 2025, 03:08:36 PMWhat guga is doing is totally different. Its like comparing apples to oranges.

Yeah, yeah, I know all that: he's using them fancy bit-shuffling instructions instead of the regular x86 stuff.

Just curious how much slower my "old-school" code is.
Maybe guga can set them both up in a testbed. He knows best how his own function works.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 03:16:44 PM
Quote from: zedd151 on March 11, 2025, 02:31:13 PMOkay, I'm back at my computer....
From your latest attachment guga, "BenchMarkTest3a2" ...
Quote17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7
17 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: A45B7 . Return in EAX: A45B7
15 cycles -> Ascii Hex to Dw by Guga (Old version - fixed Lenght),  Input: 0F2A45B7 . Return in EAX: F2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 543210F2A45B7 . Return in EAX: 13 (Bytes)
Output:
D$ 0x54321
D$ 0xF2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 18F2A45B7 . Return in EAX: 9 (Bytes)
Output:
D$ 0x1
D$ 0x8F2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 76543210F2A45B7 . Return in EAX: 15 (Bytes)
Output:
D$ 0x7654321
D$ 0xF2A45B7

52 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 76543210F2A45B7 . Return in EAX: 15 (Bytes)
Output:
D$ 0x7654321
D$ 0xF2A45B7

25 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 876543210F2A45B7 . Return in EAX: 16 (Bytes)
Output:
D$ 0x87654321
D$ 0xF2A45B7
:thumbsup:
Did you make any changes to the algorithm code here?

Hi Zedd

No, i didnt make any changes on this version, it is the same i used for JJ´s test and on the other i uploaded hat didn´t showed the extra '0' before each Dword
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 03:17:56 PM
Quote from: guga on March 11, 2025, 03:16:44 PMNo, i didnt make any changes on this version, it is the same i used for JJ´s test and on the other i uploaded hat didn´t showed the extra '0' before each Dword
Just cosmetic changes to the display, got it.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 03:18:23 PM
Quote from: NoCforMe on March 11, 2025, 02:55:50 PMOK, let me set a minor cat amongst the pigeons here:

I've cooked up a really simple ASCII hex--> binary conversion routine.
So simple it borders on the dumbass: just translates each ASCII char. to a binary nybble in a loop and stuffs it into an accumulator:

;====================================
; Partial ASCII table:
; This contains translation elements up to
; ASCII 'f'. Non-hex values have zeroes;
; Valid hex chars. have their corresponding values.
;====================================

HexXlatTable    LABEL BYTE
; Chars. below '0':
    DB    48 DUP (0)
; '0'-'9':
    DB    0, 1, 2, 3, 4, 5, 6, 7, 8, 9
; Chars. up to 'A':
    DB    7 DUP (0)
; 'A' - 'F':
    DB    10, 11, 12, 13, 14, 15
; Chars. up to 'a':
    DB    26 DUP (0)
; 'a' - 'f':
    DB    10, 11, 12, 13, 14, 15


;====================================
; AscHex2Bin()
;
; Converts a string of ASCII hex characters
; @ EAX to a numeric value in EAX
; (string must be NULL-terminated)
;
; Hex string can contain:
;  o 0-9
;  o A-F
;  o a-f
;
; No error checking is done on ASCII text.
;====================================

AscHex2Bin    PROC

    PUSH    EBX
    MOV    EBX, OFFSET HexXlatTable
    MOV    ECX, EAX        ;ECX--> ASCII chars.

    XOR    EDX, EDX        ;EDX: accumulator.

next:    XOR    EAX, EAX        ;Clear entire register.
    MOV    AL, [ECX]        ;Get next ASCII char.
    INC    ECX
    TEST    EAX, EAX        ;Check for end of string.
    JZ    done
    XLATB                ;Get its hex value.
    SHL    EDX, 4            ;Shift existing accumulator contents.
    OR    EDX, EAX        ;Lay nybble into the accumulator.
    JMP    next

done:    MOV    EAX, EDX        ;Put into return reg.
    POP    EBX
    RET

AscHex2Bin    ENDP

So I'm curious how much slower this might be than all that fancy-schmancy SSE/XMM or whatever code y'all are using here.

Anyone care to put this into a testbed and give it a spin? Zedd? Shouldn't be hard to do.

Routine takes the hex chars. pointed to by EAX, returns value in EAX.
Does a maximum of 8 hex chars. (largest value in 32-bit reg.).
Could easily be converted to 64-bit, giving a max. of 16 hex chars (1 register) or 32 chars. (2-register pair). 32-bit code could be expanded to max. 16. hex chars in a 2-register pair.

If anyone wants the testbed (console program) I can attach that here.

Yes, pls...It would be nice comparing the results.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 03:23:56 PM
Quote from: NoCforMe on March 11, 2025, 03:12:45 PM
Quote from: zedd151 on March 11, 2025, 03:08:36 PMWhat guga is doing is totally different. Its like comparing apples to oranges.

Yeah, yeah, I know all that: he's using them fancy bit-shuffling instructions instead of the regular x86 stuff.

Just curious how much slower my "old-school" code is.

Btw, this was the old code used in RosAsm

Equates
[LowSigns            31
    TextSign            30

  NoSpaceAfterThis    29
    numSign             28   ; #  01C
    IfNumSign           27   ; Substitute of # for the Conditional macros #If, ... 01B

    OpenParaMacro       26   ; { for ParaMacros  01A
  NoSpaceBeforeThis   25
    CloseParaMacro      24   ; } for ParaMacros

    CommaSign           23   ; ,

    OpenVirtual         22   ; [   016 (Macros expanded '[' -{-)
    CloseVirtual        21   ; ]   015 (Macros expanded ']' -}-) 019
    OpenBracket         20   ; [   014
    CloseBracket        19   ; ]   013
; 18, 17 >>> NewOpenBracket / NewCloseBracket
  PartEnds            16
    memMarker           15   ; $ or $  exemple: MOV B$MYVALUE 1
    colonSign           14   ; :
    openSign            13   ; (
    closeSign           12   ; )

  OperatorSigns       11
    addSign             10   ; +
    subSign              9   ; -
    mulSign              8   ; *
    divSign              7   ; /
    expSign              6   ; ^
; 5
  Separators          4
   ; Statement           0FF
    Space               3    ; space
    EOI                 2    ; |  End Of Instruction (separator)
    meEOI               1]   ; |  End Of Instruction in macro expansion
                             ; 0 is used as erase sign inside treatements


TranslateHexa:
    lodsb                                               ; clear first '0'
NackedHexa:
    mov ebx 0,  edx 0, ecx 0
L0: lodsb | cmp al LowSigns | jbe L9>
        sub al '0' | cmp al 9 | jbe L2>
            sub al 7
L2: shld edx ebx 4 | shl ebx 4 | or bl al
    cmp edx ecx | jb L8>
        mov ecx edx
            cmp al 0F | jbe L0<
L8: mov ecx D$HexTypePtr | jmp BadNumberFormat ; <--- for errors routine
L9: mov eax ebx
ret

This routine was done in the late 90's, so it was more than enough time to make a upgrade on it.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 03:25:52 PM
Easy peasy:
HexChars2Test DB "1234cDeF", 0

MOV EAX, OFFSET HexChars2Test
CALL AscHex2Bin

; result now in EAX
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 03:29:01 PM
BTW, what a weird assembler, RosAsm, that is.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 03:57:20 PM
Hi NoCforMe, here is the test on yours.

The equivalent for RosAsm is:

Naked version (Ported as it is:

[HexXlatTable:
 HexXlatTable.Chars:    B$ 0 #48, ; Chars. below '0':
 HexXlatTable.Numbers:  B$ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ; '0'-'9':
 HexXlatTable.Chars2:   B$ 0 #7, ; Chars. up to 'A':
 HexXlatTable.AtoF:     B$ 10, 11, 12, 13, 14, 15, ; 'A' - 'F':
 HexXlatTable.SmallCaps:  B$ 0 #26,; Chars. up to 'a':
 HexXlatTable.atof2:     B$ 10, 11, 12, 13, 14, 15]; ; 'a'-'f' (ASCII 97-102): 10 to 15

Proc AscHex2Bin:
    Uses ebx

    mov ebx HexXlatTable   ; EBX points to translation table
    mov ecx eax            ; ECX = pointer to ASCII string
    xor edx edx            ; EDX = accumulator, zeroed

@NextChar:
    xor eax eax
    mov al B$ecx
    inc ecx
    test eax eax | jz @Done
    xlatb
    shl edx 4
    or edx eax
    jmp @NextChar

@Done:
    mov eax edx

EndP


Modified for test comparisons (Registers preserved and using a parameter as input


Proc AscHex2BinRegPreserved:
    Arguments @pString
    Uses ebx, ecx, edx

    mov ebx HexXlatTable   ; EBX points to translation table
    mov ecx D@pString      ; ECX = pointer to ASCII string
    xor edx edx            ; EDX = accumulator, zeroed

@NextChar:
    xor eax eax
    mov al B$ecx
    inc ecx
    test eax eax | jz @Done
    xlatb
    shl edx 4
    or edx eax
    jmp @NextChar

@Done:
    mov eax edx

EndP


QuoteBTW, what a weird assembler, RosAsm, that is.
Well...The syntax is biased in Nasm and it is very old. Never had enough time to fix all necessary issues on it, neither changed the syntax to be a bit more masm friendly (Although you can emulate some of it´s syntax with the preparser token.).

Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 04:01:26 PM
Not bad, NoCforMe. Only slightly slower than guga's algorithm, for 8 char input.

24 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 87654321 . Return in EAX: 8 (Bytes)
Output:
D$ 0x87654321
D$ 0x0

30 cycles -> Ascii Hex to Dword by NoCForMe,  Input: 87654321 . Return in EAX: 87654321

31 cycles -> Ascii Hex to Dword by NoCForMe - Registers preserved,  Input: 87654321 . Return in EAX: 87654321

Guga...
Maybe you could optimize that then no need for SSE code, guga? I thought that there would be a much bigger difference between NoCforMe's old skool bytewise algo and yours using SSE...
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 04:31:41 PM
Quote from: zedd151 on March 11, 2025, 04:01:26 PMNot bad, NoCforMe. Only slightly slower than guga's algorithm, for 8 char input.

24 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 87654321 . Return in EAX: 8 (Bytes)
Output:
D$ 0x87654321
D$ 0x0

30 cycles -> Ascii Hex to Dword by NoCForMe,  Input: 87654321 . Return in EAX: 87654321

31 cycles -> Ascii Hex to Dword by NoCForMe - Registers preserved,  Input: 87654321 . Return in EAX: 87654321

Guga...
Maybe you could optimize that then no need for SSE code, guga? I thought that there would be a much bigger difference between NoCforMe's old skool bytewise algo and yours using SSE...

His version using normal registers is not slow, but the way i did (On the Dword version) was twice as fast. Even the Qword version is a bit faster then using regular x86.

But, it all depends on the needs. using SSE2 on the way i did resulted on a gain of speed of about 2x. Considering that i´m trying to fix a lot of other functions in RosAsm (or others dlls i´m using for other apps), then such gain of speeds are needed.


19 cycles -> Ascii Hex to Dw by Guga (new version. variable lenght),  Input: 87654321 . Return in EAX: 87654321
33 cycles -> Ascii Hex to Qword by Guga (Variable Lenght),  Input: 87654321 . Return in EAX: 8 (Bytes)
Output:
D$ 0x87654321
D$ 0x0
43 cycles -> Ascii Hex to Dword by NoCForMe,  Input: 87654321 . Return in EAX: 87654321
40 cycles -> Ascii Hex to Dword by NoCForMe - Registers preserved,  Input: 87654321 . Return in EAX: 87654321

Press enter to exit...


I don´t know if i use regular x86 can be faster than the way i did right now. At least, i can´t think on a even faster way right now.

Take a look at this attachment. I didn´t modified anything in both codes right now. Just removed the others functions that were also being tested to you compare better.

But...this is a simple benchmark tool. Better would be using JJ´s tool for that, or my Codetune app i uploaded here sometime ago. In all cases, i think it´s only a matter of what is needed.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 04:32:08 PM
Wellll, it probably could use some error checking.
It occurred to me that if you replaced the bytes in the translation table that didn't correspond to valid hex characters with -1, you could add this code to catch non-hex chars.:
next: XOR EAX, EAX ;Clear entire register.
MOV AL, [ECX] ;Get next ASCII char.
INC ECX
TEST EAX, EAX ;Check for end of string.
JZ done
XLATB
CMP AL, 0FFh ;Is it a valid hex char?
JE error ;  Nope.

error: ; Do what needs to be done to return an error.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 04:35:30 PM
Don't worry; I won't feel bad if you don't use my code in your assembler library
(sob, sniffle ...)
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 11, 2025, 04:39:33 PM
I'll be back tomorrow...
Time for sleeep...
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 04:45:44 PM
Quote from: NoCforMe on March 11, 2025, 04:32:08 PMWellll, it probably could use some error checking.
It occurred to me that if you replaced the bytes in the translation table that didn't correspond to valid hex characters with -1, you could add this code to catch non-hex chars.:
next:    XOR    EAX, EAX        ;Clear entire register.
    MOV    AL, [ECX]        ;Get next ASCII char.
    INC    ECX
    TEST    EAX, EAX        ;Check for end of string.
    JZ    done
    XLATB
    CMP    AL, 0FFh        ;Is it a valid hex char?
    JE    error            ;  Nope.

error:    ; Do what needs to be done to return an error.


Yeah, using a error check is a good way to avoid unneeded results. Also, you have room to make it work for Qword if needed.

But, the better would be test it using JJ´s benchmark tool since it is more accurate then this simple version i made. Or Steve´s also made some Benchmark app a long time ago (Don´t know if i still have it), Siekmanski also has some good tools for testing for speed.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: guga on March 11, 2025, 04:49:42 PM
Quote from: NoCforMe on March 11, 2025, 04:35:30 PMDon't worry; I won't feel bad if you don't use my code in your assembler library
(sob, sniffle ...)

Didn´t say that :bgrin:  :bgrin:  :bgrin: If you code some functions that are needed, not only me but others could also use if you allow it to. It´s not a competition, you know... :cool:  :cool:  :cool: i´m just trying to fix a lot of things in RosAsm right now and looking for alternatives for some very old code that needs to be upgraded and fixed.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 11, 2025, 05:01:43 PM
Oh, I understand. whatever is fine with me. And if you ever want to use any of my code that I post here, please feel free to do so; I make no claims upon it. (I like to use "copyleft".)
Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 12, 2025, 02:38:47 AM
Hi Guga,

My five cents... :smiley:

.data

    ; Constants for comparisons
    HexLowerBound db 16 dup(30h)  ; '0'
    HexUpperBound db 16 dup(39h)  ; '9'
    HexALowerBound db 16 dup(41h) ; 'A'
    HexAUpperBound db 16 dup(46h) ; 'F'
    HexaLowerBound db 16 dup(61h) ; 'a'
    HexaUpperBound db 16 dup(66h) ; 'f'

.code
; Input: RCX = pointer to the 16-byte string
; Output: RAX = 1 if all characters are valid hex, 0 otherwise

CheckHexString PROC
    ; Load the 16-byte string into xmm0
    movdqu xmm0, [rcx]

    ; Create constants for comparisons
    movdqa xmm1, xmmword ptr [HexLowerBound] ; '0' (30h)
    movdqa xmm2, xmmword ptr [HexUpperBound] ; '9' (39h)
    movdqa xmm3, xmmword ptr [HexALowerBound]; 'A' (41h)
    movdqa xmm4, xmmword ptr [HexAUpperBound]; 'F' (46h)
    movdqa xmm5, xmmword ptr [HexaLowerBound]; 'a' (61h)
    movdqa xmm6, xmmword ptr [HexaUpperBound]; 'f' (66h)

    ; Compare each byte against the ranges
    pcmpgtb xmm7, xmm0, xmm1  ; xmm7 = (xmm0 > '0')
    pcmpgtb xmm8, xmm2, xmm0  ; xmm8 = ('9' > xmm0)
    pand xmm7, xmm8           ; xmm7 = ('0' <= xmm0 <= '9')

    pcmpgtb xmm9, xmm0, xmm3  ; xmm9 = (xmm0 > 'A')
    pcmpgtb xmm10, xmm4, xmm0 ; xmm10 = ('F' > xmm0)
    pand xmm9, xmm10          ; xmm9 = ('A' <= xmm0 <= 'F')

    pcmpgtb xmm11, xmm0, xmm5 ; xmm11 = (xmm0 > 'a')
    pcmpgtb xmm12, xmm6, xmm0 ; xmm12 = ('f' > xmm0)
    pand xmm11, xmm12         ; xmm11 = ('a' <= xmm0 <= 'f')

    ; Combine all valid ranges
    por xmm7, xmm9            ; xmm7 = (('0' <= xmm0 <= '9') || ('A' <= xmm0 <= 'F'))
    por xmm7, xmm11           ; xmm7 = (('0' <= xmm0 <= '9') || ('A' <= xmm0 <= 'F') || ('a' <= xmm0 <= 'f'))

    ; Extract the mask of valid bytes
    pmovmskb eax, xmm7        ; EAX = bitmask of valid bytes (1 bit per byte)

    ; Check if all 16 bytes are valid (mask should be 0xFFFF)
    cmp eax, 0FFFFh
    sete al                   ; AL = 1 if all bytes are valid, 0 otherwise

    ; Zero-extend AL to RAX
    movzx rax, al
    ret

CheckHexString ENDP
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 12, 2025, 05:06:17 AM
Late addition:
If you use my method (AscHex2Bin()) with the error checking I added, you should add this to the hex translation table to cover the remainder of the ASCII character set:
      DB 153 DUP(-1)
That'll return an error for any non-hex character up to the limit (255).
Makes the translation table 256 bytes long, not too bad for size overall.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: stoo23 on March 15, 2025, 11:10:55 AM
QuoteOn the other hand, I respect old people and never argue with them
Hmmmm NOT Lingo eh ???

That is a Direct 'Lingo' Quote !!! used more than once in the past ....
Let alone NOT being true,.. you often DO

So do you suffer from Multiple Personality disorder ??
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 15, 2025, 11:24:08 AM
I just call him Pinocchio Pants on Fire.  :biggrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 19, 2025, 07:44:01 AM
I'm glad the impolite old lads are having fun with a new detective game. :badgrin:
It's just not clear who's playing Sherlock Holmes and who's playing Dr. Watson? :badgrin:

Q. Why are old people often mean and rude
A:  1. They may be physically in pain. Adding onto this, chronic pain takes up a LOT of mental bandwidth and energy. Being kind, empathetic, patient, and understanding are intentional acts that take time and effort, which is something we often take for granted.
Pain robs a lot of energy and patience from people, and people will default to being 'snappy', impatient, and presumptuous because it's quicker and more automatic
Chronic pain makes people unhappy and they don't even realize it after awhile.
2. They don't hear well and/or vision failing.
3. They may be socially isolated and depressed. Sometimes depression expresses itself as irritability.
4. Sometimes older people lose their filters. They feel more comfortable doing and saying whatever,    because they really don't care what other people think anymore.
5. They may be unhappy and bitter over how their lives have turned out.
6.Fatigue and a complete refusal to fight to appear positive and hopeful in their final years.
7. They're in pain from various ailments, and this reduces their energy levels and makes them tired and irritable.
8.The world they grew up in and which felt familiar to them has faded, and they dislike or feel no place in a different popular culture.
9.They feel cheated by life, that they worked and sacrificed but did not receive the rewards or comforts they expected.
10.They feel disrespected or unwanted by younger people, and their advice and opinions have been ignored.
11.They were bitter and rude jerks when they were young, and now just have more leisure time to express it.
12.Testosterone deficiency due to decreasing testosterone levels associated with low testicular production, genetic factors, adiposity, and illness. Low testosterone levels in men are associated with sexual dysfunction (low sexual desire, erectile dysfunction), reduced skeletal muscle mass and strength, decreased bone mineral density, increased cardiovascular risk and alterations of the glycometabolic profile.
:undecided:  :sad:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 19, 2025, 07:46:42 AM
STOP POSTING THAT "AI" BULLSHIT!
Title: Re: AsciiHextoDword (SSE2 version)
Post by: ognil on March 19, 2025, 08:47:36 AM
QuoteSTOP POSTING THIS "AI" BULLSHIT!.

I don't understand why you criticize without being able to give a solution, who can answer my questions.
Maybe someone from the forum? No thanks... :badgrin:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 19, 2025, 09:58:32 AM
Quote from: ognil on March 19, 2025, 08:47:36 AM
QuoteSTOP POSTING THIS "AI" BULLSHIT!.

I don't understand why you criticize without being able to give a solution
He did give a solution. Stop posting this/that AI Bullshit. (He changed it after you quoted him)
Stop posting ANY and ALL bullshit. An even better solution.

See how that works?  :badgrin:
No one forces you to read anything here, or post here.

You could go back to your empty forum, and talk to yourself.  :biggrin:
Only a half dozen or so members (in name only, not really participating members) there.
No real guests interested in anything there. Only curiosity seekers from here.
No Google search hits leading to there.
No search bots, spiders, web crawlers. No indexing on any web search platform.
No one to interfere with your 'important' work there. (Meant to be very tongue-in-cheek).  :badgrin:
Want 'peace and quiet'?? ... go back there and don't come back. We would like some 'peace and quiet' here too. More easily accomplished without your presence.  :biggrin:

Disclaimer: These are my opinions. These opinions may not reflect the opinions of other members, administration, forum staff, or the general public. If you would like to appeal my opinions, in the immortal words of hutch--, TUFF SHYTE!

Btw, have a nice day!   :smiley:
Title: Re: AsciiHextoDword (SSE2 version)
Post by: NoCforMe on March 19, 2025, 12:13:48 PM
Quote from: zedd151 on March 19, 2025, 09:58:32 AMDisclaimer: These are my opinions. These opinions may not reflect the opinions of other members, administration, forum staff, or the general public.

Zedd, I think you can be pretty confident that your opinions here match the bulk of other members of this forum.
Title: Re: AsciiHextoDword (SSE2 version)
Post by: zedd151 on March 19, 2025, 12:29:08 PM
Of course. But just in case, I added the disclaimer.  :biggrin: