News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

AsciiHextoDword (SSE2 version)

Started by guga, March 08, 2025, 11:29:50 AM

Previous topic - Next topic

guga

Hi Guys

I was thinking on a faster way to convert a hex string to dword using SSE2 only. I saw some good starting points here: Masm forum reference and Stackoverflow reference

I came up with a code that works for 8 hexadecimal string (Now working only for 8 bytes in lenght and in caps) to be tested.

Here is the RosAsm version:

RosAsm Macros used

; using values from 0 to 3
[SHUFFLE | (255 - ((#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4))]

; using values from 3 to 0
[SHUFFLE_INV | ( (#1 shl 6) or (#2 shl 4) or (#3 shl 2) or #4 )] ; Marinus/Sieekmanski

[pshufd | pshufd #1 #2 #3]
[shufps | shufps #1 #2 #3]
[shufpd | shufpd #1 #2 #3]
[pshuflw | pshuflw #1 #2 #3]
[pshufhw | pshufhw #1 #2 #3]

Main function
;;
 AsciiHex2dw

 Converts an 8-character ASCII hexadecimal string into a 32-bit DWORD value.

 Syntax:
   HexToDword (pString: pointer)

 Parameters:
   pString [in] - Pointer to an 8-character ASCII string representing a hexadecimal value
                  (e.g., "0F2A45B7"). The string must contain only digits 0-9 and uppercase
                  letters A-F, with no explicit null terminator.

 Return Value:
   Returns in EAX the 32-bit DWORD value corresponding to the converted hexadecimal string.
   For example, for "0F2A45B7", returns EAX = 0x0F2A45B7.

 Remarks:
   This function uses SSE2 instructions to efficiently process the string, converting ASCII
   characters to binary values, adjusting A-F letters, separating high and low nibbles,
   and packing the result into a DWORD. The ESI register is preserved per calling convention.
   The function assumes a valid input and does not perform additional validation. The
   SHUFFLE macro defines the pshuflw immediate as 27 (binary 00011011), reversing the order
   of the lower 4 words to align the nibbles correctly. Word values in XMM registers are
   displayed in memory order (left-to-right), as shown in the RosAsm debugger.

   Masks used:
   - Mask1: Subtracts the ASCII value of '0' (0x30) to convert characters to numeric values.
   - Mask2: Compares with '9' (0x39) to identify A-F letters.
   - Mask3: Subtracts 7 to adjust A-F letters to their correct hexadecimal range.
   - Mask4a: Isolates low nibbles (0x0F000F00 per dword). We are intercalating the mask 0F 00 0F 00
   - Mask5a: Isolates high nibbles (0x0F000F0 per dword), not used in this version, but it was the opposed intercalation above  00 F0 00 F0.

 Example:
   For pString pointing to "0F2A45B7":
   - Input: "0F2A45B7"
   - Output: EAX = 0x0F2A45B7 (decimal: 15,929,847)

References: https://masm32.com/board/index.php?topic=984.msg8975#msg8975
            https://stackoverflow.com/questions/67054154/is-there-an-algorithm-to-convert-massive-hex-string-to-bytes-stream-quickly-asm


;;

[<16 Mask1: Q$ 030303030_30303030, 030303030_30303030]  ; '0'
[<16 Mask2: Q$ 09090909_09090909, 09090909_09090909]  ; '9'
[<16 Mask3: Q$ 07070707_07070707, 07070707_07070707]  ; '7'
[<16 Mask4a: Q$ 0F_00_0F_00_0F_00_0F_00, 0F_00_0F_00_0F_00_0F_00]  ; 0x0F_00_0F_00
[<16 Mask5a: Q$  0_F0_00_F0_00_F0_00_F0, 0_F0_00_F0_00_F0_00_F0]   ; 0x00_F0_00_F0

Proc AsciiHex2dw:
    Arguments @pString
    Uses esi

    mov esi, D@pString         ; ESI = pointer to the input string
    movq xmm0, Q$esi           ; Loads 8 bytes of the string into XMM0
                               ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII
                               ; Words: 0000 0000 0000 0000 3742 3534 4132 4630

;    xorps xmm1 xmm1 | pcmpeqb xmm0 xmm1 | pmovmskb eax xmm0 | add esi 16 | sub esi D@pString  | add esi 0-16 | bsf ax ax

    ; Subtract '0'
    psubb xmm0, X$Mask1         ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                ; Mask1: 3030 3030 3030 3030 3030 3030 3030 3030
        ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
        ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    pcmpgtb xmm1, X$Mask2       ; Compares each byte with '9' to identify A-F
                                ; Mask2: 0909 0909 0909 0909 0909 0909 0909 0909
        ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 00FF 0000 FF00 FF00 (FF where > 9)

    pand xmm1, X$Mask3          ; Applies a 7 correction to bytes > 9 (A-F)
                                ; Mask3: 0707 0707 0707 0707 0707 0707 0707 0707
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
        ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700 (where 7 is the settled on the bytes greater that was greater then 7)

    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
        ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00

    pand xmm1, X$Mask4a         ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                ; Mask4a: 0F00 0F00 0F00 0F00 0F00 0F00 0F00 0F00
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
        ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pxor xmm0 xmm1              ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 000B 0004 0002 0000
        ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psllw xmm0 4                ; Shifts high nibbles 4 bits left
                                ; XMM0: D000 D000 D000 D000 00B0 0040 0020 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pslld xmm0 8                ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 000D 0000 000D 0000 B000 4000 2000 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00
   
    por xmm0 xmm1               ; Combines high and low nibbles
                                ; XMM0: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pshuflw xmm0, xmm0, {SHUFFLE 3, 2, 1, 0} ; Reorders the lower 4 words: [0, 1, 2, 3] => {SHUFFLE 3, 2, 1, 0} = 27 (In decimal)
                                ; XMM0 Before: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM0 After:  000D 0000 000D 0000 0F00 2A00 4500 B700
        ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psrld xmm0 8                ; Shifts 8 bits right to align values
                                ; XMM0 Before: 000D 0000 000D 0000 0F00 2A00 4500 B700
        ; XMM0 After:  0000 000D 0000 000D 000F 002A 0045 00B7


    packuswb xmm0 xmm0          ; Packs bytes, taking the low byte of each word
                                ; XMM0 Before:  0000 000D 0000 000D 000F 002A 0045 00B7
                                ; XMM0 After:   00FF 00FF 0F2A 45B7 00FF 00FF 0F2A 45B7


    movd eax xmm0               ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0F2A45B7 (Correct result)


EndP

And here is the masm translation of it (I hope the porting to masm is ok)
AsciiHex2dw PROC USES esi pString:PTR BYTE
    ; ESI = pointer to the input string (pString is automatically available via stack)
    mov esi, pString            ; Loads the pointer from the parameter
    movq xmm0, qword ptr [esi]  ; Loads 8 bytes of the string into XMM0
                                ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII)
                                ; Words: 0000 0000 0000 0000 3732 3534 4132 4630

    ; Subtract '0'
    psubb xmm0, xmmword ptr [Mask1]  ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                     ; XMM0: 0000 0000 0000 0000 0600 0102 0504 0702

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
                                ; XMM1: 0000 0000 0000 0000 0600 0102 0504 0702
    pcmpgtb xmm1, xmmword ptr [Mask2]  ; Compares each byte with '9' to identify A-F
                                       ; XMM1: 0000 0000 0000 0000 0000 FFFF 0000 FFFF
    pand xmm1, xmmword ptr [Mask3]     ; Applies a 7 correction to bytes > 9 (A-F)
                                       ; XMM1: 0000 0000 0000 0000 0000 0707 0000 0707
    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: 0000 0000 0000 0000 0600 0F00 0504 0F00

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM1: 0000 0000 0000 0000 0600 0F00 0504 0F00
    pand xmm1, xmmword ptr [Mask4a]  ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                     ; XMM1: 0000 0000 0000 0000 0000 0000 0004 0000
    pxor xmm0, xmm1             ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: 0000 0000 0000 0000 0600 0F00 0500 0F00
    psllw xmm0, 4               ; Shifts high nibbles 4 bits left
                                ; XMM0: 0000 0000 0000 0000 6000 F000 5000 F000
    pslld xmm0, 8               ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 0000 0000 0000 0000 0060 00F0 0050 00F0
    por xmm0, xmm1              ; Combines high and low nibbles
                                ; XMM0: 0000 0000 0000 0000 0060 00F0 0054 00F0

    pshuflw xmm0, xmm0, 27      ; Reorders the lower 4 words: [3, 2, 1, 0] (SHUFFLE 3, 2, 1, 0 = 27)
                                ; Before: 0000 0000 0000 0000 0060 00F0 0054 00F0
                                ; After:  0000 0000 0000 0000 00F0 0054 00F0 0060
    psrld xmm0, 8               ; Shifts 8 bits right to align values
                                ; XMM0: 0000 0000 0000 0000 000F 002A 0045 00B7

    packuswb xmm0, xmm0         ; Packs bytes, taking the low byte of each word
                                ; XMM0: 0000 0000 0000 0000 0000 0000 0F2A 45B7

    movd eax, xmm0              ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0x0F2A45B7

    ret                         ; Return (stack cleanup handled by stdcall)

AsciiHex2dw ENDP

.data
    ; Input string
    SzInputHex db "0F2A45B7", 0

    ; Masks (16-byte aligned for XMM operations)
    ALIGN 16
Mask1           xmmword 30303030303030303030303030303030h
Mask2           xmmword 9090909090909090909090909090909h
Mask3           xmmword 7070707070707070707070707070707h
Mask4a          xmmword 0F000F000F000F000F000F000F000F00h
.end

The same masks, using Qword (MAsk5a is unused on this testing):
Mask1           dq 3030303030303030h
                 dq 3030303030303030h
Mask2           dq 909090909090909h
                 dq 909090909090909h
Mask3           dq 707070707070707h
                 dq 707070707070707h
Mask4a          dq 0F000F000F000F00h
                 dq 0F000F000F000F00h
Mask5a          dq 0F000F000F000F0h
                 dq 0F000F000F000F0h




Can someone test the masm version for speed ? I want to see if it´s worth continue with this function. I´ll later make a way to check the lenght of the input indexing a value to be shifted right before the function returns. And if succeeded, i´ll create a variation of this to work with a string with 16 bytes (or even longer).
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Note: Updated the Mask values on the masm version. (I did ported them incorrectly earlier)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

#2
Hi guga...

QuoteAnd here is the masm translation of it (I hope the porting to masm is ok)

Cannot assemble... using ml 6.14.8444

guga_A_Dw.asm(13) : fatal error A1016: Internal Assembler Error

Here I try a later ml version:
Microsoft (R) Macro Assembler Version 14.00.24210.0
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: guga_A_Dw.asm

***********
ASCII build
***********

guga_A_Dw.asm(15) : error A2138:invalid data initializer
guga_A_Dw.asm(16) : error A2138:invalid data initializer
guga_A_Dw.asm(17) : error A2138:invalid data initializer
guga_A_Dw.asm(18) : error A2138:invalid data initializer
Press any key to continue . . .
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

I´ll try to disassemble my rosasm version and see if it is different then the version i tried to translate by hand. Hold on.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

NoCforMe

Quote from: zedd151 on March 08, 2025, 01:02:16 PMguga_A_Dw.asm(15) : error A2138:invalid data initializer
guga_A_Dw.asm(16) : error A2138:invalid data initializer
guga_A_Dw.asm(17) : error A2138:invalid data initializer
guga_A_Dw.asm(18) : error A2138:invalid data initializer

Dumb question: is xmmword defined somewhere?
Assembly language programming should be fun. That's why I do it.

zedd151

Quote from: NoCforMe on March 08, 2025, 03:49:45 PM
Quote from: zedd151 on March 08, 2025, 01:02:16 PMguga_A_Dw.asm(15) : error A2138:invalid data initializer
guga_A_Dw.asm(16) : error A2138:invalid data initializer
guga_A_Dw.asm(17) : error A2138:invalid data initializer
guga_A_Dw.asm(18) : error A2138:invalid data initializer

Dumb question: is xmmword defined somewhere?

You are right, I looked it up.

It should to be 'oword'...  guga.   :biggrin:

include \masm32\include\masm32rt.inc
.586p
.mmx
.xmm

AsciiHex2dw PROTO :PTR BYTE

.data

    ; Input string
    SzInputHex db "0F2A45B7", 0

    ; Masks (16-byte aligned for XMM operations)
    ALIGN 16
Mask1          oword 30303030303030303030303030303030h
Mask2          oword 9090909090909090909090909090909h
Mask3          oword 7070707070707070707070707070707h
Mask4a          oword 0F000F000F000F000F000F000F000F00h


.code

start proc

    invoke AsciiHex2dw, addr SzInputHex

    invoke MessageBox, 0, hex$(eax), 0, 0
    invoke ExitProcess, 0
start endp


AsciiHex2dw PROC USES esi pString:PTR BYTE
    ; ESI = pointer to the input string (pString is automatically available via stack)
    mov esi, pString            ; Loads the pointer from the parameter
    movq xmm0, qword ptr [esi]  ; Loads 8 bytes of the string into XMM0
                                ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII)
                                ; Words: 0000 0000 0000 0000 3732 3534 4132 4630

    ; Subtract '0'
    psubb xmm0, xmmword ptr [Mask1]  ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                    ; XMM0: 0000 0000 0000 0000 0600 0102 0504 0702

    ; Adjust A-F
    movdqa xmm1, xmm0          ; Copies XMM0 to XMM1 for adjustment
                                ; XMM1: 0000 0000 0000 0000 0600 0102 0504 0702
    pcmpgtb xmm1, xmmword ptr [Mask2]  ; Compares each byte with '9' to identify A-F
                                      ; XMM1: 0000 0000 0000 0000 0000 FFFF 0000 FFFF
    pand xmm1, xmmword ptr [Mask3]    ; Applies a 7 correction to bytes > 9 (A-F)
                                      ; XMM1: 0000 0000 0000 0000 0000 0707 0000 0707
    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: 0000 0000 0000 0000 0600 0F00 0504 0F00

    ; Separate and combine nibbles
    movdqa xmm1, xmm0          ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM1: 0000 0000 0000 0000 0600 0F00 0504 0F00
    pand xmm1, xmmword ptr [Mask4a]  ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                    ; XMM1: 0000 0000 0000 0000 0000 0000 0004 0000
    pxor xmm0, xmm1            ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: 0000 0000 0000 0000 0600 0F00 0500 0F00
    psllw xmm0, 4              ; Shifts high nibbles 4 bits left
                                ; XMM0: 0000 0000 0000 0000 6000 F000 5000 F000
    pslld xmm0, 8              ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 0000 0000 0000 0000 0060 00F0 0050 00F0
    por xmm0, xmm1              ; Combines high and low nibbles
                                ; XMM0: 0000 0000 0000 0000 0060 00F0 0054 00F0

    pshuflw xmm0, xmm0, 27      ; Reorders the lower 4 words: [3, 2, 1, 0] (SHUFFLE 3, 2, 1, 0 = 27)
                                ; Before: 0000 0000 0000 0000 0060 00F0 0054 00F0
                                ; After:  0000 0000 0000 0000 00F0 0054 00F0 0060
    psrld xmm0, 8              ; Shifts 8 bits right to align values
                                ; XMM0: 0000 0000 0000 0000 000F 002A 0045 00B7

    packuswb xmm0, xmm0        ; Packs bytes, taking the low byte of each word
                                ; XMM0: 0000 0000 0000 0000 0000 0000 0F2A 45B7

    movd eax, xmm0              ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0x0F2A45B7

    ret                        ; Return (stack cleanup handled by stdcall)

AsciiHex2dw ENDP

end start

Now it assembles fine with ml.exe v 14.xxxxx, guga. Appears to work as well.

But ml.exe 6.14.8444  chokes with an "internal error" still.
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

oword ? Ahn, ok....Tks. I thought xmmword  also was valid for masm
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

Quote from: guga on March 08, 2025, 04:37:17 PMoword ? Ahn, ok....Tks. I thought xmmword  also was valid for masm
Probably defined somewhere in rosasm, but apparent not in masm32 SDK.
Now that it assembles fine, we need a timing testbed, plus other similar functions to test it against.  :biggrin:

Odd though, that ml accepts "xmmword ptr xxxxxx" in the code as valid, but not "xmmword" in the .data section.  Nice job, Microsoft.
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

btw, this is a new version that can be used to variable size (i mean, it can work for a string with 1 byte lenght to 8 byte lenght)

AsciiHex2dwNew  proc near               ; CODE XREF: start+32↑p
                                        ; .text:0042BC94↑j

pString         = dword ptr  8

                push    ebp
                mov     ebp, esp
                push    ecx
                mov     eax, [ebp+pString]
                movq    xmm0, qword ptr [eax]
                xorps   xmm1, xmm1
                pcmpeqb xmm0, xmm1
                pmovmskb ecx, xmm0
                bsf     cx, cx
                mov     ch, 32
                shl     cl, 2
                sub     ch, cl
                mov     cl, ch
                movq    xmm0, qword ptr [eax]
                psubb   xmm0, Mask1
                movdqa  xmm1, xmm0
                pcmpgtb xmm1, Mask2
                pand    xmm1, Mask3
                psubb   xmm0, xmm1
                movdqa  xmm1, xmm0
                pand    xmm1, Mask4a
                pxor    xmm0, xmm1
                psllw   xmm0, 4
                pslld   xmm0, 8
                por     xmm0, xmm1
                pshuflw xmm0, xmm0, 27
                psrld   xmm0, 8
                packuswb xmm0, xmm0
                movd    eax, xmm0
                shr     eax, cl
                pop     ecx
                mov     esp, ebp
                pop     ebp
                retn    4
AsciiHex2dwNew  endp




And the RosAsm version:


Proc AsciiHex2dwNew3:
    Arguments @pString
    Uses ecx

    mov eax, D@pString         ; ESI = pointer to the input string
    movq xmm0, Q$eax           ; Loads 8 bytes of the string into XMM0
                               ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII
                               ; Words: 0000 0000 0000 0000 3742 3534 4132 4630

    ; get the size of the string to calculate a index to be shifted at the end
    xorps xmm1 xmm1 | pcmpeqb xmm0 xmm1 | pmovmskb ecx xmm0 | bsf cx cx
    mov ch 32 | shl cl 2 | sub ch cl | mov cl ch

;;
    Examples:
    0F2A45B7 = shr eax 0,  ax = 8 => 32-32 = 32-(8*4) = 0*4
     F2A45B7 = shr eax 4,  ax = 7 => 32-28 = 32-(7*4) = 1*4
      2A45B7 = shr eax 8,  ax = 6 => 32-24 = 32-(6*4) = 2*4
       A45B7 = shr eax 12, ax = 5 => 32-20 = 32-(5*4) = 3*4
        45B7 = shr eax 16, ax = 4 => 32-16 = 32-(4*4) = 4*4
         5B7 = shr eax 20, ax = 3 => 32-12 = 32-(3*4) = 5*4
          B7 = shr eax 24, ax = 2 => 32-8  = 32-(2*4) = 6*4
           7 = shr eax 28, ax = 1 => 32-4  = 32-(1*4) = 7*4
;;

    movq xmm0, Q$eax
    ; Subtract '0'
    psubb xmm0, X$Mask1         ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                ; Mask1: 3030 3030 3030 3030 3030 3030 3030 3030
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    pcmpgtb xmm1, X$Mask2       ; Compares each byte with '9' to identify A-F
                                ; Mask2: 0909 0909 0909 0909 0909 0909 0909 0909
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 00FF 0000 FF00 FF00 (FF where > 9)

    pand xmm1, X$Mask3          ; Applies a 7 correction to bytes > 9 (A-F)
                                ; Mask3: 0707 0707 0707 0707 0707 0707 0707 0707
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700 (where 7 is the settled on the bytes greater that was greater then 7)

    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00

    pand xmm1, X$Mask4a         ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                ; Mask4a: 0F00 0F00 0F00 0F00 0F00 0F00 0F00 0F00
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pxor xmm0 xmm1              ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 000B 0004 0002 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psllw xmm0 4                ; Shifts high nibbles 4 bits left
                                ; XMM0: D000 D000 D000 D000 00B0 0040 0020 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pslld xmm0 8                ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 000D 0000 000D 0000 B000 4000 2000 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    por xmm0 xmm1               ; Combines high and low nibbles
                                ; XMM0: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pshuflw xmm0, xmm0, {SHUFFLE 3, 2, 1, 0} ; Reorders the lower 4 words: [0, 1, 2, 3] => {SHUFFLE 3, 2, 1, 0} = 27 (In decimal)
                                ; XMM0 Before: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM0 After:  000D 0000 000D 0000 0F00 2A00 4500 B700
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psrld xmm0 8                ; Shifts 8 bits right to align values
                                ; XMM0 Before: 000D 0000 000D 0000 0F00 2A00 4500 B700
                                ; XMM0 After:  0000 000D 0000 000D 000F 002A 0045 00B7


    packuswb xmm0 xmm0          ; Packs bytes, taking the low byte of each word
                                ; XMM0 Before:  0000 000D 0000 000D 000F 002A 0045 00B7
                                ; XMM0 After:   00FF 00FF 0F2A 45B7 00FF 00FF 0F2A 45B7


    movd eax xmm0               ; Moves the lower 4 bytes of XMM0 to EAX
                                ; EAX = 0F2A45B7 (Correct result)

    shr eax cl

EndP





Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Quote from: zedd151 on March 08, 2025, 04:39:23 PM
Quote from: guga on March 08, 2025, 04:37:17 PMoword ? Ahn, ok....Tks. I thought xmmword  also was valid for masm
Probably defined somewhere in rosasm, but apparent not in masm32 SDK.
Now that it assembles fine, we need a timing testbed, plus other similar functions to test it against.   :biggrin:

No.. this is not a token from RosAsm. It is from IdaPro. RosAsm don´t have any of those things by default. I made it simple. Only D$ for Dword, Q$ for Qword, X$ for SSe registers, W$ for Word, T$ for TenByte and B$ for byte. You can, however uses things like dword ptr with enabling the preparser routine, but personally, i prefer the simple way.

I´m currently working on RosAsm trying to fix some very very old issues.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

Quote from: guga on March 08, 2025, 04:53:33 PMNo.. this is not a token from RosAsm. It is from IdaPro. RosAsm don´t have any of those things by default. I made it simple. Only D$ for Dword, Q$ for Qword, X$ for SSe registers, W$ for Word, T$ for TenByte and B$ for byte. You can, however uses things like dword ptr with enabling the preparser routine, but personally, i prefer the simple way.

I´m currently working on RosAsm trying to fix some very very old issues.
Ah, okay.
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

Btw..benchmarking would be nice. I tried to compíle with qeditor but got an error. But i did a small test in one tiny benchmark app i made to test some Lingo´s functions.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Congrats, it's fast :thumbsup:

Quote from: guga on March 08, 2025, 11:29:50 AMCan someone test the masm version for speed ?
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

18532  cycles for 100 * Val()
708    cycles for 100 * AsciiHex2dwNew

18615  cycles for 100 * Val()
711    cycles for 100 * AsciiHex2dwNew

18780  cycles for 100 * Val()
743    cycles for 100 * AsciiHex2dwNew

18562  cycles for 100 * Val()
741    cycles for 100 * AsciiHex2dwNew

18535  cycles for 100 * Val()
725    cycles for 100 * AsciiHex2dwNew

Averages:
18571  cycles for Val()
726    cycles for AsciiHex2dwNew

13      bytes for Val()
202    bytes for AsciiHex2dwNew

1234ABCDh      eax Val()
1234ABCDh      eax AsciiHex2dwNew

guga

Tks a lot JJ.

I`m now trying to make a extended version to convert 16 chars. The actual code, do in fact convert it already. The problem now is only to find the proper index to be shifted and how to put the extra bytes after the 1st dwords.

(...)
mov ch 32 | shl cl 2 | sub ch cl | mov cl ch <--- need only o change here, to get some index settled in ch and cl differently. Ex: If ch = 0 (or smaller then 8 it means we are dealing with a dword. Otherwise it is a qword to be converted.). Trying to figure it out now how to get the proper index to settled at:
(...)
    shr eax cl < --- Will need some test to check for ch as well in order to put the values from another xmm register on the proper buffer on output.

Maybe it could be usefull to also return in eax the amount of bytes converted....thinking... :dazzled:

btw.i´m deeply tired..it´s 06:00AM right now :mrgreen:  :mrgreen:  :mrgreen:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Quote from: guga on March 08, 2025, 07:33:12 PMMaybe it could be usefull to also return in eax the amount of bytes converted....thinking

IMHO eax should return the value. Val(), for example, returns the value in eax, the number of bytes used in dl and the type in dh (where dh 0=decimal, dh 1=binary, dh 2=hex)