News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

AsciiHextoDword (SSE2 version)

Started by guga, March 08, 2025, 11:29:50 AM

Previous topic - Next topic

zedd151

That's alright guga. Just keep this topic open as a "Work In Progress" topic.

Once you have a version that is 100% polished and ready, you can give it its own topic to showcase it, away from all of the distractions posted here (by myself included.  :tongue: )

:thumbsup:  We have faith in you.  :smiley:
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

Tks Zedd

But, what should be the proper result ? I´m a bit confused right now.

For example, say the input is this string:
[SzInputHex:  B$ "6543210F2A45B7", 0 ]

How the output should be displayed/stored ? I mean, in both dwords (This new version now saves 2 dwords). Should it be displayed as

[Output: D$ 654321
0F2A45B7]
or
[Output: D$ 6543210F
02A45B7]
or
[Output: D$ 6543210F
2A45B700]

????


Returning this:
[Output: D$ 6543210F
02A45B7]

Is the same as if i used the string twice on the Dword version of the function AsciiHex2dwNew
    mov edi SzInputHex
    call AsciiHex2dwNew edi
    mov D$Output eax
    add edi 8
    call AsciiHex2dwNew edi
    mov D$Output+4 eax

Which is the expected result considering we are reading the string from left to right. So, we 1st process the 1st 8 chars (6543210F) and then continue the remainder 6 (2A45B7).

But on the new version that handles 2 dwords at once (AsciiHex2dw_Ex4), should it produce the same result as a sequence of AsciiHex2dwNew functions, or a different one ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

#47
Quote from: guga on March 10, 2025, 05:55:21 AMFor example, say the input is this string:
[SzInputHex:  B$ "6543210F2A45B7", 0 ]


the output it seems should show:
00654321h for first dword,  and  0F2A45B7h for the second dword.

or a qword of  006543210F2A45B7h
;---------------------------------------------------------------------
Otherwise you could end up with
6543210Fh  for first dword and  002A45B7h for the second NOT what you really want.

or a qword of
6543210F002A45B7h NOT what you really want.

One of the results has a single zero inserted...

[Output: D$ 6543210F
02A45B7]  which is definitely not right.

Think of the first dword as carried hex digits (carried from the last 8 bytes, or second dword)
Maybe process the rightmost 8 bytes first? That is about the best way that I can articulate the way I see this problem.

Consider the first dword and the second as if they were joined into a qword, in the order of bytes in them.
Or maybe easier to do a 64 bit version first?
This way, you will have a way to compare for correct results?
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

TimoVJL

May the source be with you

zedd151

Quote from: TimoVJL on March 10, 2025, 06:27:48 AM6543210F2A45B7
==
00654321 0F2A45B7
Thats what I said, in maybe too many words.  :biggrin:
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

ognil

Thanks Guga for the interesting algorithm :thumbsup:

I rewrote it with some improvements like expanding the output string range to work with different lengths from 1 to 8 bytes. :smiley:

; MASM64 SSE2 implementation of AsciiHexToDword for a strings with length from 1 to 8 bytes

.data
align   16                                           ; Masks (16-byte aligned for XMM operations)  
    Mask1   oword 30303030303030303030303030303030h  ; '0' (0x30) repeated 16 times
    Mask2   oword 09090909090909090909090909090909h  ; 0x09 repeated 16 times (threshold for digits)
    Mask3   oword 07070707070707070707070707070707h  ; 0x07 repeated 16 times (adjustment for A-F)
    Mask4   oword 0F000F000F000F000F000F000F000F00h  ; Mask to isolate low nibbles
; Masks for lowercase conversion
    LowerMinMinus1  db 60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h   ; 'a' - 1 (0x60)
    LowerMax        db 66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h   ; 'f' (0x66)
    AdjustLowercase db 20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h   ; 0x20 to convert to uppercase
align   16
    szTest   db "0f2A45b7",0                        ; Example input (8 bytes)
    ;szTest  db "f2A45",0                           ; Example input (5 bytes)
    ;szTest  db "0f2A45b78",0                       ; Example input (9 bytes)
    ;szTest  db "f",0                               ; Example input (1 byte small case)
    ;szTest  db "0",0                               ; Example input (1 byte=0)

.code
;***************************************************;
    ; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result
    ; On entry: lea  rcx,szTest
    ;           call AsciiHexTodw_Og
;***************************************************;
align 16
AsciiHexTodw_Og PROC                                ; rcx = pointer to the input string
; Load input bytes and apply length limit to 8 bytes
    movq  xmm0, qword ptr [rcx]                     ; Load 8 bytes from input in xmm0
    mov   eax, 1                                    ; XMM0 = 0000000000000000-3762353441326630            
    cmp   byte ptr[rcx],0
    je    @Ret 
@@:
    cmp   byte ptr[rcx+rax],0
    je    @f 
    add   eax,1 
    cmp   eax,8 
    jb    @b
@@:
    lea   rcx,[rax-8]                               ; rax=8 -> rcx=0   
; Convert lowercase letters to uppercase
    movdqa   xmm1, xmm0                             ; XMM1 = 0000000000000000-3762353441326630    
    pcmpgtb  xmm1, xmmword ptr [LowerMinMinus1]     ; Check if >= 'a' (61h)
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    movdqa   xmm3, xmm0                             ; XMM3 = 0000000000000000-3762353441326630
    pcmpgtb  xmm3, xmmword ptr [LowerMax]           ; Check if > 'f' (66h)
                                                    ; XMM3 = 0000000000000000-0000000000000000  
    pxor     xmm1, xmm3                             ; XMM1 = FF where 'a' <= char <= 'f'
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00 
    pand     xmm1, xmmword ptr [AdjustLowercase]    ; Apply 20h adjustment
                                                    ; XMM1 = 0000000000000000-0020000000002000
    psubb    xmm0, xmm1                             ; Convert lowercase to uppercase
                                                    ; XMM0 = 0000000000000000-3742353441324630
; Subtract '0' to convert ASCII to numeric values
    psubb    xmm0, xmmword ptr [Mask1]              ; XMM0 now has values 0-15 (for 0-9, A-F)
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-0712050411021600
; Adjust for A-F (values > 9)
    movdqa   xmm1, xmm0                             ; XMM1 = D0D0D0D0D0D0D0D0-0712050411021600    
    pcmpgtb  xmm1, xmmword ptr [Mask2]              ; XMM1 = FF where value > 9
                                                    ; XMM1 = 0000000000000000-00FF0000FF00FF00
    pand     xmm1, xmmword ptr [Mask3]              ; Apply 7 adjustment
                                                    ; XMM1 = 0000000000000000-0007000007000700    
    psubb    xmm0, xmm1                             ; Subtract 7 from A-F values
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-070B05040A020F00
; Combine nibbles into bytes
    movdqa   xmm1, xmm0                             ; XMM1 = D0D0D0D0D0D0D0D0-070B05040A020F00
    pand     xmm1, xmmword ptr [Mask4]              ; Isolate low nibbles
                                                    ; XMM1 = 0000000000000000-070005000A000F00    
    pxor     xmm0, xmm1                             ; Isolate high nibbles
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-000B000400020000
    psllw    xmm0, 4                                ; Shift high nibbles left by 4 bits
                                                    ; XMM0 = 0D000D000D000D00-00B0004000200000
    pslld    xmm0, 8                                ; Align high nibbles
                                                    ; XMM0 = 000D0000000D0000-B000400020000000
    por      xmm0, xmm1                             ; Combine high and low nibbles
                                                    ; XMM0 = 000D0000000D0000-B70045002A000F00
    psrld    xmm0, 8                                ; Align to lower 32 bits
                                                    ; XMM0 = 00000D0000000D00-00B70045002A000F
    packuswb xmm0, xmm0                             ; Pack bytes into lower 32 bits
                                                    ; XMM0 = 00FF00FFB7452A0F-00FF00FFB7452A0F     
    neg      ecx                                    ; ecx=0   
    movd     eax,  xmm0                             ; Move result to eax
                                                    ; RAX = 00000000B7452A0F   
    shl      rcx,  2                                ; RCX = 0000000000000000 
    bswap    eax                                    ; Correct byte order in eax
                                                    ; RAX = 000000000F2A45B7  -> End Result      
    shr      eax,  cl                               ; RCX = 0000000000000000
@Ret:
    ret
AsciiHexTodw_Og ENDP
"Not keeping emotions under control is another type of mental distortion."

guga

Tks Ognil

Can u port this to 32 Bits, pls ?  (I cannot use 64 Bit versions yet)

I tried to port the modifications you did but i failed in some values, such as:

[SzInputHex:  B$ "18F2A45B7", 0 ]

It should return
00000001 8F2A45B7

or return an error case since the string is odd, correct ?


Can u test this string on your version and tell me what is the result ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

#52
Quote from: guga on March 10, 2025, 03:24:26 PMCan u test this string on your version and tell me what is the result ?
His version only works for up to an 8 byte string, guga. Not 16 bytes.

Quote from: ognil on March 10, 2025, 08:02:58 AMI rewrote it with some improvements like expanding the output string range to work with different lengths from 1 to 8 bytes. :smiley:
I have assembled his version and can comfirm, it only works for up to an 8 byte string, returning a dword value only. Inputting a longer string (up to 16 bytes) will result in only the first 8 bytes being processed.

    include \masm64\include64\masm64rt.inc

.data
align  16                                          ; Masks (16-byte aligned for XMM operations) 
    Mask1  oword 30303030303030303030303030303030h  ; '0' (0x30) repeated 16 times
    Mask2  oword 09090909090909090909090909090909h  ; 0x09 repeated 16 times (threshold for digits)
    Mask3  oword 07070707070707070707070707070707h  ; 0x07 repeated 16 times (adjustment for A-F)
    Mask4  oword 0F000F000F000F000F000F000F000F00h  ; Mask to isolate low nibbles
; Masks for lowercase conversion
    LowerMinMinus1  db 60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h  ; 'a' - 1 (0x60)
    LowerMax        db 66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h  ; 'f' (0x66)
    AdjustLowercase db 20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h  ; 0x20 to convert to uppercase
   

    string1 db "123456789ABCDEF", 0
   
.code

start proc

    invoke AsciiHexTodw_Og, addr string1

    invoke MessageBox, 0, hex$(rax), 0, 0
    invoke ExitProcess, eax
start endp

; MASM64 SSE2 implementation of AsciiHexToDword for a strings with length from 1 to 8 bytes

.code
;***************************************************;
    ; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result
    ; On entry: lea  rcx,szTest
    ;          call AsciiHexTodw_Og
;***************************************************;
align 16
AsciiHexTodw_Og PROC                                ; rcx = pointer to the input string
; Load input bytes and apply length limit to 8 bytes
    movq  xmm0, qword ptr [rcx]                    ; Load 8 bytes from input in xmm0
    mov  eax, 1                                    ; XMM0 = 0000000000000000-3762353441326630           
    cmp  byte ptr[rcx],0
    je    @Ret
@@:
    cmp  byte ptr[rcx+rax],0
    je    @f
    add  eax,1
    cmp  eax,8
    jb    @b
@@:
    lea  rcx,[rax-8]                              ; rax=8 -> rcx=0 
; Convert lowercase letters to uppercase
    movdqa  xmm1, xmm0                            ; XMM1 = 0000000000000000-3762353441326630   
    pcmpgtb  xmm1, xmmword ptr [LowerMinMinus1]    ; Check if >= 'a' (61h)
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    movdqa  xmm3, xmm0                            ; XMM3 = 0000000000000000-3762353441326630
    pcmpgtb  xmm3, xmmword ptr [LowerMax]          ; Check if > 'f' (66h)
                                                    ; XMM3 = 0000000000000000-0000000000000000 
    pxor    xmm1, xmm3                            ; XMM1 = FF where 'a' <= char <= 'f'
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    pand    xmm1, xmmword ptr [AdjustLowercase]    ; Apply 20h adjustment
                                                    ; XMM1 = 0000000000000000-0020000000002000
    psubb    xmm0, xmm1                            ; Convert lowercase to uppercase
                                                    ; XMM0 = 0000000000000000-3742353441324630
; Subtract '0' to convert ASCII to numeric values
    psubb    xmm0, xmmword ptr [Mask1]              ; XMM0 now has values 0-15 (for 0-9, A-F)
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-0712050411021600
; Adjust for A-F (values > 9)
    movdqa  xmm1, xmm0                            ; XMM1 = D0D0D0D0D0D0D0D0-0712050411021600   
    pcmpgtb  xmm1, xmmword ptr [Mask2]              ; XMM1 = FF where value > 9
                                                    ; XMM1 = 0000000000000000-00FF0000FF00FF00
    pand    xmm1, xmmword ptr [Mask3]              ; Apply 7 adjustment
                                                    ; XMM1 = 0000000000000000-0007000007000700   
    psubb    xmm0, xmm1                            ; Subtract 7 from A-F values
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-070B05040A020F00
; Combine nibbles into bytes
    movdqa  xmm1, xmm0                            ; XMM1 = D0D0D0D0D0D0D0D0-070B05040A020F00
    pand    xmm1, xmmword ptr [Mask4]              ; Isolate low nibbles
                                                    ; XMM1 = 0000000000000000-070005000A000F00   
    pxor    xmm0, xmm1                            ; Isolate high nibbles
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-000B000400020000
    psllw    xmm0, 4                                ; Shift high nibbles left by 4 bits
                                                    ; XMM0 = 0D000D000D000D00-00B0004000200000
    pslld    xmm0, 8                                ; Align high nibbles
                                                    ; XMM0 = 000D0000000D0000-B000400020000000
    por      xmm0, xmm1                            ; Combine high and low nibbles
                                                    ; XMM0 = 000D0000000D0000-B70045002A000F00
    psrld    xmm0, 8                                ; Align to lower 32 bits
                                                    ; XMM0 = 00000D0000000D00-00B70045002A000F
    packuswb xmm0, xmm0                            ; Pack bytes into lower 32 bits
                                                    ; XMM0 = 00FF00FFB7452A0F-00FF00FFB7452A0F   
    neg      ecx                                    ; ecx=0 
    movd    eax,  xmm0                            ; Move result to eax
                                                    ; RAX = 00000000B7452A0F 
    shl      rcx,  2                                ; RCX = 0000000000000000
    bswap    eax                                    ; Correct byte order in eax
                                                    ; RAX = 000000000F2A45B7  -> End Result     
    shr      eax,  cl                              ; RCX = 0000000000000000
@Ret:
    ret
AsciiHexTodw_Og ENDP

end

Quote from: ognil on March 10, 2025, 08:02:58 AM; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result


input = "123456789ABCDEF"    <-------  the bold chars here guga, will never get processed using ognils code.
result = 12345678h  in rax/eax

:biggrin:

I did try to convert it to 32 bit, but had too many issues with doing that. I don't have enough coding mojo I guess.   :toothy: 
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

Quote from: zedd151 on March 10, 2025, 03:31:26 PM
Quote from: guga on March 10, 2025, 03:24:26 PMCan u test this string on your version and tell me what is the result ?
His version only works for up to an 8 byte string, guga. Not 16 bytes.

Quote from: ognil on March 10, 2025, 08:02:58 AMI rewrote it with some improvements like expanding the output string range to work with different lengths from 1 to 8 bytes. :smiley:
I have assembled his version and can comfirm, it only works for up to an 8 byte string, returning a dword value only. Inputting a longer string (up to 16 bytes) will result in only the first 8 bytes being processed.

    include \masm64\include64\masm64rt.inc

.data
align  16                                          ; Masks (16-byte aligned for XMM operations) 
    Mask1  oword 30303030303030303030303030303030h  ; '0' (0x30) repeated 16 times
    Mask2  oword 09090909090909090909090909090909h  ; 0x09 repeated 16 times (threshold for digits)
    Mask3  oword 07070707070707070707070707070707h  ; 0x07 repeated 16 times (adjustment for A-F)
    Mask4  oword 0F000F000F000F000F000F000F000F00h  ; Mask to isolate low nibbles
; Masks for lowercase conversion
    LowerMinMinus1  db 60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h,60h  ; 'a' - 1 (0x60)
    LowerMax        db 66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h,66h  ; 'f' (0x66)
    AdjustLowercase db 20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h,20h  ; 0x20 to convert to uppercase
   

    string1 db "123456789ABCDEF", 0
   
.code

start proc

    invoke AsciiHexTodw_Og, addr string1

    invoke MessageBox, 0, hex$(rax), 0, 0
    invoke ExitProcess, eax
start endp

; MASM64 SSE2 implementation of AsciiHexToDword for a strings with length from 1 to 8 bytes

.code
;***************************************************;
    ; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result
    ; On entry: lea  rcx,szTest
    ;          call AsciiHexTodw_Og
;***************************************************;
align 16
AsciiHexTodw_Og PROC                                ; rcx = pointer to the input string
; Load input bytes and apply length limit to 8 bytes
    movq  xmm0, qword ptr [rcx]                    ; Load 8 bytes from input in xmm0
    mov  eax, 1                                    ; XMM0 = 0000000000000000-3762353441326630           
    cmp  byte ptr[rcx],0
    je    @Ret
@@:
    cmp  byte ptr[rcx+rax],0
    je    @f
    add  eax,1
    cmp  eax,8
    jb    @b
@@:
    lea  rcx,[rax-8]                              ; rax=8 -> rcx=0 
; Convert lowercase letters to uppercase
    movdqa  xmm1, xmm0                            ; XMM1 = 0000000000000000-3762353441326630   
    pcmpgtb  xmm1, xmmword ptr [LowerMinMinus1]    ; Check if >= 'a' (61h)
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    movdqa  xmm3, xmm0                            ; XMM3 = 0000000000000000-3762353441326630
    pcmpgtb  xmm3, xmmword ptr [LowerMax]          ; Check if > 'f' (66h)
                                                    ; XMM3 = 0000000000000000-0000000000000000 
    pxor    xmm1, xmm3                            ; XMM1 = FF where 'a' <= char <= 'f'
                                                    ; XMM1 = 0000000000000000-00FF00000000FF00
    pand    xmm1, xmmword ptr [AdjustLowercase]    ; Apply 20h adjustment
                                                    ; XMM1 = 0000000000000000-0020000000002000
    psubb    xmm0, xmm1                            ; Convert lowercase to uppercase
                                                    ; XMM0 = 0000000000000000-3742353441324630
; Subtract '0' to convert ASCII to numeric values
    psubb    xmm0, xmmword ptr [Mask1]              ; XMM0 now has values 0-15 (for 0-9, A-F)
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-0712050411021600
; Adjust for A-F (values > 9)
    movdqa  xmm1, xmm0                            ; XMM1 = D0D0D0D0D0D0D0D0-0712050411021600   
    pcmpgtb  xmm1, xmmword ptr [Mask2]              ; XMM1 = FF where value > 9
                                                    ; XMM1 = 0000000000000000-00FF0000FF00FF00
    pand    xmm1, xmmword ptr [Mask3]              ; Apply 7 adjustment
                                                    ; XMM1 = 0000000000000000-0007000007000700   
    psubb    xmm0, xmm1                            ; Subtract 7 from A-F values
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-070B05040A020F00
; Combine nibbles into bytes
    movdqa  xmm1, xmm0                            ; XMM1 = D0D0D0D0D0D0D0D0-070B05040A020F00
    pand    xmm1, xmmword ptr [Mask4]              ; Isolate low nibbles
                                                    ; XMM1 = 0000000000000000-070005000A000F00   
    pxor    xmm0, xmm1                            ; Isolate high nibbles
                                                    ; XMM0 = D0D0D0D0D0D0D0D0-000B000400020000
    psllw    xmm0, 4                                ; Shift high nibbles left by 4 bits
                                                    ; XMM0 = 0D000D000D000D00-00B0004000200000
    pslld    xmm0, 8                                ; Align high nibbles
                                                    ; XMM0 = 000D0000000D0000-B000400020000000
    por      xmm0, xmm1                            ; Combine high and low nibbles
                                                    ; XMM0 = 000D0000000D0000-B70045002A000F00
    psrld    xmm0, 8                                ; Align to lower 32 bits
                                                    ; XMM0 = 00000D0000000D00-00B70045002A000F
    packuswb xmm0, xmm0                            ; Pack bytes into lower 32 bits
                                                    ; XMM0 = 00FF00FFB7452A0F-00FF00FFB7452A0F   
    neg      ecx                                    ; ecx=0 
    movd    eax,  xmm0                            ; Move result to eax
                                                    ; RAX = 00000000B7452A0F 
    shl      rcx,  2                                ; RCX = 0000000000000000
    bswap    eax                                    ; Correct byte order in eax
                                                    ; RAX = 000000000F2A45B7  -> End Result     
    shr      eax,  cl                              ; RCX = 0000000000000000
@Ret:
    ret
AsciiHexTodw_Og ENDP

end

Quote from: ognil on March 10, 2025, 08:02:58 AM; Function:AsciiHexTodw_Og
    ; Input: ecx = pointer to the 8-byte ASCII hex string with length less then 8 bytes
    ; Output: eax = 32-bit DWORD result


input = "123456789ABCDEF"    <-------  the bold chars here guga, will never get processed using ognils code.
result = 12345678h  in rax/eax

:biggrin:

I did try to convert it to 32 bit, but had too many issues with doing that. I don't have enough coding mojo I guess.   :toothy: 

Hi  zedd151

His version has some minor flaws, but i´m trying to adjust it to my code that handles 2 dwords at once. I liked the way he used bswap (on the cost of only 2 clock cycles i presume) and saved to use other opcodes from SSE, making the algo a bit shorter (And perhaps a bit faster). The problem relies for strings longer than 8 bytes. I´m reviewing the tables in order to adjust for position and shift when a string is identified as being longer than 8 chars. I´m just concerned in how much performance loss it will result, but i guess i´m closer to a solution.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

Hey guga, since you are effectively trying to convert ascii hex to a qword (via two dwords) maybe the topic title should be changed?

Either AsciiHextoDwords (with an 's' at the end) or alternatively AsciiHextoQword???
Maybe that is why ognil only processed a single dword in his code? "AsciiHextoDword (SSE2 version)"
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

maybe....I started with a Dword, now i´m doing the qwrod version and later will do one for all sizes...Perhaps changing the title to HexAscii Conversions (or something).

Btw...i suceeded to make it work for the qword version. I suceed to fix Ognil optimization for 32 Bits...but didn´t tested yet to see if it really optimized the function in terms of speed.

I´ll try to port it to Masm and make it work with JJ´s benchmark app which is better for this sort of tests IMHO. I´pm not used yet with masmbasic but probably i can give a try to see if i can make it work on his tool so we can test the algorithm.

Although i liked the usage of bswap i don´t know if it will be good or bad for the algorithm speed itself. Once i succeed to port i´ll post it here the new version
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

Quote from: guga on March 11, 2025, 05:04:40 AMPerhaps changing the title to HexAscii Conversions (or something).
I think you can think of a good name for what it will do.  :smiley:  I only offered a suggestion.

Quote from: guga on March 11, 2025, 05:04:40 AMOnce i succeed to port i´ll post it here the new version
:thumbsup:
I wouldn't worry about speed so much. It's more important that it works first, exactly the way that you intend it to work.
Adjustments for speed can always come later.

But, is being faster really necessary though? It would only need to be super fast if it is called many, many times (100's, 1000's or more times) within a given program. If used only once or twice in a program, speed won't make much difference overall. Unless of course it takes way too long to do the conversion only once or twice, like several seconds.
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

Hi Zedd

I´m trying to make it for speed in order to use it in RosAsm. Currently i´m trying to fix some old bugs and make it working better, but it is hard since all internal code is a mess, and making the functions works independently, i mean without the needs to reuse hundreds of global variables is a true hell. My goal is to make dlls for rosasm , such as one for the encoder,decoder, disassembler, resources editor. The problem is that when we started with RosAsm, we chose at the time to make it work that way, and we allowed contributors to directly code on it, which lead to several bugs. Even considering that the contributors could work on RosAsm, the ideal was that they used the same coding style, but, it didn´t happened that way. Several major functions in RosAsm was made by different people and many of them (if not all) reused several global variables that, by default we should make them Local.

That´s the reason why we never was able to make it as a dll etc,. I succeeded to isolate hundreds of functions, and improve them on their own dlls, such as RosMem.dll (A memory management library that can be used not only for rosAsm but for other purposes), a FastCRT dll, FastMath.dl etc...now i need to do the same for the encoder, etc. Which is a true hell and after working on it a while, i tend to get bored continuing and take a long time to work on it again.

The problam is that i need now that RosAsm be as fixed as possible, specially because i plan to create some plugins for Sony Vegas, VirtualDub, Audacity, etc. And it´s not a easy task to do with the current RosAsm development. Not to mention that i was never able to implement a 64bit version of it.


Anyway...hjere is the new version that works for qword (I hope the porting to masm is ok this time)

; ---------------------------------------------------------------------------
ShiftTbl        struc
Distance        db ?
Shift           db ?
IsQword         db ?
Reserved        db ?
ShiftTbl        ends

MaskOddAdjust   oword 30h

ShiftTbl2       ShiftTbl <0, 0, 0, 0>
                ShiftTbl <0, 24, 0, 0>
                ShiftTbl 2 dup(<0, 16, 0, 0>)
                ShiftTbl 2 dup(<0, 8, 0, 0>)
                ShiftTbl 2 dup(<0, 0, 0, 0>)
                ShiftTbl 2 dup(<1, 24, 1, 0>)
                ShiftTbl 2 dup(<2, 16, 1, 0>)
                ShiftTbl 2 dup(<3, 8, 1, 0>)
                ShiftTbl 2 dup(<4, 0, 1, 0>)

MaskOddAdjust   oword 30h
Mask1           oword 30303030303030303030303030303030h
Mask2           oword 9090909090909090909090909090909h
Mask3           oword 7070707070707070707070707070707h
Mask4a          oword 0F000F000F000F000F000F000F000F00h


AsciiHex2dw_Ex5 proc near
TmpStorage1Dis  = dword ptr -18h
TmpStorage2Dis  = dword ptr -14h
TmpStorage3Dis  = dword ptr -10h
TmpStorage      = dword ptr -8
Lenght          = dword ptr -4
pString         = dword ptr  8
pOutput         = dword ptr  0Ch

                push    ebp
                mov     ebp, esp
                sub     esp, 4
                sub     esp, 14h
                mov     [ebp+TmpStorage], esp
                push    ecx
                push    edi
                push    esi
                mov     eax, [ebp+TmpStorage]
                mov     [ebp+TmpStorage1Dis], 0
                mov     [ebp+TmpStorage2Dis], 0
                mov     [ebp+TmpStorage3Dis], 0
                mov     eax, [ebp+pString]
                movdqu  xmm0, qword ptr [eax]
                xorps   xmm1, xmm1
                pcmpeqb xmm0, xmm1
                pmovmskb ecx, xmm0
                bsf     cx, cx
                jnz     short loc_42C2A4
                mov     ecx, 10h

loc_42C2A4:                             ; CODE XREF: AsciiHex2dw_Ex5+3D↑j
                mov     [ebp+Lenght], ecx
                movdqu  xmm0, qword ptr [eax]
                mov     edi, [ebp+TmpStorage]
                test    ecx, 1
                jz      short loc_42C2C7
                movdqu  qword ptr [edi+1], xmm0
                movdqu  xmm0, qword ptr [edi]
                por     xmm0, MaskOddAdjust

loc_42C2C7:                             ; CODE XREF: AsciiHex2dw_Ex5+54↑j
                psubb   xmm0, Mask1
                movdqa  xmm1, xmm0
                pcmpgtb xmm1, Mask2
                pand    xmm1, Mask3
                psubb   xmm0, xmm1
                movdqa  xmm1, xmm0
                pand    xmm1, Mask4a
                pxor    xmm0, xmm1
                psllw   xmm0, 4
                pslld   xmm0, 8
                por     xmm0, xmm1
                psrld   xmm0, 8
                packuswb xmm0, xmm0
                movdqu  xmmword ptr [edi], xmm0
                dec     ecx
                mov     ecx, dword ptr ShiftTbl2.Distance[ecx*4]
                movzx   eax, cl
                mov     eax, [eax+edi]
                bswap   eax
                mov     esi, [ebp+pOutput]
                test    ecx, 10000h
                jz      short loc_42C334
                mov     [esi+4], eax
                mov     eax, [edi]
                bswap   eax

loc_42C334:                             ; CODE XREF: AsciiHex2dw_Ex5+CB↑j
                movzx   ecx, ch
                shr     eax, cl
                mov     [esi], eax
                mov     eax, [ebp+Lenght]
                pop     esi
                pop     edi
                pop     ecx
                mov     esp, ebp
                pop     ebp
                retn    8
AsciiHex2dw_Ex5 endp




The RosAsm syntax is:


[ShiftTbl2:
ShiftTbl2.Data0: B$ 0, 0, 0, 0 ; Length = 1 ; OK. (2nd byte was 28)
ShiftTbl2.Data1: B$ 0, 24, 0, 0 ; Length = 2 ; ok
ShiftTbl2.Data2: B$ 0, 16, 0, 0 ; Length = 3 ; OK. (2nd byte was 20)
ShiftTbl2.Data3: B$ 0, 16, 0, 0 ; Length = 4; ok
ShiftTbl2.Data4: B$ 0, 8, 0, 0 ; Length = 5 ; OK. (2nd byte was 12)
ShiftTbl2.Data5: B$ 0, 8, 0, 0 ; Length = 6 ; ok
ShiftTbl2.Data6: B$ 0, 0, 0, 0 ; Length = 7 ; OK. (2nd byte was 4)
ShiftTbl2.Data7: B$ 0, 0, 0, 0 ; Length = 8; OK

; now the 2Nd dword (Distance is from the 2nd dword) and ch for the shr the 1st dword
ShiftTbl2.Data8: B$ 1, 24, 1, 0 ; Length = 9 ; OK
ShiftTbl2.Data9: B$ 1, 24, 1, 0 ; Length = 10 ; OK
ShiftTbl2.Data10: B$ 2, 16, 1, 0 ; Length = 11 ; OK
ShiftTbl2.Data11: B$ 2, 16, 1, 0 ; Length = 12 ; OK
ShiftTbl2.Data12: B$ 3, 8, 1, 0 ; Length = 13 ; OK
ShiftTbl2.Data13: B$ 3, 8, 1, 0 ; Length = 14 ; OK
ShiftTbl2.Data14: B$ 4, 0, 1, 0 ; Length = 15 ; OK
ShiftTbl2.Data15: B$ 4, 0, 1, 0];0 ]; Length = 16 ecx = 0 pos = 0

; 1st byte = distance. 2nd byte = Shift, 3rd Byte = Flag for size. If True, size > 8 bytes. False, other wise

[<16 MaskOddAdjust: Q$ 030, 0, 0, 0]  ; '0'

[<16 Mask1: Q$ 030303030_30303030, 030303030_30303030]  ; '0'
[<16 Mask2: Q$ 09090909_09090909, 09090909_09090909]  ; '9'
[<16 Mask3: Q$ 07070707_07070707, 07070707_07070707]  ; '7'
[<16 Mask4a: Q$ 0F_00_0F_00_0F_00_0F_00, 0F_00_0F_00_0F_00_0F_00]  ; 0x0F_00_0F_00

[HEXCNV_LONG_STR 00__0000_0001__0000_0000__0000_0000]

Proc AsciiHex2dw_Ex5:
    Arguments @pString, @pOutput
    Local @Lenght
    Structure @TmpStorage 16, @TmpStorage1Dis 0, @TmpStorage2Dis 4, @TmpStorage3Dis 8
    Uses ecx, edi, esi


    mov eax D@TmpStorage | mov D@TmpStorage1Dis 0 | mov D@TmpStorage2Dis 0 | mov D@TmpStorage3Dis 0
    mov eax, D@pString
    movdqu xmm0, X$eax         ; Loads 8 bytes of the string into XMM0
                               ; XMM0 = 0x37423534_41324630 ("0F2A45B7" in ASCII
                               ; Words: 0000 0000 0000 0000 3742 3534 4132 4630

    ; get the size of the string to calculate a index to be shifted at the end
    xorps xmm1 xmm1 | pcmpeqb xmm0 xmm1 | pmovmskb ecx xmm0 | bsf cx cx | jnz L1> | mov ecx 16 | L1:
    mov D@Lenght ecx
;;
; qword
String: 876543210F2A45B7    - CH: 64 - CL:  4 - CX: 16388
String: 76543210F2A45B7     - CH: 4  - CL: 64 - CX: 1088
String: 6543210F2A45B7      - CH: 8  - CL: 60 - CX: 2108
String: 543210F2A45B7       - CH: 12 - CL: 56 - CX: 3128
String: 43210F2A45B7        - CH: 16 - CL: 52 - CX: 4148
String: 3210F2A45B7         - CH: 20 - CL: 48 - CX: 5168
String: 210F2A45B7          - CH: 24 - CL: 44 - CX: 6188
String: 10F2A45B7           - CH: 28 - CL: 40 - CX: 7208
;dword
String: 0F2A45B7            - CH: 32 - CL: 36 - CX: 8228
String: F2A45B7             - CH: 36 - CL: 32 - CX: 9248
String: 2A45B7              - CH: 40 - CL: 28 - CX: 10268
String: A45B7               - CH: 44 - CL: 24 - CX: 11288
String: 45B7                - CH: 48 - CL: 20 - CX: 12308
String: 5B7                 - CH: 52 - CL: 16 - CX: 13328
String: B7                  - CH: 56 - CL: 12 - CX: 14348
String: 7                   - CH: 60 - CL:  8 - CX: 15368


; qword
String: 876543210F2A45B7    - CH: 64    - CL:  0 - CX: 16384
String: 76543210F2A45B7     - CH: 4     - CL: 60 - CX: 1084
String: 6543210F2A45B7      - CH: 8     - CL: 56 - CX: 2104
String: 543210F2A45B7       - CH: 12    - CL: 52 - CX: 3124
String: 43210F2A45B7        - CH: 16    - CL: 48 - CX: 4144
String: 3210F2A45B7         - CH: 20    - CL: 44 - CX: 5164
String: 210F2A45B7          - CH: 24    - CL: 40 - CX: 6184
String: 10F2A45B7           - CH: 28    - CL: 36 - CX: 7204

;dword
String: 0F2A45B7            - CH: 32    - CL: 32 - CX: 8224
String: F2A45B7             - CH: 36    - CL: 28 - CX: 9244
String: 2A45B7              - CH: 40    - CL: 24 - CX: 10264
String: A45B7               - CH: 44    - CL: 20 - CX: 11284
String: 45B7                - CH: 48    - CL: 16 - CX: 12304
String: 5B7                 - CH: 52    - CL: 12 - CX: 13324
String: B7                  - CH: 56    - CL:  8 - CX: 14344
String: 7                   - CH: 60    - CL:  4 - CX: 15364



    Examples:
    0F2A45B7 = shr eax 0,  ax = 8 => 32-32 = 32-(8*4) = 0*4
     F2A45B7 = shr eax 4,  ax = 7 => 32-28 = 32-(7*4) = 1*4
      2A45B7 = shr eax 8,  ax = 6 => 32-24 = 32-(6*4) = 2*4
       A45B7 = shr eax 12, ax = 5 => 32-20 = 32-(5*4) = 3*4
        45B7 = shr eax 16, ax = 4 => 32-16 = 32-(4*4) = 4*4
         5B7 = shr eax 20, ax = 3 => 32-12 = 32-(3*4) = 5*4
          B7 = shr eax 24, ax = 2 => 32-8  = 32-(2*4) = 6*4
           7 = shr eax 28, ax = 1 => 32-4  = 32-(1*4) = 7*4
;;

    movdqu xmm0, X$eax
    mov edi D@TmpStorage
    Test_If ecx 00_0000_0001; Check if the lenght of the number is odd and adjust the input accordly. So, check for 9, 11, 13, 15
        movdqu X$edi+1 xmm0 | movdqu xmm0 X$edi | por xmm0 X$MaskOddAdjust ; OR with an '0' at the end
    Test_End

    ; Subtract '0'
    psubb xmm0, X$Mask1         ; Subtracts 0x30 ('0') from each byte to convert ASCII to values
                                ; Mask1: 3030 3030 3030 3030 3030 3030 3030 3030
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    ; Adjust A-F
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for adjustment
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600

    pcmpgtb xmm1, X$Mask2       ; Compares each byte with '9' to identify A-F
                                ; Mask2: 0909 0909 0909 0909 0909 0909 0909 0909
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 00FF 0000 FF00 FF00 (FF where > 9)

    pand xmm1, X$Mask3          ; Applies a 7 correction to bytes > 9 (A-F)
                                ; Mask3: 0707 0707 0707 0707 0707 0707 0707 0707
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 0712 0504 1102 1600
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700 (where 7 is the settled on the bytes greater that was greater then 7)

    psubb xmm0, xmm1            ; Subtracts 7 from A-F bytes to adjust to hex range
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0007 0000 0700 0700

    ; Combine nibbles into bytes

    ; Separate and combine nibbles
    movdqa xmm1, xmm0           ; Copies XMM0 to XMM1 for nibble separation
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00

    pand xmm1, X$Mask4a         ; Isolates low nibbles (keeps bits 0-3 of each byte)
                                ; Mask4a: 0F00 0F00 0F00 0F00 0F00 0F00 0F00 0F00
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 070B 0504 0A02 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pxor xmm0 xmm1              ; Removes low nibbles from XMM0, keeping only high nibbles
                                ; XMM0: D0D0 D0D0 D0D0 D0D0 000B 0004 0002 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psllw xmm0 4                ; Shifts high nibbles 4 bits left
                                ; XMM0: D000 D000 D000 D000 00B0 0040 0020 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    pslld xmm0 8                ; Shifts entire register 8 bits left (aligns high nibbles)
                                ; XMM0: 000D 0000 000D 0000 B000 4000 2000 0000
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    por xmm0 xmm1               ; Combines high and low nibbles
                                ; XMM0: 000D 0000 000D 0000 B700 4500 2A00 0F00
                                ; XMM1: 0000 0000 0000 0000 0700 0500 0A00 0F00

    psrld    xmm0, 8
    packuswb xmm0, xmm0

    movdqu X$edi xmm0 ; save it to TmpStorage
    dec ecx ; our index to the shift table
    mov ecx D$ShiftTbl2+ecx*4

    ; 1st calculate distance (1st byte in ShiftTbl2)
    ; cl distance, ch = shift
    movzx eax cl
    mov eax D$edi+eax
    ; Reverse the value to be stores either in 1st dword (If string is less or equal to 8 bytes)
    ; or in 2nd Dword (If string is bigger than 8 bytes
    bswap eax
    mov esi D@pOutput
    ; Now check if the string is bigger than 8 bytes (3rd byte in ShiftTbl2)
    Test_If ecx HEXCNV_LONG_STR ;   010000
        ; get distance and shift for the 2nd dword
        mov D$esi+4 eax
        mov eax D$edi
        bswap eax
    Test_End
    ; Now calculate the shift (2nd byte in ShiftTbl2)
    movzx ecx ch
    shr eax cl
    ; store it either in 1st dword (If string is less or equal to 8 bytes) or in the 2nd Dword (If string is bigger than 8 bytes)
    mov D$esi eax

    mov eax D@Lenght

EndP


And here goes JJ´s version. (I just renamed on his app) the function to AsciiHex2Qword - but it i the same as this one. It stores the output on a buffer pointed by a parameter (output) and in eax it return the lenght of the input.   

Btw...JJ can u fix the code on your app in order o it shows the proper values on return ? I couldn´t find inside the ".asc" file where i could change it to pass the result stored in the parameter Output (A buffer containing 2 dwords)

The speed seems as fast as the previous version :)


Note:

Now it shows the proper result order, i suppose.

Ex:

[SzInputHex:  B$ "543210F2A45B7", 0 ]

[Output: D$ 0 #2] ; 16 bytes = 2 Dwords

    call AsciiHex2dw_Ex5 SzInputHex, Output

eax = 13 bytes
Output = 1st dword = 054321, 2nd dword = 0F2A45B7
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

for several instructions such as

"movdqu  xmm0, qword ptr [eax]"
"movdqu  qword ptr [edi+1], xmm0"

guga2.asm(67) : error A2022:instruction operands must be the same size
You might need some additional help here. My knowledge of SSE is very, very limited. Practically zero.  :tongue:

¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

ognil

Hi Guga,

Yesterday I saw your last version of algo. Congratulations! :thumbsup:

I want to ask a stupid question to everyone:
1. Which masochist would write such a long string as szTest db "6543210F2A45B7", 0, to get the same result, instead of leaving only the numbers and putting an "h" at the end?
Who, when, where and for what would practically use such a large QWORD number?
Please give an example! :badgrin:
"Not keeping emotions under control is another type of mental distortion."