News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Hexadecimal string to numeric function

Started by hyder, January 30, 2024, 08:09:18 AM

Previous topic - Next topic

hyder

A while back I posted a function that converts 64-bit numeric values to a string of hexadecimal digits. After much work, I've come up with an algorithm for the reverse operation: converting hexadecimal strings to a 64-bit numeric value. I originally wrote this code in 32-bit ARM assembly language for a book I'm working on, but I ported the code to x86-64 just for fun. It processes 1.5 billion strings (of various lengths) in about 10 seconds on my iMac Pro running Windows under Parallels (YMMV).

This code will skip over any leading delimiter characters (space, comma, colon, semicolon, line feed/newline, carriage return, and tab) and the process all the hexadecimal digits that follow (up to the next delimiter character or the end of the string).

It uses a 256-entry lookup table to classify each input character (and convert hexadecimal digits to the values 0 to F). The other main trick is that I (largely) unrolled the conversion loop and didn't bother checking for overflow on the first 16 input characters (because overflow cannot occur until at least 17 input characters).

Enjoy,
Randy Hyde

hyder

One additional comment: this code does not follow the Intel ABI; it was written for assembly coders. As such, I preserve all registers I modify on the call (except for RAX, where it returns the result, and ESI, which is the string pointer and is left pointing at the end of the processed string). There is a small facade function that preserves RSI (leaving it pointing at the start of the string) if you prefer that.

Note that if you can live with the confines of the Intel ABI and don't mind preserving volatile registers on call to this function, it can be sped up a little bit by changing all the registers it uses to volatile registers and removing the pushes and pops.

Note that you will need Visual C, MASM, and make to build the code as I've written it. However, you can easily lift the bufToH64 function out of the code and include it in any other pure assembly source file you write.
Cheers,
Randy Hyde

NoCforMe

Interesting. After reading this, I decided to roll my own version, attached. I wonder if you'd care to take a look at it; I have to say I'm not sure your version is any better, what with all those macros and everything.

Mine is just a straightforward, dumbass iterative routine. (I did, however, use your lookup table, although mine's simpler: it just separates valid hex chars. from non-valid ones.) My function looks for a valid hex string embedded within any other characters and converts up to 8 hex digits (32-bit result; obviously this could be expanded to 64 bits easily). The string must be NULL-terminated. It returns an error if there are no valid hex digits found or if there are too many (>8).

If you like maybe you could "race" this against your routine, although I'm not sure that really proves anything: who really cares how fast a text-to-binary routine is? At least for any applications I can think of.

Here's my conversion routine:
$hexChar EQU 0
$nonHexChar EQU 1
$maxHexDigits EQU 8

XlatTable LABEL BYTE
; Chars. below '0':
DB 48 DUP ($nonHexChar)
; '0'-'9':
DB 10 DUP ($hexChar)
; Chars. up to 'A':
DB 7 DUP ($nonHexChar)
; 'A' - 'F':
DB 6 DUP ($hexChar)
; Chars. up to 'a':
DB 26 DUP ($nonHexChar)
; 'a' - 'f':
DB 6 DUP ($hexChar)
; Remaining chars:
DB 25 + 128 DUP ($nonHexChar)

;===========================================================
; String2Hex()
;
; Converts a string which may or may not contain a sequence
; of valid hex characters (0-9/A-F/a-f) to binary.
;
; Starts @ 1st valid hex char., stops at 1st non-hex char.
; (the hex string can be embedded in any non-hex chars.)
; Scans off leading zeroes.
;
; On entry,
; EAX--> string (NULL-terminated)
;
; Returns:
; EAX = binary value of hex string
; EDX: result of conversion:
;   0: a valid result returned
;   1: no valid hex chars. found in string
;   2: overflow (too many hex chars.)
;============================================================

String2Hex PROC

PUSH EBX
PUSH ESI
MOV ESI, EAX
MOV EBX, OFFSET XlatTable

XOR DL, DL ;Clear digit count.
XOR ECX, ECX ;Clear "accumulator".

; Scan off any leading non-hex chars., if any:
prelp: LODSB
MOV DH, AL ;Save character in case it's valid.
TEST AL, AL ;Look for NULL terminator.
JZ endstr
CMP AL, '0' ;Leading zero?
JE prelp ;  Yep, get rid of it!
XLATB
CMP AL, $hexChar
JE valid
JMP prelp ;Continue scanning leading chars.

cnvlp: LODSB
MOV DH, AL
TEST AL, AL
JZ endstr
XLATB ;See what kind of char.
CMP AL, $hexChar
JNE endstr

; We've seen a valid char.--how many hex digits so far?
valid: INC DL
CMP DL, $maxHexDigits
JA excess
MOVZX EAX, DH ;Place char. in EAX.

; Valid hex char. seen--convert it:
CMP AL, '9' ;Numeric digit?
JBE isnum
CMP AL, 'F' ;Uppercase hex digit?
JBE noupr

; Char. is 'a'-'f', so uppercase it:
AND AL, 5Fh
noupr: SUB AL, 'A' - 10

shift: SHL ECX, 4 ;Shift 1 nybble over.
OR CL, AL ;Lay nybble into accumulator.
JMP cnvlp

; Valid numeric digit seen--convert it:
isnum: SUB AL, '0'
JMP shift ;Put this nybble into total.

; End of string seen--return result:
endstr: TEST DL, DL ;Did we convert ANY chars.?
JZ none ;  Nope, return error.
XOR EDX, EDX ;  Yep, indicate success
MOV EAX, ECX ;  Return accumulator.
exit99: POP ESI
POP EBX
RET

none: MOV EDX, 1
JMP exit99
excess: MOV EDX, 2
JMP exit99

String2Hex ENDP
Assembly language programming should be fun. That's why I do it.

NoCforMe

Erg; that routine name should probly be HexString2Bin() ...
Assembly language programming should be fun. That's why I do it.

sinsi

If you are so set on using such a huge table, why not like this:
XlatTable LABEL BYTE
; Chars. below '0':
DB 48 DUP (-1)
; '0'-'9':
DB 0,1,2,3,4,5,6,7,8,9
; Chars. up to 'A':
DB 7 DUP (-1)
; 'A' - 'F':
DB 10,11,12,13,14,15
; Chars. up to 'a':
DB 26 DUP (-1)
; 'a' - 'f':
DB 10,11,12,13,14,15
; Remaining chars:
DB 25 + 128 DUP (-1)

    ...
    lodsb
    xlatb
    cmp al,-1
    jz invalid
    ...

🍺🍺🍺

NoCforMe

Thanks, sinsi. I was just about to post my revised code. I realized just what you pointed out--how stupid of me not to take advantage of that table to return the value of the digit!.

Rather than edit my post to make it seem like everything was OK in the first place, I'm reposting with my tail between my legs ...

$nonHexChar EQU 0FFh
$maxHexDigits EQU 8

XlatTable LABEL BYTE
; Chars. below '0':
DB 48 DUP ($nonHexChar)
; '0'-'9':
DB 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
; Chars. up to 'A':
DB 7 DUP ($nonHexChar)
; 'A' - 'F':
DB 10, 11, 12, 13, 14, 15
; Chars. up to 'a':
DB 26 DUP ($nonHexChar)
; 'a' - 'f':
DB 10, 11, 12, 13, 14, 15
; Remaining chars:
DB 25 + 128 DUP ($nonHexChar)

;===========================================================
; HexString2Bin()
;
; Converts a string which may or may not contain a sequence
; of valid hex characters (0-9/A-F/a-f) to binary.
;
; Starts @ 1st valid hex char., stops at 1st non-hex char.
; (the hex string can be embedded in any non-hex chars.)
; Scans off leading zeroes.
;
; On entry,
; EAX--> string (NULL-terminated)
;
; Returns:
; EAX = binary value of hex string
; EDX: result of conversion:
;   0: a valid result returned
;   1: no valid hex chars. found in string
;   2: overflow (too many hex chars.)
;============================================================

HexString2Bin PROC

PUSH EBX
PUSH ESI
MOV ESI, EAX
MOV EBX, OFFSET XlatTable

XOR DL, DL ;Clear digit count.
XOR ECX, ECX ;Clear "accumulator".

; Scan off any leading non-hex chars., if any:
prelp: LODSB
TEST AL, AL ;Look for NULL terminator.
JZ endstr
CMP AL, '0' ;Leading zero?
JE prelp ;  Yep, get rid of it!
XLATB
CMP AL, $nonHexChar
JNE valid
JMP prelp ;Continue scanning leading chars.

cnvlp: LODSB
TEST AL, AL
JZ endstr
XLATB ;Get the hex char's value.
CMP AL, $nonHexChar
JE endstr

; We've got a valid char.--how many digits so far?
valid: INC DL
CMP DL, $maxHexDigits
JA excess

; Valid hex char. seen--stuff it:
stuff: SHL ECX, 4 ;Shift 1 nybble over.
OR CL, AL ;Lay nybble into accumulator.
JMP cnvlp

; End of string seen--return result:
endstr: TEST DL, DL ;Did we convert ANY chars.?
JZ none ;  Nope, return error.
XOR EDX, EDX ;  Yep, indicate success
MOV EAX, ECX ;  Return accumulator.
exit99: POP ESI
POP EBX
RET

none: MOV EDX, 1
JMP exit99
excess: MOV EDX, 2
JMP exit99

HexString2Bin ENDP
Assembly language programming should be fun. That's why I do it.

sinsi

You could filter out < '0' and > 'f' with a couple of compares and cut your table down by more than half.
Memory accesses are pretty slow, you could also put it in the code section with the proc to use the cache.
🍺🍺🍺

NoCforMe

By memory accesses you mean XLAT, correct? That instruction seems to execute in 4 cycles, and it would take longer to load EAX and do something like MOV EAX, [EBX+EAX], wouldn't it?
Assembly language programming should be fun. That's why I do it.

sinsi

Probably with today's processors MOV is optimised and XLAT is considered legacy.
You can't really trust cycle counts with newer CPUs either.
🍺🍺🍺

daydreamer

Quote from: sinsi on January 31, 2024, 01:17:26 PMYou could filter out < '0' and > 'f' with a couple of compares and cut your table down by more than half.
Memory accesses are pretty slow, you could also put it in the code section with the proc to use the cache.

Wouldn't it be better subtract if between "0" -"9" to be between 0-9
If between "a" and "f" subtract so it's between 10-15?
Ofcourse remove bit that says if it is uppercase or not
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

NoCforMe

Check my revised code; none of that is needed. The table delivers the exact (binary) value of the hex character seen with no conversion necessary. That was sinsi's point.
Assembly language programming should be fun. That's why I do it.

daydreamer

I am working on a SIMD version,but maybe part of it needs to be scalar

Hyder if its faster without preserve registers,wouldnt it be candidate for put in a workerthread ?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

InfiniteLoop

My best effort for now.
 It converts 2x 16 byte strings from [rdx] and outputs 2x unsigned long longs into [rcx].
HexToInt proc ;"(0)(x)"ff ff ff ff ff ff ff ff"(H)" = 16bytes
;'0' = 48,'9' = 57,'a' = 65, 'F' = 70 'f' = 102 => 0000,1001,1010,1111,1111
vmovdqu ymm0,ymmword ptr [rdx]
vpxor xmm1,xmm1,xmm1
vgf2p8affineqb ymm2,ymm1,ymm1,'9'
vgf2p8affineqb ymm3,ymm1,ymm1,'F'
vgf2p8affineqb ymm4,ymm1,ymm1,48
vgf2p8affineqb ymm5,ymm1,ymm1,7
vgf2p8affineqb ymm1,ymm1,ymm1,32
vpcmpgtb k1,ymm0,ymm2 ;> '9'
vpcmpgtb k2,ymm0,ymm3 ;> 'F'
vpaddb ymm4{k1},ymm4,ymm5
vpaddb ymm4{k2},ymm4,ymm1
vpsubb ymm0,ymm0,ymm4 ;00001111 00002222 00003333 00004444 00007777 00008888 ..00009999 0000aaaa 0000bbbb
vpsrlq ymm1,ymm0,4 ;22220000 33330000 44440000 55550000 88880000 00000000 ..aaaa0000 xxxxxxxx cccc0000
vpor ymm0,ymm0,ymm1 ;22221111 xxxxxxxx 44443333 xxxxxxxx 88887777 xxxxxxxx
vmovdqu ymm3,ymmword ptr [PermB_Hex]
vpbroadcastq xmm4,qword ptr [G4_Reverse]
vpermb ymm0,ymm3,ymm0 ;22221111 44443333...
vgf2p8affineqb xmm0,xmm0,xmm4,0
vmovdqu xmmword ptr [rcx],xmm0
vzeroupper
ret
PermB_Hex BYTE 14,12,10,8,6,4,2,0,30,28,26,24,22,20,18,16,16 DUP (-1)
G4_Reverse QWORD 1020408001020408H
HexToInt endp

hyder

Quote from: sinsi on January 31, 2024, 11:46:42 AMIf you are so set on using such a huge table, why not like this:
...

I basically wrote a C program to generate the table, not something I did by hand :)