News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Fast hexstring to binary conversion

Started by jj2007, February 11, 2024, 06:33:52 AM

Previous topic - Next topic

jj2007

May I have some timings, please?

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

20314   cycles for 100 * MasmBasic Val
39045   cycles for 100 * CRT a2ud
2536    cycles for 100 * Masm32 SDK hex2bin
2686    cycles for 100 * HexStr2Bin
2092    cycles for 100 * HexStr2BinT (with table)
9181    cycles for 100 * HexStr2X (64-bit)

19732   cycles for 100 * MasmBasic Val
40132   cycles for 100 * CRT a2ud
3281    cycles for 100 * Masm32 SDK hex2bin
3952    cycles for 100 * HexStr2Bin
2231    cycles for 100 * HexStr2BinT (with table)
8839    cycles for 100 * HexStr2X (64-bit)

19825   cycles for 100 * MasmBasic Val
39673   cycles for 100 * CRT a2ud
3598    cycles for 100 * Masm32 SDK hex2bin
2747    cycles for 100 * HexStr2Bin
2054    cycles for 100 * HexStr2BinT (with table)
8678    cycles for 100 * HexStr2X (64-bit)

20010   cycles for 100 * MasmBasic Val
39973   cycles for 100 * CRT a2ud
3242    cycles for 100 * Masm32 SDK hex2bin
3565    cycles for 100 * HexStr2Bin
2150    cycles for 100 * HexStr2BinT (with table)
9123    cycles for 100 * HexStr2X (64-bit)

19712   cycles for 100 * MasmBasic Val
39055   cycles for 100 * CRT a2ud
2539    cycles for 100 * Masm32 SDK hex2bin
3479    cycles for 100 * HexStr2Bin
2385    cycles for 100 * HexStr2BinT (with table)
9158    cycles for 100 * HexStr2X (64-bit)

3       bytes for MasmBasic Val
19      bytes for CRT a2ud
12      bytes for Masm32 SDK hex2bin
48      bytes for HexStr2Bin
76      bytes for HexStr2BinT (with table)
104     bytes for HexStr2X (64-bit)

12ABCDEFh       eax MasmBasic Val
12ABCDEFh       eax CRT a2ud
12ABCDEFh       eax Masm32 SDK hex2bin
12ABCDEFh       eax HexStr2Bin
12ABCDEFh       eax HexStr2BinT (with table)
56789DEFh       eax HexStr2X (64-bit)

Remarks:
- The string used is 12AbCdEfh
- MasmBasic Val is slower because it's an allrounder; you can throw $123, 456h, 12345, 010010101y at it, and you'll always get the correct result. Therefore it's only twice as fast as crt_sscanf :biggrin:
- The last one, HexStr2X, uses the string 1234AbcD56789Defh; however, it returns the result in xmm0, therefore (for technical reasons) eax shows only the second half, 56789DEFh

NoCforMe

Two things:
  • Is my HexString2Bin() in there somewhere?
  • Again: who really cares how fast this is? It's hard to imagine a use for this where picayune differences in speed really matter to anyone.
Assembly language programming should be fun. That's why I do it.

fearless

AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

15921  cycles for 100 * MasmBasic Val
26961  cycles for 100 * CRT a2ud
2126    cycles for 100 * Masm32 SDK hex2bin
2209    cycles for 100 * HexStr2Bin
1845    cycles for 100 * HexStr2BinT (with table)
6857    cycles for 100 * HexStr2X (64-bit)

16110  cycles for 100 * MasmBasic Val
26237  cycles for 100 * CRT a2ud
2205    cycles for 100 * Masm32 SDK hex2bin
2314    cycles for 100 * HexStr2Bin
1763    cycles for 100 * HexStr2BinT (with table)
6459    cycles for 100 * HexStr2X (64-bit)

15842  cycles for 100 * MasmBasic Val
26733  cycles for 100 * CRT a2ud
2039    cycles for 100 * Masm32 SDK hex2bin
2167    cycles for 100 * HexStr2Bin
1740    cycles for 100 * HexStr2BinT (with table)
6541    cycles for 100 * HexStr2X (64-bit)

15816  cycles for 100 * MasmBasic Val
26510  cycles for 100 * CRT a2ud
2085    cycles for 100 * Masm32 SDK hex2bin
2287    cycles for 100 * HexStr2Bin
1792    cycles for 100 * HexStr2BinT (with table)
6441    cycles for 100 * HexStr2X (64-bit)

15748  cycles for 100 * MasmBasic Val
26419  cycles for 100 * CRT a2ud
2087    cycles for 100 * Masm32 SDK hex2bin
2131    cycles for 100 * HexStr2Bin
1665    cycles for 100 * HexStr2BinT (with table)
6532    cycles for 100 * HexStr2X (64-bit)

3      bytes for MasmBasic Val
19      bytes for CRT a2ud
12      bytes for Masm32 SDK hex2bin
48      bytes for HexStr2Bin
76      bytes for HexStr2BinT (with table)
104    bytes for HexStr2X (64-bit)

12ABCDEFh      eax MasmBasic Val
12ABCDEFh      eax CRT a2ud
12ABCDEFh      eax Masm32 SDK hex2bin
12ABCDEFh      eax HexStr2Bin
12ABCDEFh      eax HexStr2BinT (with table)
56789DEFh      eax HexStr2X (64-bit)

--- ok ---

HSE

Intel(R) Core(TM) i3-10100 CPU @ 3.60GHz (SSE4)

21786   cycles for 100 * MasmBasic Val
38609   cycles for 100 * CRT a2ud
2921    cycles for 100 * Masm32 SDK hex2bin
2688    cycles for 100 * HexStr2Bin
1686    cycles for 100 * HexStr2BinT (with table)
6975    cycles for 100 * HexStr2X (64-bit)

19965   cycles for 100 * MasmBasic Val
38482   cycles for 100 * CRT a2ud
2960    cycles for 100 * Masm32 SDK hex2bin
3349    cycles for 100 * HexStr2Bin
1829    cycles for 100 * HexStr2BinT (with table)
6983    cycles for 100 * HexStr2X (64-bit)

19990   cycles for 100 * MasmBasic Val
38434   cycles for 100 * CRT a2ud
2973    cycles for 100 * Masm32 SDK hex2bin
3279    cycles for 100 * HexStr2Bin
1725    cycles for 100 * HexStr2BinT (with table)
6875    cycles for 100 * HexStr2X (64-bit)

19922   cycles for 100 * MasmBasic Val
38457   cycles for 100 * CRT a2ud
3042    cycles for 100 * Masm32 SDK hex2bin
2749    cycles for 100 * HexStr2Bin
1725    cycles for 100 * HexStr2BinT (with table)
6979    cycles for 100 * HexStr2X (64-bit)

19995   cycles for 100 * MasmBasic Val
38463   cycles for 100 * CRT a2ud
7092    cycles for 100 * Masm32 SDK hex2bin
4304    cycles for 100 * HexStr2Bin
5392    cycles for 100 * HexStr2BinT (with table)
14712   cycles for 100 * HexStr2X (64-bit)

3       bytes for MasmBasic Val
19      bytes for CRT a2ud
12      bytes for Masm32 SDK hex2bin
48      bytes for HexStr2Bin
76      bytes for HexStr2BinT (with table)
104     bytes for HexStr2X (64-bit)

12ABCDEFh       eax MasmBasic Val
12ABCDEFh       eax CRT a2ud
12ABCDEFh       eax Masm32 SDK hex2bin
12ABCDEFh       eax HexStr2Bin
12ABCDEFh       eax HexStr2BinT (with table)
56789DEFh       eax HexStr2X (64-bit)

--- ok ---
Equations in Assembly: SmplMath

jj2007

Quote from: NoCforMe on February 11, 2024, 07:20:27 AM
  • Is my HexString2Bin() in there somewhere?

I don't think so, but check yourself. Search the source for endp.

Quote
  • Again: who really cares how fast this is?

The OP?

@fearless & Héctor: Thanks :thup:

NoCforMe

Quote from: jj2007 on February 11, 2024, 08:11:41 AM
Quote from: NoCforMe on February 11, 2024, 07:20:27 AM
  • Again: who really cares how fast this is?
The OP?
Speaking of which:
Quote from: hyder on January 30, 2024, 08:09:18 AMA while back I posted a function that converts 64-bit numeric values to a string of hexadecimal digits. After much work, I've come up with an algorithm for the reverse operation: converting hexadecimal strings to a 64-bit numeric value.
So Mr. Hyde, do you have anything to say about what's evolved here from your post?
Assembly language programming should be fun. That's why I do it.

jj2007

Version 2 beats the CRT by a factor 17:
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

19526   cycles for 100 * MasmBasic Val
39707   cycles for 100 * CRT a2ud
3668    cycles for 100 * Masm32 SDK hex2bin
3695    cycles for 100 * HexStr2Bin
2588    cycles for 100 * HexStr2BinT (with table)
2139    cycles for 100 * HexStr2X (64-bit)

20294   cycles for 100 * MasmBasic Val
39182   cycles for 100 * CRT a2ud
3594    cycles for 100 * Masm32 SDK hex2bin
3672    cycles for 100 * HexStr2Bin
2596    cycles for 100 * HexStr2BinT (with table)
2207    cycles for 100 * HexStr2X (64-bit)

19826   cycles for 100 * MasmBasic Val
39041   cycles for 100 * CRT a2ud
2552    cycles for 100 * Masm32 SDK hex2bin
3765    cycles for 100 * HexStr2Bin
2543    cycles for 100 * HexStr2BinT (with table)
2186    cycles for 100 * HexStr2X (64-bit)

20136   cycles for 100 * MasmBasic Val
39231   cycles for 100 * CRT a2ud
3732    cycles for 100 * Masm32 SDK hex2bin
3786    cycles for 100 * HexStr2Bin
2509    cycles for 100 * HexStr2BinT (with table)
2237    cycles for 100 * HexStr2X (64-bit)

20169   cycles for 100 * MasmBasic Val
39727   cycles for 100 * CRT a2ud
5039    cycles for 100 * Masm32 SDK hex2bin
2817    cycles for 100 * HexStr2Bin
2662    cycles for 100 * HexStr2BinT (with table)
2386    cycles for 100 * HexStr2X (64-bit)

3       bytes for MasmBasic Val
19      bytes for CRT a2ud
12      bytes for Masm32 SDK hex2bin
48      bytes for HexStr2Bin
76      bytes for HexStr2BinT (with table)
128     bytes for HexStr2X (64-bit)

12ABCDEFh       eax MasmBasic Val
12ABCDEFh       eax CRT a2ud
12ABCDEFh       eax Masm32 SDK hex2bin
12ABCDEFh       eax HexStr2Bin
12ABCDEFh       eax HexStr2BinT (with table)
12ABCDEFh       eax HexStr2X (64-bit)

For a better comparison, the same string is used for all algos. Note that the 64-bit HexStr2X is now the fastest, thanks to some SSE 4.1 acrobacy - sorry for our friends with legacy CPUs :cool:

Note also that the Masm32 SDK hex2bin has a strange little bug with odd-sized strings.

sinsi

Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz (SSE4)

15415   cycles for 100 * MasmBasic Val
27670   cycles for 100 * CRT a2ud
2401    cycles for 100 * Masm32 SDK hex2bin
2577    cycles for 100 * HexStr2Bin
1515    cycles for 100 * HexStr2BinT (with table)
1991    cycles for 100 * HexStr2X (64-bit)

15419   cycles for 100 * MasmBasic Val
27507   cycles for 100 * CRT a2ud
2387    cycles for 100 * Masm32 SDK hex2bin
2578    cycles for 100 * HexStr2Bin
1538    cycles for 100 * HexStr2BinT (with table)
1776    cycles for 100 * HexStr2X (64-bit)

15464   cycles for 100 * MasmBasic Val
27932   cycles for 100 * CRT a2ud
2384    cycles for 100 * Masm32 SDK hex2bin
2589    cycles for 100 * HexStr2Bin
1611    cycles for 100 * HexStr2BinT (with table)
1776    cycles for 100 * HexStr2X (64-bit)

15432   cycles for 100 * MasmBasic Val
27581   cycles for 100 * CRT a2ud
2393    cycles for 100 * Masm32 SDK hex2bin
2529    cycles for 100 * HexStr2Bin
1555    cycles for 100 * HexStr2BinT (with table)
1728    cycles for 100 * HexStr2X (64-bit)

15411   cycles for 100 * MasmBasic Val
27860   cycles for 100 * CRT a2ud
2441    cycles for 100 * Masm32 SDK hex2bin
2613    cycles for 100 * HexStr2Bin
1654    cycles for 100 * HexStr2BinT (with table)
1735    cycles for 100 * HexStr2X (64-bit)

3       bytes for MasmBasic Val
19      bytes for CRT a2ud
12      bytes for Masm32 SDK hex2bin
48      bytes for HexStr2Bin
76      bytes for HexStr2BinT (with table)
128     bytes for HexStr2X (64-bit)

12ABCDEFh       eax MasmBasic Val
12ABCDEFh       eax CRT a2ud
12ABCDEFh       eax Masm32 SDK hex2bin
12ABCDEFh       eax HexStr2Bin
12ABCDEFh       eax HexStr2BinT (with table)
12ABCDEFh       eax HexStr2X (64-bit)

--- ok ---
The first version ran OK, this second one tripped Windows Defender

daydreamer

Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (SSE4)

26332   cycles for 100 * MasmBasic Val
49023   cycles for 100 * CRT a2ud
3193    cycles for 100 * Masm32 SDK hex2bin
6746    cycles for 100 * HexStr2Bin
3455    cycles for 100 * HexStr2BinT (with table)
10090   cycles for 100 * HexStr2X (64-bit)

25977   cycles for 100 * MasmBasic Val
42327   cycles for 100 * CRT a2ud
3540    cycles for 100 * Masm32 SDK hex2bin
3117    cycles for 100 * HexStr2Bin
1996    cycles for 100 * HexStr2BinT (with table)
7329    cycles for 100 * HexStr2X (64-bit)

29533   cycles for 100 * MasmBasic Val
43779   cycles for 100 * CRT a2ud
4857    cycles for 100 * Masm32 SDK hex2bin
5937    cycles for 100 * HexStr2Bin
3429    cycles for 100 * HexStr2BinT (with table)
7939    cycles for 100 * HexStr2X (64-bit)

41624   cycles for 100 * MasmBasic Val
50731   cycles for 100 * CRT a2ud
3926    cycles for 100 * Masm32 SDK hex2bin
6351    cycles for 100 * HexStr2Bin
2003    cycles for 100 * HexStr2BinT (with table)
7744    cycles for 100 * HexStr2X (64-bit)

36712   cycles for 100 * MasmBasic Val
42581   cycles for 100 * CRT a2ud
3201    cycles for 100 * Masm32 SDK hex2bin
6077    cycles for 100 * HexStr2Bin
2002    cycles for 100 * HexStr2BinT (with table)
7664    cycles for 100 * HexStr2X (64-bit)

3       bytes for MasmBasic Val
19      bytes for CRT a2ud
12      bytes for Masm32 SDK hex2bin
48      bytes for HexStr2Bin
76      bytes for HexStr2BinT (with table)
104     bytes for HexStr2X (64-bit)

12ABCDEFh       eax MasmBasic Val
12ABCDEFh       eax CRT a2ud
12ABCDEFh       eax Masm32 SDK hex2bin
12ABCDEFh       eax HexStr2Bin
12ABCDEFh       eax HexStr2BinT (with table)
56789DEFh       eax HexStr2X (64-bit)

-
also working on a SSE2 packed conversion while commuting on train,now at home need to run through debugger fix it

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Quote from: sinsi on February 11, 2024, 12:50:53 PMthis second one tripped Windows Defender

The OS must be defended against exotic modern stuff like psllq, pextrb and pinsrb :thumbsup: 

Quote from: daydreamer on February 11, 2024, 06:20:52 PMworking on a SSE2 packed conversion

What kind of conversion?

hyder

Quote from: NoCforMe on February 11, 2024, 08:52:15 AMSo Mr. Hyde, do you have anything to say about what's evolved here from your post?
FWIW, I'm currently working on 32-bit ARM code for Volume 2 of "The Art of ARM Assembly" and I will occasionally post an x86 conversion here to see if I can get some ideas for improving the ARM code. Most of the crazy SSE/AVX stuff won't translate well, but the generic x86 code is useful to look at. I would have loved to find an SSE/AVX algorithm that processes multiple characters at a time, but nothing like that appears here (that I could see, anyway).

There is considerable Apples to Oranges comparisons going on here (for example, none of the routines I've seen handle underscores in the input, so comparing my function against those is not a good comparison; likewise, MASMBasic Val does so much more, it is also an unfair comparison). Running the tests on a single input string is dangerous, to say the least. That's why I used a large number of strings as inputs to my function, that tended to hit some boundary conditions). Of course, in the real world, most input strings are going to be relatively short (probably four digits or less), so choosing a large number of longer strings (or a long string as your only input) can be misleading for certain algorithms.

Also, I rarely drop down into optimizations involving instruction scheduling or code alignment. Such code executes well on *one* CPU, not as well on other CPUs (in the same CPU family). Modern compilers (with command-line switches) do a much better job of this kind of optimization these days. I'm not say that a human couldn't beat a compiler if they really tried, I'm just saying that human probably wouldn't redo the code for every CPU possibility whereas the C programmer can just change a command-line option and get better code for a different CPU. FWIW, I back ported my ARM code to (very unstructured) C code and the compiler generated code almost identical to my hand-written code. I was then able to generate Cortex-A72 (Pi 3), -A74 (Pi 4), -A76 (Pi 5), Cortex-M7F (Teensy 4.1), Cortex-M4 (Teensy 3.2), and Cortex-M0+ (Pico) code just by changing a command-line option. Except for the Cortex-M0+ (a brain-dead instruction set), the resulting code was quite good (this was all 32-bit code, btw).


And to answer the question about "why would anyone care about the speed?"
If you're writing library code for others to call, it should be optimized for space or speed (depending on the user's requirements). I generally choose speed. Of course, for generic library code, you cannot get away with some of the algorithms posted here as they wouldn't mesh well with calling code. As I am writing my code for use in "The Art of ARM Assembly Volume 2" (32-bit code), I like to preserve all the registers, which makes the code much easier to use by assembly language programmers (especially beginners, who tend to be the ones reading my books) even if it costs a little performance. For example, I have a "print" function I call, which is a front end for the C printf function, that preserves all the registers that printf() might wipe out (a large number, considering the SSE/AVX set). Not that making printf() any faster would be noticeable (as it is *really* slow to begin with), but not having to preserve any registers around the call to print (other than possible parameters you are passing in registers) is a big win, even with the performance loss. 


Cheers,
Randy Hyde

jj2007

Quote from: hyder on February 14, 2024, 08:19:47 AMfor generic library code, you cannot get away with some of the algorithms posted here as they wouldn't mesh well with calling code

Hi Randy,

Can you elaborate a bit on that one? The algos posted here are normally compatible with the Windows ABI. Some of mine may not run on very old CPUs, but they run perfectly on 99% of all machines... so I don't quite understand what you mean :cool:

jj2007

Version 3: CRT sscanf is out, slightly faster strtoull is in:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

19228   cycles for 100 * MasmBasic Val
2638    cycles for 100 * Masm32 SDK hex2bin
25866   cycles for 100 * strtoull
1647    cycles for 100 * HexVal

19188   cycles for 100 * MasmBasic Val
2627    cycles for 100 * Masm32 SDK hex2bin
25486   cycles for 100 * strtoull
1713    cycles for 100 * HexVal

19315   cycles for 100 * MasmBasic Val
2649    cycles for 100 * Masm32 SDK hex2bin
25776   cycles for 100 * strtoull
1651    cycles for 100 * HexVal

19143   cycles for 100 * MasmBasic Val
2678    cycles for 100 * Masm32 SDK hex2bin
25735   cycles for 100 * strtoull
1665    cycles for 100 * HexVal

19124   cycles for 100 * MasmBasic Val
2641    cycles for 100 * Masm32 SDK hex2bin
25542   cycles for 100 * strtoull
1840    cycles for 100 * HexVal

strtoull sits in ucrtbase.dll, which might not be available on older Windows versions. The program checks for its presence, though.

TimoVJL

OS msvcrt.dll
_strtoui64

AMD Athlon(tm) II X2 220 Processor (SSE3)

42190   cycles for 100 * MasmBasic Val
4935    cycles for 100 * Masm32 SDK hex2bin
41442   cycles for 100 * strtoull
May the source be with you

jj2007

Quote from: TimoVJL on February 14, 2024, 02:31:59 PMAMD Athlon(tm) II X2 220 Processor (SSE3)

42190   cycles for 100 * MasmBasic Val
41442   cycles for 100 * strtoull

Congrats, you beat MasmBasic :thup: