Hi
qWord :t
Yeah, unrolling it is the way to make it faster, but the tested algo is not unrolled - that's the point. It's "classic" more or less small, "looped" code, these characteristics are intentional - that was not a contest but rather a test

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
79 cycles for C version
79 cycles for C mod JJ
40 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
31 hsz2dw2 (unrolled 4 times)
43 cycles for Small 1
45 cycles for Small 2
66 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
79 cycles for C version
79 cycles for C mod JJ
37 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
31 hsz2dw2 (unrolled 4 times)
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
79 cycles for C version
79 cycles for C mod JJ
37 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
31 hsz2dw2 (unrolled 4 times)
48 bytes for Axhex2dw_C2
43 bytes for Axhex2dw_CJ
ABCDEF01 returned
--- ok ---
Actually, the testbed I posted was a trimmed version I've made some time ago... will post it now - it contains the procs which were in the contest here earlier (I used Axhex2dw just because is the fastest (at least till now) from tests hex2dw procs with the characteristics: case insensitive, does not check input, is looped (i.e. for every digit there is one loop iteration - not unrolled at all), and it looks like it is copyrighted by me, at least no one dispute the rights for ~3 years :lol:)
OK, here is the timings for the archive attached (it is old testbed):
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
25 cycles for Fast version
27 cycles for Fast version under AMD
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
28 cycles for MMX 1
28 cycles for MMX 2
32 cycles for SSE1
Other's Versions:
48 cycles for Axhex2dw improved by Hutch (1)
83 cycles for Axhex2dw improved by Hutch (2)
28 cycles for Lingo's SSE version
24 cycles for Lingo's BIG integer version
23 cycles for Jochen's WORD-Indexed version
27 cycles for Dave's version (with minor changes)
25 cycles for Fast version
27 cycles for Fast version under AMD
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
59 cycles for Small 3.1
43 cycles for Small 4
28 cycles for MMX 1
28 cycles for MMX 2
30 cycles for SSE1
Other's Versions:
48 cycles for Axhex2dw improved by Hutch (1)
83 cycles for Axhex2dw improved by Hutch (2)
28 cycles for Lingo's SSE version
24 cycles for Lingo's BIG integer version
23 cycles for Jochen's WORD-Indexed version
27 cycles for Dave's version (with minor changes)
25 cycles for Fast version
30 cycles for Fast version under AMD
43 cycles for Small 1
116 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
28 cycles for MMX 1
28 cycles for MMX 2
46 cycles for SSE1
Other's Versions:
48 cycles for Axhex2dw improved by Hutch (1)
83 cycles for Axhex2dw improved by Hutch (2)
28 cycles for Lingo's SSE version
24 cycles for Lingo's BIG integer version
23 cycles for Jochen's WORD-Indexed version
27 cycles for Dave's version (with minor changes)
==========
Codesizes:
Axhex2dw_Unrolled: 396
Axhex2dw_Unrolled_AMD: 396
Axhex2dw1 - 1: 69
Axhex2dw2 - 2: 48
Axhex2dw3 - 3: 57
Axhex2dw3_1 - 3.1: 56
Axhex2dw3 - 4: 61
Axhex2dw_MMX: 128
Axhex2dw_MMX2: 160
Axhex2dw_SSE: 160
Alex_Short_Hutch: 59
Axhex2dw_Hutch2: 54
Hex2dwLingoSSE: 160
lingo_htodw: 1950
ax_jj_htodw: 174
krbhtodw: 547
--- ok ---
krbhtodw - the author is Dave (KeepingRealBusy) with minor changes made with his permission - it's the most universal proc - it check the input, it has possibility to process "ignorant chars". It's lookup table.
The fastest GPR code by Jochen (jj2007) - ax_jj_htodw - it's word-indexed lookuptable.
All not "Other's versions" are mine, but when posted in this thread I excluded every not GPR, every unrolled and/or every lookup table based versions. Well, there are new CPUs were released since then, and maybe it's interesting to test all these procs again
