Hi

**qWord** :t

Yeah, unrolling it is the way to make it faster, but the tested algo is not unrolled - that's the point. It's "classic" more or less small, "looped" code, these characteristics are intentional - that was not a contest but rather a test

`Intel(R) Celeron(R) CPU 2.13GHz (SSE3)`

43 cycles for Small 1

45 cycles for Small 2

43 cycles for Small 3

47 cycles for Small 3.1

43 cycles for Small 4

79 cycles for C version

79 cycles for C mod JJ

40 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)

31 hsz2dw2 (unrolled 4 times)

43 cycles for Small 1

45 cycles for Small 2

66 cycles for Small 3

47 cycles for Small 3.1

43 cycles for Small 4

79 cycles for C version

79 cycles for C mod JJ

37 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)

31 hsz2dw2 (unrolled 4 times)

43 cycles for Small 1

45 cycles for Small 2

43 cycles for Small 3

47 cycles for Small 3.1

43 cycles for Small 4

79 cycles for C version

79 cycles for C mod JJ

37 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)

31 hsz2dw2 (unrolled 4 times)

48 bytes for Axhex2dw_C2

43 bytes for Axhex2dw_CJ

ABCDEF01 returned

--- ok ---

Actually, the testbed I posted was a trimmed version I've made some time ago... will post it now - it contains the procs which were in the contest here earlier (I used Axhex2dw just because is the fastest (at least till now) from tests hex2dw procs with the characteristics: case insensitive, does not check input, is looped (i.e. for every digit there is one loop iteration - not unrolled at all), and it looks like it is copyrighted by me, at least no one dispute the rights for ~3 years :lol:)

OK, here is the timings for the archive attached (it is old testbed):

`Intel(R) Celeron(R) CPU 2.13GHz (SSE3)`

25 cycles for Fast version

27 cycles for Fast version under AMD

43 cycles for Small 1

45 cycles for Small 2

43 cycles for Small 3

47 cycles for Small 3.1

43 cycles for Small 4

28 cycles for MMX 1

28 cycles for MMX 2

32 cycles for SSE1

Other's Versions:

48 cycles for Axhex2dw improved by Hutch (1)

83 cycles for Axhex2dw improved by Hutch (2)

28 cycles for Lingo's SSE version

24 cycles for Lingo's BIG integer version

23 cycles for Jochen's WORD-Indexed version

27 cycles for Dave's version (with minor changes)

25 cycles for Fast version

27 cycles for Fast version under AMD

43 cycles for Small 1

45 cycles for Small 2

43 cycles for Small 3

59 cycles for Small 3.1

43 cycles for Small 4

28 cycles for MMX 1

28 cycles for MMX 2

30 cycles for SSE1

Other's Versions:

48 cycles for Axhex2dw improved by Hutch (1)

83 cycles for Axhex2dw improved by Hutch (2)

28 cycles for Lingo's SSE version

24 cycles for Lingo's BIG integer version

23 cycles for Jochen's WORD-Indexed version

27 cycles for Dave's version (with minor changes)

25 cycles for Fast version

30 cycles for Fast version under AMD

43 cycles for Small 1

116 cycles for Small 2

43 cycles for Small 3

47 cycles for Small 3.1

43 cycles for Small 4

28 cycles for MMX 1

28 cycles for MMX 2

46 cycles for SSE1

Other's Versions:

48 cycles for Axhex2dw improved by Hutch (1)

83 cycles for Axhex2dw improved by Hutch (2)

28 cycles for Lingo's SSE version

24 cycles for Lingo's BIG integer version

23 cycles for Jochen's WORD-Indexed version

27 cycles for Dave's version (with minor changes)

==========

Codesizes:

Axhex2dw_Unrolled: 396

Axhex2dw_Unrolled_AMD: 396

Axhex2dw1 - 1: 69

Axhex2dw2 - 2: 48

Axhex2dw3 - 3: 57

Axhex2dw3_1 - 3.1: 56

Axhex2dw3 - 4: 61

Axhex2dw_MMX: 128

Axhex2dw_MMX2: 160

Axhex2dw_SSE: 160

Alex_Short_Hutch: 59

Axhex2dw_Hutch2: 54

Hex2dwLingoSSE: 160

lingo_htodw: 1950

ax_jj_htodw: 174

krbhtodw: 547

--- ok ---

krbhtodw - the author is Dave (KeepingRealBusy) with minor changes made with his permission - it's the most universal proc - it check the input, it has possibility to process "ignorant chars". It's lookup table.

The fastest GPR code by Jochen (jj2007) - ax_jj_htodw - it's word-indexed lookuptable.

All not "Other's versions" are mine, but when posted in this thread I excluded every not GPR, every unrolled and/or every lookup table based versions. Well, there are new CPUs were released since then, and maybe it's interesting to test all these procs again