Hi guys, i´m facing a small problem
I need a fast dword to hex algorithm that can produces short strings with only one zero at the start.
Example:
Input AAB12F
Output String; 0AAB12F
Input AA
Output String; 0AA
Input FFFFFFFF
Output String; 0FFFFFFFF
etc.
I´m usng Biterider (http://www.asmcommunity.net/forums/topic/?id=17763&page=2) optimized algo (no loops)
Proc dwtohexEx:
Arguments @dNumber, @pBuffer
Uses edx, ecx, eax, edi
mov edx D@dNumber
mov ecx edx
shr edx 4
and edx 0F0F0F0F
and ecx 0F0F0F0F
mov eax edx
mov edi ecx
add edx (080808080 - 0A0A0A0A) ; Build mask to discern digit > 9
add ecx (080808080 - 0A0A0A0A)
shr edx 4
shr ecx 4
not edx
not ecx
and edx 07070707 ; Mask digit > 9 ... mask = 0111
and ecx 07070707
add edx eax ; Add 'A' - '9' if digit > 9
add ecx edi
add edx 030303030 ; Add ascii '0'
add ecx 030303030
mov edi D@pBuffer ; Using edi is faster
mov B$edi+7 cl
mov B$edi+6 dl
mov B$edi+5 ch
mov B$edi+4 dh
shr ecx 16
shr edx 16
mov B$edi+3 cl
mov B$edi+2 dl
mov B$edi+1 ch
mov B$edi+0 dh
mov B$edi+8 0
EndP
But, the result are fixed on a 8 byte lentgh string (Ex: Input 012A, output: 0000012A), instead the short variation as needed. does anyone knows a fast algo that can do this convertion ? (as in the example i posted. So, input 12A, output 012A)
Just use a buffer for the biggest possible string length plus 1.
- call the algo
- check the bytes at the start until you find a non-"0"
- put "0" before that byte, and use this as as the start address.
Havn't tested it, and prob could be optimized but based on JJs suggestion this is something id write to format the hex string, hopefully that helps.
FormatHexString PROTO :DWORD, :DWORD
;-------------------------------------------------------------------------------
; Formats hex string so that it start with one zero
; Input: lpszHexString
; Output: lpszFormattedString
;
; make sure buffer pointed to at lpszFormattedString is large enough
;
;
; Exmaple: Invoke FormatHexString, Addr szHEX, Addr szMyNewHexString
;
;-------------------------------------------------------------------------------
FormatHexString PROC USES EDI ESI lpszHexString:DWORD, lpszFormattedString:DWORD
LOCAL Position:DWORD
LOCAL FlagFoundHexChars:DWORD
Invoke szLen, lpszHexString
mov nMaxLen, eax
mov edi, lpszFormattedString
mov esi, lpszHexString
mov byte ptr [edi], '0' ; start our formatted string with a ascii zero character
inc edi
mov FlagFoundHexChars, FALSE ; set flag to false initially
mov Position, 0
mov eax, 0
.WHILE eax < nMaxLen
.IF FlagFoundHexChars == FALSE
movzx eax, byte ptr [esi]
.IF al != '0' ; ascii zero
mov FlagFoundHexChars, TRUE ; looks like we found some ascii chars
mov byte ptr [edi], al ; so start storing the first of them into our formatted string (edi)
inc edi ; position for next char when we loop next and we branch to next bit below till end of string
.ENDIF
.ELSE ; we have a flag set, so we fetch rest of characters in string till we hit end of string or a null char
movzx eax, byte ptr [esi]
.IF al != 0 ; null
mov byte ptr [edi], al ; start storing next byte in formatted string (edi)
inc edi ; position for next char when we loop next
.ELSE
.BREAK ; break if null found
.ENDIF
.ENDIF
inc esi
inc Position
mov eax, Position
.ENDW
mov byte ptr [edi], 0 ; final null of formatted string
ret
FormatHexString ENDP
adding the 0 is simple enough
for converting to hex, a look-up table is likely the fastest
i would think a 512-byte table would work well
if speed isn't that important, but you want UNICODE aware....
awDw2Hex PROC USES EDI dwVal:DWORD,lpBuf:LPSTR
;UNICODE aware Dword to Hex - DednDave, 3-2013
; Returns: EAX = pointer to string buffer
; ECX = length in characters (8)
; EDX = original binary value
;the buffer must be large enough for at least 9 TCHAR's (includes null terminator)
;--------------------------------------------
mov ecx,8
mov edx,dwVal
mov edi,lpBuf
push ecx
push edx
push edi
IFDEF __UNICODE__
add edi,16
mov word ptr [edi],0
ELSE
add edi,ecx
mov byte ptr [edi],0
ENDIF
.repeat
mov eax,edx
IFDEF __UNICODE__
sub edi,2
ELSE
dec edi
ENDIF
and eax,0Fh
shr edx,4
cmp al,0Ah
sbb al,69h
das
dec ecx
IFDEF __UNICODE__
mov [edi],ax
ELSE
mov [edi],al
ENDIF
.until ZERO?
pop eax
pop edx
pop ecx
ret
awDw2Hex ENDP
just modify that to add a 0, as required
Thanks, guys...
I´ll give a try. The most important for me is speed.(That´s why i used bitraider´s algo), but i needed to create a "short" output, and not the whole 8 bytes long string.
Perhaps using a bswap at the beginning and doing what JJ suggested ?
I´ll give a try and test all of the algos to check for speed.
Check
\Masm32\m32lib\dw2hex.asm
\Masm32\m32lib\dw2h_ex.asm
P.S.: I have hacked together an algo using a table, here are some results.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4144 cycles for 100 * dw2hex
7838 cycles for 100 * MB Hex$
51698 cycles for 100 * CRT sprintf
1066 cycles for 100 * Bin2Hex
765 cycles for 100 * Bin2Hex2
4143 cycles for 100 * dw2hex
7854 cycles for 100 * MB Hex$
51713 cycles for 100 * CRT sprintf
1066 cycles for 100 * Bin2Hex
765 cycles for 100 * Bin2Hex2
4141 cycles for 100 * dw2hex
7848 cycles for 100 * MB Hex$
51750 cycles for 100 * CRT sprintf
1066 cycles for 100 * Bin2Hex
765 cycles for 100 * Bin2Hex2
4145 cycles for 100 * dw2hex
7884 cycles for 100 * MB Hex$
51708 cycles for 100 * CRT sprintf
1065 cycles for 100 * Bin2Hex
764 cycles for 100 * Bin2Hex2
20 bytes for dw2hex
17 bytes for MB Hex$
29 bytes for CRT sprintf
225 bytes for Bin2Hex
150 bytes for Bin2Hex2
00345678 = eax dw2hex
00345678 = eax MB Hex$
345678 = eax CRT sprintf
345678 = eax Bin2Hex
00345678 = eax Bin2Hex2
As you can see, both CRT sprintf and the first variant of my algo can handle the short form.
Hi JJ
I analysed it and i´m trying to gain a bit more speed on dw2hex.asm and dw2hex_ex.asm
On my tests i made a faster variation of the dw2hex using a fixed table (instead of computing it as in your example)
Can you please tests it to check if it is really that fast ? On mine tests it s half of teh speed of dw2hex and 10-18% faster then Bin2Hex2 of yours.
The variation i made was:
[hex_table: B$ "000102030405060708090A0B0C0D0E0F"
B$ "101112131415161718191A1B1C1D1E1F"
B$ "202122232425262728292A2B2C2D2E2F"
B$ "303132333435363738393A3B3C3D3E3F"
B$ "404142434445464748494A4B4C4D4E4F"
B$ "505152535455565758595A5B5C5D5E5F"
B$ "606162636465666768696A6B6C6D6E6F"
B$ "707172737475767778797A7B7C7D7E7F"
B$ "808182838485868788898A8B8C8D8E8F"
B$ "909192939495969798999A9B9C9D9E9F"
B$ "A0A1A2A3A4A5A6A7A8A9AAABACADAEAF"
B$ "B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF"
B$ "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
B$ "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
B$ "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
B$ "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF", 0]
Proc Bin2Hex6:
Arguments @Input, @Output
Local @DwordStorage
Uses eax, edi ; preserves eax and edi on output
mov eax D@Input
mov edi D@Output
mov D@DwordStorage eax
movzx eax B@DwordStorage+3 | mov ax W$hex_table+eax*2 | stosw ; | mov W$edi ax . Using stosw is faster then mov W$ on a I7
movzx eax B@DwordStorage+2 | mov ax W$hex_table+eax*2 | stosw ; | mov W$edi+2 ax. Using stosw is faster then mov W$ on a I7
movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw ; | mov W$edi+4 ax. Using stosw is faster then mov W$ on a I7
movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw ; | mov W$edi+6 ax. Using stosw is faster then mov W$ on a I7
mov B$edi 0
EndP
The above version does not produce the shorter string. I´m trying to speed it up 1st, because if i use "repeat+until" macros the code will slow down due to the loop.
I´ll try replacing it with a test opcode to see if it can speed it up a bit
Hi Guga,
Of course, my algo uses the same identical table. If you have an algo that uses this table, please post it (in Masm syntax), and I will add it to the testbed.
Btw if Repeat ... Until loops slow down your code, check what RosAsm generates. In Masm, the macro produces the fastest possible version.
i haven't looked at the current algorithms
but - if you can write the routine so the address is returned in EAX,
rather than left-justifying the string in a fixed buffer, it should help speed it up
do you want a leading 0 when the first character is less than A ?
Hi dave, yes. I need a leading 0 on all values starting from 0 to F. Ex: 00 , 01234A, 023, 0FFFF etc
JJ. i`ll try porting it to masm syntax for you test it
Hi JJ, here is the masm syntax
hex_table db '000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F2'
db '02122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F40'
db '4142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606'
db '162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F8081'
db '82838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A'
db '2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2'
db 'C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E'
db '3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF',0
; =============== S U B R O U T I N E =======================================
; Attributes: bp-based frame
Bin2Hex6 proc near
DwordStorage = dword ptr -4
Input = dword ptr 8
Output = dword ptr 0Ch
push ebp
mov ebp, esp
sub esp, 4
push eax
push edi
mov eax, [ebp+Input]
mov edi, [ebp+Output]
mov [ebp+DwordStorage], eax
movzx eax, byte ptr [ebp+DwordStorage+3]
mov ax, word ptr hex_table[eax*2]
stosw
movzx eax, byte ptr [ebp+DwordStorage+2]
mov ax, word ptr hex_table[eax*2]
stosw
movzx eax, byte ptr [ebp+DwordStorage+1]
mov ax, word ptr hex_table[eax*2]
stosw
movzx eax, byte ptr [ebp+DwordStorage]
mov ax, word ptr hex_table[eax*2]
stosw
mov byte ptr [edi], 0
pop edi
pop eax
mov esp, ebp
pop ebp
retn 8
Bin2Hex6 endp
Congrats, Guga, it's much faster than the standard Masm32 algo:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4120 cycles for 100 * dw2hex
7851 cycles for 100 * MB Hex$
52652 cycles for 100 * CRT sprintf
1071 cycles for 100 * Bin2Hex
772 cycles for 100 * Bin2Hex2 cx
1656 cycles for 100 * Bin2Hex6
4115 cycles for 100 * dw2hex
7832 cycles for 100 * MB Hex$
52043 cycles for 100 * CRT sprintf
1072 cycles for 100 * Bin2Hex
771 cycles for 100 * Bin2Hex2 cx
1660 cycles for 100 * Bin2Hex6
4115 cycles for 100 * dw2hex
7820 cycles for 100 * MB Hex$
52089 cycles for 100 * CRT sprintf
1069 cycles for 100 * Bin2Hex
771 cycles for 100 * Bin2Hex2 cx
1658 cycles for 100 * Bin2Hex6
4115 cycles for 100 * dw2hex
7819 cycles for 100 * MB Hex$
52067 cycles for 100 * CRT sprintf
1070 cycles for 100 * Bin2Hex
773 cycles for 100 * Bin2Hex2 cx
1662 cycles for 100 * Bin2Hex6
20 bytes for dw2hex
17 bytes for MB Hex$
29 bytes for CRT sprintf
225 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
616 bytes for Bin2Hex6
00345678 = eax dw2hex
00345678 = eax MB Hex$
345678 = eax CRT sprintf
345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
12345678 = eax Bin2Hex6
Thanks...can you tests it preserving the registers of the other algos ? I would like to compare the true speed.
For example, my version saves the used registers (eax and edx), so to make it While trhe orther versiosn does not saves anything. I would like to test the functions as their same functionality to compare the speeds.
For example, when i use your version of Bin2Hex3 and mine is still fast. I don´t understand the differents speeds.
For the benachmark tests i´m uysing teh gui version that Steve made. The one that uses GetTickCount and SleepEx apis as part of the calibration algo.
Btw, dave and guys, i suceed to make the shorter version on output. And the speed was kept intact on my tests. It seems fast.
[hex_table: B$ "000102030405060708090A0B0C0D0E0F"
B$ "101112131415161718191A1B1C1D1E1F"
B$ "202122232425262728292A2B2C2D2E2F"
B$ "303132333435363738393A3B3C3D3E3F"
B$ "404142434445464748494A4B4C4D4E4F"
B$ "505152535455565758595A5B5C5D5E5F"
B$ "606162636465666768696A6B6C6D6E6F"
B$ "707172737475767778797A7B7C7D7E7F"
B$ "808182838485868788898A8B8C8D8E8F"
B$ "909192939495969798999A9B9C9D9E9F"
B$ "A0A1A2A3A4A5A6A7A8A9AAABACADAEAF"
B$ "B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF"
B$ "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
B$ "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
B$ "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
B$ "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF", 0]
Proc Bin2Hex7:
Arguments @Input, @Output
Local @DwordStorage
Uses eax, edi, ecx
mov eax D@Input
mov edi D@Output
mov D@DwordStorage eax
mov B$edi '0' | inc edi
movzx eax B@DwordStorage+3
Test_If eax eax
On ax <= 0F, dec edi
mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+2 | mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
mov B$edi 0
ExitP
Test_End
movzx eax B@DwordStorage+2
Test_If eax eax
On ax <= 0F, dec edi
mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
mov B$edi 0
ExitP
Test_End
movzx eax B@DwordStorage+1
On ax <= 0F, dec edi
Test_If eax eax
mov ax W$hex_table+eax*2 | stosw
Test_End
movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
mov B$edi 0
EndP
add this one to your tests :P
0
01
012
0123
01234
012345
0123456
01234567
012345678
Press any key to continue ...
Thanks dave
It is quite close. The problem is that on eax it returns the original output variable is being forwarded, which results on zero bytes on the beginning if the input is short. For example:
[OutputBuff: B$ 0 #12] ; 12 bytes long
call FastHex 01000, OutputBuff
eax = 0 0 0 0 0 + "01000" decimal strings. 5 leading zeros ate the start folloed by the converted data
Although the result is correct this may cause problems if the output is part of a string chain. For example
[OutputBuff: B$ "Test" 0#256]
mov edi OutputBuff
add edi 4 ;
call FastHex 01000, edi
The result will be:
[OutputBuff: B$ "Test" 0 0 0 0 0
B$ "01000"
B$ 0....]
instead of
[OutputBuff: B$ "Test01000"
B$ 0....]
Concerning the speed i made a couple of tests, here is the result:
(http://i67.tinypic.com/novhty.jpg)
Your code ported to RosAsm to it behave the same as the one i´m testing is:
Both preserves the registers they use internally (with the macro "uses" . Which is a simple push/pop operation). I´m testing this to be sure about the speed of all functions working on the same conditions. The main difference that i´ll try is make yours output on eax the lenght of the converted data, to make sure both functions behaves and works exactly the same, so i can have a better idea in terms of speed.
Proc FastHex:
Arguments @Input, @Output
Uses ecx, edx
mov eax D@Output | add eax 8
mov ecx D@Input
test ecx ecx
mov D$eax 03030 | je P2>
FHex00: M6:
movzx edx cl
mov dx W$edx*2+hex_table
mov W$eax dx
sub eax 02
shr ecx 08 | jne M6<
inc eax
mov B$eax 030
FHex01: P2:
cmp B$eax 030
lea eax D$eax+01 | je P2<
sub eax 02
EndP
Btw, i updated mine version
Proc dwtoHex_Ex2:
Arguments @Input, @Output
Local @DwordStorage
Uses edi
mov eax D@Input
mov edi D@Output
mov D@DwordStorage eax
mov B$edi '0' | inc edi
movzx eax B@DwordStorage+3
Test_If eax eax
On ax <= 0F, dec edi
mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+2 | mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
mov B$edi 0
sub edi D@Output | mov eax edi
ExitP
Test_End
movzx eax B@DwordStorage+2
Test_If eax eax
On ax <= 0F, dec edi
mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
mov B$edi 0
sub edi D@Output | mov eax edi
ExitP
Test_End
movzx eax B@DwordStorage+1
Test_If eax eax
On ax <= 0F, dec edi
mov ax W$hex_table+eax*2 | stosw
movzx eax B@DwordStorage+0
Test_Else
movzx eax B@DwordStorage+0
On ax <= 0F, dec edi
Test_End
mov ax W$hex_table+eax*2 | stosw
mov B$edi 0
sub edi D@Output | mov eax edi
EndP
Quote from: dedndave on November 29, 2015, 01:11:41 PM
add this one to your tests :P
It gets a bit crowded now :biggrin:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4166 cycles for 100 * dw2hex
7871 cycles for 100 * MB Hex$
52117 cycles for 100 * CRT sprintf
660 cycles for 100 * Bin2Hex
760 cycles for 100 * Bin2Hex2 cx
1635 cycles for 100 * Bin2Hex6
1981 cycles for 100 * FastHex
4217 cycles for 100 * dw2hex
7799 cycles for 100 * MB Hex$
52408 cycles for 100 * CRT sprintf
659 cycles for 100 * Bin2Hex
759 cycles for 100 * Bin2Hex2 cx
1645 cycles for 100 * Bin2Hex6
1763 cycles for 100 * FastHex
4180 cycles for 100 * dw2hex
7841 cycles for 100 * MB Hex$
52083 cycles for 100 * CRT sprintf
658 cycles for 100 * Bin2Hex
778 cycles for 100 * Bin2Hex2 cx
1656 cycles for 100 * Bin2Hex6
1904 cycles for 100 * FastHex
4214 cycles for 100 * dw2hex
7866 cycles for 100 * MB Hex$
52062 cycles for 100 * CRT sprintf
660 cycles for 100 * Bin2Hex
757 cycles for 100 * Bin2Hex2 cx
1647 cycles for 100 * Bin2Hex6
1995 cycles for 100 * FastHex
20 bytes for dw2hex
17 bytes for MB Hex$
29 bytes for CRT sprintf
138 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
616 bytes for Bin2Hex6
66 bytes for FastHex
00345678 = eax dw2hex
00345678 = eax MB Hex$
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
12345678 = eax Bin2Hex6
012345678 = eax FastHex
AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G (SSE4)
11919 cycles for 100 * dw2hex
10902 cycles for 100 * MB Hex$
54302 cycles for 100 * CRT sprintf
742 cycles for 100 * Bin2Hex
932 cycles for 100 * Bin2Hex2 cx
2277 cycles for 100 * Bin2Hex6
2171 cycles for 100 * FastHex
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
3566 cycles for 100 * dw2hex
7054 cycles for 100 * MB Hex$
53879 cycles for 100 * CRT sprintf
589 cycles for 100 * Bin2Hex
722 cycles for 100 * Bin2Hex2 cx
1430 cycles for 100 * Bin2Hex6
1497 cycles for 100 * FastHex
Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz (SSE4)
3077 cycles for 100 * dw2hex
5258 cycles for 100 * MB Hex$
44338 cycles for 100 * CRT sprintf
526 cycles for 100 * Bin2Hex
521 cycles for 100 * Bin2Hex2 cx
1157 cycles for 100 * Bin2Hex6
1261 cycles for 100 * FastHex
AMD Athlon(tm) II X2 220 Processor (SSE3)
8622 cycles for 100 * dw2hex
8122 cycles for 100 * MB Hex$
78598 cycles for 100 * CRT sprintf
902 cycles for 100 * Bin2Hex
901 cycles for 100 * Bin2Hex2 cx
4208 cycles for 100 * Bin2Hex6
2019 cycles for 100 * FastHex
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
3956 cycles for 100 * dw2hex
7890 cycles for 100 * MB Hex$
50529 cycles for 100 * CRT sprintf
665 cycles for 100 * Bin2Hex
760 cycles for 100 * Bin2Hex2 cx
1550 cycles for 100 * Bin2Hex6
1952 cycles for 100 * FastHex
3955 cycles for 100 * dw2hex
7845 cycles for 100 * MB Hex$
50451 cycles for 100 * CRT sprintf
663 cycles for 100 * Bin2Hex
760 cycles for 100 * Bin2Hex2 cx
1553 cycles for 100 * Bin2Hex6
1974 cycles for 100 * FastHex
3953 cycles for 100 * dw2hex
7889 cycles for 100 * MB Hex$
50417 cycles for 100 * CRT sprintf
662 cycles for 100 * Bin2Hex
762 cycles for 100 * Bin2Hex2 cx
1561 cycles for 100 * Bin2Hex6
2015 cycles for 100 * FastHex
3933 cycles for 100 * dw2hex
7851 cycles for 100 * MB Hex$
50340 cycles for 100 * CRT sprintf
661 cycles for 100 * Bin2Hex
766 cycles for 100 * Bin2Hex2 cx
1584 cycles for 100 * Bin2Hex6
1967 cycles for 100 * FastHex
20 bytes for dw2hex
17 bytes for MB Hex$
29 bytes for CRT sprintf
138 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
616 bytes for Bin2Hex6
66 bytes for FastHex
00345678 = eax dw2hex
00345678 = eax MB Hex$
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
12345678 = eax Bin2Hex6
012345678 = eax FastHex
JJ, i´m trying to make the testing fucntions behave the same (I mean, all inside a regular proc, instead a void function ), but im having probklems wuth the syntax in masm.
I rebuild the function as:
Bin2Hex6 proc Input:DWORD, Output:DWORD
Local DwordStorage:DWORD
push eax
push edi
mov eax, Input
mov edi, Output
mov DwordStorage, eax
movzx eax, byte ptr [DwordStorage+3]
mov ax, word ptr hex_table[eax*2]
stosw
movzx eax, byte ptr [DwordStorage+2]
mov ax, word ptr hex_table[eax*2]
stosw
movzx eax, byte ptr [DwordStorage+1]
mov ax, word ptr hex_table[eax*2]
stosw
movzx eax, byte ptr [DwordStorage]
mov ax, word ptr hex_table[eax*2]
stosw
mov byte ptr [edi], 0
pop edi
pop eax
Bin2Hex6 endp
NameG equ <Bin2Hex6> ; assign a descriptive name here
TestG proc
mov ebx, AlgoLoops-1 ; loop e.g. 100x
align 4
.Repeat
;push offset somestring
;push 12345678h
call Bin2Hex6 12345678h, offset somestring
dec ebx
.Until Sign?
mov eax, offset somestring
ret
TestG endp
But, why masm can´t assembled it ? It says it have a symbol redefinton. Is this the proper syntax ?
Well... using Input and Output as equates is kind of courageous ;)
Equates ? I thought they were arguments of the function :icon_mrgreen: It´s a long time since i last used masm, but i suceeded to port something more similar to the output results. Dispites the difference of the way the results are built. The timmings are close :)
I hope the syntax is ok now. I rebuilt the function bin2hex to it work as a proc with 2 arguments
Bin2Hex proc near
_Input = dword ptr 8
_Output = dword ptr 0Ch
push ebp
mov ebp, esp
push ecx
push edx
push edi
mov eax, [ebp+_Input]
mov edi, [ebp+_Output]
mov edx, offset hex_table ; "000102030405060708090A0B0C0D0E0F1011121"...
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edi+6], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edi+4], cx
shr eax, 10h
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edi+2], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edi], cx
mov byte ptr [edi+8], 0
lea eax, [edi]
pop edi
pop edx
pop ecx
mov esp, ebp
pop ebp
retn 8
Bin2Hex endp
Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz (SSE4)
8867 cycles for 100 * dw2hex
7048 cycles for 100 * MB Hex$
48977 cycles for 100 * CRT sprintf
787 cycles for 100 * Bin2Hex
245 cycles for 100 * Bin2Hex2 cx
1243 cycles for 100 * Bin2Hex6
1679 cycles for 100 * FastHex
4721 cycles for 100 * dw2hex
9396 cycles for 100 * MB Hex$
65490 cycles for 100 * CRT sprintf
1332 cycles for 100 * Bin2Hex
558 cycles for 100 * Bin2Hex2 cx
1598 cycles for 100 * Bin2Hex6
1727 cycles for 100 * FastHex
4414 cycles for 100 * dw2hex
9983 cycles for 100 * MB Hex$
77615 cycles for 100 * CRT sprintf
1574 cycles for 100 * Bin2Hex
682 cycles for 100 * Bin2Hex2 cx
1934 cycles for 100 * Bin2Hex6
1990 cycles for 100 * FastHex
5609 cycles for 100 * dw2hex
11614 cycles for 100 * MB Hex$
81394 cycles for 100 * CRT sprintf
1708 cycles for 100 * Bin2Hex
760 cycles for 100 * Bin2Hex2 cx
1992 cycles for 100 * Bin2Hex6
2185 cycles for 100 * FastHex
20 bytes for dw2hex
17 bytes for MB Hex$
29 bytes for CRT sprintf
139 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
616 bytes for Bin2Hex6
66 bytes for FastHex
00345678 = eax dw2hex
00345678 = eax MB Hex$
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
12345678 = eax Bin2Hex6
012345678 = eax FastHex
--- ok ---
Quote from: guga on November 30, 2015, 02:28:00 AM
Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz (SSE4)
787 cycles for 100 * Bin2Hex
245 cycles for 100 * Bin2Hex2 cx
Your i7 is cheating, Guga :eusa_naughty:
My code is fast but 2.45 cycles is fake 8)
Cheating ? How is that possible ?
I didn´t touched Bin2Hex2 cx
(http://i66.tinypic.com/i2tsw2.jpg)
i suggest you write a little test piece, like i did for FastHex, to verify proper results
if it takes 2 or 3 clock cycles to finish, it's a good bet that it isn't working to begin with
There is something very weird. I isolated JJ´s code Bin2hex2 cx to it displays only this algo. (I deleted all others), and i keep having different results whenever i click the app.
To achieve this different results, all i did was, open the app and close it, wait 3 to 5 seconds, open t again+close, and so on.
(http://i65.tinypic.com/2rftvrq.jpg)
Hi Guga,
I was just making fun, but this is indeed weird! I thought it was just a strange outlier. These are my very stable results:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
3386 cycles for 100 * dw2hex
6473 cycles for 100 * MB Hex$
42045 cycles for 100 * CRT sprintf
1141 cycles for 100 * Bin2Hex Guga
613 cycles for 100 * Bin2Hex2 cx
1328 cycles for 100 * Bin2Hex6
1348 cycles for 100 * FastHex
3389 cycles for 100 * dw2hex
6370 cycles for 100 * MB Hex$
42029 cycles for 100 * CRT sprintf
1147 cycles for 100 * Bin2Hex Guga
612 cycles for 100 * Bin2Hex2 cx
1329 cycles for 100 * Bin2Hex6
1633 cycles for 100 * FastHex
Try setting AlgoLoops or TimerLoops ten times higher, sometimes this help to stabilise timings.
I suggest :
Hex PROC __lpszString:LPSTR,__dwNumber:DWORD
mov edi,__lpszString
mov edx,__dwNumber
@Loop :
mov eax,edx
and eax,0f0000000h
shr eax,28
cmp al,10
jl @NotAlpha
add al,'A' - 10
jmp @Next
@NotAlpha :
add al,'0'
@Next :
mov Byte Ptr [edi],al
add edi,1
shl edx,4
jnz @Loop
mov Byte Ptr [edi],0
Hex ENDP
Works fine:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
3371 cycles for 100 * dw2hex
3589 cycles for 100 * Hex$ Grincheux
42320 cycles for 100 * CRT sprintf
526 cycles for 100 * Bin2Hex
605 cycles for 100 * Bin2Hex2 cx
1321 cycles for 100 * Bin2Hex6
1626 cycles for 100 * FastHex
3375 cycles for 100 * dw2hex
3518 cycles for 100 * Hex$ Grincheux
42303 cycles for 100 * CRT sprintf
524 cycles for 100 * Bin2Hex
605 cycles for 100 * Bin2Hex2 cx
1321 cycles for 100 * Bin2Hex6
1626 cycles for 100 * FastHex
JJ you suprise me, I am very very slow. I will remake it. How do you compute the cycles? It is very interesting.
Quote from: Grincheux on December 07, 2015, 01:36:26 PM
JJ you suprise me, I am very very slow. I will remake it. How do you compute the cycles? It is very interesting.
Philippe,
Your solution is not slow, it's on par with the standard Masm32 algo, and a factor 12 faster than the C runtime library.
Bin2Hex is even faster because it's a table-based solution.
Re cycles, it's complicated but public: Open the source in WordPad or RichMasm and search for
counter_begin - a macro:
include \masm32\macros\timers.asm ; download from the Masm32 Laboratory (http://masm32.com/board/index.php?topic=49.0)
this is the template i use for use with Michael's timing routines
;###############################################################################################
.XCREF
.NoList
INCLUDE \Masm32\Include\Masm32rt.inc
.686p
.MMX
.XMM
INCLUDE \Masm32\Macros\Timers.asm
.List
;###############################################################################################
Loop_Count = 10000 ;adjust so that each pass is roughly 0.5 seconds or longer
;###############################################################################################
.DATA
;***********************************************************************************************
.DATA?
;###############################################################################################
.CODE
;***********************************************************************************************
main PROC
INVOKE GetCurrentProcess
INVOKE SetProcessAffinityMask,eax,1
INVOKE Sleep,750
mov ecx,5
Loop00: push ecx
counter_begin Loop_Count,HIGH_PRIORITY_CLASS
;code to time goes here
counter_end
print str$(eax),32
pop ecx
dec ecx
jnz Loop00
print chr$(13,10)
inkey
INVOKE ExitProcess,0
main ENDP
;###############################################################################################
END main
See if this has got any legs.
In the .DATA section.
.data
align 16
hex_table2 \
db "000102030405060708090A0B0C0D0E0F"
db "101112131415161718191A1B1C1D1E1F"
db "202122232425262728292A2B2C2D2E2F"
db "303132333435363738393A3B3C3D3E3F"
db "404142434445464748494A4B4C4D4E4F"
db "505152535455565758595A5B5C5D5E5F"
db "606162636465666768696A6B6C6D6E6F"
db "707172737475767778797A7B7C7D7E7F"
db "808182838485868788898A8B8C8D8E8F"
db "909192939495969798999A9B9C9D9E9F"
db "A0A1A2A3A4A5A6A7A8A9AAABACADAEAF"
db "B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF"
db "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
db "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
db "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
db "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF"
The algorithm.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
utoh proc valu:DWORD,pbuf:DWORD
; -----------------------------------------------
; convert an unsigned DWORD value to a hex string
; -----------------------------------------------
push ebx
mov eax, [esp+8][4] ; pbuf
mov ebx, [esp+4][4] ; valu
movzx ecx, bh ; 3rd byte
mov dx, WORD PTR [ecx*2+hex_table2]
mov [eax+4], dx
movzx ecx, bl ; 4th byte
mov dx, WORD PTR [ecx*2+hex_table2]
mov [eax+6], dx
bswap ebx
movzx ecx, bl ; 1st byte
mov dx, WORD PTR [ecx*2+hex_table2]
mov [eax], dx
movzx ecx, bh ; 2nd byte
mov dx, WORD PTR [ecx*2+hex_table2]
mov [eax+2], dx
mov DWORD PTR [eax+8], 0 ; terminate buffer
pop ebx
ret 8
utoh endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Note that the buffer should be 12 bytes long.
Quote from: hutch-- on December 08, 2015, 09:34:21 AM
See if this has got any legs.
Quite an improvement on the standard Masm32 algo, and a factor 43 faster than CRT ;-)
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4342 cycles for 100 * dw2hex
1230 cycles for 100 * utoh (Hutch)
52286 cycles for 100 * CRT sprintf
667 cycles for 100 * Bin2Hex
764 cycles for 100 * Bin2Hex2 cx
765 cycles for 100 * Bin2Hex3 ecx
1645 cycles for 100 * Bin2Hex6
1716 cycles for 100 * FastHex
4121 cycles for 100 * dw2hex
1233 cycles for 100 * utoh (Hutch)
52256 cycles for 100 * CRT sprintf
665 cycles for 100 * Bin2Hex
766 cycles for 100 * Bin2Hex2 cx
763 cycles for 100 * Bin2Hex3 ecx
1661 cycles for 100 * Bin2Hex6
2029 cycles for 100 * FastHex
4122 cycles for 100 * dw2hex
1221 cycles for 100 * utoh (Hutch)
52273 cycles for 100 * CRT sprintf
671 cycles for 100 * Bin2Hex
766 cycles for 100 * Bin2Hex2 cx
768 cycles for 100 * Bin2Hex3 ecx
1655 cycles for 100 * Bin2Hex6
1848 cycles for 100 * FastHex
Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz (SSE4)
3717 cycles for 100 * dw2hex
950 cycles for 100 * utoh (Hutch)
60822 cycles for 100 * CRT sprintf
659 cycles for 100 * Bin2Hex
659 cycles for 100 * Bin2Hex2 cx
659 cycles for 100 * Bin2Hex3 ecx
1515 cycles for 100 * Bin2Hex6
1621 cycles for 100 * FastHex
3716 cycles for 100 * dw2hex
950 cycles for 100 * utoh (Hutch)
60830 cycles for 100 * CRT sprintf
658 cycles for 100 * Bin2Hex
659 cycles for 100 * Bin2Hex2 cx
659 cycles for 100 * Bin2Hex3 ecx
1515 cycles for 100 * Bin2Hex6
1621 cycles for 100 * FastHex
3716 cycles for 100 * dw2hex
950 cycles for 100 * utoh (Hutch)
60818 cycles for 100 * CRT sprintf
658 cycles for 100 * Bin2Hex
659 cycles for 100 * Bin2Hex2 cx
659 cycles for 100 * Bin2Hex3 ecx
1514 cycles for 100 * Bin2Hex6
1621 cycles for 100 * FastHex
00345678 = eax dw2hex
12345678 = eax utoh (Hutch)
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
00345678 = eax Bin2Hex3 ecx
12345678 = eax Bin2Hex6
012345678 = eax FastHex
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
3855 cycles for 100 * dw2hex
1098 cycles for 100 * utoh (Hutch)
50477 cycles for 100 * CRT sprintf
665 cycles for 100 * Bin2Hex
760 cycles for 100 * Bin2Hex2 cx
767 cycles for 100 * Bin2Hex3 ecx
1550 cycles for 100 * Bin2Hex6
1859 cycles for 100 * FastHex
3848 cycles for 100 * dw2hex
1113 cycles for 100 * utoh (Hutch)
50596 cycles for 100 * CRT sprintf
661 cycles for 100 * Bin2Hex
763 cycles for 100 * Bin2Hex2 cx
759 cycles for 100 * Bin2Hex3 ecx
1557 cycles for 100 * Bin2Hex6
1762 cycles for 100 * FastHex
3849 cycles for 100 * dw2hex
1111 cycles for 100 * utoh (Hutch)
50491 cycles for 100 * CRT sprintf
661 cycles for 100 * Bin2Hex
760 cycles for 100 * Bin2Hex2 cx
766 cycles for 100 * Bin2Hex3 ecx
1536 cycles for 100 * Bin2Hex6
1962 cycles for 100 * FastHex
00345678 = eax dw2hex
12345678 = eax utoh (Hutch)
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
00345678 = eax Bin2Hex3 ecx
12345678 = eax Bin2Hex6
012345678 = eax FastHex
--- ok ---
There was not much to tweak on this algo but I reused ECX, freed up EDX and removed the push / pop of EBX and with a fall through algo like this the times did improve. I don't have the right stuff set up to use JJ's testbed so I could not test it against the other reference algos.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
align 16
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
utoh proc valu:DWORD,pbuf:DWORD
; -----------------------------------------------
; convert an unsigned DWORD value to a hex string
; the "pbuf" address is also returned in EAX
; -----------------------------------------------
mov edx, [esp+4] ; valu
mov eax, [esp+8] ; pbuf
movzx ecx, dl ; 4th byte
mov cx, WORD PTR [ecx*2+hex_table2]
mov [eax+6], cx
movzx ecx, dh ; 3rd byte
mov cx, WORD PTR [ecx*2+hex_table2]
mov [eax+4], cx
bswap edx
movzx ecx, dl ; 1st byte
mov cx, WORD PTR [ecx*2+hex_table2]
mov [eax], cx
movzx ecx, dh ; 2nd byte
mov cx, WORD PTR [ecx*2+hex_table2]
mov [eax+2], cx
mov DWORD PTR [eax+8], 0 ; terminate buffer
ret 8
utoh endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
1183->1061 :t
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4119 cycles for 100 * dw2hex
1061 cycles for 100 * utoh (Hutch)
52100 cycles for 100 * CRT sprintf
659 cycles for 100 * Bin2Hex
758 cycles for 100 * Bin2Hex2 cx
756 cycles for 100 * Bin2Hex3 ecx
1646 cycles for 100 * Bin2Hex6
1750 cycles for 100 * FastHex
I did a couple of other variants, put it in a rough benchmark and zipped it. The FASTCALL variant is clearly faster than the others as its free of stack overhead.
-------------
test accuracy
-------------
12345678 FASTCALL
12345678 NO STACK FRAME
12345678 NO STACK FRAME VARIANT
12345678 WITH STACK FRAME
------------
algo timings
------------
358 FASTCALL
577 NO STACK FRAME
608 NO STACK FRAME VARIANT
639 WITH STACK FRAME
Press any key to continue ...
Quote from: hutch-- on December 09, 2015, 09:08:03 AMThe FASTCALL variant is clearly faster than the others as its free of stack overhead.
Included:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4114 cycles for 100 * dw2hex
670 cycles for 100 * utoh (Hutch)
52229 cycles for 100 * CRT sprintf
661 cycles for 100 * Bin2Hex
761 cycles for 100 * Bin2Hex2 cx
759 cycles for 100 * Bin2Hex3 ecx
1650 cycles for 100 * Bin2Hex6
1674 cycles for 100 * FastHex
Bin2Hex preserves ecx (MasmBasic ABI ;)), but that has no influence on the timings.
For comparison:
Hutch:
; value to convert passed in ECX
; output buffer address passed in EAX
; -----------------------------------
movzx edx, cl ; 4th byte
mov dx, WORD PTR [edx*2+hex_table2]
mov [eax+6], dx
movzx edx, ch ; 3rd byte
mov dx, WORD PTR [edx*2+hex_table2]
mov [eax+4], dx
bswap ecx
movzx edx, cl ; 1st byte
mov dx, WORD PTR [edx*2+hex_table2]
mov [eax], dx
movzx edx, ch ; 2nd byte
mov dx, WORD PTR [edx*2+hex_table2]
mov [eax+2], dx
mov DWORD PTR [eax+8], 0 ; terminate buffer
ret
JJ:
Bin2Hex proc ; value passed in eax, pBuffer returned in eax
mov edx, offset xHex
cmp dword ptr [edx], 0
je crtHexTable
push ecx
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+6], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+4], cx
shr eax, 16
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+2], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+0], cx
lea eax, [edx+512] ; return pBuffer
pop ecx
ret
Hutch's version needs 600 bytes, mine 138; reason is simply that mine creates the table once. Note that the currently attached version does not require MasmBasic.
Thanks for adding that, I had no way to compare these algos to the ones in your benchmark.
Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz (SSE4)
3721 cycles for 100 * dw2hex
569 cycles for 100 * utoh (Hutch)
60842 cycles for 100 * CRT sprintf
667 cycles for 100 * Bin2Hex
690 cycles for 100 * Bin2Hex2 cx
665 cycles for 100 * Bin2Hex3 ecx
1543 cycles for 100 * Bin2Hex6
1629 cycles for 100 * FastHex
3722 cycles for 100 * dw2hex
569 cycles for 100 * utoh (Hutch)
61080 cycles for 100 * CRT sprintf
664 cycles for 100 * Bin2Hex
662 cycles for 100 * Bin2Hex2 cx
665 cycles for 100 * Bin2Hex3 ecx
1521 cycles for 100 * Bin2Hex6
1627 cycles for 100 * FastHex
3721 cycles for 100 * dw2hex
569 cycles for 100 * utoh (Hutch)
60834 cycles for 100 * CRT sprintf
664 cycles for 100 * Bin2Hex
665 cycles for 100 * Bin2Hex2 cx
665 cycles for 100 * Bin2Hex3 ecx
1521 cycles for 100 * Bin2Hex6
1627 cycles for 100 * FastHex
20 bytes for dw2hex
600 bytes for utoh (Hutch)
29 bytes for CRT sprintf
138 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
214 bytes for Bin2Hex3 ecx
616 bytes for Bin2Hex6
66 bytes for FastHex
00345678 = eax dw2hex
12345678 = eax utoh (Hutch)
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
00345678 = eax Bin2Hex3 ecx
12345678 = eax Bin2Hex6
012345678 = eax FastHex
--- ok ---
AMD Athlon(tm) II X2 220 Processor (SSE3) 2.8 GHz
8781 cycles for 100 * dw2hex
797 cycles for 100 * utoh (Hutch)
78313 cycles for 100 * CRT sprintf
903 cycles for 100 * Bin2Hex
903 cycles for 100 * Bin2Hex2 cx
933 cycles for 100 * Bin2Hex3 ecx
4241 cycles for 100 * Bin2Hex6
2025 cycles for 100 * FastHex
8676 cycles for 100 * dw2hex
797 cycles for 100 * utoh (Hutch)
78244 cycles for 100 * CRT sprintf
908 cycles for 100 * Bin2Hex
922 cycles for 100 * Bin2Hex2 cx
919 cycles for 100 * Bin2Hex3 ecx
4227 cycles for 100 * Bin2Hex6
2034 cycles for 100 * FastHex
8712 cycles for 100 * dw2hex
810 cycles for 100 * utoh (Hutch)
78209 cycles for 100 * CRT sprintf
991 cycles for 100 * Bin2Hex
916 cycles for 100 * Bin2Hex2 cx
925 cycles for 100 * Bin2Hex3 ecx
4480 cycles for 100 * Bin2Hex6
2191 cycles for 100 * FastHex
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
23333 cycles for 100 * dw2hex
1261 cycles for 100 * utoh (Hutch)
173108 cycles for 100 * CRT sprintf
1684 cycles for 100 * Bin2Hex
1480 cycles for 100 * Bin2Hex2 cx
1549 cycles for 100 * Bin2Hex3 ecx
5593 cycles for 100 * Bin2Hex6
2880 cycles for 100 * FastHex
23375 cycles for 100 * dw2hex
1266 cycles for 100 * utoh (Hutch)
173666 cycles for 100 * CRT sprintf
1460 cycles for 100 * Bin2Hex
1485 cycles for 100 * Bin2Hex2 cx
1474 cycles for 100 * Bin2Hex3 ecx
5608 cycles for 100 * Bin2Hex6
2874 cycles for 100 * FastHex
23111 cycles for 100 * dw2hex
1275 cycles for 100 * utoh (Hutch)
172950 cycles for 100 * CRT sprintf
1465 cycles for 100 * Bin2Hex
1481 cycles for 100 * Bin2Hex2 cx
1481 cycles for 100 * Bin2Hex3 ecx
5592 cycles for 100 * Bin2Hex6
2805 cycles for 100 * FastHex
Hi,
Quote from: jj2007 on December 09, 2015, 01:37:36 PM
Note that the currently attached version does not require MasmBasic.
In that case, some oldies. Somewhat odd utoh results for
the P-MMX?
{P-MMX}
pre-P4
15186 cycles for 100 * dw2hex
7585 cycles for 100 * utoh (Hutch)
253408 cycles for 100 * CRT sprintf
7057 cycles for 100 * Bin2Hex
7626 cycles for 100 * Bin2Hex2 cx
7585 cycles for 100 * Bin2Hex3 ecx
8991 cycles for 100 * Bin2Hex6
6013 cycles for 100 * FastHex
14894 cycles for 100 * dw2hex
8126 cycles for 100 * utoh (Hutch)
245623 cycles for 100 * CRT sprintf
7622 cycles for 100 * Bin2Hex
7604 cycles for 100 * Bin2Hex2 cx
7590 cycles for 100 * Bin2Hex3 ecx
8918 cycles for 100 * Bin2Hex6
5857 cycles for 100 * FastHex
13313 cycles for 100 * dw2hex
7410 cycles for 100 * utoh (Hutch)
245176 cycles for 100 * CRT sprintf
8035 cycles for 100 * Bin2Hex
7633 cycles for 100 * Bin2Hex2 cx
6907 cycles for 100 * Bin2Hex3 ecx
9412 cycles for 100 * Bin2Hex6
6373 cycles for 100 * FastHex
20 bytes for dw2hex
600 bytes for utoh (Hutch)
29 bytes for CRT sprintf
138 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
214 bytes for Bin2Hex3 ecx
616 bytes for Bin2Hex6
66 bytes for FastHex
00345678 = eax dw2hex
12345678 = eax utoh (Hutch)
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
00345678 = eax Bin2Hex3 ecx
12345678 = eax Bin2Hex6
012345678 = eax FastHex
--- ok ---
{P-III}
pre-P4 (SSE1)
7668 cycles for 100 * dw2hex
1311 cycles for 100 * utoh (Hutch)
166093 cycles for 100 * CRT sprintf
1614 cycles for 100 * Bin2Hex
2016 cycles for 100 * Bin2Hex2 cx
1617 cycles for 100 * Bin2Hex3 ecx
3698 cycles for 100 * Bin2Hex6
5975 cycles for 100 * FastHex
7687 cycles for 100 * dw2hex
1312 cycles for 100 * utoh (Hutch)
166001 cycles for 100 * CRT sprintf
1728 cycles for 100 * Bin2Hex
2025 cycles for 100 * Bin2Hex2 cx
1652 cycles for 100 * Bin2Hex3 ecx
3717 cycles for 100 * Bin2Hex6
5958 cycles for 100 * FastHex
7719 cycles for 100 * dw2hex
1322 cycles for 100 * utoh (Hutch)
166667 cycles for 100 * CRT sprintf
1625 cycles for 100 * Bin2Hex
2023 cycles for 100 * Bin2Hex2 cx
1613 cycles for 100 * Bin2Hex3 ecx
3718 cycles for 100 * Bin2Hex6
5966 cycles for 100 * FastHex
20 bytes for dw2hex
600 bytes for utoh (Hutch)
29 bytes for CRT sprintf
138 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
214 bytes for Bin2Hex3 ecx
616 bytes for Bin2Hex6
66 bytes for FastHex
00345678 = eax dw2hex
12345678 = eax utoh (Hutch)
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
00345678 = eax Bin2Hex3 ecx
12345678 = eax Bin2Hex6
012345678 = eax FastHex
--- ok ---
Regards,
Steve N.
Steve,
The variations with the older hardware is probably due to the different handling of the Intel complex addressing mode. Its been a long time since I worked on a PIII or earlier but from memory you preferentially loaded a table address into a register first then processed it and this was to simplify the complex address to registers rather than combined OFFSETs from a table mixed with registers. The techniques changed with the PIV and then again with the Core2 series. So far the i7 I am using is similar to the Core2 quad I used to use and this is what I tested the last set of algos on.
Hi Steve,
Thanks for the reply. So complex addressing is the difference.
I should have remembered when I first got a Pentium Pro computer.
It performed quite a bit better than the Pentiums that were then
current. If you had a good 32-bit compiler. 16-bit stuff was only
faster due to the faster clock and better cache.
Regards,
Steve N.
i like the Pentium MMX - it makes my code look the best :lol:
:greenclp:
Dave,
If you can optimise code for a Prescott you will find the later processors easy in comparison. The Prescott processors had the longest and fussiest pipeline of any of the Intel processors I remember.
i would say it's easiest to optimize for whatever processor you are currently using - lol
Dave
the Rdtscp opcode exists in what processors ? I7 and above only ?
And the Rdtsc ? Does it exists on a PII or PIII too ? (If so, how to detect it ?
I mean, rdtscp is detectable thourgh bit 27 resultant from cpuid with mode 1 (in eax) activated. But, and rdtsc which bit is used to recognize it ?
I´m not sure if rdtsc exists in all Pentium Series or only from PII or PIII and above
i believe RDTSC is supported in all pentiums and newer (that's from memory, so check it)
the CPUID bit is named TSC
it is CPUID function 1, EDX bit 4
Many thanks, dave :t