The MASM Forum

General => The Laboratory => Topic started by: jj2007 on April 01, 2024, 03:42:50 AM

Title: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 01, 2024, 03:42:50 AM
Inspired by ahsat's Binary to displayable text, any base (https://masm32.com/board/index.php?topic=11817.0) thread, I searched the Lab for our fastest dword-to-string. To my surprise, there is almost nothing; please correct me if I am wrong.

So I fumbled together a testbed. Can I have some timings, please? Even better: some really fast algos?

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

4471    cycles for 100 * dwtoa
31143   cycles for 100 * dw2a
3531    cycles for 100 * dw2str

4428    cycles for 100 * dwtoa
31325   cycles for 100 * dw2a
3456    cycles for 100 * dw2str

4463    cycles for 100 * dwtoa
31232   cycles for 100 * dw2a
3440    cycles for 100 * dw2str

4400    cycles for 100 * dwtoa
31364   cycles for 100 * dw2a
3458    cycles for 100 * dw2str

4440    cycles for 100 * dwtoa
31783   cycles for 100 * dw2a
3472    cycles for 100 * dw2str

Averages:
4444    cycles for dwtoa
31307   cycles for dw2a
3462    cycles for dw2str

20      bytes for dwtoa
20      bytes for dw2a
74      bytes for dw2str

dwtoa                                   123456789
dw2a                                    123456789
dw2str                                  123456789

dw2a is CRT printf
dw2str is an adaption of the Masm32 lib dwtoa algo
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 01, 2024, 04:48:49 AM
Version 2, with Ray's algo:
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

4485    cycles for 100 * dwtoa
3452    cycles for 100 * dw2str
16911  cycles for 100 * MasmBasic Str$()
12479  cycles for 100 * Ray's algo

4412    cycles for 100 * dwtoa
3480    cycles for 100 * dw2str
17000  cycles for 100 * MasmBasic Str$()
12466  cycles for 100 * Ray's algo

4446    cycles for 100 * dwtoa
3470    cycles for 100 * dw2str
16844  cycles for 100 * MasmBasic Str$()
12497  cycles for 100 * Ray's algo

4400    cycles for 100 * dwtoa
3444    cycles for 100 * dw2str
16814  cycles for 100 * MasmBasic Str$()
12511  cycles for 100 * Ray's algo

Averages:
4429    cycles for dwtoa
3461    cycles for dw2str
16878  cycles for MasmBasic Str$()
12488  cycles for Ray's algo

20      bytes for dwtoa
74      bytes for dw2str
16      bytes for MasmBasic Str$()
138    bytes for Ray's algo

dwtoa                                  123456789
dw2a                                    123456789
dw2str                                  123456789
MasmBasic Str$()                        123456789
Ray's algo                              123456789

Here is Ray's algo, with minor modifications:
;----------------------------------------------------------------------------
;          Original by Ray Gwinn
; Any number to any base with a specified digit length, will pad leading zeros.
; Parameters:
;   eax    The number to convert
;   ebx    The desired base
;   ecx    The desired digit count
;   edi    Where to place the converted string
;----------------------------------------------------------------------------
ctAnyToAny:
  push edx ; save edx, or a digit
  cdq ; same as xor edx, edx: zero edx
  div ebx ; generate next digit in edx
  dec ecx ; decrement digit count
  .if !Zero? ; br if done
call ctAnyToAny ; generate next digit
  .endif
  mov al, @digits[edx] ; get the ascii value
  stosb ; save it
  pop edx ; restore edx, or get next digit
  ret
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: sinsi on April 01, 2024, 06:22:32 AM
13th Gen Intel(R) Core(TM) i9-13900KF (SSE4)

2155    cycles for 100 * dwtoa
1928    cycles for 100 * dw2str
11807   cycles for 100 * MasmBasic Str$()
2912    cycles for 100 * Ray's algo

2137    cycles for 100 * dwtoa
1897    cycles for 100 * dw2str
11673   cycles for 100 * MasmBasic Str$()
2922    cycles for 100 * Ray's algo

2133    cycles for 100 * dwtoa
1913    cycles for 100 * dw2str
11662   cycles for 100 * MasmBasic Str$()
2934    cycles for 100 * Ray's algo

2133    cycles for 100 * dwtoa
1897    cycles for 100 * dw2str
11687   cycles for 100 * MasmBasic Str$()
2924    cycles for 100 * Ray's algo

Averages:
2135    cycles for dwtoa
1905    cycles for dw2str
11680   cycles for MasmBasic Str$()
2923    cycles for Ray's algo

20      bytes for dwtoa
74      bytes for dw2str
16      bytes for MasmBasic Str$()
138     bytes for Ray's algo

dwtoa                                   123456789
dw2a                                    123456789
dw2str                                  123456789
MasmBasic Str$()                        123456789
Ray's algo                              123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: TimoVJL on April 01, 2024, 06:59:01 AM
AMD Athlon(tm) II X2 220 Processor (SSE3)

7967    cycles for 100 * dwtoa
7799    cycles for 100 * dw2str
27640  cycles for 100 * MasmBasic Str$()
31049  cycles for 100 * Ray's algo

7999    cycles for 100 * dwtoa
7750    cycles for 100 * dw2str
27413  cycles for 100 * MasmBasic Str$()
31224  cycles for 100 * Ray's algo

7968    cycles for 100 * dwtoa
7747    cycles for 100 * dw2str
27365  cycles for 100 * MasmBasic Str$()
31046  cycles for 100 * Ray's algo

7965    cycles for 100 * dwtoa
7747    cycles for 100 * dw2str
27587  cycles for 100 * MasmBasic Str$()
31092  cycles for 100 * Ray's algo

Averages:
7968    cycles for dwtoa
7748    cycles for dw2str
27500  cycles for MasmBasic Str$()
31070  cycles for Ray's algo

20      bytes for dwtoa
74      bytes for dw2str
16      bytes for MasmBasic Str$()
138    bytes for Ray's algo

dwtoa                                  123456789
dw2a                                    123456789
dw2str                                  123456789
MasmBasic Str$()                        123456789
Ray's algo                              123456789

--- ok ---
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 01, 2024, 07:25:55 AM
Any base that is a power of two, such as, 2=bianry, 8=octal, 16=hex, will convert faster because they can be done with shifts, avoiding timely multiply and divide instructions. They will always be faster than the recursive code.

Any base that is not a power of two, like our everyday base 10=decimal and base 60=Sexagesimal :-), will convert faster using the recursive code.

Don't forget that in the everyday world, base 10 is the most used.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 01, 2024, 07:50:13 AM
Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (SSE4)

3709    cycles for 100 * dwtoa
3567    cycles for 100 * dw2str
17710   cycles for 100 * MasmBasic Str$()
11878   cycles for 100 * Ray's algo

4336    cycles for 100 * dwtoa
3609    cycles for 100 * dw2str
15036   cycles for 100 * MasmBasic Str$()
9591    cycles for 100 * Ray's algo

3862    cycles for 100 * dwtoa
3340    cycles for 100 * dw2str
13489   cycles for 100 * MasmBasic Str$()
10160   cycles for 100 * Ray's algo

4346    cycles for 100 * dwtoa
3768    cycles for 100 * dw2str
16407   cycles for 100 * MasmBasic Str$()
10623   cycles for 100 * Ray's algo
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 01, 2024, 08:00:16 AM
Quote from: sinsi on April 01, 2024, 06:22:32 AM13th Gen Intel(R) Core(TM) i9-13900KF (SSE4)
Intel must have really improved the speed of a divide on this processor.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 01, 2024, 09:17:19 AM
Version 3, dword to decimal string, this time with a negative number:

Averages:
4670    cycles for dwtoa
2923    cycles for dw2str
17191   cycles for MasmBasic Str$()
14168   cycles for Ray's algo

20      bytes for dwtoa
74      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo                              171510507

For positive numbers, all algos work fine. My dw2str got a little speed boost, strangely enough also thanks to a jecxz :cool:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 01, 2024, 09:37:41 AM
OK, it's spring, and apparently that's time for a favorite indoor sport around here: How Fast is My Algo?

I always have to laugh at all this attention paid here to speed when it comes to numeric-conversion routines, which seem to be the favorite category in this sport. Why? Because it's all so ridiculous; really, except in a few cases, who cares how fast they are? Usually these routines will be used to either accept or display user input/output, in which case the time it takes to do the conversion is grossly swamped by the input/output time.

But "Oh, look, my a2bin function is a whole 25 microseconds faster than yours!".

But I suppose it's a harmless sport at that. And so far as our OP here goes (here's looking at you, @ahsat), it's all part of your assembly learning experience, so I won't begrudge you that. Just don't fall for the hype here.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 01, 2024, 09:51:27 AM
Oh dear... can you explain, in a few words, why you are hanging around here? Apparently, it's not for assembly programming.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 01, 2024, 09:57:18 AM
I just like pissing you off, JJ.

Besides, as you well know, I've posted plenty of code here and continue to do so.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: sinsi on April 01, 2024, 10:00:49 AM
Quote from: NoCforMe on April 01, 2024, 09:37:41 AMOK, it's spring, and apparently that's time for a favorite indoor sport around here: How Fast is My Algo?

I always have to laugh at all this attention paid here to speed when it comes to numeric-conversion routines, which seem to be the favorite category in this sport. Why? Because it's all so ridiculous; really, except in a few cases, who cares how fast they are? Usually these routines will be used to either accept or display user input/output, in which case the time it takes to do the conversion is grossly swamped by the input/output time.

But I suppose it's a harmless sport at that. And so far as our OP here goes (here's looking at you, @ahsat), it's all part of your assembly learning experience, so I won't begrudge you that. Just don't fall for the hype here.
So when Excel wants to show 10000 integers in a spreadsheet an algo 4 times slower is enough?


More results
13th Gen Intel(R) Core(TM) i9-13900KF (SSE4)

3219    cycles for 100 * dwtoa
2059    cycles for 100 * dw2str
14581   cycles for 100 * MasmBasic Str$()
3617    cycles for 100 * Ray's algo

2692    cycles for 100 * dwtoa
2056    cycles for 100 * dw2str
14797   cycles for 100 * MasmBasic Str$()
3627    cycles for 100 * Ray's algo

3120    cycles for 100 * dwtoa
2063    cycles for 100 * dw2str
11700   cycles for 100 * MasmBasic Str$()
2900    cycles for 100 * Ray's algo

2201    cycles for 100 * dwtoa
1669    cycles for 100 * dw2str
11825   cycles for 100 * MasmBasic Str$()
2916    cycles for 100 * Ray's algo

Averages:
2906    cycles for dwtoa
2058    cycles for dw2str
13203   cycles for MasmBasic Str$()
3266    cycles for Ray's algo

20      bytes for dwtoa
74      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo                              171510507

--- ok ---
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 01, 2024, 10:02:21 AM
Quote from: NoCforMe on April 01, 2024, 09:57:18 AMI just like pissing you off, JJ.

Besides, as you well know, I've posted plenty of code here and continue to do so.

No problem. From now on, I will intervene under each and every of your coding attempts and leave some crispy remarks. I'm quite good at that game, too :thup:

Btw what happened to your funny "cubby holes" editor (https://masm32.com/board/index.php?topic=10319.0)? Any chance to see a release version this year?
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 01, 2024, 10:07:05 AM
Quote from: sinsi on April 01, 2024, 10:00:49 AM
Quote from: NoCforMe on April 01, 2024, 09:37:41 AMOK, it's spring, and apparently that's time for a favorite indoor sport around here: How Fast is My Algo?

I always have to laugh at all this attention paid here to speed when it comes to numeric-conversion routines, which seem to be the favorite category in this sport. Why? Because it's all so ridiculous; really, except in a few cases, who cares how fast they are? Usually these routines will be used to either accept or display user input/output, in which case the time it takes to do the conversion is grossly swamped by the input/output time.
So when Excel wants to show 10000 integers in a spreadsheet an algo 4 times slower is enough?
As I wrote: except in a few cases, who cares how fast they are?. That's one of those few cases.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 01, 2024, 10:10:12 AM
Quote from: jj2007 on April 01, 2024, 10:02:21 AMBtw what happened to your funny "cubby holes" editor (https://masm32.com/board/index.php?topic=10319.0)? Any chance to see a release version this year?
Yes, good question and thanks for asking. Yes, there'll be a new "release" coming along. And I think that's a perfectly fine (and humorous) way to describe it. (I do wish I had a bit more feedback on it, though.)

And of course I wish you the best of luck with your editor efforts (which I have admitted have some very nice functionality that mine lacks).
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 01, 2024, 10:24:11 AM
@Sinsi: thanks :thumbsup:

Quote from: ahsat on April 01, 2024, 08:00:16 AMIntel must have really improved the speed of a divide on this processor.

The div ebx is indeed the bottleneck in your algo. I've tried to replace it with a mul 0CCCCCCCDh but with little success, speedwise. It's a pity because your algo is really very elegant and short :thumbsup:

ctAnyToAny:
  push edx ; save edx, or a digit
  xor edx, edx ; same as cdq but maybe a tick faster
  div ebx ; generate next digit in edx
  ; dec ecx ; decrement digit count
  test eax, eax ; anything left?
  .if !Zero? ; br if done
call ctAnyToAny ; generate next digit
  .endif
  mov al, @digits[edx] ; get the ascii value
  stosb ; save it
  pop edx ; restore edx, or get next digit
  ret

Btw you could drop the digit counter and instead check if anything is left in eax:
  div ebx ; generate next digit in edx
  ; dec ecx ; decrement digit count
  test eax, eax ; anything left?
  .if !Zero? ; br if done
call ctAnyToAny ; generate next digit
  .endif

Works fine and leaves one register intact. Speedwise there is no difference, it seems.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: sinsi on April 01, 2024, 11:02:59 AM
Quote from: NoCforMe on April 01, 2024, 10:07:05 AMAs I wrote: except in a few cases, who cares how fast they are?. That's one of those few cases.
Or Explorer showing the details of 1000 files.
Or 7-zip showing a 20,000 file archive in flat view.
Anyway, I wouldn't class Excel (or Access) as "a few cases", seeing how well Office sells  :biggrin:

For me, a more useful algo would be to have thousand separators.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 01, 2024, 11:41:47 AM
Rather than continue this (tangential) discussion here, I've started a new topic in the Soap Box (https://masm32.com/board/index.php?topic=11822.0) in this whole algorithm-speed thing.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 01, 2024, 11:47:56 AM
Quote from: sinsi on April 01, 2024, 11:02:59 AMFor me, a more useful algo would be to have thousand separators.
If you're talking about putting in, say, commas every 3 digits:

3,549,782

then here's my code that does that:
;====================================================================
; CommaNum (inputBuffer, outputBuffer)
;
; Number comma-formatting routine.
; On entry,
; inputBuffer has ASCII number string (zero-terminated)
;
; On exit,
; outputBuffer contains the comma-formatted number string.
;====================================================================

CommaNum PROC inputBuffer:DWORD, outputBuffer:DWORD
LOCAL digitMoveCount:DWORD, quotient:DWORD, remainder:DWORD

PUSH ESI
PUSH EDI

MOV EDI, outputBuffer ;Where to store our finished product.
MOV digitMoveCount, 3 ;Default # of digits to move/"triad".

; Inline strlen() here:
MOV EAX, inputBuffer
MOV ESI, EAX ;Set up for later moves.
XOR EDX, EDX ;Len. counter
sl10: CMP BYTE PTR [EAX + EDX], NULL
JE sl88
INC EDX
JMP sl10

sl88: MOV EAX, EDX ;EAX = # digits.
TEST EAX, EAX ;Test for zero length.
JZ done ;Cap off output buffer and exit.
XOR EDX, EDX
MOV ECX, 3
DIV ECX ;How many "triads"?
MOV quotient, EAX
MOV remainder, EDX

; If quotient = 0, use remainder as digit count for last "triad":
CMP quotient, 0
JNE @F
MOV EAX, remainder
MOV digitMoveCount, EAX
MOV quotient, 1 ;1 time through move loop.
JMP SHORT triads

; First see if there's a short "triad" at beginning:
@@: MOV ECX, remainder
JECXZ triads ;No remainder, so no short "triad".
REP MOVSB ;Yes, so move that many digits.
MOV AL, ','
STOSB ;Put in comma.

; Now we're just moving "triads" and putting in commas after them:
triads: MOV ECX, quotient
nloop: PUSH ECX
MOV ECX, digitMoveCount
REP MOVSB
POP ECX

; Do we put a comma in, or are we on the last "triad"?
CMP ECX, 1
JE nocom
MOV AL, ','
STOSB
nocom: LOOP nloop

; Cap off formatted # buffer:
done: MOV BYTE PTR [EDI], 0

exit99: POP EDI
POP ESI
RET

CommaNum ENDP
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 01, 2024, 12:27:08 PM
Quote from: jj2007 on April 01, 2024, 10:24:11 AMIt's a pity because your algo is really very elegant and short
Thank you. I like elegant and short. I developed this algo circa 1978. We needed it for debugging on embedded systems. Replace the stosb with a print, and it printed one character at a time, usually on a tty.

This algo is also much more powerful, because it will do any base.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: sinsi on April 01, 2024, 12:37:17 PM
Oh, I have code to do that. It does it while converting the number to ascii.
I was just wondering if there was a way of optimising it.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 01, 2024, 12:58:32 PM
Quote from: jj2007 on April 01, 2024, 10:24:11 AMBtw you could drop the digit counter and instead check if anything is left in eax:
You did not look at the code for Anybase.exe. There are two versions of the algorithm. One test eax, the other does a jecxz.

I now wonder if your speed tests were flawed. Base 10 should have use the algo that uses eax.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 01, 2024, 02:06:24 PM
Thank you ahsat, :thumbsup:

for positive integers less than 100,000,000 = 5F5E100h  I use:

; Int_to_string - Convert a small binary interger into an string
; IN:  RAX = binary integer
;       RCX = address of location to store string

align 16
db  90h,90h,90h
i2a       proc   
          mov    r9,  rcx
          mov    r8d, 10
          xor    ecx, ecx
          xor    edx, edx
@@:         
          div    r8               ; divide by the number-base
          shl    rcx, 8
          add    rdx, 30h
          xchg   dl,  cl
          test   eax, eax           
          jne    @b
          mov    [r9],rcx
          ret
i2a       endp
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 01, 2024, 03:53:46 PM
Quote from: lingo on April 01, 2024, 02:06:24 PMThank you ahsat,
You are welcome.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: sinsi on April 01, 2024, 04:14:19 PM
xchg   dl,  clThat, mister, is the best instruction (for how many things it does) that I've ever seen.  :greenclp:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 01, 2024, 07:34:04 PM
Quote from: ahsat on April 01, 2024, 12:58:32 PMThere are two versions of the algorithm. One test eax, the other does a jecxz.

Apologies. I tested both versions, and there is no difference, speedwise. If you have a 32-bit version of the jecxz algo, please post it, so that I can incorporate it into the testbed. I also noted that your code produces 00000123 for dword=123, i.e. leading zeros.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 02, 2024, 03:56:52 AM
Quote from: NoCforMe on April 01, 2024, 10:10:12 AMI wish you the best of luck with your editor
I wish you both good luck. I am amazed at how poor text editors are today compared to what they use to be. The best text editor I have ever used was Brief. There is a fair start of al clone of Brief, called Grief (https://github.com/adamyg/grief), on Github. You guys should take a look if you haven't already.

A real text editor is like assembly language, it is not for the beginners.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 02, 2024, 04:17:09 AM
Quote from: jj2007 on April 01, 2024, 07:34:04 PMI also noted that your code produces 00000123 for dword=123
Not my code, but your implementation of my code. The original 64bit version that I posted only put leading zeros into hex and Sexagesimal, but no leading zeros with base 10. You would not use the jecxz version of the algo, unless you wanted the possibility of leading zeros.

The 32 bit version of Anybase.exe is working now. I have to clean it up but I will post it latter today.

Look closer this time, and you will see two versions of the algo, one with leading zeros and one without leading zeros.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 02, 2024, 04:20:44 AM
I remember using Brief, briefly back in the day. I thought it was OK.
The editor I really, really miss is MultiEdit, which up to XP would run under Windoze (no more on Win7). It did everything: regular expression search & replace, infinitely extensible through a sophisticated macro language (yes, JJ!), columnar text selection, so much more.

But then for years I plodded along using Notepad as an editor, until I wrote my own.

BTW, if you care to take a look at my editor, here's the latest version (https://masm32.com/board/index.php?msg=127757). If you do peek at it I'd be very interested in your take on it.

Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 02, 2024, 05:56:36 AM
Quote from: jj2007 on April 01, 2024, 07:34:04 PMIf you have a 32-bit version of the jecxz algo
I just posted the 32 bit version of Anybase. It has both, and always had, both versions of the algo.

Sorry, I don't know how to directly link you to the post, but it is in the same message thread that I released the 64 bit version, the "Miscellaneous Projects " part of this forum.
EDIT: https://masm32.com/board/index.php?topic=11817.60 (https://masm32.com/board/index.php?topic=11817.60)
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 02, 2024, 06:02:28 AM
On each post, the text at top ("Today at xx:xx:xx AM/PM") is a link you can copy to link to that reply in the thread. Use the link button (3rd button in 5th group) to insert a link into the post.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 02, 2024, 06:32:50 AM
Quote from: ahsat on April 02, 2024, 05:56:36 AMI don't know how to directly link you to the post

It's here (https://masm32.com/board/index.php?topic=11817.msg128470#msg128470).
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 02, 2024, 06:41:20 AM
Quote from: NoCforMe on April 02, 2024, 06:02:28 AMOn each post, the text at top
Thanks.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 02, 2024, 07:43:17 AM
Quote from: jj2007 on April 01, 2024, 10:24:11 AMThe div ebx is indeed the bottleneck in your algo. I've tried to replace it with a mul 0CCCCCCCDh but with little success, speedwise. It's a pity because your algo is really very elegant and short :thumbsup:


Using your version 3 and enabling test F I get

Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz (SSE4)

6026    cycles for 100 * dwtoa
6271    cycles for 100 * dw2str
22536  cycles for 100 * MasmBasic Str$()
15352  cycles for 100 * Ray's algo
5964    cycles for 100 * Ray+mul

6054    cycles for 100 * dwtoa
6152    cycles for 100 * dw2str
22515  cycles for 100 * MasmBasic Str$()
15347  cycles for 100 * Ray's algo
5983    cycles for 100 * Ray+mul

5941    cycles for 100 * dwtoa
6034    cycles for 100 * dw2str
22455  cycles for 100 * MasmBasic Str$()
15310  cycles for 100 * Ray's algo
5944    cycles for 100 * Ray+mul

6060    cycles for 100 * dwtoa
6359    cycles for 100 * dw2str
22421  cycles for 100 * MasmBasic Str$()
15315  cycles for 100 * Ray's algo
5984    cycles for 100 * Ray+mul

Averages:
6040    cycles for dwtoa
6212    cycles for dw2str
22485  cycles for MasmBasic Str$()
15331  cycles for Ray's algo
5974    cycles for Ray+mul

20      bytes for dwtoa
74      bytes for dw2str
16      bytes for MasmBasic Str$()
110    bytes for Ray's algo
74      bytes for Ray+mul

dwtoa                                  -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo                              171510507
Ray+mul                                j

Seems faster with the multiply to me ?

Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 02, 2024, 09:27:49 AM
Quote from: jimg on April 02, 2024, 07:43:17 AMI've tried to replace it with a mul 0CCCCCCCDh but with little success, speedwise.
Quote from: jimg on April 02, 2024, 07:43:17 AMSeems faster with the multiply to me ?

If you made it faster, show me the code.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 02, 2024, 09:40:50 AM
Quote from: jimg on April 02, 2024, 07:43:17 AMUsing your version 3 and enabling test F I get

It's definitely faster, Jim, but look at the result: there was a reason why I disabled it :biggrin:

Some fine tuning needed; dwtoa and dw2str use the mul variant.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 02, 2024, 09:46:58 AM
Quote from: jj2007 on April 02, 2024, 09:40:50 AMIt's definitely faster, Jim, but look at the result:
Is there any chance I can see the code with the multiply?
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 02, 2024, 09:55:46 AM
Quote from: ahsat on April 02, 2024, 09:46:58 AMIs there any chance I can see the code with the multiply?

It's in the archive posted in your thread (https://masm32.com/board/index.php?msg=128470) as Test_F, but disabled. Here is the code:
_xmul    dd 0CCCCCCCDh
ctAnyToAnyMul:
  push edx        ; save edx, or a digit
  push ecx
  mov ecx, eax
;   int 3
  mul _xmul    ; something is wrong here
  shr edx, 3
  mov eax, edx
  add edx, edx
  lea edx, [edx*4+edx+"0"]
  sub edx, ecx
  neg edx
  pop ecx
  dec ecx        ; decrement digit count
  .if !Zero?        ; br if done
    call ctAnyToAnyMul    ; generate next digit
  .endif
  mov al, @digits[edx]    ; get the ascii value
  stosb            ; save it
  pop edx        ; restore edx, or get next digit
  ret

This is a multiply by 10, then add 30h:
  add edx, edx
  lea edx, [edx*4+edx+"0"]

Note it will probably work with minor changes, but I was too tired to fix it...

Instead of the decrement digit count, I would vote for a test eax, eax :cool:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 02, 2024, 12:37:19 PM
it normally looks like this.  I added comments so it isn't so cryptic-

      mov ecx, 0CCCCCCCDh ; 8 * 1/10
      mov ebx,eax         ; save original
      mul ecx           ; num * 1/10 magic number
      shr edx, 3          ; magic number fixup
      lea edx,[edx*4+edx] ; *5
      add edx,edx         ; *10
      sub ebx,edx         ; remainder
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 02, 2024, 01:15:09 PM
I couldn't resist.  This seems to produce the correct number.  Patched together from dwtoa and ctAnyToAnyMul.  Could use some refinement, I just wanted to see it work.

ctAnyToAnyMul:  ; setup
  test eax,eax
  jnz sign
  mov word ptr [edi],30h
  ret

sign:
    jns pos
    mov byte ptr [edi],'-'
    neg eax
    add edi, 1
pos:
   call ctanydoit
   ret

ctanydoit:
  push edx ; save edx, or a digit
  xor edx,edx
  push ecx
      mov ecx,0CCCCCCCDh  ; 8 * 1/10
      mov ebx,eax         ; save original
      mul ecx             ; num * 1/10 magic number
      shr edx, 3          ; magic number fixup
      mov eax,edx
      lea edx,[edx*4+edx] ; *5
      add edx,edx         ; *10
      sub ebx,edx         ; remainder
      mov edx,ebx     
  pop ecx
  dec ecx ; decrement digit count
  .if !Zero? ; br if done
call ctanydoit ; generate next digit
  .endif
  mov al, @digits[edx] ; get the ascii value
  stosb ; save it
  pop edx ; restore edx, or get next digit
  ret

A little slower than dwtoa, but I didn't spend any time optimizing for speed.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 02, 2024, 01:41:58 PM
Quote from: jimg on April 02, 2024, 01:15:09 PMI couldn't resist.
That is really cool, if it works. But I would leave it up to the caller to handle the sigh, unless negative numbers break things.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 02, 2024, 02:52:37 PM
Quote from: jimg on April 02, 2024, 01:15:09 PMA little slower than dwtoa, but I didn't spend any time optimizing for speed.
It can be optimized a little, but I tested it and now see that it only works for base 10. I like my algorithm which will convert to any base.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 02, 2024, 04:52:38 PM
Of course.  This was just for fun.  You'd need the magic number for each base you wanted to use.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 02, 2024, 08:37:10 PM
Quote from: jimg on April 02, 2024, 04:52:38 PMYou'd need the magic number for each base you wanted to use.

Which wouldn't be too difficult, actually.

.CODE
_xmul    dd 0CCCCCCCDh
ctAnyToAnyMul:
  push ecx        ; save edx, or a digit
  mov ecx, eax
  mul esi        ; e.g. 0CCCCCCCDh
  shr edx, 3
  mov eax, edx
  add edx, edx
  lea edx, [edx*4+edx-"0"+1]
  sub ecx, edx
  test eax, eax
  .if !Zero?        ; br if done
    call ctAnyToAnyMul    ; generate next digit
  .endif
  mov al, dl    ; @digits[edx]    ; get the ascii value
  stosb            ; save it
  pop ecx        ; restore edx, or get next digit
  ret
...
    mov eax, TheNumber
    mov edi, offset result2
    push esi
    mov esi, _xmul[0*4]    ; 0...2 index
    call ctAnyToAnyMul
    pop esi

And voilà, it beats good ol' dwtoa :biggrin:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
Averages:
3400    cycles for dwtoa
2180    cycles for dw2str
15316   cycles for MasmBasic Str$()
10984   cycles for Ray's algo I
3086    cycles for Ray+mul

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
64      bytes for Ray+mul

dwtoa                                   1234567
dw2a                                    1234567
dw2str                                  1234567
MasmBasic Str$()                        1234567
Ray's algo I                            001234567
Ray+mul
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 02, 2024, 08:42:46 PM
QuoteYou'd need the magic number for each base you wanted to use.
n/base = n*(1/base) ; Not all bases have magic numbers,so "div" can't change into "mul" sometimes.
Only 10,19,28,37,46,55 is a magic number in 2-60.

QuoteCheck if a number is magic (Recursive sum of digits is 1)
A number is said to be a magic number, if the sum of its digits are calculated till a single digit recursively by adding the sum of the digits after every addition. If the single digit comes out to be 1,then the number is a magic number.

For example-
Number= 50113
=> 5+0+1+1+3=10
=> 1+0=1
This is a Magic Number

For example-
Number= 1234
=> 1+2+3+4=10
=> 1+0=1
This is a Magic Number
ChkMagicNumber proc uses rbx n:QWORD
    Local sum:QWORD

    mov    rax,n
    mov    sum,0
    mov    rcx,10
    .while (rax > 0) || (sum > 9)
        .if    rax == 0
            mov    rax,sum
            mov    sum,0
        .endif
        xor    rdx,rdx
        div    rcx
        mov    rbx,sum
        add    rbx,rdx
        mov    sum,rbx
    .endw
    mov    rax,sum    ;rax = 1, n is a magic number.
    ret

ChkMagicNumber endp

Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 02, 2024, 08:58:34 PM
Quote from: six_L on April 02, 2024, 08:42:46 PMOnly 10,19,28,37,46,55 is a magic number in 2-60.

Oops, that's bad news :sad:

So we'll better stick with Ray's div ebx. It's a very elegant algo indeed.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 03, 2024, 12:12:55 AM
Quote from: jj2007 on April 02, 2024, 08:58:34 PMSo we'll better stick with Ray's div ebx. It's a very elegant algo indeed.
But now you will have to make another decision. Which is valid, 0x0A or 0x0a.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 03, 2024, 12:22:46 AM
Recursion can also be used to very simply handle unzipping zips within zips, or any like problem.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: daydreamer on April 03, 2024, 12:54:31 AM
Quote from: ahsat on April 03, 2024, 12:22:46 AMRecursion can also be used to very simply handle unzipping zips within zips, or any like problem.
I wonder if you have skill with chess recursive code to climb up and down a tree until find best chess move? 
Simpler code is do it with tic tac toe
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 03, 2024, 01:08:53 AM
Magic number generator by qWord
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 03, 2024, 01:14:44 AM
Hi,jj2007

Quotethat's bad news
That's a good news because of avoid to waste your unnecessary times.

regard
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 03, 2024, 02:18:04 AM
Finally got something to beat dwtoa consistently using Rays recursion technique.  (Apologies to Ray, we're trying to beat dwtoa in this thread, not trash your fine general purpose algorithm.)

ctAnyToAnyMul:  ; setup
  test eax,eax
  jnz sign
  mov word ptr [edi],30h
  ret

sign:
    jns pos
    mov byte ptr [edi],'-'
    neg eax
    add edi, 1
pos:
   push ebx
   mov ecx,0CCCCCCCDh  ; 8 * 1/10
   call ctanydoit
   pop ebx
   ret

align 16
ctanydoit:
  push edx ; save edx, or a digit
      mov ebx,eax         ; save original
      mul ecx             ; 0CCCCCCCDh 8 * 1/10   num * 1/10 magic number
      shr edx, 3          ; magic number fixup (divide by 8)
      mov eax,edx
      lea edx,[edx*4+edx] ; *5
      add edx,edx         ; *10
      sub ebx,edx         ; remainder
      mov edx,ebx     
   test eax,eax
   .if !zero?
      call ctanydoit ; generate next digit
   .endif
   add dl,'0'
   mov [edi],dl
   inc edi
   pop edx ; restore edx, or get next digit
   ret

Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz (SSE4)

5984    cycles for 100 * dwtoa
5921    cycles for 100 * Ray+mul

5979    cycles for 100 * dwtoa
5905    cycles for 100 * Ray+mul

5976    cycles for 100 * dwtoa
5877    cycles for 100 * Ray+mul

5978    cycles for 100 * dwtoa
5882    cycles for 100 * Ray+mul

Averages:
5978    cycles for dwtoa
5894    cycles for Ray+mul

20      bytes for dwtoa
104    bytes for Ray+mul

dwtoa                                  -123456789
Ray+mul                                -123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 03, 2024, 02:46:08 AM
Okay, those test are probably invalid.  I added a duplicate in TestG, called it Ray+mul2, and it ran slower than the other two, even though the code was identical.  Maybe a page crossing?  Which one is faster depends on where it sits in the program and in memory I guess.  But the new routine is at least as fast as dwtoa.

6014    cycles for 100 * dwtoa
5975    cycles for 100 * Ray+mul
6083    cycles for 100 * Ray+mul2

6038    cycles for 100 * dwtoa
5914    cycles for 100 * Ray+mul
6111    cycles for 100 * Ray+mul2

6041    cycles for 100 * dwtoa
5886    cycles for 100 * Ray+mul
6074    cycles for 100 * Ray+mul2

6032    cycles for 100 * dwtoa
5870    cycles for 100 * Ray+mul
6108    cycles for 100 * Ray+mul2

Averages:
6035    cycles for dwtoa
5900    cycles for Ray+mul
6096    cycles for Ray+mul2

20      bytes for dwtoa
104     bytes for Ray+mul
104     bytes for Ray+mul2

dwtoa                                   -123456789
Ray+mul                                 -123456789
Ray+mul2                                -123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 03, 2024, 02:56:09 AM
Quote from: daydreamer on April 03, 2024, 12:54:31 AMI wonder if you have skill with chess recursive code
No, but I did a tic tac toe in fortran once, without recursion.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 03, 2024, 03:38:58 AM
Hi,all
Perhaps the Ray's algorithm has an important research and application field in compression/encryption/large number calculation. since it includes "any".

regard
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 03, 2024, 04:31:31 AM
Quote from: jimg on April 03, 2024, 02:46:08 AMMaybe a page crossing?
I think you can force either or both to a page boundary with an "Align 4096" and then a "db 4090 dup (?)" and then the code.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 03, 2024, 05:40:41 AM
Can someone here explain how these magic numbers work (for example, how you can use them to turn a divide into a multiply)? In a simple way? My non-higher-math head hurts from all this.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: Biterider on April 03, 2024, 06:13:14 AM
Hi
The web is full of good examples. 
This technique is now part of compiler optimisations.

https://www.bartol.udel.edu/mri/sam/Athlon_code_optimization_guide.pdf (https://www.bartol.udel.edu/mri/sam/Athlon_code_optimization_guide.pdf)   Integer Optimizations
https://www.complang.tuwien.ac.at/papers/ertl19kps.pdf (https://www.complang.tuwien.ac.at/papers/ertl19kps.pdf)  Good intro
https://gmplib.org/~tege/divcnst-pldi94.pdf (https://gmplib.org/~tege/divcnst-pldi94.pdf)  Original publication by Torbjörn Granlund (~1991)

Biterider

Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 03, 2024, 06:28:08 AM
Thanks, but all of those are too technical or too mathematically abstract. Can't someone here explain, in simple terms, how it works?
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: Biterider on April 03, 2024, 06:34:57 AM
Hi
In chapter "2 Background" of the second link, it is explained very well.

Biterider
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 03, 2024, 07:08:35 AM
Quote from: NoCforMe on April 03, 2024, 06:28:08 AMhow it works?
Simply stated, you can divide by a number, or multiply by it's reciprocal and get the same result. The reciprocal of a number is dividing 1 by the number.

For example 1 divided by 2 is 0.5, thus 0.5 is the reciprocal of 2. So we can divide a number by 2, or we can multiply by 0.5 and get the same result.

Very simply stated, what they are doing is multiplying by the reciprocal of 10, instead of dividing my 10. But its the remainder of the divide they want, so that is what the rest of the code is about. I have not studied the code to see how they come up with the remainder.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 03, 2024, 07:14:28 AM
Thanks! That's what I was after, a simple explanation rather than a 5,000-word treatise on processor optimizations or whatever.

Still not sure how 0CCCCCCCDh works out to be the reciprocal of 10 ...
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 03, 2024, 07:42:47 AM
0ccccccc..ccc = 0.0001100110011001100110011..... in binary
you know the stuff to the left of the binary point are powers of two, 2^1,2^2,2^3, etc.
the stuff to the right are also, 2^-1,2^-2,2^-3,etc.  (1/2, 1/4,1/8,1/16,etc)
so in .00001100110011 we have 0*1/2 + 0*1/4 + 0*1/8 + 1*1/16 + 1*1/32 + 0*1/64 etc. so
                    sum
  1/16=.0625       .0625
+ 1/32=.03125      .09375
+ 1/256=.00390625  .09765625
etc, approaching .1
The final D is just the remaining CCCC...  rounded up.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 03, 2024, 08:02:31 AM
OK ... I guess.

What's still hard to wrap my head around here is that what we're actually doing is multiplying, by a very large non-zero number, so I still don't get how we get multiplication by the reciprocal powers of two (2^-1, 2^-2, etc.) from that operation. How does that work? How do we get that "virtual" binary point?

Thanks to all who are putting up with my simple-mindedness here.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 03, 2024, 08:35:18 AM
Quote from: NoCforMe on April 03, 2024, 07:14:28 AMStill not sure how 0CCCCCCCDh works out to be the reciprocal of 10 ...
Neither am I, but I understand what they are doing, but not the details.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 03, 2024, 08:56:04 AM
we are throwing away the overflow and keeping only the bits that line up the way we want.  If you look closely, we are actually multiplying by 8*1/10 to get things lined up.  Later down we shift edx over 3 places to divide out the extra 8.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 03, 2024, 10:05:44 AM
okay, here is the latest and final for me.  rearranged some registers and scrapped of a tiny bit of time.

ctAnyToAnyMul:  ; setup   (needs a better name)
   test eax,eax
   jnz @f
   mov word ptr [edi],30h
   ret

@@: jns @f
    mov byte ptr [edi],'-'
    neg eax
    add edi, 1
@@: mov ecx,0CCCCCCCDh  ; 8 * 1/10
    call ctanydoit
    ret

align 16
ctanydoit:             ; this one needs a better name also
   push ebx            ; save ebx, or a digit
   mov ebx,eax         ; save original
   mul ecx             ; 0CCCCCCCDh 8 * 1/10   num * 1/10 magic number
   shr edx, 3          ; magic number fixup (divide by 8)
   mov eax,edx         ; save answer
   lea edx,[edx*4+edx] ; *5
   add edx,edx         ; *10
   sub ebx,edx         ; remainder
   test eax,eax
   .if !zero?
      call ctanydoit ; generate next digit
   .endif
   add bl,'0'
   mov [edi],bl
   inc edi
   pop ebx ; restore ebx, or get next digit
   ret
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 03, 2024, 10:33:49 AM
Quote from: jimg on April 03, 2024, 10:05:44 AMokay, here is the latest and final for me.
How was the timing? Post what you got.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 03, 2024, 11:00:09 AM
Quote from: jimg on April 03, 2024, 10:05:44 AMokay, here is the latest and final for me

Well done :thumbsup:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
Averages:
4483    cycles for dwtoa
2836    cycles for dw2str
16663   cycles for MasmBasic Str$()
13924   cycles for Ray's algo I
3870    cycles for Ray's algo, mod JimG

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
153     bytes for Ray's algo, mod JimG

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo I                            171510507
Ray's algo, mod JimG                    -123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 03, 2024, 11:22:50 AM
Thank you jimg,

For big integers use my faster code.

Would you like to include my code in your speed test?

; Int_to_string - Convert a big integer into an string
; IN:  RAX = binary integer
;      RCX = address of location to store string

align 16
db 15 dup(90h)
 
i2aA      proc       
          push   rdi
          xor    r9d, r9d
          mov    rdi, rcx
          mov    r8d, 10
          xor    ecx, ecx
          xor    edx, edx
@@1:
          div     r8                                                    ; divide by the number-base 
          shl     rcx, 8
          or      edx, 30h
          add     r9,  1 
          xchg    dl,  cl
          test    eax, eax
          je      @@3
          cmp     r9d, 8
          jne     @@1   
          push    rsi
          mov     rsi,rcx
          xor     ecx, ecx
@@2:
          div     r8                                                    ; divide by the number-base 
          shl     rcx, 8
          or      edx, 30h
          inc     r9                                                    ; r9 -> counter
          xchg    dl,  cl
          test    eax, eax
          jne     @@2   
          mov     [rdi],rcx
          mov     [rdi+r9-8],rsi
          pop     rsi       
          pop     rdi
          ret
align 16
@@3:
          mov    [rdi],rcx
          pop    rdi
          ret
i2aA      endp

Note:
The code for the minus sign and leading zeros must be kept out of the algorithm.
It's not normal to bloat the algorithm with 99.99% unusable code.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 03, 2024, 11:28:46 AM
Quote from: jimg on April 03, 2024, 02:46:08 AMWhich one is faster depends on where it sits in the program and in memory I guess

Thanks, Ray. I checked that and you are right, there was a problem with the align_64 macro that I borrowed from Nidud (https://masm32.com/board/index.php?topic=4545.msg48734) years ago.

The latest test (https://masm32.com/board/index.php?msg=128545) (above, reply #68) has an align 256. That assembles with UAsm only, though. M$ ml.exe throws an error.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 03, 2024, 11:33:45 AM
Quote from: lingo on April 03, 2024, 11:22:50 AMWould you like to include my code in your speed test?

My speed test is 32-bit code. Besides, we all know how slow div is - no need to include a slow algo.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 03, 2024, 12:00:18 PM
Quote from: jj2007 on April 03, 2024, 11:28:46 AM
Quote from: jimg on April 03, 2024, 02:46:08 AMWhich one is faster depends on where it sits in the program and in memory I guess

Thanks, Ray. I checked that and you are right, there was a problem with the align_64 macro that I borrowed from Nidud (https://masm32.com/board/index.php?topic=4545.msg48734) years ago.

The latest test (https://masm32.com/board/index.php?msg=128545) (above, reply #68) has an align 256. That assembles with UAsm only, though. M$ ml.exe throws an error.

I am glad you got it working.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 03, 2024, 01:02:51 PM
Quote from: NoCforMe on April 03, 2024, 08:02:31 AMOK ... I guess.

What's still hard to wrap my head around here is that what we're actually doing is multiplying, by a very large non-zero number, so I still don't get how we get multiplication by the reciprocal powers of two (2^-1, 2^-2, etc.) from that operation. How does that work? How do we get that "virtual" binary point?

Thanks to all who are putting up with my simple-mindedness here.

You know how exponential number work, right?

2000 * 300 = (2.0*10^3) * (3.0*10^2) =  (2*3)*10^(3+2)= 6* 10^5 = 600000
to multiply exponentials, you multiply the mantissas and add the exponents
Same for binary, it's just powers of 2 rather than 10.

So, if you look a the hex representation of .1 in IEEE754 floating point it looks like 3dcccccd hex.
The leftmost bit is the sign, followed by 8 bits of exponent followed by the mantissa with an implied 1 as the high bit.

exp=7bh=123, exponents are stored with 127 added, so 123-127=-4, so exponent is 2^-4
The mantissa for 1/10 is cccccdh.  look familiar?

so to multiply by 1/10 you multiply your numbers mantissa by that mantissa, and adjust the exponent as needed.
The mantissa of your number is just the number with the exponent of (10^0=1)  e.g.  1234 is  1234*10^0

Try these if you are hazy on floating point numbers.
FloatConverter (https://www.h-schmidt.net/FloatConverter/IEEE754.html)
FloatToHex (https://gregstoll.com/~gregstoll/floattohex/)






Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 03, 2024, 01:06:22 PM
OK, I do understand floating-point. But what I don't understand is how you get an integer value (say 0CCCCCCCDh) to behave as if it were a floating-point value. It's the mechanics of the procedure that escape me: would you mind going through the process, step by step, including the shifts and all, for dummies like me?

I could actually use some pictures and diagrams (like in Alice's Restaurant: 8x10 color glossy photographs with circles and arrows and a paragraph on the back of each one explainin' what each one was).
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 03, 2024, 01:15:53 PM
okay, that's going to have to wait for tomorrow.

But you do see that magic number is the mantissa of .1, right?

So you multiply that mantissa by the mantissa of the number you want to divide by 10.

I didn't design this system, and it took me a while to figure it out.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 03, 2024, 01:41:21 PM
Well, I have to give you points for seeming to understand it.

I can wait.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: sinsi on April 03, 2024, 03:21:26 PM
It's called a magic number for a reason  :biggrin:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: TimoVJL on April 03, 2024, 07:16:15 PM
AMD Athlon(tm) II X2 220 Processor (SSE3)
61      bytes for other

8105    cycles for 100 * dwtoa
6404    cycles for 100 * dw2str
48371   cycles for 100 * MasmBasic Str$()
34694   cycles for 100 * Ray's algo I
10410   cycles for 100 * Ray's algo, mod JimG

8207    cycles for 100 * dwtoa
6404    cycles for 100 * dw2str
26515   cycles for 100 * MasmBasic Str$()
34499   cycles for 100 * Ray's algo I
10303   cycles for 100 * Ray's algo, mod JimG

8109    cycles for 100 * dwtoa
6405    cycles for 100 * dw2str
26517   cycles for 100 * MasmBasic Str$()
34405   cycles for 100 * Ray's algo I
10303   cycles for 100 * Ray's algo, mod JimG

8104    cycles for 100 * dwtoa
6413    cycles for 100 * dw2str
27009   cycles for 100 * MasmBasic Str$()
34508   cycles for 100 * Ray's algo I
10303   cycles for 100 * Ray's algo, mod JimG

Averages:
8107    cycles for dwtoa
6404    cycles for dw2str
26763   cycles for MasmBasic Str$()
34504   cycles for Ray's algo I
10303   cycles for Ray's algo, mod JimG

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
153     bytes for Ray's algo, mod JimG

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo I                            171510507
Ray's algo, mod JimG                    -123456789

--- ok ---
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 03, 2024, 10:15:32 PM
For the sake of completeness, Ray's algo used for converting to a bin$:
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
29156   cycles for crt__itoa
928     cycles for MasmBasic Bin$()
45257   cycles for Ray's algo

26      bytes for crt__itoa
16      bytes for MasmBasic Bin$()
110     bytes for Ray's algo

crt__itoa                               100101101011010000111
MasmBasic Bin$()                        00000000000100101101011010000111
Ray's algo                              00000000000100101101011010000111

As expected, the div ebx with ebx=2 is not efficient, compared to a simple sar eax, 1.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 04, 2024, 01:21:34 AM
One more bin-to-dec, with a modification proposed by JimG:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4440    cycles for dwtoa
2843    cycles for dw2str
16348   cycles for MasmBasic Str$()
13864   cycles for Ray's algo I
3767    cycles for Ray's algo, mod JimG

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
1129    bytes for Ray's algo, mod JimG

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo I                            171510507
Ray's algo, mod JimG                    -123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 04, 2024, 04:59:41 AM
Quote from: NoCforMe on April 03, 2024, 01:41:21 PMWell, I have to give you points for seeming to understand it.

I can wait.

I've given this a lot of thought, and I'm not good enough to explain this more than I have.  If there is something in particular you don't understand, I'd be happy to explain that, but a blanket explain everything is beyond me.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 04, 2024, 05:29:49 AM
Hi,Lingo
Quote00001220 clock cycles, (Ray_AnyToAny)x1000, OutPut: 987654321098
00001130 clock cycles, (lingo_i2aA)x1000, OutPut: 987654321098
00000416 clock cycles, (Roberts_dqtoa)x1000, OutPut: 987654321098

Roberts_dqtoa:
dqtoa proc uses rbx rsi rdi dqValue:QWORD, lpBuffer:QWORD
; -------------------------------------------------------------
; convert QWORD to ascii string
; dqValue is value to be converted
; lpBuffer is the address of the receiving buffer
; EXAMPLE:
; invoke dqtoa,dqValue,ADDR buffer
; -------------------------------------------------------------

mov rax, dqValue
mov rdi, lpBuffer
test rax,rax
jnz @pos
@zero:
mov word ptr [rdi],30h
jmp @exit
@pos:
mov rcx, 0cccccccccccccccdh ; 8 * 1/10
mov rsi, rdi
.while (rax > 0)
mov rbx,rax ; save original
mul rcx ; num * 1/10 magic number. rax/10 = rax* magic number
shr rdx, 3 ; magic number fixup
mov rax,rdx ;
lea rdx,[rdx*4+rdx] ; *5
add rdx,rdx ; *10
sub rbx,rdx ; remainder
add bl,'0' ; get the ascii value
mov [rdi],bl ; save it
add rdi, 1 ; get next digit
.endw
mov byte ptr [rdi], 0       ; terminate the string

; We now have all the digits, but in reverse order.
.while (rsi < rdi)
sub rdi, 1
mov al, [rsi]
mov ah, [rdi]
mov [rdi], al
mov [rsi], ah
add rsi, 1
.endw
@exit:
ret

dqtoa endp
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 04, 2024, 05:35:24 AM
Quote from: jimg on April 04, 2024, 04:59:41 AM
Quote from: NoCforMe on April 03, 2024, 01:41:21 PMWell, I have to give you points for seeming to understand it.

I can wait.
I've given this a lot of thought, and I'm not good enough to explain this more than I have.  If there is something in particular you don't understand, I'd be happy to explain that, but a blanket explain everything is beyond me.

Maybe you're better at explaining than you think you are.

Let's try this: can you explain, in some detail, how the bits of the number being "divided" (multiplied) and the magic number get shuffled and manipulated to come up with the final answer? This would explain how the virtual binary point comes into play, and how we turn a lowly integer (like 0CCCC ... CCCDh) into a virtual floating-point number. Possible?

I know I'm asking a lot, but at this point I really do not understand how this works. And that bothers me. I don't like taking things that I don't understand for granted.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 04, 2024, 05:44:37 AM
Quote from: six_L on April 04, 2024, 05:29:49 AMlea   rdx,[rdx*4+rdx]

That can be made faster, see reply #80.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 04, 2024, 07:08:19 AM
Thank you six_L, :thumbsup:

Perfect!
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 04, 2024, 07:35:50 AM
Quote from: NoCforMe on April 04, 2024, 05:35:24 AMLet's try this: can you explain, in some detail, how the bits of the number being "divided" (multiplied) and the magic number get shuffled and manipulated to come up with the final answer? This would explain how the virtual binary point comes into play, and how we turn a lowly integer (like 0CCCC ... CCCDh) into a virtual floating-point number. Possible?

I know I'm asking a lot, but at this point I really do not understand how this works. And that bothers me. I don't like taking things that I don't understand for granted.



Okay, you made me take a closer look and found out my assumptions were wrong all along.
The answers come out correct, the algorithm works, it just that it seems the extra
multiply by 8, divide by 8 was unnecessary as far as I can see.  I'll have to write
a test program before I'm 100% sure, but it looks like it from here.
So, removing the factor of eight means we save a shr,3 instruction.
The "magic" number now becomes 1/8 less or 19999999 hex.  Strange coincidence with
a decimal looking number, but it's not.  Originally, I thought the extra factor of eight
was required to maximize the precision of the numbers, left justifying the binary in the
maximum 32 bits.  For now, it seems it's unneccessary.  I tested on the min and max
32 bit binary number (minimum value of -2,147,483,648 and a maximum value of 2,147,483,647)
and it seems to work.
This was my original explanation, in case it turns out we do need the extra *8
"              1/10 is a repeating number in binary  0.000110011001100....
              we multiply by 8 to maximize the precision= .11001100110011001100
              on the bottom end, we slid in additional bits from the repeating binary
              then, since there is more repeating binary left, we round up to a D
              We do all this because, without full 32 bits of precision, the last
              digit of our result would suffer from truncation error and would be
              incorrect.
"
What hogwash.

Anyway, here is my new dialog-


  eax = number in binary = 1234

  mov ebx,eax        ; save original for subtraction below to get remainder
  mul ecx            ; 19999999h = 1/10 
              1/10 in binary = 0.00011001100110011001100110011001 = 19999999 hex

        1234 * 19999999h = 1234 * 429,496,729.625 = 529,998,964,357.25 = 7B66666685 hex
                        7B66666685 =>  edx=7B eax=66666685  edx= 123
                           edx is the overflow, from the multiplication, the integer part of the answer
                                  the fractional part being in eax.
  mov eax,edx        ; save answer
  lea edx,[edx*4+edx] ; *5  = 123*5 = 615
  add edx,edx        ; * = *10 total = 1230
  sub ebx,edx        ; remainder  1234-1230 = 4  our last digit,
                          which we push onto the stack for saving
                         
                      next time through we have 123 in eax and do the same thing
                     
  mov ebx,eax        ; save original 123
  mul ecx            ; 19999999h = 1/10  gives 12 in edx, don't care about eax
  mov eax,edx        ; save answer
  lea edx,[edx*4+edx] ; *5 12*5=60
  add edx,edx        ; *2=120
  sub ebx,edx        ; remainder 123-120 = 3, our next digit
 
  etc until our remainder was zero, then we are done.
  at the end we pop the digits off the stack and save them.  You know this.
 
  I hope this helps.  If not, ask again.


Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 04, 2024, 08:50:09 AM
Quote from: six_L on April 04, 2024, 05:29:49 AMRoberts_dqtoa:

No negative numbers?
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 04, 2024, 10:09:28 AM
Quote from: jimg on April 04, 2024, 07:35:50 AMOkay, you made me take a closer look and found out my assumptions were wrong all along.
Wow. Very impressive answer. I haven't had time to delve into it yet, but I will. Thanks!
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 04, 2024, 10:40:09 AM
Thank you  six_L :thumbsup:

Would you like to include my new 64 bit code in your perfect speed test, together with the Robert's code?

; Int_to_string - Convert a big integer into an string
; IN:  RAX = binary integer
;       RCX = address of location to store string
; OUT: RAX = new start of the string in the buffer
align 16
db 90h
 
i2a64   proc   
        push rbx
        lea  r9,  [rcx+24]                   ; lpBuffer
        mov  r8,  0cccccccccccccccdh         ; 8 * 1/10
@@:       
        mov  rbx, rax
        mul  r8
        shr  rdx, 3
        sub  r9,  1
        lea  rcx, [rdx*4+rdx-18h]
        lea  rax, [rcx+rcx]
        sub  rbx, rax
        mov  rax, rdx
        mov  [r9],bl
        test eax, eax   
        jnz  @b
        mov  rax, r9                          ; return in RAX new start of the buffer
        pop  rbx
        ret
i2a64   endp
Note:
The code for the minus sign and leading zeros must be kept out of the algorithm.
It's not normal to bloat the algorithm with 99.99% unusable code.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 04, 2024, 10:52:19 AM
So....
I know I said I was done before, but with all the changes, here's the latest.
You can blame NoCforMe.


.DATA?
align 4
resultJim        db 16 dup(?)
.CODE
alignProc
TestG_s:
db 2 dup (?) ; dummy spacer to align instructions best
ctAnyToAnyMulJim:  ; setup  (needs a better name)
  test eax,eax
  jnz @f
  mov word ptr [edi],30h
  ret

@@: jns @f
    mov byte ptr [edi],'-'
    neg eax
    add edi, 1
@@: mov ecx,19999999h  ; 1/10
    call ctanydoit
    ret

align 16
nop
ctanydoit:            ; this one needs a better name also
  push ebx            ; save ebx, or a digit
  mov ebx,eax        ; save original
  mul ecx            ; 19999999h = 1/10
  mov eax,edx        ; save answer
  lea edx,[edx*4+edx] ; *5
  add edx,edx        ; *10
  sub ebx,edx        ; subtract from original gives remainder digit
  test eax,eax
  .if !zero?
      call ctanydoit ; generate next digit
  .endif
  add bl,'0'
  mov [edi],bl
  inc edi
  pop ebx ; restore ebx, or get next digit
  ret



results:

Averages:
5989    cycles for dwtoa
5380    cycles for dw2str
22244  cycles for MasmBasic Str$()
14633  cycles for Ray's algo I
5313    cycles for Ray's algo, mod JimG

Averages:
5862    cycles for dwtoa
5386    cycles for dw2str
22392  cycles for MasmBasic Str$()
14412  cycles for Ray's algo I
5341    cycles for Ray's algo, mod JimG

Averages:
5861    cycles for dwtoa
5376    cycles for dw2str
22837  cycles for MasmBasic Str$()
14746  cycles for Ray's algo I
5310    cycles for Ray's algo, mod JimG

Averages:
5856    cycles for dwtoa
5415    cycles for dw2str
22320  cycles for MasmBasic Str$()
14484  cycles for Ray's algo I
5230    cycles for Ray's algo, mod JimG

That four consecutive passes, no cherry picking.

Not bad for having used dwtoa for a couple decades.  Even beat that cheater dw2str.

Of course, now we need to update all those other routines with the changes (use the new 1/10 and don't shift) to get fair comparisons.

I've attach JJ's routine with changes I made because some things were skewing the results.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 04, 2024, 11:15:24 AM
Quote from: jimg on April 04, 2024, 10:52:19 AMI know I said I was done before
A good programmer is never done, always looking for a better way. Its best to say, "done for now".

You certainly did a very good job of adapting the algorithm for base 10.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 04, 2024, 11:35:51 AM
Well, with that being said......

If you weren't impressed with the last one, hang onto your socks.

I did something I've been meaning to try all along.
Personally I dislike recursion in the strongest terms.
So I got rid of the recursion.


.DATA?
align 4
resultJim2        db 16 dup(?)
.CODE
alignProc
TestH_s:
db 2 dup (?) ; dummy spacer to align instructions best
dword2ascii:  ; setup
  test eax,eax
  jnz @f
  mov word ptr [edi],30h
  ret

@@: jns @f
    mov byte ptr [edi],'-'
    neg eax
    add edi, 1
@@: mov ecx,19999999h  ; 1/10
    push esi
    mov esi,0
    call dword2asciiloop
pop esi
    ret

align 16
  nop              ; code optimization
dword2asciiloop:
  push ebx            ; save ebx, or a digit
  inc esi
  mov ebx,eax        ; save original
  mul ecx            ; 19999999h = 1/10
  mov eax,edx        ; save answer
  lea edx,[edx*4+edx] ; *5
  add edx,edx        ; *10
  sub ebx,edx        ; subtract from original gives remainder digit
  test eax,eax        ; are we done?
  jnz dword2asciiloop
  .repeat
      add bl,'0'
      mov [edi],bl
      inc edi
      pop ebx ; restore ebx, or get next digit
      dec esi
  .until zero?
  ret



results:

Averages:
5932    cycles for dwtoa
5363    cycles for dw2str
22231  cycles for MasmBasic Str$()
5204    cycles for Ray's algo, mod JimG
4581    cycles for jgnorecurse

Averages:
5859    cycles for dwtoa
5376    cycles for dw2str
22270  cycles for MasmBasic Str$()
5282    cycles for Ray's algo, mod JimG
4596    cycles for jgnorecurse

Averages:
5860    cycles for dwtoa
5376    cycles for dw2str
22506  cycles for MasmBasic Str$()
5282    cycles for Ray's algo, mod JimG
4595    cycles for jgnorecurse

Averages:
5855    cycles for dwtoa
5376    cycles for dw2str
22690  cycles for MasmBasic Str$()
5264    cycles for Ray's algo, mod JimG
4593    cycles for jgnorecurse

dwtoa                                  -123456789
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo, mod JimG                    -123456789
jgnorecurse                            -123456789


Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 04, 2024, 11:39:10 AM
Quote from: jimg on April 04, 2024, 10:52:19 AMEven beat that cheater dw2str

How dare you :rofl:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
Averages:
4618    cycles for dwtoa
2994    cycles for dw2str
3930    cycles for Ray's algo, mod JimG
3040    cycles for dw2$X

Averages:
4620    cycles for dwtoa
2994    cycles for dw2str
3951    cycles for Ray's algo, mod JimG
3044    cycles for dw2$X

Averages:
4634    cycles for dwtoa
3000    cycles for dw2str
3975    cycles for Ray's algo, mod JimG
3036    cycles for dw2$X

> If you weren't impressed with the last one, hang onto your socks.
P.S.: socks algo not yet included, sorry - it's 2:40 AM here...
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 04, 2024, 12:10:44 PM
And a final cleanup.  Only gained about 200 from the change.


alignProc
TestH_s:
db 2 dup (?) ; dummy spacer to align instructions best
dword2ascii:  ; setup
    test eax,eax
    jnz @f
    mov word ptr [edi],30h
    ret
@@: jns @f
    mov byte ptr [edi],'-'
    neg eax
    add edi, 1
@@: mov ecx,19999999h   ; 1/10
    push esi
    mov esi,0
    .repeat
      push ebx            ; save ebx, or a digit
      inc esi
      mov ebx,eax         ; save original
      mul ecx             ; 19999999h = 1/10
      mov eax,edx         ; save answer
      lea edx,[edx*4+edx-24] ; *5  also subtract out half of '0' (24) to convert last byte to ascii  ala lingo
      add edx,edx         ; *10    double it, so full value of '0' is covered.  so -48 extra to dl
      sub ebx,edx         ; subtract from original gives remainder digit  -(-48) =+48=+'0' bl now in ascii
      test eax,eax        ; are we done?
    .until zero?
    .repeat
      mov [edi],bl
      inc edi
      pop ebx      ; restore ebx, or get next digit
      dec esi
    .until zero?
    pop esi
    ret

NameH equ <jgnorecurse>
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 04, 2024, 01:05:30 PM
Quote from: jimg on April 04, 2024, 12:10:44 PMAnd a final cleanup.
Very good job, excellent work.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 04, 2024, 01:31:03 PM
Just read the comments.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 04, 2024, 01:37:50 PM
I'm using Jochen's trick here.  I didn't see any gain in speed, but I left it in anyway-

    lea edx,[edx*4+edx-26] ; *5    also subtract out half of '0' (26) to convert last byte to ascii
    add edx,edx         ; *10      double it, so full value of '0' is covered.  so -52 extra to dl
    sub ebx,edx         ; subtract from original gives remainder digit  -(-52) =+52=+'0'   bl now in ascii

I'll add these to the code above
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 04, 2024, 03:08:53 PM
QuoteI'm using Jochen's trick here.  I didn't see any gain in speed, but I left it in anyway-

    lea edx,[edx*4+edx-26] ; *5    also subtract out half of '0' (26) to convert last byte to ascii
    add edx,edx        ; *10      double it, so full value of '0' is covered.  so -52 extra to dl
    sub ebx,edx        ; subtract from original gives remainder digit  -(-52) =+52=+'0'  bl now in ascii
26*2=52='4'  -> ASCII code

This trick was stolen from my post (see page 6 bottom in my post)  by a very stupid thief, ha-ha-hah... :badgrin:  :badgrin:  :badgrin:  :skrewy:
The true number in my post is for 18h=24 * 2 = 30h=48 = '0'
      lea  rcx, [rdx*4+rdx-18h]
        lea  rax, [rcx+rcx]
 ...........

Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 04, 2024, 03:34:38 PM
You are absolutely correct.  My appologies.  I'm going to see why it works.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 04, 2024, 03:42:13 PM
This is totally my fault.  I tried to modify one of Jochen's test procs to test my new proc, and I missed where I had to change one value, and it picked up the results from the previous tests answers.  No excuse, just stupidity.  Had I been printing the correct results, I would have immediately seen that something was wrong.   Stupid.
Again, my apologies for not crediting the correct person.  But anyway, I'll fix the above code.
Please don't start a war over this, this was totally unintentional.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 04, 2024, 04:08:14 PM
On another note, I said above that dw2str was a cheater.   That because it requires one to preset what the length of the answer is going to be.  But of course, one doesn't know what the length is going to be.  By presetting it, it saves all the code to reverse the string.  Not usable unless you know you want a right justified string, which is totally possible of being the case, just not for this test.

Unless I'm completely misreading the code, which, given what happened just above, is possible.

One other caveat.  I'm optimizing this on an Intel computer.  If you are running AMD, use Jochen's result to choose a routine.  Here is my latest results for three consequtive runs fixing the screwup above-

Averages:
5862    cycles for dwtoa
5389    cycles for dw2str
22890   cycles for MasmBasic Str$()
5332    cycles for Ray's algo, mod JimG
4280    cycles for jgnorecurse

Averages:
5876    cycles for dwtoa
5377    cycles for dw2str
23038   cycles for MasmBasic Str$()
5293    cycles for Ray's algo, mod JimG
4256    cycles for jgnorecurse

Averages:
5865    cycles for dwtoa
5376    cycles for dw2str
22772   cycles for MasmBasic Str$()
5220    cycles for Ray's algo, mod JimG
4256    cycles for jgnorecurse

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
100     bytes for Ray's algo, mod JimG
80      bytes for jgnorecurse

dwtoa                                   -123456789
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo, mod JimG                    -123456789
jgnorecurse                             -123456789

Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 04, 2024, 04:11:49 PM
Hi,jj2007

QuoteNo negative numbers?

the lingo_i2aA hasn't the negative numbers.
For the sake of fair race, I made a bit modification to remove the negative numbers.

Hi,Lingo
1)
Quote00001088 clock cycles, (Ray_AnyToAny)x1000, OutPut: 987654321098
00000281 clock cycles, (lingo_i2a64)x1000, OutPut: 987654321098
00000386 clock cycles, (Roberts_dqtoa)x1000, OutPut: 987654321098
You are the fastest this time.

2)
if you have some idle time, help me check the X64timers.asm. whether is it correctly? it's been translated from 32bit asm.
;¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
  ; These two macros perform the grunt work involved in measuring the
  ; processor clock cycle count for a block of code. These macros must
  ; be used in pairs, and the block of code must be placed in between
  ; the counter_begin and counter_end macro calls. The counter_end macro
  ; returns the clock cycle count for a single pass through the block of
  ; code, corrected for the test loop overhead, in RAX.
  ;
  ; These macros require a .586 or higher processor directive.
  ;
  ; If your code is using MMX instructions and not executing an EMMS
  ; at the end of each MMX instruction sequence, defining the symbol
  ; _EMMS will cause the ctr_end macro to insert an EMMS in front of
  ; the FPU instructions.
  ;
  ; The loopcount parameter should be set to a relatively high value to
  ; produce repeatable results.
  ;
  ; Note that setting the priority parameter to REALTIME_PRIORITY_CLASS
  ; involves some risk, as it will cause your process to preempt *all*
  ; other processes, including critical Windows processes. Setting the
  ; priority parameter to HIGH_PRIORITY_CLASS instead will significantly
  ; reduce the risk, and in most cases will produce the same cycle count.
;------------------------------------------------------------------------------
counter_begin MACRO loopcount:REQ, priority
LOCAL label

        IFNDEF __counter__stuff__defined__
__counter__stuff__defined__ equ <1>
.data
__counter__pc__count__delta LARGE_INTEGER <>
__counter__pc__count__0 LARGE_INTEGER <>
__counter__pc__count__1 LARGE_INTEGER <>
__counter__pc__count__2 LARGE_INTEGER <>
__counter__pc__count__3 LARGE_INTEGER <>
__counter__loop__count__ dq 0
__counter__loop__counter__ dq 0
__counter__dq_count__ dq 0
.code
        ENDIF

        mov __counter__loop__count__, loopcount
        IFNB <priority>
invoke GetCurrentProcess
invoke SetPriorityClass, rax, priority
        ENDIF
invoke QueryPerformanceCounter, ADDR __counter__pc__count__0
        mov   __counter__loop__counter__, loopcount
@@: ;; Start an empty reference loop
        sub   __counter__loop__counter__, 1
        jnz   @B
invoke QueryPerformanceCounter, ADDR __counter__pc__count__1

mov   rax,__counter__pc__count__1.QuadPart
sub   rax,__counter__pc__count__0.QuadPart
mov   __counter__pc__count__delta.QuadPart,rax ;; Overhead count
       
mov   __counter__loop__counter__, loopcount
invoke QueryPerformanceCounter, ADDR __counter__pc__count__2
label: ;; Start test loop
        __counter__loop__label__ equ <label>
ENDM
;------------------------------------------------------------------------------
counter_end MACRO
        sub   __counter__loop__counter__, 1
        jnz   __counter__loop__label__

invoke QueryPerformanceCounter, ADDR __counter__pc__count__3

        invoke GetCurrentProcess
        invoke SetPriorityClass, rax, NORMAL_PRIORITY_CLASS

        IFDEF _EMMS
EMMS
        ENDIF

mov   rax,__counter__pc__count__3.QuadPart
sub   rax,__counter__pc__count__2.QuadPart
sub   rax,__counter__pc__count__delta.QuadPart
mov   __counter__pc__count__3.QuadPart,rax ;; count
       
finit
        fild  __counter__pc__count__3.QuadPart
        fild  __counter__loop__count__
        fdiv
        fistp __counter__dq_count__

        mov   rax, __counter__dq_count__
ENDM
;------------------------------------------------------------------------------
  ; These two macros perform the grunt work involved in measuring the
  ; execution time in milliseconds for a specified number of loops
  ; through a block of code. These macros must be used in pairs, and
  ; the block of code must be placed in between the timer_begin and
  ; timer_end macro calls. The timer_end macro returns the elapsed
  ; milliseconds for the entire loop in RAX.
  ;
  ; These macros utilize the high-resolution performance counter.
  ; The return value will be zero if the high-resolution performance
  ; counter is not available.
  ;
  ; If your code is using MMX instructions and not executing an EMMS
  ; at the end of each MMX instruction sequence, defining the symbol
  ; _EMMS will cause the timer_end macro to insert an EMMS in front of
  ; the FPU instructions.
  ;
  ; The loopcount parameter should be set to a relatively high value to
  ; produce repeatable results.
  ;
  ; Note that setting the priority parameter to REALTIME_PRIORITY_CLASS
  ; involves some risk, as it will cause your process to preempt *all*
  ; other processes, including critical Windows processes. Setting the
  ; priority parameter to HIGH_PRIORITY_CLASS instead will significantly
  ; reduce the risk, and in most cases will produce very nearly the same
  ; result.
;------------------------------------------------------------------------------
timer_begin MACRO loopcount:REQ, priority
LOCAL label

        IFNDEF __timer__stuff__defined__
__timer__stuff__defined__ equ <1>
.data
__timer__pc__frequency__    LARGE_INTEGER <>
__timer__pc__count__delta   LARGE_INTEGER <>
__timer__pc__count__0       LARGE_INTEGER <>
__timer__pc__count__1       LARGE_INTEGER <>
__timer__pc__count__2       LARGE_INTEGER <>
__timer__pc__count__3       LARGE_INTEGER <>
__timer__loop__counter__    dq 0
__timer__dq_count__         dq 0
.code
        ENDIF

        invoke QueryPerformanceFrequency, ADDR __timer__pc__frequency__
        .if rax != 0

IFNB <priority>
invoke GetCurrentProcess
invoke SetPriorityClass, rax, priority
ENDIF

invoke QueryPerformanceCounter, ADDR __timer__pc__count__0

mov   __timer__loop__counter__, loopcount
@@: ;; Start an empty reference loop
sub   __timer__loop__counter__, 1
jnz   @B

invoke QueryPerformanceCounter, ADDR __timer__pc__count__1

mov   rax,__timer__pc__count__1.QuadPart
sub   rax, __timer__pc__count__0.QuadPart
mov   __timer__pc__count__delta.QuadPart,rax ;; Overhead count

invoke QueryPerformanceCounter, ADDR __timer__pc__count__2 ;; Start test count

mov __timer__loop__counter__, loopcount
label: ;; Start test loop
__timer__loop__label__ equ <label>
        .endif
ENDM
;------------------------------------------------------------------------------
timer_end MACRO
        sub   __timer__loop__counter__, 1
        jnz   __timer__loop__label__

        invoke QueryPerformanceFrequency, ADDR __timer__pc__frequency__
        .IF rax != 0
invoke QueryPerformanceCounter, ADDR __timer__pc__count__3

invoke GetCurrentProcess
invoke SetPriorityClass, rax, NORMAL_PRIORITY_CLASS

IFDEF _EMMS
EMMS
ENDIF

mov   rax,__timer__pc__count__3.QuadPart
sub   rax, __timer__pc__count__2.QuadPart
sub   rax, __timer__pc__count__delta.QuadPart
mov   __timer__pc__count__3.QuadPart,rax ;; count

finit

fild  __timer__pc__count__3.QuadPart
fild  __timer__pc__frequency__.QuadPart
fdiv
mov   __timer__dq_count__, 1000000 ;ns*1000000=ms
fild  __timer__dq_count__
fmul
fistp __timer__dq_count__
mov   rax, __timer__dq_count__
        .ELSE
xor   rax, rax        ;; No performance counter
        .ENDIF
ENDM
;------------------------------------------------------------------------------
SpinUp MACRO
invoke Sleep, 1
mov rax, 100000000 ; tell the CPU it's needed
.Repeat
dec rax
.Until Sign?
ENDM
;¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
i modify the "counter_begin 10, HIGH_PRIORITY_CLASS" into "counter_begin 20, HIGH_PRIORITY_CLASS", there has no distinction.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 04, 2024, 04:56:06 PM
Thank you six_L,  :thumbsup:

Excellent!
That's better than ever. :greenclp:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 04, 2024, 05:39:26 PM
Hi,Lingo
Quote00001237 clock cycles, (Ray_AnyToAny)x1000, OutPut: 987654321098
00000297 clock cycles, (lingo_i2a64)x1000, OutPut: 987654321098
00000458 clock cycles, (Roberts_dqtoa)x1000, OutPut: 987654321098
00000319 clock cycles, (jimg_qword2ascii)x1000, OutPut: 98765431::98
jimg_qword2ascii:
;db 2 dup (?) ; dummy spacer to align instructions best
qword2ascii proc
test rax,rax
jnz @f
mov word ptr [rdi],30h
ret
;@@:
; jns @f
; mov byte ptr [rdi],'-'
; neg rax
; add rdi, 1
@@:
mov rcx,1999999999999999h   ; 1/10
push rsi
mov rsi,0
.repeat
push rbx ; save rbx, or a digit
inc rsi
mov rbx,rax         ; save original
mul rcx             ; 19999999h = 1/10
mov rax,rdx         ; save answer
lea rdx,[rdx*4+rdx-18h] ; *5  also subtract out half of '0' (24) to convert last byte to ascii  ala lingo
add rdx,rdx         ; *10    double it, so full value of '0' is covered.  so -48 extra to dl
sub rbx,rdx         ; subtract from original gives remainder digit  -(-48) =+48=+'0' bl now in ascii
test rax,rax ; are we done?
.until zero?
.repeat
mov [rdi],bl
inc rdi
pop rbx ; restore rbx, or get next digit
dec rsi
.until zero?
pop rsi
ret

qword2ascii  endp
but the jimg_qword2ascii is not right result. maybe i translated errorly.

regard
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 04, 2024, 07:53:29 PM
Quote from: lingo on April 04, 2024, 03:08:53 PMa very stupid thief, ha-ha-hah... :badgrin:  :badgrin:  :badgrin:  :skrewy:

See Bravo, Lingo! (https://masm32.com/board/index.php?topic=11831.0) in the Colosseum ;-)
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 04, 2024, 10:31:14 PM
Quote from: jimg on April 04, 2024, 12:10:44 PMAnd a final cleanup.

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
Averages:
4638    cycles for dwtoa
2986    cycles for dw2str
3225    cycles for jgnorecurse
3042    cycles for dw2$X

Averages:
4616    cycles for dwtoa
2988    cycles for dw2str
3232    cycles for jgnorecurse
3042    cycles for dw2$X

Averages:
4608    cycles for dwtoa
2982    cycles for dw2str
3210    cycles for jgnorecurse
3026    cycles for dw2$X

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
Averages:
7034    cycles for dwtoa
5718    cycles for dw2str
4764    cycles for jgnorecurse
6172    cycles for dw2$X

Averages:
7232    cycles for dwtoa
5883    cycles for dw2str
4932    cycles for jgnorecurse
6285    cycles for dw2$X

Averages:
7382    cycles for dwtoa
6289    cycles for dw2str
5239    cycles for jgnorecurse
6760    cycles for dw2$X

Looks like an Intel vs AMD problem...
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 05, 2024, 12:32:01 AM
That is a significant and surprising difference.   
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 05, 2024, 01:44:25 AM
Jochen-

Would you check this out if you get a chance?

This is the test program but with a conditional compile around test H.  On line 462 you should find

dotest=2  ; 1=test without DW2$X, else test with DS2$X

if dotest eq 1


Case 1 is just a copy of my routine for TestH.
Case 2 is your DS2$X routine.

When I run, my routine runs 120 cycles faster for case 1 than case 2.  This makes no sense to me, but it is consistent.  Does this happen on AMD also?
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 05, 2024, 02:15:48 AM
Quote from: jimg on April 05, 2024, 01:44:25 AMWould you check this out if you get a chance?

Averages:
4552    cycles for dwtoa
3041    cycles for dw2str
3176    cycles for jgnorecurse
2993    cycles for dw2$X

Averages:
4450    cycles for dwtoa
2845    cycles for dw2str
3072    cycles for jgnorecurse
3072    cycles for jgnorecurse2

Timings are not very consistent :rolleyes:

Same but with TimerLoops = 100000 (the default - you set it very low at 1000):

Averages:
4480    cycles for dwtoa
2870    cycles for dw2str
3100    cycles for jgnorecurse
3099    cycles for jgnorecurse2

Averages:
4597    cycles for dwtoa
2968    cycles for dw2str
3227    cycles for jgnorecurse
3016    cycles for dw2$X

Hmmmm...
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 05, 2024, 02:25:13 AM
Looks like it ran an extra 100 cycles on the first test, and an extra 127 on the second test, so you are seeing the same effect.  Strange.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: zedd on April 05, 2024, 03:04:00 AM
From reply #106

Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz (SSE4)
61      bytes for other

9439    cycles for 100 * dwtoa
7928    cycles for 100 * dw2str
7370    cycles for 100 * jgnorecurse
7980    cycles for 100 * dw2$X

9527    cycles for 100 * dwtoa
8051    cycles for 100 * dw2str
7511    cycles for 100 * jgnorecurse
7954    cycles for 100 * dw2$X

9533    cycles for 100 * dwtoa
8081    cycles for 100 * dw2str
7475    cycles for 100 * jgnorecurse
7945    cycles for 100 * dw2$X

9567    cycles for 100 * dwtoa
8053    cycles for 100 * dw2str
7472    cycles for 100 * jgnorecurse
7942    cycles for 100 * dw2$X

Averages:
9530    cycles for dwtoa
8052    cycles for dw2str
7474    cycles for jgnorecurse
7950    cycles for dw2$X

20      bytes for dwtoa
82      bytes for dw2str
80      bytes for jgnorecurse
106    bytes for dw2$X

dwtoa                                  -123456789
dw2str                                  -123456789
jgnorecurse                            -123456789
dw2$X                                  -123456789

--- ok ---
:smiley:

Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 05, 2024, 03:25:12 AM
Hi,Lingo
Quotebut the jimg_qword2ascii is not right result. maybe i translated errorly.
1) i know that's why. we can't use the 01999,9999,9999,9999h(1/10),must use the 0ccc,cccc,cccc,cccdh(8*1/10).
otherwise, when a few zero in qword, the qword2ascii will be failed.
2) you use the "little end" to save the result. that eliminate the unnecessary codes.
3) {lea rdx,[rdx*4+rdx-18h]}, We really can't deliberate it. That's very nice.

i learned a lot skills from you.
Thank you very much!

regard

Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 05, 2024, 04:13:46 AM
Thanks for that.  I was just getting around to testing.   So for now, back to the old way.


.DATA?
align 4
resultJim        db 16 dup(?)
.CODE
alignProc
TestG_s:
db 2 dup (?) ; dummy spacer to align instructions best
dword2ascii:  ; setup
    test eax,eax
    jnz @f
    mov word ptr [edi],30h
    ret
@@: jns @f
    mov byte ptr [edi],'-'
    neg eax
    add edi, 1
@@: mov ecx,0CCCCCCCDh  ; 8 * 1/10
    push esi
    mov esi,0
    .repeat
      push ebx            ; save ebx, or a digit
      inc esi
      mov ebx,eax         ; save original
      mul ecx             ; 0CCCCCCCDh =8 * 1/10
      shr edx,3           ; take out the factor of 8
      mov eax,edx         ; save answer
      lea edx,[edx*4+edx-24] ; *5  also subtract out half of '0' (24) to convert last byte to ascii
      add edx,edx         ; *10    double it, so full value of '0' is covered.  so -48 extra to dl
      sub ebx,edx         ; subtract from original gives remainder digit  -(-48) =+48=+'0' bl now in ascii
      test eax,eax        ; are we done?
    .until zero?
    .repeat
      mov [edi],bl
      inc edi
      pop ebx      ; restore ebx, or get next digit
      dec esi
    .until zero?
    pop esi
    ret
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 05, 2024, 04:30:50 AM
After a little testing, it turns out that two bits was enough, i.e. could use 66666667h to multiply, and shr 2, but no gain in speed so I'll leave it at 3 bits ( *8) to be compatible with everyone else.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 05, 2024, 04:52:03 AM
Hi, jimg
Quoteuse 66666667h to multiply
This is a significant improvement.
Quote00000374 clock cycles, (lingo_i2a64)x1000, OutPut: 98765432109876  ;use 8*1/10
00000360 clock cycles, (lingo_i2a64+)x1000, OutPut: 98765432109876 ;use 4*1/10
Thank you very much!

regard
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 05, 2024, 05:06:11 AM
it shouldn't really make a difference as you have to shift anyway.  And I tested only 32 bit numbers, may not be true for 64 bit.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 05, 2024, 09:54:20 AM
Quote from: jimg on April 05, 2024, 04:30:50 AMAfter a little testing, it turns out that two bits was enough, i.e. could use 66666667h to multiply, and shr 2, but no gain in speed so I'll leave it at 3 bits ( *8) to be compatible with everyone else.

66666667h/2 and shr 1 would be a tick faster. The sh* reg32, 1 instructions are faster than shifts with a counter higher than one. Unfortunately, that limits the range of valid multiplies - try the numbers above 1,073,741,824 :cool:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 06, 2024, 03:05:42 AM
Hi,jj2007
Quote66666667h/2 and shr 1 would be a tick faster.
:cool:
This ia an another improvement.
Could you test the speed of below codes?
Ldword2ascii proc uses ebx esi edi dwVaule:DWORD,pOutBuf:DWORD
    local    nFlag:BOOL
   
    mov    nFlag,FALSE
    mov    eax,dwVaule
    mov    edi,pOutBuf
    lea    esi,[edi+16]
    test    eax,eax
    jnz    @f
    mov    word ptr [esi],30h
    jmp    @exit
@@:
    jns    @f
    mov    nFlag,TRUE
    neg    eax
@@:
    mov    ecx,033333334h            ; 2*(1/10)
    .repeat
        mov    ebx,eax            ; save original
        mul    ecx            ; 033333334h = 2*(1/10)
        shr    edx,1            ; /2
        mov    edi,edx            ; edx = remainder of eax/10
        lea    edx,[edx*4+edx-18h]    ; *5  also subtract out half of '0' (24) to convert last byte to ascii  ala lingo
        lea    edx,[edx+edx]        ; *10    double it, so full value of '0' is covered.  so -48 extra to dl
        sub    ebx,edx            ; subtract from original gives remainder digit  -(-48) =+48=+'0' bl now in ascii
        sub    esi,1
        mov    [esi],bl
        mov    eax,edi            ; save answer
        test    eax,eax            ; are we done?
    .until zero?
    .if    nFlag == TRUE
        sub    esi, 1
        mov    byte ptr [esi],'-'
    .endif
@exit:   
    mov    eax,esi                ;eax output addr in pOutBuf

    ret

Ldword2ascii  endp

local Outbuf[64] :BYTE

    invoke    RtlZeroMemory,addr Outbuf,sizeof Outbuf
        invoke    Ldword2ascii, -9076541,addr Outbuf
        invoke    MessageBox,0,eax,0,0
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jimg on April 06, 2024, 03:14:24 AM
This is it for me.  This is the all up proc.  Overall, about a 10% improvement over dwtoa (dwtoa averaged 72 cycles, dword2ascii averaged 64 cycles). 



align 16
dword2ascii proc uses esi edi, Value,buff
   ; converts 32 bit signed integer to ascii
   ; Value to be converted
   ; buff = address of where to store the results
    mov eax,Value
    mov edi,buff
    test eax,eax
    jnz @f
    mov word ptr [edi],30h
    ret
@@: jns @f
    mov byte ptr [edi],'-'
    neg eax 
    add edi, 1
@@: mov ecx,66666667h    ; 4 * 1/10
    push esi
    xor esi,esi
    .repeat
      push ebx           ; save ebx, or a digit
      inc esi
      mov ebx,eax        ; save original
      mul ecx            ; 66666667h = 4 * 1/10
      shr edx,2          ; take out the factor of 4     
      mov eax,edx        ; save answer
      add edx,edx        ; *2
      lea edx,[edx*4+edx-'0'] ; *10  also subtract out '0' to convert last byte to ascii
      sub ebx,edx        ; subtract from original gives remainder, digit - (-'0') = +'0' bl now in ascii
      test eax,eax       ; are we done?
    .until zero?
    .repeat
      mov [edi],bl
      inc edi
      pop ebx      ; restore ebx, or get next digit
      dec esi
    .until zero?
    mov byte ptr [edi],0
    pop esi
    ret
dword2ascii endp

Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 06, 2024, 03:27:57 AM
Hi six_L,

It is a new version which return length of the string in rcx.

; qword to string - Convert a qword into string
; IN:  RAX = qword
;        RCX = address of location to store string
; OUT: RAX = new start of the string in the buffer
           RCX = length of the string
align 16
db 8 Dup (90h)

q2a64         proc
lea  r9,  [rcx+24]                       ; lpBuffer
mov  r8,  0cccccccccccccccdh             ; 8 * 1/10
mov  byte ptr[r9],0
mov  r10, r9
mov  rcx, rax
@@:
mul  r8
shr  rdx, 3    ; : 8
sub  r9,  1
lea  rax, [rdx*4+rdx]    ; * 5
neg  rax
lea  rcx, [rcx+2*rax+30h]    ; * 2 +30h
mov  rax, rdx
mov  [r9],cl
mov  rcx, rdx
test eax, eax
jnz  @b
mov  rcx, r10
mov  rax, r9            ; return in RAX new starting address of tne string of the buffer
sub  rcx, r9    ; return in RCX length of the string
                ret
q2a64           endp
Note:
The code for the minus sign and leading zeros must be kept out of the algorithm.
It's not normal to bloat the algorithm with 99.99% unusable code.

I am curious when the stupid thief will come back...ha,ha :skrewy:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 06, 2024, 06:15:25 AM
Hi,Lingo
This is the tested result.
Quote00005214 clock cycles, (Roberts_dqtoa)x10000, OutPut: 98765432109876
00004638 clock cycles, (Roberts_Lqword2ascii)x10000, OutPut: 98765432109876
00014990 clock cycles, (Ray_AnyToAny)x10000, OutPut: 98765432109876
00004458 clock cycles, (lingo_i2a64_1)x10000, OutPut: 98765432109876
00003845 clock cycles, (lingo_q2a64_2)x10000, OutPut: 98765432109876
the attachment is the tested exe.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 06, 2024, 09:21:46 AM
Thank you six_L, :thumbsup:

Excellent results!
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 06, 2024, 09:50:42 AM
Quote from: jimg on April 04, 2024, 03:42:13 PMAgain, my apologies for not crediting the correct person.

Jim, as described here, you credited the right person (https://masm32.com/board/index.php?topic=11831.0). Btw I wouldn't call Lingo a thief just because he used one of my ideas, that's not my style. Actually, I feel flattered that our great genius, the greatest coder since Adam & Eve, uses my ideas.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 06, 2024, 11:23:08 AM
Hi,All

Note to all:
If anyone wants to argue, hurry up and argue hotly. Actually, there will be a few time left. because our member NoCforMe forecasted "No planet, Zero bytes at 2030".

I quite approve of his view. Since the World Thief Biden forgot the UN's rule which the five permanent members of the UN cannot attack each other.

Good Luck to all in those remaining time.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 06, 2024, 11:24:49 AM
Like I say, enjoy the ride while it lasts!
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 06, 2024, 11:44:28 AM
Thank you six_L,

I was raised not to argue with elderly people with mental and/or physical disabilities.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: stoo23 on April 07, 2024, 03:26:58 AM
With specific regard to certain statements, suggestions and accusations made here recently, may I Politely and Respectfully refer you ALL to this:
A Reiteration of certain forum rules (https://masm32.com/board/index.php?topic=11836.0)
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 07, 2024, 08:52:16 AM
Thanks, Stoo :thup:

Back to the topic: may I have some timings, please?

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4603    cycles for dwtoa
2954    cycles for dw2str
2212    cycles for dw2$JJ
3248    cycles for jgnorecurse
2957    cycles for dw2$X
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: ahsat on April 07, 2024, 09:31:20 AM
Quote from: jj2007 on April 07, 2024, 08:52:16 AMBack to the topic: may I have some timings, please?

Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (SSE4)

3144   cycles for 100 * dwtoa
2625   cycles for 100 * dw2str
1474   cycles for 100 * dw2$JJ
2276   cycles for 100 * jgnorecurse
2810   cycles for 100 * dw2$X

3243   cycles for 100 * dwtoa
2634   cycles for 100 * dw2str
1514   cycles for 100 * dw2$JJ
2338   cycles for 100 * jgnorecurse
2880   cycles for 100 * dw2$X

3226   cycles for 100 * dwtoa
2704   cycles for 100 * dw2str
1605   cycles for 100 * dw2$JJ
2359   cycles for 100 * jgnorecurse
2861   cycles for 100 * dw2$X

3275   cycles for 100 * dwtoa
2686   cycles for 100 * dw2str
1528   cycles for 100 * dw2$JJ
2388   cycles for 100 * jgnorecurse
2882   cycles for 100 * dw2$X

Averages:
3234   cycles for dwtoa
2660   cycles for dw2str
1521   cycles for dw2$JJ
2348   cycles for jgnorecurse
2870   cycles for dw2$X

20   bytes for dwtoa
82   bytes for dw2str
94   bytes for dw2$JJ
76   bytes for jgnorecurse
106   bytes for dw2$X

dwtoa-123456789
dw2str-123456789
dw2$JJ-123456789
jgnorecurse-123456789
dw2$X-123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: sinsi on April 07, 2024, 10:11:24 AM
13th Gen Intel(R) Core(TM) i9-13900KF (SSE4)

2180    cycles for 100 * dwtoa
1748    cycles for 100 * dw2str
905     cycles for 100 * dw2$JJ
1798    cycles for 100 * jgnorecurse
2090    cycles for 100 * dw2$X

2227    cycles for 100 * dwtoa
1799    cycles for 100 * dw2str
995     cycles for 100 * dw2$JJ
1843    cycles for 100 * jgnorecurse
2088    cycles for 100 * dw2$X

2249    cycles for 100 * dwtoa
1796    cycles for 100 * dw2str
957     cycles for 100 * dw2$JJ
1815    cycles for 100 * jgnorecurse
2056    cycles for 100 * dw2$X

2241    cycles for 100 * dwtoa
1790    cycles for 100 * dw2str
958     cycles for 100 * dw2$JJ
1814    cycles for 100 * jgnorecurse
2061    cycles for 100 * dw2$X

Averages:
2234    cycles for dwtoa
1793    cycles for dw2str
958     cycles for dw2$JJ
1814    cycles for jgnorecurse
2074    cycles for dw2$X

20      bytes for dwtoa
82      bytes for dw2str
94      bytes for dw2$JJ
76      bytes for jgnorecurse
106     bytes for dw2$X

dwtoa                                   -123456789
dw2str                                  -123456789
dw2$JJ                                  -123456789
jgnorecurse                             -123456789
dw2$X                                   -123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: NoCforMe on April 07, 2024, 10:37:40 AM
And the winner is ... dw2$JJ by a length!

I'm sure this'll be great news to all coders who use this to display user output ... wow, I saved a whole 800 cycles!
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 07, 2024, 11:02:06 AM
Quote from: NoCforMe on April 07, 2024, 10:37:40 AMAnd the winner is ... dw2$JJ by a length!

Not yet. The algo works fine for 123456789, less so for smaller numbers. Will be fixed tomorrow ;-)
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: sinsi on April 07, 2024, 11:17:13 AM
Quote from: NoCforMe on April 07, 2024, 10:37:40 AMAnd the winner is ... dw2$JJ by a length!

I'm sure this'll be great news to all coders who use this to display user output ... wow, I saved a whole 800 cycles!
For some of us that aren't just hobbyists, having a user sitting twiddling their thumbs is not a good look.
After all, this is The Laboratory
QuoteThe Laboratory
Algorithm and code design research laboratory. This is the place to post assembler algorithms and code design for discussion, optimisation and any other improvements that can be made on it. Post code here to be beaten to death to make it better, smaller, faster or more powerful.
Maybe you should hang around The Campus :biggrin:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 07, 2024, 07:55:18 PM
The show must go on :thumbsup:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4761    cycles for dwtoa
3044    cycles for dw2str
2122    cycles for MbDw2Str  <<<<<<<<<<<<<<<<<<<< tested for the full range -1...0
3364    cycles for jgnorecurse
2948    cycles for dw2$X

The new algo uses a table, as suggested initially by Ray alias ahsat :thumbsup:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: LiaoMi on April 07, 2024, 08:04:51 PM
Quote from: jj2007 on April 07, 2024, 08:52:16 AMThanks, Stoo :thup:

Back to the topic: may I have some timings, please?

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4603    cycles for dwtoa
2954    cycles for dw2str
2212    cycles for dw2$JJ
3248    cycles for jgnorecurse
2957    cycles for dw2$X

13th Gen Intel(R) Core(TM) i9-13980HX (SSE4)

1794    cycles for 100 * dwtoa
1416    cycles for 100 * dw2str
751     cycles for 100 * dw2$JJ
1442    cycles for 100 * jgnorecurse
1707    cycles for 100 * dw2$X

1845    cycles for 100 * dwtoa
1464    cycles for 100 * dw2str
793     cycles for 100 * dw2$JJ
1496    cycles for 100 * jgnorecurse
1710    cycles for 100 * dw2$X

1839    cycles for 100 * dwtoa
1466    cycles for 100 * dw2str
806     cycles for 100 * dw2$JJ
1509    cycles for 100 * jgnorecurse
1686    cycles for 100 * dw2$X

1837    cycles for 100 * dwtoa
1466    cycles for 100 * dw2str
799     cycles for 100 * dw2$JJ
1612    cycles for 100 * jgnorecurse
1693    cycles for 100 * dw2$X

Averages:
1838    cycles for dwtoa
1465    cycles for dw2str
796     cycles for dw2$JJ
1502    cycles for jgnorecurse
1700    cycles for dw2$X

20      bytes for dwtoa
82      bytes for dw2str
94      bytes for dw2$JJ
76      bytes for jgnorecurse
106     bytes for dw2$X

dwtoa                                   -123456789
dw2str                                  -123456789
dw2$JJ                                  -123456789
jgnorecurse                             -123456789
dw2$X                                   -123456789

--- ok ---
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: LiaoMi on April 07, 2024, 08:06:59 PM
Quote from: jj2007 on April 07, 2024, 07:55:18 PMThe show must go on :thumbsup:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4761    cycles for dwtoa
3044    cycles for dw2str
2122    cycles for MbDw2Str  <<<<<<<<<<<<<<<<<<<< tested for the full range -1...0
3364    cycles for jgnorecurse
2948    cycles for dw2$X

The new algo uses a table, as suggested initially by Ray alias ahsat :thumbsup:

13th Gen Intel(R) Core(TM) i9-13980HX (SSE4)

1782    cycles for 100 * dwtoa
1383    cycles for 100 * dw2str
997     cycles for 100 * MbDw2Str
1400    cycles for 100 * jgnorecurse
1650    cycles for 100 * dw2$X

1830    cycles for 100 * dwtoa
1442    cycles for 100 * dw2str
1052    cycles for 100 * MbDw2Str
1440    cycles for 100 * jgnorecurse
1657    cycles for 100 * dw2$X

1846    cycles for 100 * dwtoa
1442    cycles for 100 * dw2str
1038    cycles for 100 * MbDw2Str
1447    cycles for 100 * jgnorecurse
1650    cycles for 100 * dw2$X

1833    cycles for 100 * dwtoa
1446    cycles for 100 * dw2str
1000    cycles for 100 * MbDw2Str
1437    cycles for 100 * jgnorecurse
1647    cycles for 100 * dw2$X

Averages:
1832    cycles for dwtoa
1442    cycles for dw2str
1019    cycles for MbDw2Str
1438    cycles for jgnorecurse
1650    cycles for dw2$X

20      bytes for dwtoa
82      bytes for dw2str
102     bytes for MbDw2Str
76      bytes for jgnorecurse
106     bytes for dw2$X

dwtoa                                   123456789
dw2str                                  123456789
MbDw2Str                                123456789
jgnorecurse                             123456789
dw2$X                                   123456789

--- ok ---
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 07, 2024, 08:12:53 PM
Quote from: LiaoMi on April 07, 2024, 08:06:59 PM13th Gen Intel(R) Core(TM) i9-13980HX (SSE4)

Thanks, LiaoMi - so even on Intel MbDw2Str performs well :thumbsup:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 08, 2024, 02:55:33 AM
Special edition for JimG - I had activated an older version of his algo. Here is the correct one :thup:

Sorry for that :cool:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 08, 2024, 07:33:28 AM
56.7% faster than dwtoa is ok, right?

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4618    cycles for dwtoa
2938    cycles for dw2str
2000    cycles for MbDw2Str
3972    cycles for jgnorecurse
2968    cycles for dw2$X

20      bytes for dwtoa
82      bytes for dw2str
94      bytes for MbDw2Str
104    bytes for jgnorecurse
106    bytes for dw2$X

dwtoa                                  -123456789
dw2str                                  -123456789
MbDw2Str                                -123456789
jgnorecurse                            -123456789
dw2$X                                  -123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 09, 2024, 07:37:47 PM
Hi,Lingo
In my memory, jj2007 always loses in the speed racing. But this time, He eventually won once.

Quote00004502 clock cycles, (Roberts_dqtoa)x10000, OutPut: 98765432109876
00004291 clock cycles, (Roberts_Lqword2ascii)x10000, OutPut: 98765432109876
00014634 clock cycles, (Ray_AnyToAny)x10000, OutPut: 98765432109876
00004312 clock cycles, (Lingo_i2a64_1)x10000, OutPut: 98765432109876
00003799 clock cycles, (Lingo_q2a64_2)x10000, OutPut: 98765432109876
00002310 clock cycles, (jj2007_MbDq2Str)x10000, OutPut: 98765432109876

His method:
1) div 100, handle 2 digits at once.
2) query the table includes 100dw digits

; init:
    dw2aTable    dw 100 dup (0)

    xor    rbx,rbx
    lea    rdi,dw2aTable
    .repeat
        invoke    RtlZeroMemory,addr tmp,sizeof tmp
        invoke    wsprintf,addr tmp,CStr("%02i"),rbx
        lea    rcx,tmp
        mov    ax,[rcx]
        mov    [rdi+2*rbx],ax
        inc    rbx
    .until rbx == 100
...
MbDq2Str proc uses rsi rdi rbx dqValue:QWORD,OutBuffer:QWORD

    mov    rax, dqValue
    ;mov    r8,  rax    ;for sign
    lea    rsi, dw2aTable
    mov    rdi, OutBuffer
    lea    rdi, [rdi+24]
    test    rax, rax
    jnz    @F
    mov    word ptr [rdi],30h
    jmp    @Exit
;@@:   
    ;jns    @F
    ;neg    rax
@@:   
    mov    byte ptr [rdi], 0            ; terminate the string
    dec    rdi
    mov    rcx, 051eb851eb851eb85h            ; 051EB851EB851EB85h=32*1/100=4*(0cccccccccccccccdh=8*1/10)/10   
    .while rax > 0
        xor    rdx, rdx
        mov    rbx, rax            ; number
        mul    rcx                ; *32x1/100
        shr    rdx, 5                ; /32
        mov    rax, rdx
        imul    rdx, rdx, 100
        sub    rbx, rdx
        movzx    rdx, word ptr [rsi+2*rbx]    ;00-99,word table
        sub    rdi, 2
        mov    [rdi], dx
    .endw
    .if    rbx < 10
        mov byte ptr [rdi], 0            ;figure the "0" in front number out
        inc rdi
    .endif
    ;test    r8, r8
    ;.if Sign?
    ;    dec    rdi
    ;    mov    byte ptr [rdi], "-"
    ;.endif
@Exit:   
    xchg    rax, rdi
    ret
MbDq2Str endp
Edit: + exe attachment
regard
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 10, 2024, 12:57:07 AM
Thank you six_L, :thumbsup:

Do you include the time to complete the table?
Can I have your test?
I want to run your test on my computer.
Title: move the post
Post by: six_L on April 10, 2024, 02:34:46 AM
Hi, six_L
Have 'moved' / 'merged' your post into the requested Dword to ascii thread in The Laboratory + the attachment.

Admin'
:thumbsup:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 10, 2024, 08:39:51 AM
I don't really like 64-bit code, but I guess there is enough interest for a 64-bit version:
843     ms for 50Mio*MbDw2Str   result  $rdi    1234567890123456789
2172    ms for 100Mio*Roberts   result  $rdi    1234567890123456789
15406   ms for 100Mio*sprintf   result  $rdi    1234567890123456789

"Roberts" is the code from reply #82 (https://masm32.com/board/index.php?topic=11820.msg128574#msg128574). Unfortunately, it doesn't handle negative numbers, which distorts the numbers a bit (handling negative numbers costs some cycles).
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 10, 2024, 02:07:09 PM
Hi,Lingo
QuoteDo you include the time to complete the table?
No, the digit's table is outside the MbDq2Str proc. it is being created at the testing thread startup.

Hi,jj2007
QuoteI don't really like 64-bit code.
the 32bit codes are working in VM of 64bit OS. Perhaps your tested result was thingemyed.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 10, 2024, 04:42:10 PM
Thank you six_L,

I don't believe there is a masochist who would fill in the table before using this "algo"...?! :badgrin:
It's 32 bit slow and bloated garbage...
Any 64 bit code will be shorter and faster.
You can try the same with 20 rows less code:
align 16
db 8 dup(90h)
Dq2Str proc
       lea  r9,  dw2aTable
       mov  r10, rax
       mov  r8,  51EB851EB851EB85h
       add  rcx, 18h
@@:
       mul  r8
       shr  rdx, 5           ; :32
       sub  rcx, 2
       mov  rax, rdx
       imul rdx, 64h         ; *100
       sub  r10, rdx
       mov  dx,  [r9+r10*2]  ; r9 = addr dw2aTable
       mov  r10, rax
       mov  [rcx], dx
       cmp  rax, 0
       ja   @b
       cmp  rax, 0
       ja   @f
       cmp  rbx, 0Ah
       jnb  @Exit
@@:
       mov  byte ptr [rcx], 0
       inc  rcx
@Exit:
       mov  rax, rcx
       ret     ; minus 20 rows
Dq2Str endp
QuoteHis method:
1) div 100, handle 2 digits at once.
2) query the table includes 100dw digits
Next version will:
1) div 1000, handle 4 digits at once.
2) query the table includes ?? dd digits...ha,ha,hah  :badgrin:  :badgrin:  :skrewy:
QuotePerhaps your tested result was thingemyed.
Don't believe his manipulated "tests". The reason for that is that there is no one from this forum who can compile his test "sources" normally... :undecided:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: sinsi on April 10, 2024, 05:51:54 PM
Quote from: lingo on April 10, 2024, 04:42:10 PMThe reason for that is that there is no one from this forum who can compile his test "sources" normally... :undecided:
Some of us can even understand the syntax  :biggrin:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 10, 2024, 06:52:17 PM
Quote from: six_L on April 10, 2024, 02:07:09 PMthe 32bit codes are working in VM of 64bit OS.

You should try to understand how 32-bit code runs on a 64-bit OS.

QuotePerhaps your tested result was thingemyed.

All benchmarks have little problems. Mine have been tested hundreds of times in the Lab, and many members of this forum have contributed to make them reliable. Btw most of us here understand that including the time to build the table has no measurable effect on a test that runs with a Million iterations. We know our stuff here in the Lab.

0 µs for building the table

Sometimes it's even one microsecond :bgrin:

Quote from: sinsi on April 10, 2024, 05:51:54 PMSome of us can even understand the syntax  :biggrin:

Right :biggrin:

And some even know how to open a Rich Text Format file, and happily build my sources:

Quote from: jimg on April 02, 2024, 07:43:17 AMUsing your version 3 and enabling test F I get
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 10, 2024, 07:13:00 PM
Quote from: lingo on April 10, 2024, 04:42:10 PMYou can try the same with 20 rows less code:
align 16
db 8 dup(90h)
Dq2Str proc
       lea  r9,  dw2aTable
       mov  r10, rax
       mov  r8,  51EB851EB851EB85h
       add  rcx, 18h
@@:
       mul  r8

To those who think that "Lingo" is just an impostor: No, he is the real Lingo. His code is badly copied, and of course, it crashes (he also forgets to handle negative numbers, but that's only minor criticism).
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 11, 2024, 01:46:24 AM
Hi,Lingo
1)
You have surpassed jj2007 again.
Quote00004451 clock cycles, (Roberts_dqtoa)x10000, OutPut: 98765432109876
00004603 clock cycles, (Roberts_Lqword2ascii)x10000, OutPut: 98765432109876
00014591 clock cycles, (Ray_AnyToAny)x10000, OutPut: 98765432109876
00004130 clock cycles, (Lingo_i2a64_1)x10000, OutPut: 98765432109876
00004036 clock cycles, (Lingo_q2a64_2)x10000, OutPut: 98765432109876
00002609 clock cycles, (jj2007_MbDq2Str)x10000, OutPut: 98765432109876
00002426 clock cycles, (Lingo3_Dq2Str)x10000, OutPut: 98765432109876

No the best codes, Only has the better codes. I hope the competition continues.
2)
; init:
    dw2aTable    dw 100 dup (0)

    xor    rbx,rbx
    lea    rdi,dw2aTable
    .repeat
        invoke    RtlZeroMemory,addr tmp,sizeof tmp
        invoke    wsprintf,addr tmp,CStr("%02i"),rbx
        lea    rcx,tmp
        mov    ax,[rcx]
        mov    [rdi+2*rbx],ax
        inc    rbx
    .until rbx == 100
...
QuoteI don't believe there is a masochist who would fill in the table before using this "algo"...?! :badgrin:
That's my codes for read easiely and verifying the correctness of his logic. You should blame me, not him.

regard.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: lingo on April 11, 2024, 02:29:45 AM
Hi six_L, :thumbsup:

QuoteYou should blame me, not him.
I cannot and do not blame anyone directly, neither you nor jj.
I only expressed my opinion about the algorithm and the code of the program.
I wonder why jj is interrupting OUR conversation and acting in such a nervous manner.

QuoteI hope the competition continues.
After these insults from jj, I lost the interest, sorry! :sad:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 11, 2024, 05:25:22 AM
The latest version. I had to tweak the code from our genius a little bit because a) it crashed and b) it overwrote the table, but now it works :thumbsup:

And indeed, Lingo's code is faster than mine :thumbsup:  :thumbsup:  :thumbsup:

I feel honoured that he used almost all my tricks :cool:

There are tiny little problems here and there, but hey, it's fast :thumbsup:

This program was assembled with UAsm64 in 64-bit format.
AMD Athlon Gold 3150U with Radeon Graphics
41      bytes for loadTable
140     bytes for MbDw2Str
-1
0
1
-1234567890123456789
1234567890123456789


Lingo: a) no negative numbers, b) leading zero, c) but only for numbers with odd number of digits

938     ms for 50Mio*MbDw2Str   result  $rdi    1234567890123456789
2187    ms for 50Mio*Roberts    result  $rdi    1234567890123456789
797     ms for 50Mio*Lingo's    result  $rdi    01234567890123456789
1532    ms for 5 Mio*sprintf    result  $rdi    1234567890123456789

718     ms for 50Mio*MbDw2Str   result  $rdi    123456789012345678
2032    ms for 50Mio*Roberts    result  $rdi    123456789012345678
687     ms for 50Mio*Lingo's    result  $rdi    123456789012345678
1484    ms for 5 Mio*sprintf    result  $rdi    123456789012345678

719     ms for 50Mio*MbDw2Str   result  $rdi    12345678901234567
1953    ms for 50Mio*Roberts    result  $rdi    12345678901234567
672     ms for 50Mio*Lingo's    result  $rdi    012345678901234567
1250    ms for 5 Mio*sprintf    result  $rdi    12345678901234567

906     ms for 50Mio*MbDw2Str   result  $rdi    -1234567890123456789
2297    ms for 50Mio*Roberts    result  $rdi    17212176183586094827
766     ms for 50Mio*Lingo's    result  $rdi    17212176183586094827
1531    ms for 5 Mio*sprintf    result  $rdi    -1234567890123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: LiaoMi on April 11, 2024, 05:46:11 PM
Quote from: jj2007 on April 11, 2024, 05:25:22 AMThe latest version. I had to tweak the code from our genius a little bit because a) it crashed and b) it overwrote the table, but now it works :thumbsup:

And indeed, Lingo's code is faster than mine :thumbsup:  :thumbsup:  :thumbsup:

I feel honoured that he used almost all my tricks :cool:

There are tiny little problems here and there, but hey, it's fast :thumbsup:

This program was assembled with UAsm64 in 64-bit format.
AMD Athlon Gold 3150U with Radeon Graphics
41      bytes for loadTable
140     bytes for MbDw2Str
-1
0
1
-1234567890123456789
1234567890123456789


Lingo: a) no negative numbers, b) leading zero, c) but only for numbers with odd number of digits

938     ms for 50Mio*MbDw2Str   result  $rdi    1234567890123456789
2187    ms for 50Mio*Roberts    result  $rdi    1234567890123456789
797     ms for 50Mio*Lingo's    result  $rdi    01234567890123456789
1532    ms for 5 Mio*sprintf    result  $rdi    1234567890123456789

718     ms for 50Mio*MbDw2Str   result  $rdi    123456789012345678
2032    ms for 50Mio*Roberts    result  $rdi    123456789012345678
687     ms for 50Mio*Lingo's    result  $rdi    123456789012345678
1484    ms for 5 Mio*sprintf    result  $rdi    123456789012345678

719     ms for 50Mio*MbDw2Str   result  $rdi    12345678901234567
1953    ms for 50Mio*Roberts    result  $rdi    12345678901234567
672     ms for 50Mio*Lingo's    result  $rdi    012345678901234567
1250    ms for 5 Mio*sprintf    result  $rdi    12345678901234567

906     ms for 50Mio*MbDw2Str   result  $rdi    -1234567890123456789
2297    ms for 50Mio*Roberts    result  $rdi    17212176183586094827
766     ms for 50Mio*Lingo's    result  $rdi    17212176183586094827
1531    ms for 5 Mio*sprintf    result  $rdi    -1234567890123456789

This program was assembled with UAsm64 in 64-bit format.
13th Gen Intel(R) Core(TM) i9-13980HX
41      bytes for loadTable
140     bytes for MbDw2Str
-1
0
1
-1234567890123456789
1234567890123456789


Lingo: a) no negative numbers, b) leading zero, c) but only for numbers with odd number of digits

313     ms for 50Mio*MbDw2Str   result  $rdi    1234567890123456789
734     ms for 50Mio*Roberts    result  $rdi    1234567890123456789
250     ms for 50Mio*Lingo's    result  $rdi    01234567890123456789
438     ms for 5 Mio*sprintf    result  $rdi    1234567890123456789

281     ms for 50Mio*MbDw2Str   result  $rdi    123456789012345678
687     ms for 50Mio*Roberts    result  $rdi    123456789012345678
219     ms for 50Mio*Lingo's    result  $rdi    123456789012345678
406     ms for 5 Mio*sprintf    result  $rdi    123456789012345678

297     ms for 50Mio*MbDw2Str   result  $rdi    12345678901234567
641     ms for 50Mio*Roberts    result  $rdi    12345678901234567
219     ms for 50Mio*Lingo's    result  $rdi    012345678901234567
391     ms for 5 Mio*sprintf    result  $rdi    12345678901234567

484     ms for 50Mio*MbDw2Str   result  $rdi    -1234567890123456789
765     ms for 50Mio*Roberts    result  $rdi    17212176183586094827
250     ms for 50Mio*Lingo's    result  $rdi    17212176183586094827
438     ms for 5 Mio*sprintf    result  $rdi    -1234567890123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 11, 2024, 06:46:09 PM
Thanks, LiaoMi :thup:

Your cpu beats mine by almost a factor three. That is a fine machine - I could become a bit envious :mrgreen:

I attach sources and executables for both 64- and 32-bit versions. Note it's the same source ("dual assembly"), there is just one option OPT_64 n on top, with n=0 or 1. My MbDw2Str has only two tiny conditionals on entry:
    if @64
        mov rax, rcx        ; xwValue
    else
        mov eax, xwValue
    endif
...
    if @64
        mov rcx, 051EB851EB851EB85h
    else
        mov ecx, 051EB851Fh
    endif

CRT sprintf requires a different format specifier:
    mov rcx, number
    if @64
        jinvoke sprintf, rdi, Chr$("%lld"), rcx
    else
        jinvoke sprintf, rdi, Chr$("%d"), rcx
    endif

The Roberts and Lingo versions cannot be converted to 32-bit because of their register usage. Latest timings:
a) 64-bit:
This program was assembled with UAsm64 in 64-bit format.
AMD Athlon Gold 3150U with Radeon Graphics
41      bytes for loadTable
111    bytes for MbDw2Str
-1      minus 1
0      zero
1      one
Lingo: a) no negative numbers, b) leading zero, c) but only for numbers with odd number of digits

828     ms for 50Mio*MbDw2Str   result  $rdi    1234567890123456789
2156    ms for 50Mio*Roberts    result  $rdi    1234567890123456789
782     ms for 50Mio*Lingo's    result  $rdi    01234567890123456789
1500    ms for 5 Mio*sprintf    result  $rdi    1234567890123456789

687     ms for 50Mio*MbDw2Str   result  $rdi    123456789012345678
1984    ms for 50Mio*Roberts    result  $rdi    123456789012345678
688     ms for 50Mio*Lingo's    result  $rdi    123456789012345678
1453    ms for 5 Mio*sprintf    result  $rdi    123456789012345678

703     ms for 50Mio*MbDw2Str   result  $rdi    12345678901234567
1953    ms for 50Mio*Roberts    result  $rdi    12345678901234567
672     ms for 50Mio*Lingo's    result  $rdi    012345678901234567
1235    ms for 5 Mio*sprintf    result  $rdi    12345678901234567

843     ms for 50Mio*MbDw2Str   result  $rdi    -1234567890123456789
2219    ms for 50Mio*Roberts    result  $rdi    17212176183586094827
766     ms for 50Mio*Lingo's    result  $rdi    17212176183586094827
1531    ms for 5 Mio*sprintf    result  $rdi    -1234567890123456789

b) 32-bit:
This program was assembled with UAsm64 in 32-bit format.
AMD Athlon Gold 3150U with Radeon Graphics
37      bytes for loadTable
86      bytes for MbDw2Str
-1      minus 1
0       zero
1       one
469     ms for 50Mio*MbDw2Str   result  $rdi    123456789
1078    ms for 5 Mio*sprintf    result  $rdi    123456789

328     ms for 50Mio*MbDw2Str   result  $rdi    12345678
1016    ms for 5 Mio*sprintf    result  $rdi    12345678

375     ms for 50Mio*MbDw2Str   result  $rdi    1234567
765     ms for 5 Mio*sprintf    result  $rdi    1234567

469     ms for 50Mio*MbDw2Str   result  $rdi    -123456789
1063    ms for 5 Mio*sprintf    result  $rdi    -123456789
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: six_L on April 11, 2024, 10:52:13 PM
Hi,Lingo

QuoteAfter these insults from jj, I lost the interest, sorry! :sad:
This is a scene that most of our members do not want to see. I respect your choice. It's you who exhibited us the power of assembly language and the dazzling skills.

jj2007 not only argued with you, but also with many members. such as hutch--/aw27/nidud/_japheth. Some outstanding members had left forever because of arguing. Arguing is his inherent attribute. Human beings need the biological diversity. i guess our forum also needs member diversity. you may ignore him.

regard
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 11, 2024, 11:41:49 PM
Quote from: six_L on April 11, 2024, 10:52:13 PMjj2007 not only argued with you, but also with many members. such as hutch--/aw27/nidud/_japheth.

Stop this nonsense, dear. Yes, I argued with some of them, but it was always civilised. With Hutch, I had a very good relationship.

Lingo comes here out of the blue, never supplied a valid email address to our administrators, wants to take over the forum, posts crappy code, most of which (badly) copied from mine, and then has the chuzpe to call me a thief.

This is all documented in Bravo, Lingo! (https://masm32.com/board/index.php?topic=11831.msg128620#msg128620)

I have no problem with Lingo using my code, although just using other registers to hide that sort of "reuse" is a bit fishy imho. I do have a problem, though, if he calls me a thief.

If I were the moderator, I would have kicked him out a long time ago.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: TimoVJL on April 11, 2024, 11:52:36 PM
Many of us hoped, that this site's old style vanish, but old dogs are, what they are.
I am only here, as i use Pelle's C and i need to know something about assembly, when i try to hunt bugs.
Hopefully this site users still have an open mind for programming.
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: jj2007 on April 12, 2024, 10:19:53 AM
I've added the fast algo to the MasmBasic library of 12 April 2024 (http://masm32.com/board/index.php?topic=94.0), in two flavours:

a) as Str$(number):
  Print Str$("Rect left/top/right/bottom=%i", [esi.RECT.left]), "/", Str$(MyRC.top), "/", Str$(MyRC.right), "/", Str$([esi.RECT.bottom])
Flexibel, and a tick faster than the Masm32 SDK str$().

b) invoke MbDw2Str, number: ultrafast but no concatenation as in a).
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: stoo23 on April 21, 2024, 08:00:14 AM
I'll try this Again

lingo

Can you Please Read and Respond to the PM's I have sent.
Thanking you in advance,
Stewart
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: zedd on June 06, 2024, 02:25:01 PM
Testing the laptop. As predicted elsewhere, not very high performance. Not even medium-performance.

64 bit



This program was assembled with UAsm64 in 64-bit format.
AMD A8-6410 APU with AMD Radeon R5 Graphics
41      bytes for loadTable
111     bytes for MbDw2Str
-1      minus 1
0       zero
1       one


Lingo: a) no negative numbers, b) leading zero, c) but only for numbers with odd
 number of digits

2762    ms for 50Mio*MbDw2Str   result  $rdi    1234567890123456789
5382    ms for 50Mio*Roberts    result  $rdi    1234567890123456789
2558    ms for 50Mio*Lingo's    result  $rdi    01234567890123456789
2449    ms for 5 Mio*sprintf    result  $rdi    1234567890123456789

2528    ms for 50Mio*MbDw2Str   result  $rdi    123456789012345678
4492    ms for 50Mio*Roberts    result  $rdi    123456789012345678
2309    ms for 50Mio*Lingo's    result  $rdi    123456789012345678
2356    ms for 5 Mio*sprintf    result  $rdi    123456789012345678

2558    ms for 50Mio*MbDw2Str   result  $rdi    12345678901234567
5008    ms for 50Mio*Roberts    result  $rdi    12345678901234567
2309    ms for 50Mio*Lingo's    result  $rdi    012345678901234567
2230    ms for 5 Mio*sprintf    result  $rdi    12345678901234567

2793    ms for 50Mio*MbDw2Str   result  $rdi    -1234567890123456789
4852    ms for 50Mio*Roberts    result  $rdi    17212176183586094827
2558    ms for 50Mio*Lingo's    result  $rdi    17212176183586094827
2558    ms for 5 Mio*sprintf    result  $rdi    -1234567890123456789


32 bit



This program was assembled with UAsm64 in 32-bit format.
AMD A8-6410 APU with AMD Radeon R5 Graphics
37      bytes for loadTable
86      bytes for MbDw2Str
-1      minus 1
0       zero
1       one
1092    ms for 50Mio*MbDw2Str   result  $rdi    123456789
1779    ms for 5 Mio*sprintf    result  $rdi    123456789

936     ms for 50Mio*MbDw2Str   result  $rdi    12345678
1622    ms for 5 Mio*sprintf    result  $rdi    12345678

936     ms for 50Mio*MbDw2Str   result  $rdi    1234567
1482    ms for 5 Mio*sprintf    result  $rdi    1234567

1139    ms for 50Mio*MbDw2Str   result  $rdi    -123456789
1841    ms for 5 Mio*sprintf    result  $rdi    -123456789


On the bright side, it does have 4 cores...  :undecided:
Title: Re: Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)
Post by: TimoVJL on June 06, 2024, 07:25:15 PM
With old PC


This program was assembled with UAsm64 in 32-bit format.
AMD Athlon(tm) II X2 220 Processor
37      bytes for loadTable
86      bytes for MbDw2Str
-1      minus 1
0       zero
1       one
967     ms for 50Mio*MbDw2Str   result  $rdi    123456789
1903    ms for 5 Mio*sprintf    result  $rdi    123456789

874     ms for 50Mio*MbDw2Str   result  $rdi    12345678
1638    ms for 5 Mio*sprintf    result  $rdi    12345678

780     ms for 50Mio*MbDw2Str   result  $rdi    1234567
1357    ms for 5 Mio*sprintf    result  $rdi    1234567

1092    ms for 50Mio*MbDw2Str   result  $rdi    -123456789
1763    ms for 5 Mio*sprintf    result  $rdi    -123456789



This program was assembled with UAsm64 in 64-bit format.
AMD Athlon(tm) II X2 220 Processor
41      bytes for loadTable
111     bytes for MbDw2Str
-1      minus 1
0       zero
1       one


Lingo: a) no negative numbers, b) leading zero, c) but only for numbers with odd
 number of digits

2075    ms for 50Mio*MbDw2Str   result  $rdi    1234567890123456789
4040    ms for 50Mio*Roberts    result  $rdi    1234567890123456789
1841    ms for 50Mio*Lingo's    result  $rdi    01234567890123456789
2792    ms for 5 Mio*sprintf    result  $rdi    1234567890123456789

1841    ms for 50Mio*MbDw2Str   result  $rdi    123456789012345678
3760    ms for 50Mio*Roberts    result  $rdi    123456789012345678
1669    ms for 50Mio*Lingo's    result  $rdi    123456789012345678
2590    ms for 5 Mio*sprintf    result  $rdi    123456789012345678

1840    ms for 50Mio*MbDw2Str   result  $rdi    12345678901234567
3604    ms for 50Mio*Roberts    result  $rdi    12345678901234567
1654    ms for 50Mio*Lingo's    result  $rdi    012345678901234567
2433    ms for 5 Mio*sprintf    result  $rdi    12345678901234567

2091    ms for 50Mio*MbDw2Str   result  $rdi    -1234567890123456789
4181    ms for 50Mio*Roberts    result  $rdi    17212176183586094827
1856    ms for 50Mio*Lingo's    result  $rdi    17212176183586094827
2824    ms for 5 Mio*sprintf    result  $rdi    -1234567890123456789