News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)

Started by jj2007, April 01, 2024, 03:42:50 AM

Previous topic - Next topic

lingo

Hi six_L,

It is a new version which return length of the string in rcx.

; qword to string - Convert a qword into string
; IN:  RAX = qword
;        RCX = address of location to store string
; OUT: RAX = new start of the string in the buffer
           RCX = length of the string
align 16
db 8 Dup (90h)

q2a64         proc
lea  r9,  [rcx+24]                       ; lpBuffer
mov  r8,  0cccccccccccccccdh             ; 8 * 1/10
mov  byte ptr[r9],0
mov  r10, r9
mov  rcx, rax
@@:
mul  r8
shr  rdx, 3    ; : 8
sub  r9,  1
lea  rax, [rdx*4+rdx]    ; * 5
neg  rax
lea  rcx, [rcx+2*rax+30h]    ; * 2 +30h
mov  rax, rdx
mov  [r9],cl
mov  rcx, rdx
test eax, eax
jnz  @b
mov  rcx, r10
mov  rax, r9            ; return in RAX new starting address of tne string of the buffer
sub  rcx, r9    ; return in RCX length of the string
                ret
q2a64           endp
Note:
The code for the minus sign and leading zeros must be kept out of the algorithm.
It's not normal to bloat the algorithm with 99.99% unusable code.

I am curious when the stupid thief will come back...ha,ha :skrewy:
Quid sit futurum cras fuge quaerere.

six_L

Hi,Lingo
This is the tested result.
Quote00005214 clock cycles, (Roberts_dqtoa)x10000, OutPut: 98765432109876
00004638 clock cycles, (Roberts_Lqword2ascii)x10000, OutPut: 98765432109876
00014990 clock cycles, (Ray_AnyToAny)x10000, OutPut: 98765432109876
00004458 clock cycles, (lingo_i2a64_1)x10000, OutPut: 98765432109876
00003845 clock cycles, (lingo_q2a64_2)x10000, OutPut: 98765432109876
the attachment is the tested exe.
Say you, Say me, Say the codes together for ever.

lingo

Quid sit futurum cras fuge quaerere.

jj2007

#123
Quote from: jimg on April 04, 2024, 03:42:13 PMAgain, my apologies for not crediting the correct person.

Jim, as described here, you credited the right person. Btw I wouldn't call Lingo a thief just because he used one of my ideas, that's not my style. Actually, I feel flattered that our great genius, the greatest coder since Adam & Eve, uses my ideas.

six_L

Hi,All

Note to all:
If anyone wants to argue, hurry up and argue hotly. Actually, there will be a few time left. because our member NoCforMe forecasted "No planet, Zero bytes at 2030".

I quite approve of his view. Since the World Thief Biden forgot the UN's rule which the five permanent members of the UN cannot attack each other.

Good Luck to all in those remaining time.
Say you, Say me, Say the codes together for ever.

NoCforMe

Assembly language programming should be fun. That's why I do it.

lingo

#126
Thank you six_L,

I was raised not to argue with elderly people with mental and/or physical disabilities.
Quid sit futurum cras fuge quaerere.

stoo23

With specific regard to certain statements, suggestions and accusations made here recently, may I Politely and Respectfully refer you ALL to this:
A Reiteration of certain forum rules

jj2007

Thanks, Stoo :thup:

Back to the topic: may I have some timings, please?

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4603    cycles for dwtoa
2954    cycles for dw2str
2212    cycles for dw2$JJ
3248    cycles for jgnorecurse
2957    cycles for dw2$X

ahsat

Quote from: jj2007 on April 07, 2024, 08:52:16 AMBack to the topic: may I have some timings, please?

Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (SSE4)

3144   cycles for 100 * dwtoa
2625   cycles for 100 * dw2str
1474   cycles for 100 * dw2$JJ
2276   cycles for 100 * jgnorecurse
2810   cycles for 100 * dw2$X

3243   cycles for 100 * dwtoa
2634   cycles for 100 * dw2str
1514   cycles for 100 * dw2$JJ
2338   cycles for 100 * jgnorecurse
2880   cycles for 100 * dw2$X

3226   cycles for 100 * dwtoa
2704   cycles for 100 * dw2str
1605   cycles for 100 * dw2$JJ
2359   cycles for 100 * jgnorecurse
2861   cycles for 100 * dw2$X

3275   cycles for 100 * dwtoa
2686   cycles for 100 * dw2str
1528   cycles for 100 * dw2$JJ
2388   cycles for 100 * jgnorecurse
2882   cycles for 100 * dw2$X

Averages:
3234   cycles for dwtoa
2660   cycles for dw2str
1521   cycles for dw2$JJ
2348   cycles for jgnorecurse
2870   cycles for dw2$X

20   bytes for dwtoa
82   bytes for dw2str
94   bytes for dw2$JJ
76   bytes for jgnorecurse
106   bytes for dw2$X

dwtoa-123456789
dw2str-123456789
dw2$JJ-123456789
jgnorecurse-123456789
dw2$X-123456789

sinsi

13th Gen Intel(R) Core(TM) i9-13900KF (SSE4)

2180    cycles for 100 * dwtoa
1748    cycles for 100 * dw2str
905     cycles for 100 * dw2$JJ
1798    cycles for 100 * jgnorecurse
2090    cycles for 100 * dw2$X

2227    cycles for 100 * dwtoa
1799    cycles for 100 * dw2str
995     cycles for 100 * dw2$JJ
1843    cycles for 100 * jgnorecurse
2088    cycles for 100 * dw2$X

2249    cycles for 100 * dwtoa
1796    cycles for 100 * dw2str
957     cycles for 100 * dw2$JJ
1815    cycles for 100 * jgnorecurse
2056    cycles for 100 * dw2$X

2241    cycles for 100 * dwtoa
1790    cycles for 100 * dw2str
958     cycles for 100 * dw2$JJ
1814    cycles for 100 * jgnorecurse
2061    cycles for 100 * dw2$X

Averages:
2234    cycles for dwtoa
1793    cycles for dw2str
958     cycles for dw2$JJ
1814    cycles for jgnorecurse
2074    cycles for dw2$X

20      bytes for dwtoa
82      bytes for dw2str
94      bytes for dw2$JJ
76      bytes for jgnorecurse
106     bytes for dw2$X

dwtoa                                   -123456789
dw2str                                  -123456789
dw2$JJ                                  -123456789
jgnorecurse                             -123456789
dw2$X                                   -123456789

NoCforMe

And the winner is ... dw2$JJ by a length!

I'm sure this'll be great news to all coders who use this to display user output ... wow, I saved a whole 800 cycles!
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: NoCforMe on April 07, 2024, 10:37:40 AMAnd the winner is ... dw2$JJ by a length!

Not yet. The algo works fine for 123456789, less so for smaller numbers. Will be fixed tomorrow ;-)

sinsi

Quote from: NoCforMe on April 07, 2024, 10:37:40 AMAnd the winner is ... dw2$JJ by a length!

I'm sure this'll be great news to all coders who use this to display user output ... wow, I saved a whole 800 cycles!
For some of us that aren't just hobbyists, having a user sitting twiddling their thumbs is not a good look.
After all, this is The Laboratory
QuoteThe Laboratory
Algorithm and code design research laboratory. This is the place to post assembler algorithms and code design for discussion, optimisation and any other improvements that can be made on it. Post code here to be beaten to death to make it better, smaller, faster or more powerful.
Maybe you should hang around The Campus :biggrin:

jj2007

The show must go on :thumbsup:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4761    cycles for dwtoa
3044    cycles for dw2str
2122    cycles for MbDw2Str  <<<<<<<<<<<<<<<<<<<< tested for the full range -1...0
3364    cycles for jgnorecurse
2948    cycles for dw2$X

The new algo uses a table, as suggested initially by Ray alias ahsat :thumbsup: