News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

The fastest way to fill a dword array with string values

Started by frktons, December 09, 2012, 02:49:23 AM

Previous topic - Next topic

KeepingRealBusy

If you want them in ascending sequence, it should be [   3   2   1   0] since the xmm reg is loaded/stored in little endian order. Then you need to take of carries frpm 9 to 0 with the next character incremented. This could get troublesome.

What I did was to set a string up to "0000", pick it up and store it, set a pointer to the lowest character, then increment the character pointed to by the pointer. Check if the character exceeds '9', if not pickup and save the string. If greater then '9' add (256-10) to the WORD which has the current pointer as its lowest BYTE which will set the current character to '0' (256-10+'9'+1 = '0' plus a carry to the upper BYTE) and increment the next higher character, then change the pointer to check that character for exceeding '9', etc, etc. After finding a valid character, always drop the pointer back to the lowest character and pick up the value and save it and increment the character at the pointer.

Dave

frktons

Quote from: KeepingRealBusy on December 09, 2012, 05:05:11 AM
If you want them in ascending sequence, it should be [   3   2   1   0] since the xmm reg is loaded/stored in little endian order. Then you need to take of carries frpm 9 to 0 with the next character incremented. This could get troublesome.

What I did was to set a string up to "0000", pick it up and store it, set a pointer to the lowest character, then increment the character pointed to by the pointer. Check if the character exceeds '9', if not pickup and save the string. If greater then '9' add (256-10) to the WORD which has the current pointer as its lowest BYTE which will set the current character to '0' (256-10+'9'+1 = '0' plus a carry to the upper BYTE) and increment the next higher character, then change the pointer to check that character for exceeding '9', etc, etc. After finding a valid character, always drop the pointer back to the lowest character and pick up the value and save it and increment the character at the pointer.

Dave

The order of bytes you suggested is correct, Dave.

It should not be that difficult, at least as I foresee it, I haven't yet
implemented it, but I only imagined it. Jochen's suggestion is
probably the fastest path, but we are still in the preliminary steps,
everything should be tested before seeing the results.

Your idea seems to use mem vars, and this could slow down the
process. 

If you test it, let me know your results.  :t
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

KeepingRealBusy

You could use regs and different offsets in the regs for the testing and correcting. I was using an 8 byte valuefor my test.

Dave.

frktons

A couple of years ago I wrote a prototype in PBCC and after I translated it
in asm for a program that displays in a console screen all the ASCII chars
from 1 to 255, each one of them preceded by the ASCII number of the char.
I used 3 BYTE variables to represent the hundreds, tens and units, and filled
the console buffer with them.

Now I'd like to have a much faster routine, so instead of variables I'm going
to use GPRS and XMM registers. I need some time to write and test the code.
In the meanwhile if somebody tries any solution, is welcome to post the
solution and the performance test results.  ;)





There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

CommonTater

Quote from: frktons on December 09, 2012, 02:49:23 AM
I'd like to fill an array of 1000 dword with the string values ranging
from '   0' to ' 999'.
To avoid 400k increment in program size and writing manually 1000
values in .data, I prefer to declare the array in .data? and fill the
array with a proc: call FillArray.
I've some ideas on how to to that, but before starting the tests I'd like
your suggestions, code, already done experiments.. to think upon.

Let me know.

Frank

Rather than attempting to contain a large array inside your program it would be better to allocate memory from a heap at run time.  It's the same thing... but it doesn't bloat the daylights out of your program image.


dedndave

tater,
1000 dwords is quite small, in terms of memory allocation
when you put something in the uninitialized data section, the size of the exe does not increase
only the image at run-time

if you allocate memory or put it in uninitialized data, it uses memory either way
the advantage to allocation is that it may be free'd up
so - if the table is to be used throughout program execution, that advantage is lost

frktons

Quote from: dedndave on December 09, 2012, 06:54:28 AM
tater,
1000 dwords is quite small, in terms of memory allocation
when you put something in the uninitialized data section, the size of the exe does not increase
only the image at run-time

if you allocate memory or put it in uninitialized data, it uses memory either way
the advantage to allocation is that it may be free'd up
so - if the table is to be used throughout program execution, that advantage is lost

Agreed, the idea is to use the array throughout program execution.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

Quote from: dedndave on December 09, 2012, 06:54:28 AM
when you put something in the uninitialized data section, the size of the exe does not increase
only the image at run-time

Test it:
include \masm32\include\masm32rt.inc

.data?
MyArray dd 280000 dup(?)

.code
start: inkey "Hello World"
exit

end start


... and get to know a well-known MASM bug. Even ML 10 still shows that odd behaviour...

dedndave

yah - i know that bug, Jochen - lol
but, we are only allocating 1000 dwords   :biggrin:

frktons

Quote from: jj2007 on December 09, 2012, 07:14:20 AM

Test it:
include \masm32\include\masm32rt.inc

.data?
MyArray dd 280000 dup(?)

.code
start: inkey "Hello World"
exit

end start


... and get to know a well-known MASM bug. Even ML 10 still shows that odd behaviour...

I didn't get any strange behaviour, and the exe size is 3072 bytes.
when the pgm runs it displays Hello World and waits for the
keystroke.  :icon_rolleyes:

N.B. I use ML10.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

the strange behaviour is that it takes long to assemble - at least using ML v6.x it does

here you go, Frank
i get about 2.3 cycles per string on my P4 prescott

frktons

Quote from: dedndave on December 09, 2012, 08:41:05 AM
the strange behaviour is that it takes long to assemble - at least using ML v6.x it does

here you go, Frank
i get about 2.3 cycles per string on my P4 prescott


Good starting point, Dave. Let's see the next algos to decide the fastest around.

On my pc I have 1.3 cycles per string.  :t

Quote
;that'll be $50, Frank
We'll see in few days if you earned them  :lol:
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

For the time being, Dave's routine got these results on my pc:
Quote
------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
------------------------------------------------------------------------
1934    cycles for Dedndave code
------------------------------------------------------------------------
1927    cycles for Dedndave code
------------------------------------------------------------------------
1917    cycles for Dedndave code
------------------------------------------------------------------------
1918    cycles for Dedndave code
------------------------------------------------------------------------

That is about 1.9 cycles per dword string. Not bad. I think we
can arrive at 0.7 cycles per dword string, but it has yet to be
demonstrated.  :lol:
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave


dedndave

------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
------------------------------------------------------------------------
2556    cycles for Dedndave code
------------------------------------------------------------------------
2256    cycles for Dedndave code
------------------------------------------------------------------------
2258    cycles for Dedndave code
------------------------------------------------------------------------
2271    cycles for Dedndave code
------------------------------------------------------------------------