The MASM Forum

General => The Laboratory => Topic started by: frktons on December 26, 2012, 11:06:06 AM

Title: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 26, 2012, 11:06:06 AM
There are a lot of algos to convert integer binary to ASCII numbers.
Some of them also add thousand separator. The best algo for this
kind of job was posted some time ago by clive. I don't know
if something new and better has come out in the last two years, but
for what I have in my mind, that best algo can be replaced with a faster
one that is about 2:1 faster.
While I'm going to do some tests to see how my ideas can become a
real algo, I'll post clive's algo to see if someone has already done
something faster, testing it vs all the solutions you have realized for
your own pleasure or necessity, and are eager to compare with other's.

This is another drop in the ocean called "File Compressor" that is and
will be my main project for the months to come.  :biggrin:

Frank
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 26, 2012, 01:08:52 PM
Here we start with the algos to deal with.
Both were written by clive about 2 years ago and,
at that time, were the fastest around.

Enjoy the challenge.  8)

Frank

Quote
------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
------------------------------------------------------------------------
3.233   cycles for NumFormat - IDIV / Stack written by clive
1.934   cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------
2.947   cycles for NumFormat - IDIV / Stack written by clive
1.962   cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------
2.939   cycles for NumFormat - IDIV / Stack written by clive
2.869   cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------
4.356   cycles for NumFormat - IDIV / Stack written by clive
2.899   cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 05:30:10 AM
According to my vision, these algos are simple and efficient
in their design. And this fact is confirmed by the speed test
we performed some time ago.

After two years of no coding at all, in my mind have arised
new ideas about optimization, that mostly depends on a
paradigm shift and a change in the algos design.

I see now clearly some weakness in the overall design that
I didn't see some time ago.

The most important weakness I can see is the single byte
processing
. To obtain a single formatted byte the algo has to:
divide or multiply / move / add / push / pop / move.

I have the idea that we can have a much faster algo if we
use a parallel approach [at least 4 bytes in the same cycle]
bypassing unnecessary steps altogether.

So the first step is to put together some processing, making
the single byte processing into a multibytes processing.

In next post I'll show you the code doing this kind of transformation.
Stay tuned, and post your code as well if you like.

Frank  :biggrin:
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 27, 2012, 05:43:22 AM
feeling kinda lonely in this thread, Frank ?   :(
i will come and keep you company for a minute

you may not get a lot of participation on this one
that's because you are optimizing a function that will be used only a few times per session
very rare are the cases where you might want to format a million string values   :P
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 06:07:40 AM
Quote from: dedndave on December 27, 2012, 05:43:22 AM
feeling kinda lonely in this thread, Frank ?   :(
i will come and keep you company for a minute

you may not get a lot of participation on this one
that's because you are optimizing a function that will be used only a few times per session
very rare are the cases where you might want to format a million string values   :P

I need this function for formatting about 100-120 numbers per page
when I'll examine the recurrences of chars or tokens in the program
to compress a text file.

It doesn't matter, for the time being, how many partecipants will be
into this drop of optimization. I'm just sharing my thoughts and code.

Thanks anyway for your welcome company. :t

Of course everybody is welcome.  :biggrin:
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 06:16:16 AM
And talking about optimization, here we are with the first step
in the speed up process. I'm afraid the first step already crashed
the previous fastest algo  :redface:

But the new algo design is so powerful it just will shine in the
next steps.   :lol:
Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
4.489   cycles for NumFormat - IDIV / Stack written by clive
2.082   cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.301   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
2.995   cycles for NumFormat - IDIV / Stack written by clive
2.075   cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.950   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
4.488   cycles for NumFormat - IDIV / Stack written by clive
3.093   cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.961   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
4.505   cycles for NumFormat - IDIV / Stack written by clive
3.117   cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.951   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 27, 2012, 07:33:00 AM
prescott w/htt
Quote----------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,627   cycles for NumFormat - IDIV / Stack written by clive
3,551   cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,606   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
8,715   cycles for NumFormat - IDIV / Stack written by clive
3,578   cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,587   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
8,654   cycles for NumFormat - IDIV / Stack written by clive
3,535   cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,622   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
8,631   cycles for NumFormat - IDIV / Stack written by clive
3,619   cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,663   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 07:39:40 AM
Dave, for your old PC  :redface: I think you'll have to wait until
the II step to see "The Crash"  :lol:

So far the new version is 2.5:1 faster compared to its old equivalent
anyway, so my suspects are confirmed. Designing a parallel data
processing pays.  :P
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: jj2007 on December 27, 2012, 10:40:39 AM
Ciao Frank,

It looks very promising :t

Buon Natale,
Jochen
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 11:02:17 AM
Quote from: jj2007 on December 27, 2012, 10:40:39 AM
Ciao Frank,

It looks very promising :t

Buon Natale,
Jochen

Buone feste anche a te, Jochen.

Well tomorrow I'm going to post the II step algo, that should
definitively outdo the old formatting algo performances.
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: Gunther on December 27, 2012, 11:12:47 PM
Hi frktons,

here are my results:


----------------------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2
----------------------------------------------------------------------------
2.323 cycles for NumFormat - IDIV / Stack written by clive
1.472 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.210 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
2.158 cycles for NumFormat - IDIV / Stack written by clive
1.795 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.143 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
1.749 cycles for NumFormat - IDIV / Stack written by clive
1.264 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.204 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
1.937 cycles for NumFormat - IDIV / Stack written by clive
1.476 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.463 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------

--- ok ---


Gunther
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 28, 2012, 01:30:51 AM
Quote from: Gunther on December 27, 2012, 11:12:47 PM
Hi frktons,

here are my results:

Gunther

Thanks Gunther, your results confirm mine, on new machines
parallel processing is much faster.
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: Gunther on December 28, 2012, 01:39:32 AM
Hi frktons,

Quote from: frktons on December 28, 2012, 01:30:51 AM
Thanks Gunther, your results confirm mine, on new machines
parallel processing is much faster.

yes, of course. That isn't a surprise.

Gunther
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 28, 2012, 11:35:37 AM
Quote from: Gunther on December 28, 2012, 01:39:32 AM
Hi frktons,

yes, of course. That isn't a surprise.

Gunther

Not a surprise, but it is good to see it by myself. :P

The III step version is coming in a few days.

Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
2.957   cycles for NumFormatX - IDIV / Stack
2.210   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988   cycles for NumFormatF-I - IDIV / Stack / Table
1.581   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
4.430   cycles for NumFormatX - IDIV / Stack
3.017   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988   cycles for NumFormatF-I - IDIV / Stack / Table
1.563   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
4.545   cycles for NumFormatX - IDIV / Stack
3.034   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988   cycles for NumFormatF-I - IDIV / Stack / Table
1.562   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
4.461   cycles for NumFormatX - IDIV / Stack
3.016   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988   cycles for NumFormatF-I - IDIV / Stack / Table
1.562   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------

Press a key to exit ...

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 28, 2012, 02:22:57 PM
if you want more repeatable results, let the test run a little longer   :P
QuoteIntel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,338   cycles for NumFormatX - IDIV / Stack
3,616   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,319   cycles for NumFormatF-I - IDIV / Stack / Table
1,296   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
8,598   cycles for NumFormatX - IDIV / Stack
3,499   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,281   cycles for NumFormatF-I - IDIV / Stack / Table
1,297   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
9,171   cycles for NumFormatX - IDIV / Stack
3,497   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,313   cycles for NumFormatF-I - IDIV / Stack / Table
1,297   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
8,566   cycles for NumFormatX - IDIV / Stack
3,479   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,295   cycles for NumFormatF-I - IDIV / Stack / Table
1,296   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 28, 2012, 03:44:10 PM
Quote from: dedndave on December 28, 2012, 02:22:57 PM
if you want more repeatable results, let the test run a little longer   :P
These results are more real than making the loop 1 million times.
In real life, and in the "File Compressor" 1000 times are more than enough.
:P
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: Gunther on December 28, 2012, 10:31:18 PM
Hi frktons,

Quote from: frktons on December 28, 2012, 03:44:10 PM
In real life, and in the "File Compressor" 1000 times are more than enough.
:P

that's not quiet right. A few years ago, I've written a fractal image compressor. The bottle neck were 5 simple procedures (calculating arithmetic mean, dot product etc.) which wasted 90% of computing time. For a small image with, lets say, 256x256 pixels those functions were called over 30 million times. That's the reality.

Gunther
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 28, 2012, 10:52:06 PM
what i was refering to was increasing the loop count for longer tests
these numbers won't jump all over the place - makes it easier to get a comparison
Quote8,568   cycles for NumFormatX - IDIV / Stack
8,523   cycles for NumFormatX - IDIV / Stack
8,678   cycles for NumFormatX - IDIV / Stack
8,526   cycles for NumFormatX - IDIV / Stack
8,576   cycles for NumFormatX - IDIV / Stack
9,091   cycles for NumFormatX - IDIV / Stack
8,394   cycles for NumFormatX - IDIV / Stack
8,301   cycles for NumFormatX - IDIV / Stack
8,709   cycles for NumFormatX - IDIV / Stack
8,390   cycles for NumFormatX - IDIV / Stack
8,605   cycles for NumFormatX - IDIV / Stack
8,648   cycles for NumFormatX - IDIV / Stack
8,643   cycles for NumFormatX - IDIV / Stack
8,529   cycles for NumFormatX - IDIV / Stack
8,199   cycles for NumFormatX - IDIV / Stack
8,527   cycles for NumFormatX - IDIV / Stack
8,406   cycles for NumFormatX - IDIV / Stack
8,622   cycles for NumFormatX - IDIV / Stack
8,533   cycles for NumFormatX - IDIV / Stack
8,516   cycles for NumFormatX - IDIV / Stack
8,832   cycles for NumFormatX - IDIV / Stack
8,570   cycles for NumFormatX - IDIV / Stack
8,402   cycles for NumFormatX - IDIV / Stack
8,663   cycles for NumFormatX - IDIV / Stack
8,141   cycles for NumFormatX - IDIV / Stack
8,369   cycles for NumFormatX - IDIV / Stack
8,360   cycles for NumFormatX - IDIV / Stack
8,292   cycles for NumFormatX - IDIV / Stack
they are jumping around +/- 6 %   ::)
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 29, 2012, 07:26:22 AM
Quote from: Gunther on December 28, 2012, 10:31:18 PM
Hi frktons,

Quote from: frktons on December 28, 2012, 03:44:10 PM
In real life, and in the "File Compressor" 1000 times are more than enough.
:P

that's not quiet right. A few years ago, I've written a fractal image compressor. The bottle neck were 5 simple procedures (calculating arithmetic mean, dot product etc.) which wasted 90% of computing time. For a small image with, lets say, 256x256 pixels those functions were called over 30 million times. That's the reality.

Gunther
Hi Gunther,
In your case it's good to think in these terms, in my case, with
a routine that formats binary numbers into strings, it isn't.
The reality I'm speaking of is just the program in which I'm going
to use the routine, not the universal and irreversible truth. :P

Quote from: dedndave on December 28, 2012, 10:52:06 PM
what i was refering to was increasing the loop count for longer tests
these numbers won't jump all over the place - makes it easier to get a comparison,
....
they are jumping around +/- 6 %   ::)
According to my experience, the most credible tests are those that
simulate in the best possible way the reality we have to simulate.
Having a loop with million/s calls to the routines is useful only
when you presume the program is going to do that, otherwise there
is no point simulating a non-existing/high improbable case, that takes in the overhead
of a multitask environment, among other influences as well.

+/- 6% is a reasonable amount I can accept.  :biggrin:

Frank
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 29, 2012, 07:50:39 AM
i'm good if you are   8)
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 29, 2012, 07:56:39 AM
Quote from: dedndave on December 29, 2012, 07:50:39 AM
i'm good if you are   8)
I'm more than good if you are good. :biggrin:
I only hope that you understand my point of view, even if you disagree,
or your experince tells you otherwise. :t
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: MichaelW on December 29, 2012, 07:58:16 AM
Quote from: frktons on December 29, 2012, 07:26:22 AM
+/- 6% is a reasonable amount I can accept.

What about when you are trying to optimize your code for speed? In this case with a 12% uncertainty in the run time you would have to run multiple trials to recognize small increases in speed. With a higher loop count you could recognize these increases in one trial.
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 29, 2012, 08:13:20 AM
Quote from: MichaelW on December 29, 2012, 07:58:16 AM
Quote from: frktons on December 29, 2012, 07:26:22 AM
+/- 6% is a reasonable amount I can accept.

What about when you are trying to optimize your code for speed? In this case with a 12% uncertainty in the run time you would have to run multiple trials to recognize small increases in speed. With a higher loop count you could recognize these increases in one trial.


Yes Michael,  of course the tests have to be useful for
something. In this case I'm just showing that an old algo can
be surpassed by a new one parallellizing some passages.
I started the thread saying that we can have an algo that
is about 2:1 faster than the previous one. If I run the test 1,000
times or 1 million times, it doesn't change a lot the results,
and if it does, it is highly probable external influences that do
it, not the algo in itself.
But the most important thing is, from my point of view, what are we
looking for in a test. About 2:1 is about 100% faster, that's enough
for what I'm searching for. If a +/- 6% occurr in "about 100%" then
it is reasonable, +/-20/30% would be less reasonable.

Frank
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 29, 2012, 08:40:01 AM
For lovers of longer loop counts  :lol:

This test had a loop count of 10 millions:
Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
3.005   cycles for NumFormatX - IDIV / Stack
2.203   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.367   cycles for NumFormatF-I - IDIV / Stack / Table
1.185   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.113   cycles for NumFormatX - IDIV / Stack
2.338   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.475   cycles for NumFormatF-I - IDIV / Stack / Table
1.126   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.130   cycles for NumFormatX - IDIV / Stack
2.121   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.496   cycles for NumFormatF-I - IDIV / Stack / Table
1.109   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.197   cycles for NumFormatX - IDIV / Stack
2.285   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.463   cycles for NumFormatF-I - IDIV / Stack / Table
1.015   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------

Press a key to exit ...

This is NOT the final test, by the way. The III step is work in progress...

Dave, in your opinion/knowledge, what's the meaning of 6% difference
between:
Quote
3.005   cycles for NumFormatX - IDIV / Stack
and
Quote
3.197   cycles for NumFormatX - IDIV / Stack

I'm not able to interpret it correctly, not to speak of 16% difference:
Quote
1.185   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
and
Quote
1.015   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table

Is somebody able to tell me what this means? I just think this is
a sign of external influence [OS, swap, other processes in memory...]
and has nothing to do with the algo we are testing.
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: MichaelW on December 29, 2012, 06:54:29 PM
Maybe it will work as expected if you try 10 billion :biggrin:

Now that I look at your code I see that you are not setting an affinity mask to limit the process/thread to a single core. You cannot depend on the time-stamp counters being in sync between the cores, and the system can (allegedly) move the process/thread to a different core than it started on. If that does not correct the problem then try increasing the priority. Running on a P4 processor with HT, I have had no problems at the highest possible priority (combining REALTIME_PRIORITY_CLASS  with THREAD_PRIORITY_TIME_CRITICAL), even with test code that triggered an exception. Note that the macros you are using cannot control the thread priority, but there is nothing to stop you from doing this separately.


Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 12:20:09 AM
Quote from: MichaelW on December 29, 2012, 06:54:29 PM
Maybe it will work as expected if you try 10 billion :biggrin:

Now that I look at your code I see that you are not setting an affinity mask to limit the process/thread to a single core. You cannot depend on the time-stamp counters being in sync between the cores, and the system can (allegedly) move the process/thread to a different core than it started on. If that does not correct the problem then try increasing the priority. Running on a P4 processor with HT, I have had no problems at the highest possible priority (combining REALTIME_PRIORITY_CLASS  with THREAD_PRIORITY_TIME_CRITICAL), even with test code that triggered an exception. Note that the macros you are using cannot control the thread priority, but there is nothing to stop you from doing this separately.
Well Michael, I'm here to learn something. So what should I do?
Should I modify this line:

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS

or use a new macro/call/whatever?
Better if you post the code other than your suggestion, so I can
understand what you mean. :biggrin:
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 12:50:32 AM
before you run any tests...
        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
        INVOKE  Sleep,500

that sets the processor affinity to a single core (core 0)
the Sleep gives it a chance to bind, 750 mS works a little better, but who wants to wait that long   :P
the idea is to relinquish the current time-slice, then wait a little extra

then, for each test, try to choose a loop count that yields a 500 mS test
even using HIGH_PRIORITY_CLASS, you will see fairly stable results
if you use REALTIME_PRIORITY_CLASS, and the test is a bit too long, it may hang

you may get a little extra stability by diddling with the thread priority, but generally not worth the effort
on machines with more than 64 cores, you might have to play with thread affinity
the documents tell us little about machines that have 33 to 64 cores   :lol:
i don't think any forum members have more than 8 cores
for us normal people, thread affinity is simply a subset of process affinity
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 01:05:00 AM
there is some behaviour that i have not figured out
that is - we usually run some code to identify and display the cpu name and features
then we run the tests

if you put the cpu id code after the tests, you will get different results
at least, that's how it is on my machine, which is a little weird because it's running XP MCE
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 02:41:01 AM
Using the 3 lines of code that Dave suggests, and running the
test 1 million times:
Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
2.923   cycles for NumFormatX - IDIV / Stack
2.179   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.463   cycles for NumFormatF-I - IDIV / Stack / Table
1.249   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.119   cycles for NumFormatX - IDIV / Stack
2.197   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.475   cycles for NumFormatF-I - IDIV / Stack / Table
1.249   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.131   cycles for NumFormatX - IDIV / Stack
2.198   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.484   cycles for NumFormatF-I - IDIV / Stack / Table
1.256   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.134   cycles for NumFormatX - IDIV / Stack
2.210   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.475   cycles for NumFormatF-I - IDIV / Stack / Table
1.248   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
What can we say now?
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 03:11:36 AM
it's not "how many times" you run the test, really
it's how much time you spend running it
you want to run it for a long enough period of time so that OS intervention is negligible
or, at least, consistent - lol

Quotethen, for each test, try to choose a loop count that yields a 500 mS test
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 06:19:03 AM
Quote from: dedndave on December 30, 2012, 03:11:36 AM
it's not "how many times" you run the test, really
it's how much time you spend running it
you want to run it for a long enough period of time so that OS intervention is negligible
or, at least, consistent - lol

Quotethen, for each test, try to choose a loop count that yields a 500 mS test

The program I'm going to use this function in will never have so long
time to run for this specific function. So I really don't understand what's
the point of doing so long test. The results I get are not the speed the
function is going to have while I use it.

By the way I've almost finished:
Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
3.268   cycles for NumFormatX - IDIV / Stack
2.332   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.514   cycles for NumFormatF-I - IDIV / Stack / Table
1.282   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.199   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
891     cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3.371   cycles for NumFormatX - IDIV / Stack
2.253   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.592   cycles for NumFormatF-I - IDIV / Stack / Table
1.237   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.296   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
973     cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3.220   cycles for NumFormatX - IDIV / Stack
2.306   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.477   cycles for NumFormatF-I - IDIV / Stack / Table
1.267   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.215   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
908     cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------

In a few days I'm going to post the final results and program. :biggrin:
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 08:07:30 AM
the purpose is merely to get stable, repeatable results
it isn't for my benefit - it is for yours
it will make it easier to interpret the data, is all
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 09:00:32 AM
Quote from: dedndave on December 30, 2012, 08:07:30 AM
the purpose is merely to get stable, repeatable results
it isn't for my benefit - it is for yours
it will make it easier to interpret the data, is all

If the principle of getting stable and repeatable results
is referred to general algos, I do understand all the things
that you, Michael, or others have said.
My doubts are only referred to the application of these
principles to a case in which I forecast the routine will
be called 100-200 times for each cycle.
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 10:20:04 AM
Time to move towards other routines and the main
project: Text File Scanner & Compressor.

For the time being I've already fulfilled the goal of a
2:1 faster algo. It is in reality about 3:1 faster.
My routines right align the numbers, as you can see with
one of the two programs attached, clive's ones align left.
Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
3.091   cycles for NumFormatX - IDIV / Stack
2.463   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.425   cycles for NumFormatF-I - IDIV / Stack / Table
1.225   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.193   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  912   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3.252   cycles for NumFormatX - IDIV / Stack
2.447   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.408   cycles for NumFormatF-I - IDIV / Stack / Table
1.261   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.177   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  929   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3.240   cycles for NumFormatX - IDIV / Stack
2.434   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.403   cycles for NumFormatF-I - IDIV / Stack / Table
1.230   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.179   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  902   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------

If somebody posts his results I'd be nice.

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 10:26:41 AM
 ;)
QuoteMicrosoft Windows XP [Version 5.1.2600]
(C) Copyri0ht 1985-2001 Micros0ft Corp.                       0
  0       9                   9                               9
  9Documen10 and Settings\Dave10esktop => pf                 10
10       99                  99                             99
99       100                 100                           100
100       999                 999                           999
999       1,000               1,000                       1.000               1.
000       9,999               9,999                       9.999               9.
999       10,000              10,000                     10.000              10.
000       99,999              99,999                     99.999              99.
999       100,000             100,000                   100.000             100.
000       999,999             999,999                   999.999             999.
999       1,000,000           1,000,000               1.000.000           1.000.
000       9,999,999           9,999,999               9.999.999           9.999.
999       10,000,000          10,000,000             10.000.000          10.000.
000       99,999,999          99,999,999             99.999.999          99.999.
999       100,000,000         100,000,000           100.000.000         100.000.
000       999,999,999         999,999,999           999.999.999         999.999.
999       1,000,000,000       1,000,000,000       1.000.000.000       1.000.000.
000       4,294,967,295       4,294,967,295       4.294.967.295       4.294.967.295

prescott w/htt
Quote----------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,328   cycles for NumFormatX - IDIV / Stack
3,178   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,364   cycles for NumFormatF-I - IDIV / Stack / Table
2,581   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,283   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
1,124   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
8,322   cycles for NumFormatX - IDIV / Stack
3,201   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,365   cycles for NumFormatF-I - IDIV / Stack / Table
2,564   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,280   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
1,125   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
8,266   cycles for NumFormatX - IDIV / Stack
3,172   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,382   cycles for NumFormatF-I - IDIV / Stack / Table
2,545   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,285   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
1,109   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 10:30:01 AM
Thanks Dave for your collaboration. As you can see your
student is trying to learn something, my master.  :lol:

Oh, sorry. I forgot the usual $50 for you.  :t
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 10:35:18 AM
I decided not to do, for the time being, the SSE version,
maybe in the future. Now there are a thousand more routines to prepare  :lol:
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 11:27:22 AM
If somebody with an english setting of Windows can run
this test and post the results, I'll be glad.

I have to verify that the routines use the appropriate thousand separator
for english and non-english setting.

These routines have been checked and corrected  in
order to adapt to the locale setting of the system.

Thanks
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: sinsi on December 30, 2012, 01:57:17 PM


          0                   0                               0                   0
          9                   9                               9                   9
          10                  10                             10                  10
          99                  99                             99                  99
          100                 100                           100                 100
          999                 999                           999                 999
          1,000               1,000                       1,000               1,000
          9,999               9,999                       9,999               9,999
          10,000              10,000                     10,000              10,000
          99,999              99,999                     99,999              99,999
          100,000             100,000                   100,000             100,000
          999,999             999,999                   999,999             999,999
          1,000,000           1,000,000               1,000,000           1,000,000
          9,999,999           9,999,999               9,999,999           9,999,999
          10,000,000          10,000,000             10,000,000          10,000,000
          99,999,999          99,999,999             99,999,999          99,999,999
          100,000,000         100,000,000           100,000,000         100,000,000
          999,999,999         999,999,999           999,999,999         999,999,999
          1,000,000,000       1,000,000,000       1,000,000,000       1,000,000,000
          4,294,967,295       4,294,967,295       4,294,967,295       4,294,967,295

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 02:46:11 PM
Thanks Sinsi. This is a good test  :t
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 06:52:14 AM
I forgot the last touch to get an algo 3:1 faster than previous ones:
Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
3,243   cycles for NumFormatX - IDIV / Stack
2,432   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,463   cycles for NumFormatF-I - IDIV / Stack / Table
1,278   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,173   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  763   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3,293   cycles for NumFormatX - IDIV / Stack
2,419   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,453   cycles for NumFormatF-I - IDIV / Stack / Table
1,250   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,170   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  776   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3,259   cycles for NumFormatX - IDIV / Stack
2,411   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,453   cycles for NumFormatF-I - IDIV / Stack / Table
1,238   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,174   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  768   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------

Now it's OK.  :t
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: jj2007 on January 07, 2013, 07:41:33 AM
 :t
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz

Instructions: MMX, SSE1, SSE2, SSE3
--------------------------------------------------------------------
2,896   cycles for NumFormatX - IDIV / Stack
2,279   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,327   cycles for NumFormatF-I - IDIV / Stack / Table
1,231   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,055   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  777   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
--------------------------------------------------------------------
2,895   cycles for NumFormatX - IDIV / Stack
2,279   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,328   cycles for NumFormatF-I - IDIV / Stack / Table
1,241   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,076   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  777   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
--------------------------------------------------------------------
2,889   cycles for NumFormatX - IDIV / Stack
2,286   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,341   cycles for NumFormatF-I - IDIV / Stack / Table
1,271   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,056   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  777   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on January 07, 2013, 10:36:30 AM
prescott w/htt
Quote----------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,306   cycles for NumFormatX - IDIV / Stack
3,730   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,355   cycles for NumFormatF-I - IDIV / Stack / Table
2,556   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,293   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  968   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
8,285   cycles for NumFormatX - IDIV / Stack
3,744   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,368   cycles for NumFormatF-I - IDIV / Stack / Table
2,569   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,293   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  995   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
8,381   cycles for NumFormatX - IDIV / Stack
3,719   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,364   cycles for NumFormatF-I - IDIV / Stack / Table
2,543   cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,286   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
  972   cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 10:41:22 AM
Dave,
on your PC my last algo is 4:1 faster than previous ones.
Very good my Master, you teached it well.  :t
You'll find your $50 at the usual link.  :lol:
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on January 07, 2013, 10:52:05 AM
what do you think you are, some kind of jedi knight or something ?
waving your link around in front of me
i'm an assembly programmer
mind tricks don't work on me
only money !

(http://img593.imageshack.us/img593/5135/tworkonmeonlymoney.jpg)
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 11:08:08 AM
Quote from: dedndave on January 07, 2013, 10:52:05 AM
what do you think you are, some kind of jedi knight or something ?
waving your link around in front of me
i'm an assembly programmer
mind tricks don't work on me
only money !

(http://img593.imageshack.us/img593/5135/tworkonmeonlymoney.jpg)
Nowadays we only use electronic money, so here it is, prepayment
for next algo included  8)
(http://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Usdollar100front.jpg/640px-Usdollar100front.jpg)
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on January 07, 2013, 12:00:15 PM
ole' Ben looks pissed  :lol:
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 12:12:20 PM
Quote from: dedndave on January 07, 2013, 12:00:15 PM
ole' Ben looks pissed  :lol:
On his planet they don't use US dollars anymore. :lol:
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: jj2007 on January 07, 2013, 05:44:09 PM
So is it ready to be posted as a challenge on Pelle's C Forum? What is the corresponding CRT algo?
:P
Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 07:47:30 PM
Quote from: jj2007 on January 07, 2013, 05:44:09 PM
So is it ready to be posted as a challenge on Pelle's C Forum? What is the corresponding CRT algo?
:P

Hi Jochen.
I don't know about C matters, or challenge on Pelle's C Forum.
Do you mean they could try to do better in C or there is already
a formatting routine in C? I haven't seen any of them so far.  :icon_rolleyes: