Print Page - Time to crash the "best algo" to format dword unsigned numbers

Title: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 26, 2012, 11:06:06 AM

There are a lot of algos to convert integer binary to ASCII numbers.
Some of them also add thousand separator. The best algo for this
kind of job was posted some time ago by clive. I don't know
if something new and better has come out in the last two years, but
for what I have in my mind, that best algo can be replaced with a faster
one that is about 2:1 faster.
While I'm going to do some tests to see how my ideas can become a
real algo, I'll post clive's algo to see if someone has already done
something faster, testing it vs all the solutions you have realized for
your own pleasure or necessity, and are eager to compare with other's.

This is another drop in the ocean called "File Compressor" that is and
will be my main project for the months to come. :biggrin:

Frank

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 26, 2012, 01:08:52 PM

Here we start with the algos to deal with.
Both were written by clive about 2 years ago and,
at that time, were the fastest around.

Enjoy the challenge. 8)

Frank

Quote
------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
------------------------------------------------------------------------
3.233 cycles for NumFormat - IDIV / Stack written by clive
1.934 cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------
2.947 cycles for NumFormat - IDIV / Stack written by clive
1.962 cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------
2.939 cycles for NumFormat - IDIV / Stack written by clive
2.869 cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------
4.356 cycles for NumFormat - IDIV / Stack written by clive
2.899 cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 05:30:10 AM

According to my vision, these algos are simple and efficient
in their design. And this fact is confirmed by the speed test
we performed some time ago.

After two years of no coding at all, in my mind have arised
new ideas about optimization, that mostly depends on a
paradigm shift and a change in the algos design.

I see now clearly some weakness in the overall design that
I didn't see some time ago.

The most important weakness I can see is the single byte
processing. To obtain a single formatted byte the algo has to:
divide or multiply / move / add / push / pop / move.

I have the idea that we can have a much faster algo if we
use a parallel approach [at least 4 bytes in the same cycle]
bypassing unnecessary steps altogether.

So the first step is to put together some processing, making
the single byte processing into a multibytes processing.

In next post I'll show you the code doing this kind of transformation.
Stay tuned, and post your code as well if you like.

Frank :biggrin:

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 27, 2012, 05:43:22 AM

feeling kinda lonely in this thread, Frank ? :(
i will come and keep you company for a minute

you may not get a lot of participation on this one
that's because you are optimizing a function that will be used only a few times per session
very rare are the cases where you might want to format a million string values :P

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 06:07:40 AM

Quote from: dedndave on December 27, 2012, 05:43:22 AM
feeling kinda lonely in this thread, Frank ? :(
i will come and keep you company for a minute

you may not get a lot of participation on this one
that's because you are optimizing a function that will be used only a few times per session
very rare are the cases where you might want to format a million string values :P

I need this function for formatting about 100-120 numbers per page
when I'll examine the recurrences of chars or tokens in the program
to compress a text file.

It doesn't matter, for the time being, how many partecipants will be
into this drop of optimization. I'm just sharing my thoughts and code.

Thanks anyway for your welcome company. :t

Of course everybody is welcome. :biggrin:

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 06:16:16 AM

And talking about optimization, here we are with the first step
in the speed up process. I'm afraid the first step already crashed
the previous fastest algo :redface:

But the new algo design is so powerful it just will shine in the
next steps. :lol:

Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
4.489 cycles for NumFormat - IDIV / Stack written by clive
2.082 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.301 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
2.995 cycles for NumFormat - IDIV / Stack written by clive
2.075 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.950 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
4.488 cycles for NumFormat - IDIV / Stack written by clive
3.093 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.961 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
4.505 cycles for NumFormat - IDIV / Stack written by clive
3.117 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.951 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 27, 2012, 07:33:00 AM

prescott w/htt

Quote----------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,627 cycles for NumFormat - IDIV / Stack written by clive
3,551 cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,606 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
8,715 cycles for NumFormat - IDIV / Stack written by clive
3,578 cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,587 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
8,654 cycles for NumFormat - IDIV / Stack written by clive
3,535 cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,622 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
8,631 cycles for NumFormat - IDIV / Stack written by clive
3,619 cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,663 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 07:39:40 AM

Dave, for your old PC :redface: I think you'll have to wait until
the II step to see "The Crash" :lol:

So far the new version is 2.5:1 faster compared to its old equivalent
anyway, so my suspects are confirmed. Designing a parallel data
processing pays. :P

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: jj2007 on December 27, 2012, 10:40:39 AM

Ciao Frank,

It looks very promising :t

Buon Natale,
Jochen

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 27, 2012, 11:02:17 AM

Quote from: jj2007 on December 27, 2012, 10:40:39 AM
Ciao Frank,

It looks very promising :t

Buon Natale,
Jochen

Buone feste anche a te, Jochen.

Well tomorrow I'm going to post the II step algo, that should
definitively outdo the old formatting algo performances.

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: Gunther on December 27, 2012, 11:12:47 PM

Hi frktons,

here are my results:

Code Select


----------------------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2
----------------------------------------------------------------------------
2.323	cycles for NumFormat - IDIV / Stack written by clive
1.472	cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.210	cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
2.158	cycles for NumFormat - IDIV / Stack written by clive
1.795	cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.143	cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
1.749	cycles for NumFormat - IDIV / Stack written by clive
1.264	cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.204	cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
1.937	cycles for NumFormat - IDIV / Stack written by clive
1.476	cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.463	cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------

--- ok ---

Gunther

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 28, 2012, 01:30:51 AM

Quote from: Gunther on December 27, 2012, 11:12:47 PM
Hi frktons,

here are my results:

Gunther

Thanks Gunther, your results confirm mine, on new machines
parallel processing is much faster.

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: Gunther on December 28, 2012, 01:39:32 AM

Hi frktons,

Quote from: frktons on December 28, 2012, 01:30:51 AM
Thanks Gunther, your results confirm mine, on new machines
parallel processing is much faster.

yes, of course. That isn't a surprise.

Gunther

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 28, 2012, 11:35:37 AM

Quote from: Gunther on December 28, 2012, 01:39:32 AM
Hi frktons,

yes, of course. That isn't a surprise.

Gunther

Not a surprise, but it is good to see it by myself. :P

The III step version is coming in a few days.

Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
2.957 cycles for NumFormatX - IDIV / Stack
2.210 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988 cycles for NumFormatF-I - IDIV / Stack / Table
1.581 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
4.430 cycles for NumFormatX - IDIV / Stack
3.017 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988 cycles for NumFormatF-I - IDIV / Stack / Table
1.563 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
4.545 cycles for NumFormatX - IDIV / Stack
3.034 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988 cycles for NumFormatF-I - IDIV / Stack / Table
1.562 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
4.461 cycles for NumFormatX - IDIV / Stack
3.016 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988 cycles for NumFormatF-I - IDIV / Stack / Table
1.562 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------

Press a key to exit ...

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 28, 2012, 02:22:57 PM

if you want more repeatable results, let the test run a little longer :P

QuoteIntel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,338 cycles for NumFormatX - IDIV / Stack
3,616 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,319 cycles for NumFormatF-I - IDIV / Stack / Table
1,296 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
8,598 cycles for NumFormatX - IDIV / Stack
3,499 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,281 cycles for NumFormatF-I - IDIV / Stack / Table
1,297 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
9,171 cycles for NumFormatX - IDIV / Stack
3,497 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,313 cycles for NumFormatF-I - IDIV / Stack / Table
1,297 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
8,566 cycles for NumFormatX - IDIV / Stack
3,479 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,295 cycles for NumFormatF-I - IDIV / Stack / Table
1,296 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 28, 2012, 03:44:10 PM

Quote from: dedndave on December 28, 2012, 02:22:57 PM
if you want more repeatable results, let the test run a little longer :P

These results are more real than making the loop 1 million times.
In real life, and in the "File Compressor" 1000 times are more than enough.
:P

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: Gunther on December 28, 2012, 10:31:18 PM

Hi frktons,

Quote from: frktons on December 28, 2012, 03:44:10 PM
In real life, and in the "File Compressor" 1000 times are more than enough.
:P

that's not quiet right. A few years ago, I've written a fractal image compressor. The bottle neck were 5 simple procedures (calculating arithmetic mean, dot product etc.) which wasted 90% of computing time. For a small image with, lets say, 256x256 pixels those functions were called over 30 million times. That's the reality.

Gunther

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 28, 2012, 10:52:06 PM

what i was refering to was increasing the loop count for longer tests
these numbers won't jump all over the place - makes it easier to get a comparison

Quote8,568 cycles for NumFormatX - IDIV / Stack
8,523 cycles for NumFormatX - IDIV / Stack
8,678 cycles for NumFormatX - IDIV / Stack
8,526 cycles for NumFormatX - IDIV / Stack
8,576 cycles for NumFormatX - IDIV / Stack
9,091 cycles for NumFormatX - IDIV / Stack
8,394 cycles for NumFormatX - IDIV / Stack
8,301 cycles for NumFormatX - IDIV / Stack
8,709 cycles for NumFormatX - IDIV / Stack
8,390 cycles for NumFormatX - IDIV / Stack
8,605 cycles for NumFormatX - IDIV / Stack
8,648 cycles for NumFormatX - IDIV / Stack
8,643 cycles for NumFormatX - IDIV / Stack
8,529 cycles for NumFormatX - IDIV / Stack
8,199 cycles for NumFormatX - IDIV / Stack
8,527 cycles for NumFormatX - IDIV / Stack
8,406 cycles for NumFormatX - IDIV / Stack
8,622 cycles for NumFormatX - IDIV / Stack
8,533 cycles for NumFormatX - IDIV / Stack
8,516 cycles for NumFormatX - IDIV / Stack
8,832 cycles for NumFormatX - IDIV / Stack
8,570 cycles for NumFormatX - IDIV / Stack
8,402 cycles for NumFormatX - IDIV / Stack
8,663 cycles for NumFormatX - IDIV / Stack
8,141 cycles for NumFormatX - IDIV / Stack
8,369 cycles for NumFormatX - IDIV / Stack
8,360 cycles for NumFormatX - IDIV / Stack
8,292 cycles for NumFormatX - IDIV / Stack

they are jumping around +/- 6 % ::)

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 29, 2012, 07:26:22 AM

Quote from: Gunther on December 28, 2012, 10:31:18 PM
Hi frktons,

Quote from: frktons on December 28, 2012, 03:44:10 PM
In real life, and in the "File Compressor" 1000 times are more than enough.
:P

that's not quiet right. A few years ago, I've written a fractal image compressor. The bottle neck were 5 simple procedures (calculating arithmetic mean, dot product etc.) which wasted 90% of computing time. For a small image with, lets say, 256x256 pixels those functions were called over 30 million times. That's the reality.

Gunther

Hi Gunther,
In your case it's good to think in these terms, in my case, with
a routine that formats binary numbers into strings, it isn't.
The reality I'm speaking of is just the program in which I'm going
to use the routine, not the universal and irreversible truth. :P

Quote from: dedndave on December 28, 2012, 10:52:06 PM
what i was refering to was increasing the loop count for longer tests
these numbers won't jump all over the place - makes it easier to get a comparison,
....
they are jumping around +/- 6 % ::)

According to my experience, the most credible tests are those that
simulate in the best possible way the reality we have to simulate.
Having a loop with million/s calls to the routines is useful only
when you presume the program is going to do that, otherwise there
is no point simulating a non-existing/high improbable case, that takes in the overhead
of a multitask environment, among other influences as well.

+/- 6% is a reasonable amount I can accept. :biggrin:

Frank

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 29, 2012, 07:50:39 AM

i'm good if you are 8)

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 29, 2012, 07:56:39 AM

Quote from: dedndave on December 29, 2012, 07:50:39 AM
i'm good if you are 8)

I'm more than good if you are good. :biggrin:
I only hope that you understand my point of view, even if you disagree,
or your experince tells you otherwise. :t

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: MichaelW on December 29, 2012, 07:58:16 AM

Quote from: frktons on December 29, 2012, 07:26:22 AM
+/- 6% is a reasonable amount I can accept.

What about when you are trying to optimize your code for speed? In this case with a 12% uncertainty in the run time you would have to run multiple trials to recognize small increases in speed. With a higher loop count you could recognize these increases in one trial.

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 29, 2012, 08:13:20 AM

Quote from: MichaelW on December 29, 2012, 07:58:16 AM
Quote from: frktons on December 29, 2012, 07:26:22 AM
+/- 6% is a reasonable amount I can accept.

What about when you are trying to optimize your code for speed? In this case with a 12% uncertainty in the run time you would have to run multiple trials to recognize small increases in speed. With a higher loop count you could recognize these increases in one trial.

Yes Michael, of course the tests have to be useful for
something. In this case I'm just showing that an old algo can
be surpassed by a new one parallellizing some passages.
I started the thread saying that we can have an algo that
is about 2:1 faster than the previous one. If I run the test 1,000
times or 1 million times, it doesn't change a lot the results,
and if it does, it is highly probable external influences that do
it, not the algo in itself.
But the most important thing is, from my point of view, what are we
looking for in a test. About 2:1 is about 100% faster, that's enough
for what I'm searching for. If a +/- 6% occurr in "about 100%" then
it is reasonable, +/-20/30% would be less reasonable.

Frank

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 29, 2012, 08:40:01 AM

For lovers of longer loop counts :lol:

This test had a loop count of 10 millions:

Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
3.005 cycles for NumFormatX - IDIV / Stack
2.203 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.367 cycles for NumFormatF-I - IDIV / Stack / Table
1.185 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.113 cycles for NumFormatX - IDIV / Stack
2.338 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.475 cycles for NumFormatF-I - IDIV / Stack / Table
1.126 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.130 cycles for NumFormatX - IDIV / Stack
2.121 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.496 cycles for NumFormatF-I - IDIV / Stack / Table
1.109 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.197 cycles for NumFormatX - IDIV / Stack
2.285 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.463 cycles for NumFormatF-I - IDIV / Stack / Table
1.015 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------

Press a key to exit ...

This is NOT the final test, by the way. The III step is work in progress...

Dave, in your opinion/knowledge, what's the meaning of 6% difference
between:

Quote
3.005 cycles for NumFormatX - IDIV / Stack

and

Quote
3.197 cycles for NumFormatX - IDIV / Stack

I'm not able to interpret it correctly, not to speak of 16% difference:

Quote
1.185 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table

and

Quote
1.015 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table

Is somebody able to tell me what this means? I just think this is
a sign of external influence [OS, swap, other processes in memory...]
and has nothing to do with the algo we are testing.

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: MichaelW on December 29, 2012, 06:54:29 PM

Maybe it will work as expected if you try 10 billion :biggrin:

Now that I look at your code I see that you are not setting an affinity mask to limit the process/thread to a single core. You cannot depend on the time-stamp counters being in sync between the cores, and the system can (allegedly) move the process/thread to a different core than it started on. If that does not correct the problem then try increasing the priority. Running on a P4 processor with HT, I have had no problems at the highest possible priority (combining REALTIME_PRIORITY_CLASS with THREAD_PRIORITY_TIME_CRITICAL), even with test code that triggered an exception. Note that the macros you are using cannot control the thread priority, but there is nothing to stop you from doing this separately.

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 12:20:09 AM

Quote from: MichaelW on December 29, 2012, 06:54:29 PM
Maybe it will work as expected if you try 10 billion :biggrin:

Now that I look at your code I see that you are not setting an affinity mask to limit the process/thread to a single core. You cannot depend on the time-stamp counters being in sync between the cores, and the system can (allegedly) move the process/thread to a different core than it started on. If that does not correct the problem then try increasing the priority. Running on a P4 processor with HT, I have had no problems at the highest possible priority (combining REALTIME_PRIORITY_CLASS with THREAD_PRIORITY_TIME_CRITICAL), even with test code that triggered an exception. Note that the macros you are using cannot control the thread priority, but there is nothing to stop you from doing this separately.

Well Michael, I'm here to learn something. So what should I do?
Should I modify this line:

Code Select


	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS

or use a new macro/call/whatever?
Better if you post the code other than your suggestion, so I can
understand what you mean. :biggrin:

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 12:50:32 AM

before you run any tests...

Code Select

        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
        INVOKE  Sleep,500

that sets the processor affinity to a single core (core 0)
the Sleep gives it a chance to bind, 750 mS works a little better, but who wants to wait that long :P
the idea is to relinquish the current time-slice, then wait a little extra

then, for each test, try to choose a loop count that yields a 500 mS test
even using HIGH_PRIORITY_CLASS, you will see fairly stable results
if you use REALTIME_PRIORITY_CLASS, and the test is a bit too long, it may hang

you may get a little extra stability by diddling with the thread priority, but generally not worth the effort
on machines with more than 64 cores, you might have to play with thread affinity
the documents tell us little about machines that have 33 to 64 cores :lol:
i don't think any forum members have more than 8 cores
for us normal people, thread affinity is simply a subset of process affinity

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 01:05:00 AM

there is some behaviour that i have not figured out
that is - we usually run some code to identify and display the cpu name and features
then we run the tests

if you put the cpu id code after the tests, you will get different results
at least, that's how it is on my machine, which is a little weird because it's running XP MCE

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 02:41:01 AM

Using the 3 lines of code that Dave suggests, and running the
test 1 million times:

Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
2.923 cycles for NumFormatX - IDIV / Stack
2.179 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.463 cycles for NumFormatF-I - IDIV / Stack / Table
1.249 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.119 cycles for NumFormatX - IDIV / Stack
2.197 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.475 cycles for NumFormatF-I - IDIV / Stack / Table
1.249 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.131 cycles for NumFormatX - IDIV / Stack
2.198 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.484 cycles for NumFormatF-I - IDIV / Stack / Table
1.256 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
3.134 cycles for NumFormatX - IDIV / Stack
2.210 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.475 cycles for NumFormatF-I - IDIV / Stack / Table
1.248 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------

What can we say now?

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 03:11:36 AM

it's not "how many times" you run the test, really
it's how much time you spend running it
you want to run it for a long enough period of time so that OS intervention is negligible
or, at least, consistent - lol

Quotethen, for each test, try to choose a loop count that yields a 500 mS test

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 06:19:03 AM

Quote from: dedndave on December 30, 2012, 03:11:36 AM
it's not "how many times" you run the test, really
it's how much time you spend running it
you want to run it for a long enough period of time so that OS intervention is negligible
or, at least, consistent - lol

Quotethen, for each test, try to choose a loop count that yields a 500 mS test

The program I'm going to use this function in will never have so long
time to run for this specific function. So I really don't understand what's
the point of doing so long test. The results I get are not the speed the
function is going to have while I use it.

By the way I've almost finished:

Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
3.268 cycles for NumFormatX - IDIV / Stack
2.332 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.514 cycles for NumFormatF-I - IDIV / Stack / Table
1.282 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.199 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
891 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3.371 cycles for NumFormatX - IDIV / Stack
2.253 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.592 cycles for NumFormatF-I - IDIV / Stack / Table
1.237 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.296 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
973 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3.220 cycles for NumFormatX - IDIV / Stack
2.306 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.477 cycles for NumFormatF-I - IDIV / Stack / Table
1.267 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.215 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
908 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------

In a few days I'm going to post the final results and program. :biggrin:

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 08:07:30 AM

the purpose is merely to get stable, repeatable results
it isn't for my benefit - it is for yours
it will make it easier to interpret the data, is all

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 09:00:32 AM

Quote from: dedndave on December 30, 2012, 08:07:30 AM
the purpose is merely to get stable, repeatable results
it isn't for my benefit - it is for yours
it will make it easier to interpret the data, is all

If the principle of getting stable and repeatable results
is referred to general algos, I do understand all the things
that you, Michael, or others have said.
My doubts are only referred to the application of these
principles to a case in which I forecast the routine will
be called 100-200 times for each cycle.

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 10:20:04 AM

Time to move towards other routines and the main
project: Text File Scanner & Compressor.

For the time being I've already fulfilled the goal of a
2:1 faster algo. It is in reality about 3:1 faster.
My routines right align the numbers, as you can see with
one of the two programs attached, clive's ones align left.

Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
3.091 cycles for NumFormatX - IDIV / Stack
2.463 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.425 cycles for NumFormatF-I - IDIV / Stack / Table
1.225 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.193 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
912 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3.252 cycles for NumFormatX - IDIV / Stack
2.447 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.408 cycles for NumFormatF-I - IDIV / Stack / Table
1.261 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.177 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
929 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3.240 cycles for NumFormatX - IDIV / Stack
2.434 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.403 cycles for NumFormatF-I - IDIV / Stack / Table
1.230 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1.179 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
902 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------

If somebody posts his results I'd be nice.

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on December 30, 2012, 10:26:41 AM

;)

QuoteMicrosoft Windows XP [Version 5.1.2600]
(C) Copyri0ht 1985-2001 Micros0ft Corp. 0
0 9 9 9
9Documen10 and Settings\Dave10esktop => pf 10
10 99 99 99
99 100 100 100
100 999 999 999
999 1,000 1,000 1.000 1.
000 9,999 9,999 9.999 9.
999 10,000 10,000 10.000 10.
000 99,999 99,999 99.999 99.
999 100,000 100,000 100.000 100.
000 999,999 999,999 999.999 999.
999 1,000,000 1,000,000 1.000.000 1.000.
000 9,999,999 9,999,999 9.999.999 9.999.
999 10,000,000 10,000,000 10.000.000 10.000.
000 99,999,999 99,999,999 99.999.999 99.999.
999 100,000,000 100,000,000 100.000.000 100.000.
000 999,999,999 999,999,999 999.999.999 999.999.
999 1,000,000,000 1,000,000,000 1.000.000.000 1.000.000.
000 4,294,967,295 4,294,967,295 4.294.967.295 4.294.967.295

prescott w/htt

Quote----------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,328 cycles for NumFormatX - IDIV / Stack
3,178 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,364 cycles for NumFormatF-I - IDIV / Stack / Table
2,581 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,283 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
1,124 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
8,322 cycles for NumFormatX - IDIV / Stack
3,201 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,365 cycles for NumFormatF-I - IDIV / Stack / Table
2,564 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,280 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
1,125 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
8,266 cycles for NumFormatX - IDIV / Stack
3,172 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,382 cycles for NumFormatF-I - IDIV / Stack / Table
2,545 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,285 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
1,109 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 10:30:01 AM

Thanks Dave for your collaboration. As you can see your
student is trying to learn something, my master. :lol:

Oh, sorry. I forgot the usual $50 for you. :t

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 10:35:18 AM

I decided not to do, for the time being, the SSE version,
maybe in the future. Now there are a thousand more routines to prepare :lol:

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 11:27:22 AM

If somebody with an english setting of Windows can run
this test and post the results, I'll be glad.

I have to verify that the routines use the appropriate thousand separator
for english and non-english setting.

These routines have been checked and corrected in
order to adapt to the locale setting of the system.

Thanks

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: sinsi on December 30, 2012, 01:57:17 PM

0 0 0 0
9 9 9 9
10 10 10 10
99 99 99 99
100 100 100 100
999 999 999 999
1,000 1,000 1,000 1,000
9,999 9,999 9,999 9,999
10,000 10,000 10,000 10,000
99,999 99,999 99,999 99,999
100,000 100,000 100,000 100,000
999,999 999,999 999,999 999,999
1,000,000 1,000,000 1,000,000 1,000,000
9,999,999 9,999,999 9,999,999 9,999,999
10,000,000 10,000,000 10,000,000 10,000,000
99,999,999 99,999,999 99,999,999 99,999,999
100,000,000 100,000,000 100,000,000 100,000,000
999,999,999 999,999,999 999,999,999 999,999,999
1,000,000,000 1,000,000,000 1,000,000,000 1,000,000,000
4,294,967,295 4,294,967,295 4,294,967,295 4,294,967,295

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on December 30, 2012, 02:46:11 PM

Thanks Sinsi. This is a good test :t

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 06:52:14 AM

I forgot the last touch to get an algo 3:1 faster than previous ones:

Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
3,243 cycles for NumFormatX - IDIV / Stack
2,432 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,463 cycles for NumFormatF-I - IDIV / Stack / Table
1,278 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,173 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
763 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3,293 cycles for NumFormatX - IDIV / Stack
2,419 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,453 cycles for NumFormatF-I - IDIV / Stack / Table
1,250 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,170 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
776 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
3,259 cycles for NumFormatX - IDIV / Stack
2,411 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,453 cycles for NumFormatF-I - IDIV / Stack / Table
1,238 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,174 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
768 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------

Now it's OK. :t

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: jj2007 on January 07, 2013, 07:41:33 AM

:t
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz

Instructions: MMX, SSE1, SSE2, SSE3
--------------------------------------------------------------------
2,896 cycles for NumFormatX - IDIV / Stack
2,279 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,327 cycles for NumFormatF-I - IDIV / Stack / Table
1,231 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,055 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
777 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
--------------------------------------------------------------------
2,895 cycles for NumFormatX - IDIV / Stack
2,279 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,328 cycles for NumFormatF-I - IDIV / Stack / Table
1,241 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,076 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
777 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
--------------------------------------------------------------------
2,889 cycles for NumFormatX - IDIV / Stack
2,286 cycles for NumFormatX2 - Reciprocal IMUL / Stack

1,341 cycles for NumFormatF-I - IDIV / Stack / Table
1,271 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,056 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
777 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on January 07, 2013, 10:36:30 AM

prescott w/htt

Quote----------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,306 cycles for NumFormatX - IDIV / Stack
3,730 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,355 cycles for NumFormatF-I - IDIV / Stack / Table
2,556 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,293 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
968 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
8,285 cycles for NumFormatX - IDIV / Stack
3,744 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,368 cycles for NumFormatF-I - IDIV / Stack / Table
2,569 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,293 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
995 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------
8,381 cycles for NumFormatX - IDIV / Stack
3,719 cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,364 cycles for NumFormatF-I - IDIV / Stack / Table
2,543 cycles for NumFormatF-IA - IDIV / STRUCT / Table
1,286 cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
972 cycles for NumFormatF-III - Reciprocal IMUL / STRUCT / Table
----------------------------------------------------------------------------

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 10:41:22 AM

Dave,
on your PC my last algo is 4:1 faster than previous ones.
Very good my Master, you teached it well. :t
You'll find your $50 at the usual link. :lol:

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on January 07, 2013, 10:52:05 AM

what do you think you are, some kind of jedi knight or something ?
waving your link around in front of me
i'm an assembly programmer
mind tricks don't work on me
only money !

(http://img593.imageshack.us/img593/5135/tworkonmeonlymoney.jpg)

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 11:08:08 AM

Quote from: dedndave on January 07, 2013, 10:52:05 AM
what do you think you are, some kind of jedi knight or something ?
waving your link around in front of me
i'm an assembly programmer
mind tricks don't work on me
only money !

(http://img593.imageshack.us/img593/5135/tworkonmeonlymoney.jpg)

Nowadays we only use electronic money, so here it is, prepayment
for next algo included 8)
(http://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Usdollar100front.jpg/640px-Usdollar100front.jpg)

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: dedndave on January 07, 2013, 12:00:15 PM

ole' Ben looks pissed :lol:

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 12:12:20 PM

Quote from: dedndave on January 07, 2013, 12:00:15 PM
ole' Ben looks pissed :lol:

On his planet they don't use US dollars anymore. :lol:

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: jj2007 on January 07, 2013, 05:44:09 PM

So is it ready to be posted as a challenge on Pelle's C Forum? What is the corresponding CRT algo?
:P

Title: Re: Time to crash the "best algo" to format dword unsigned numbers
Post by: frktons on January 07, 2013, 07:47:30 PM

Quote from: jj2007 on January 07, 2013, 05:44:09 PM
So is it ready to be posted as a challenge on Pelle's C Forum? What is the corresponding CRT algo?
:P

Hi Jochen.
I don't know about C matters, or challenge on Pelle's C Forum.
Do you mean they could try to do better in C or there is already
a formatting routine in C? I haven't seen any of them so far. :icon_rolleyes:

The MASM Forum

General => The Laboratory => Topic started by: frktons on December 26, 2012, 11:06:06 AM