News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Time to crash the "best algo" to format dword unsigned numbers

Started by frktons, December 26, 2012, 11:06:06 AM

Previous topic - Next topic

frktons

There are a lot of algos to convert integer binary to ASCII numbers.
Some of them also add thousand separator. The best algo for this
kind of job was posted some time ago by clive. I don't know
if something new and better has come out in the last two years, but
for what I have in my mind, that best algo can be replaced with a faster
one that is about 2:1 faster.
While I'm going to do some tests to see how my ideas can become a
real algo, I'll post clive's algo to see if someone has already done
something faster, testing it vs all the solutions you have realized for
your own pleasure or necessity, and are eager to compare with other's.

This is another drop in the ocean called "File Compressor" that is and
will be my main project for the months to come.  :biggrin:

Frank
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Here we start with the algos to deal with.
Both were written by clive about 2 years ago and,
at that time, were the fastest around.

Enjoy the challenge.  8)

Frank

Quote
------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
------------------------------------------------------------------------
3.233   cycles for NumFormat - IDIV / Stack written by clive
1.934   cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------
2.947   cycles for NumFormat - IDIV / Stack written by clive
1.962   cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------
2.939   cycles for NumFormat - IDIV / Stack written by clive
2.869   cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------
4.356   cycles for NumFormat - IDIV / Stack written by clive
2.899   cycles for NumFormatX - Rec.IMUL / Stack written by clive
------------------------------------------------------------------------

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

According to my vision, these algos are simple and efficient
in their design. And this fact is confirmed by the speed test
we performed some time ago.

After two years of no coding at all, in my mind have arised
new ideas about optimization, that mostly depends on a
paradigm shift and a change in the algos design.

I see now clearly some weakness in the overall design that
I didn't see some time ago.

The most important weakness I can see is the single byte
processing
. To obtain a single formatted byte the algo has to:
divide or multiply / move / add / push / pop / move.

I have the idea that we can have a much faster algo if we
use a parallel approach [at least 4 bytes in the same cycle]
bypassing unnecessary steps altogether.

So the first step is to put together some processing, making
the single byte processing into a multibytes processing.

In next post I'll show you the code doing this kind of transformation.
Stay tuned, and post your code as well if you like.

Frank  :biggrin:
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

feeling kinda lonely in this thread, Frank ?   :(
i will come and keep you company for a minute

you may not get a lot of participation on this one
that's because you are optimizing a function that will be used only a few times per session
very rare are the cases where you might want to format a million string values   :P

frktons

Quote from: dedndave on December 27, 2012, 05:43:22 AM
feeling kinda lonely in this thread, Frank ?   :(
i will come and keep you company for a minute

you may not get a lot of participation on this one
that's because you are optimizing a function that will be used only a few times per session
very rare are the cases where you might want to format a million string values   :P

I need this function for formatting about 100-120 numbers per page
when I'll examine the recurrences of chars or tokens in the program
to compress a text file.

It doesn't matter, for the time being, how many partecipants will be
into this drop of optimization. I'm just sharing my thoughts and code.

Thanks anyway for your welcome company. :t

Of course everybody is welcome.  :biggrin:
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

And talking about optimization, here we are with the first step
in the speed up process. I'm afraid the first step already crashed
the previous fastest algo  :redface:

But the new algo design is so powerful it just will shine in the
next steps.   :lol:
Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
4.489   cycles for NumFormat - IDIV / Stack written by clive
2.082   cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.301   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
2.995   cycles for NumFormat - IDIV / Stack written by clive
2.075   cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.950   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
4.488   cycles for NumFormat - IDIV / Stack written by clive
3.093   cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.961   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
4.505   cycles for NumFormat - IDIV / Stack written by clive
3.117   cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.951   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

prescott w/htt
Quote----------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,627   cycles for NumFormat - IDIV / Stack written by clive
3,551   cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,606   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
8,715   cycles for NumFormat - IDIV / Stack written by clive
3,578   cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,587   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
8,654   cycles for NumFormat - IDIV / Stack written by clive
3,535   cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,622   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
8,631   cycles for NumFormat - IDIV / Stack written by clive
3,619   cycles for NumFormatX - Rec.IMUL / Stack written by clive
3,663   cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------

frktons

Dave, for your old PC  :redface: I think you'll have to wait until
the II step to see "The Crash"  :lol:

So far the new version is 2.5:1 faster compared to its old equivalent
anyway, so my suspects are confirmed. Designing a parallel data
processing pays.  :P
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007


frktons

Quote from: jj2007 on December 27, 2012, 10:40:39 AM
Ciao Frank,

It looks very promising :t

Buon Natale,
Jochen

Buone feste anche a te, Jochen.

Well tomorrow I'm going to post the II step algo, that should
definitively outdo the old formatting algo performances.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Gunther

Hi frktons,

here are my results:


----------------------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2
----------------------------------------------------------------------------
2.323 cycles for NumFormat - IDIV / Stack written by clive
1.472 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.210 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
2.158 cycles for NumFormat - IDIV / Stack written by clive
1.795 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.143 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
1.749 cycles for NumFormat - IDIV / Stack written by clive
1.264 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.204 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------
1.937 cycles for NumFormat - IDIV / Stack written by clive
1.476 cycles for NumFormatX - Rec.IMUL / Stack written by clive
1.463 cycles for NumFormatF-I - IDIV / Stack / Table written by frktons
----------------------------------------------------------------------------

--- ok ---


Gunther
You have to know the facts before you can distort them.

frktons

Quote from: Gunther on December 27, 2012, 11:12:47 PM
Hi frktons,

here are my results:

Gunther

Thanks Gunther, your results confirm mine, on new machines
parallel processing is much faster.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Gunther

Hi frktons,

Quote from: frktons on December 28, 2012, 01:30:51 AM
Thanks Gunther, your results confirm mine, on new machines
parallel processing is much faster.

yes, of course. That isn't a surprise.

Gunther
You have to know the facts before you can distort them.

frktons

Quote from: Gunther on December 28, 2012, 01:39:32 AM
Hi frktons,

yes, of course. That isn't a surprise.

Gunther

Not a surprise, but it is good to see it by myself. :P

The III step version is coming in a few days.

Quote
----------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------------
2.957   cycles for NumFormatX - IDIV / Stack
2.210   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988   cycles for NumFormatF-I - IDIV / Stack / Table
1.581   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
4.430   cycles for NumFormatX - IDIV / Stack
3.017   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988   cycles for NumFormatF-I - IDIV / Stack / Table
1.563   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
4.545   cycles for NumFormatX - IDIV / Stack
3.034   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988   cycles for NumFormatF-I - IDIV / Stack / Table
1.562   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
4.461   cycles for NumFormatX - IDIV / Stack
3.016   cycles for NumFormatX2 - Reciprocal IMUL / Stack

1.988   cycles for NumFormatF-I - IDIV / Stack / Table
1.562   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------

Press a key to exit ...

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

if you want more repeatable results, let the test run a little longer   :P
QuoteIntel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
----------------------------------------------------------------------------
8,338   cycles for NumFormatX - IDIV / Stack
3,616   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,319   cycles for NumFormatF-I - IDIV / Stack / Table
1,296   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
8,598   cycles for NumFormatX - IDIV / Stack
3,499   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,281   cycles for NumFormatF-I - IDIV / Stack / Table
1,297   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
9,171   cycles for NumFormatX - IDIV / Stack
3,497   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,313   cycles for NumFormatF-I - IDIV / Stack / Table
1,297   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table
----------------------------------------------------------------------------
8,566   cycles for NumFormatX - IDIV / Stack
3,479   cycles for NumFormatX2 - Reciprocal IMUL / Stack

3,295   cycles for NumFormatF-I - IDIV / Stack / Table
1,296   cycles for NumFormatF-II - Reciprocal IMUL / Stack / Table