Author Topic: The fastest way to fill a dword array with string values  (Read 61076 times)

frktons

  • Member
  • ***
  • Posts: 491
Re: The fastest way to fill a dword array with string values
« Reply #75 on: December 12, 2012, 11:18:46 PM »
Frktons I Step / 2 GPRs and Frktons I Step / 4 GPRs are remarkably fast but you should have a look at the output.

I didn't check and it is possible that some instructions are not correct,
I was first testing for speed, and now I'm going to check for size optimization
and correctness of code. A few days more, I'm quite slow indeed.

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: The fastest way to fill a dword array with string values
« Reply #76 on: December 12, 2012, 11:25:42 PM »
lol
it has to work first, otherwise you are comparing apples with oranges in the timing tests   :t

frktons

  • Member
  • ***
  • Posts: 491
Re: The fastest way to fill a dword array with string values
« Reply #77 on: December 13, 2012, 05:34:04 AM »
lol
it has to work first, otherwise you are comparing apples with oranges in the timing tests   :t

Here we are, tested and posted. The performances don't change,
there were some typing errors and adding one instead of two somewhere.
These errors didn't impact on performance, but on results they did.  :P

I included a PROC to display the content of the filled array, on demand.  :t

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: The fastest way to fill a dword array with string values
« Reply #78 on: December 13, 2012, 06:50:59 AM »
prescott w/htt
------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
------------------------------------------------------------------------
2246    cycles for Dedndave code - 5 GPRs
1964    cycles for Frktons I Step / 2 GPRs
1999    cycles for Frktons I Step / 4 GPRs
 Frktons II Step requires a PC with SSSE3
2451    cycles for Frktons II Step / 5 MMX without SSSE3
1122    cycles for Frktons III Step / XMM/MMX with SSE2
1167    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2240    cycles for Dedndave code - 5 GPRs
1939    cycles for Frktons I Step / 2 GPRs
2015    cycles for Frktons I Step / 4 GPRs
 Frktons II Step requires a PC with SSSE3
2588    cycles for Frktons II Step / 5 MMX without SSSE3
1115    cycles for Frktons III Step / XMM/MMX with SSE2
1167    cycles for Jochen / 5 XMM
------------------------------------------------------------------------

frktons

  • Member
  • ***
  • Posts: 491
Re: The fastest way to fill a dword array with string values
« Reply #79 on: December 13, 2012, 07:10:26 AM »
Two out of three goals are accomplished, now the last, but not least,
optimization: code size and some small improvement, if needed.  ;)

jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: The fastest way to fill a dword array with string values
« Reply #80 on: December 13, 2012, 11:46:51 AM »
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz

Instructions: MMX, SSE1, SSE2, SSE3
--------------------------------------------------------
1740    cycles for Dedndave code - 5 GPRs
1348    cycles for Frktons I Step / 2 GPRs
1262    cycles for Frktons I Step / 4 GPRs
 Frktons II Step requires a PC with SSSE3
943     cycles for Frktons II Step / 5 MMX without SSSE3
941     cycles for Frktons III Step / XMM/MMX with SSE2
728     cycles for Jochen / 5 XMM
--------------------------------------------------------
1742    cycles for Dedndave code - 5 GPRs
1350    cycles for Frktons I Step / 2 GPRs
1262    cycles for Frktons I Step / 4 GPRs
 Frktons II Step requires a PC with SSSE3
944     cycles for Frktons II Step / 5 MMX without SSSE3
937     cycles for Frktons III Step / XMM/MMX with SSE2
728     cycles for Jochen / 5 XMM

frktons

  • Member
  • ***
  • Posts: 491
Re: The fastest way to fill a dword array with string values
« Reply #81 on: December 14, 2012, 09:58:46 AM »
On my home PC the III step is a bit faster than Jochen's code,
and in my office PC it is even faster.
Quote
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
-------------------------------------------------------------
813     cycles for Frktons III Step / XMM/MMX with SSE2
835     cycles for Jochen / 5 XMM

Quote
Intel(R) Pentium(R) CPU G6950  @ 2.80GHz
-------------------------------------------------------------
...
405     cycles for Frktons III Step / XMM/MMX with SSE2
644     cycles for Jochen / 5 XMM

I'm actually studing a faster/smaller solution because, as Jochen said:


P.S.: If you don't agree with the suffix "FINAL", write a faster algo :bgrin:
I don't agree with the suffix, I'm not at the FINAL stage so far.   :lol:

jj2007

  • Member
  • *****
  • Posts: 11177
  • Assembler is fun ;-)
    • MasmBasic
Re: The fastest way to fill a dword array with string values
« Reply #82 on: December 14, 2012, 01:46:03 PM »
On my home PC the III step is a bit faster than Jochen's code,
and in my office PC it is even faster.
...
I'm actually studing a faster/smaller solution because, as Jochen said:


P.S.: If you don't agree with the suffix "FINAL", write a faster algo :bgrin:
I don't agree with the suffix, I'm not at the FINAL stage so far.   :lol:

So the incentive worked :greensml: :t

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: The fastest way to fill a dword array with string values
« Reply #83 on: December 14, 2012, 02:59:18 PM »

frktons

  • Member
  • ***
  • Posts: 491
Re: The fastest way to fill a dword array with string values
« Reply #84 on: December 15, 2012, 04:35:30 AM »
This could be my final test, if you don't have any suggestion
to enhance the performance of the last code:

Quote
------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
------------------------------------------------------------------------
1915    cycles for Dedndave code - 5 GPRs
1890    cycles for Frktons I Step / 2 GPRs
1964    cycles for Frktons I Step / 4 GPRs with LEA
1114    cycles for Frktons II Step / 5 MMX with SSSE3
1199    cycles for Frktons II Step / 5 MMX without SSSE3
811     cycles for Frktons III Step / XMM/MMX with SSE2
630     cycles for Frktons III Step / XMM with SSE2 - enhanced
706     cycles for Jochen / 5 XMM
------------------------------------------------------------------------
1915    cycles for Dedndave code - 5 GPRs
1896    cycles for Frktons I Step / 2 GPRs
1978    cycles for Frktons I Step / 4 GPRs with LEA
1110    cycles for Frktons II Step / 5 MMX with SSSE3
1199    cycles for Frktons II Step / 5 MMX without SSSE3
813     cycles for Frktons III Step / XMM/MMX with SSE2
628     cycles for Frktons III Step / XMM with SSE2 - enhanced
704     cycles for Jochen / 5 XMM
------------------------------------------------------------------------

--- ok ---


dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: The fastest way to fill a dword array with string values
« Reply #85 on: December 15, 2012, 04:56:58 AM »
prescott w/htt
Quote
------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
------------------------------------------------------------------------
2267    cycles for Dedndave code - 5 GPRs
1957    cycles for Frktons I Step / 2 GPRs
2035    cycles for Frktons I Step / 4 GPRs with LEA
 Frktons II Step requires a PC with SSSE3
2459    cycles for Frktons II Step / 5 MMX without SSSE3
1126    cycles for Frktons III Step / XMM/MMX with SSE2
1197    cycles for Frktons III Step / XMM with SSE2 - enhanced
1159    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2282    cycles for Dedndave code - 5 GPRs
1967    cycles for Frktons I Step / 2 GPRs
2031    cycles for Frktons I Step / 4 GPRs with LEA
 Frktons II Step requires a PC with SSSE3
2483    cycles for Frktons II Step / 5 MMX without SSSE3
1126    cycles for Frktons III Step / XMM/MMX with SSE2
1185    cycles for Frktons III Step / XMM with SSE2 - enhanced
1158    cycles for Jochen / 5 XMM
------------------------------------------------------------------------

six_L

  • Member
  • **
  • Posts: 201
Re: The fastest way to fill a dword array with string values
« Reply #86 on: December 15, 2012, 05:02:10 AM »
Quote
------------------------------------------------------------------------
Intel(R) Core(TM) i3 CPU       M 370  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2
------------------------------------------------------------------------
1335   cycles for Dedndave code - 5 GPRs
1855   cycles for Frktons I Step / 2 GPRs
1009   cycles for Frktons I Step / 4 GPRs with LEA
587   cycles for Frktons II Step / 5 MMX with SSSE3
643   cycles for Frktons II Step / 5 MMX without SSSE3
389   cycles for Frktons III Step / XMM/MMX with SSE2
344   cycles for Frktons III Step / XMM with SSE2 - enhanced
460   cycles for Jochen / 5 XMM
------------------------------------------------------------------------
1278   cycles for Dedndave code - 5 GPRs
1025   cycles for Frktons I Step / 2 GPRs
1015   cycles for Frktons I Step / 4 GPRs with LEA
585   cycles for Frktons II Step / 5 MMX with SSSE3
633   cycles for Frktons II Step / 5 MMX without SSSE3
400   cycles for Frktons III Step / XMM/MMX with SSE2
345   cycles for Frktons III Step / XMM with SSE2 - enhanced
437   cycles for Jochen / 5 XMM
------------------------------------------------------------------------

--- ok ---

Say you, Say me, Say the codes together for ever.

frktons

  • Member
  • ***
  • Posts: 491
Re: The fastest way to fill a dword array with string values
« Reply #87 on: December 15, 2012, 05:04:18 AM »
As usual, different CPUs = different performances  :P

Considering I'm targeting Core Duo upwards, I can be satisfied
that I reached less than 0.7 cycles for each dword string.  :biggrin:

Farabi

  • Member
  • ****
  • Posts: 969
  • Neuroscience Fans
Re: The fastest way to fill a dword array with string values
« Reply #88 on: December 23, 2012, 09:24:58 AM »
ahhh nice algo

http://farabidatacenter.url.ph/MySoftware/
My 3D Game Engine Demo.

Contact me at Whatsapp: 6283818314165

frktons

  • Member
  • ***
  • Posts: 491
Re: The fastest way to fill a dword array with string values
« Reply #89 on: December 23, 2012, 12:06:14 PM »
ahhh nice algo

Thanks my friend. The application of unrolling and SSE2 code
produce this fast algo. :t