The MASM Forum

General => The Campus => Topic started by: RuiLoureiro on May 29, 2014, 06:15:48 AM

Title: Sorting strings
Post by: RuiLoureiro on May 29, 2014, 06:15:48 AM
 :biggrin:
Hi
    Does anyone know a procedure/macro to generate random strings ?
    Do you want to write it ?
      :t


    EDIT: i wrote something about it.       
Title: Re: Sorting strings
Post by: jj2007 on May 29, 2014, 07:02:52 AM
Something like this?

include \masm32\MasmBasic\MasmBasic.inc      ; download (http://masm32.com/board/index.php?topic=94.0)
  Init
  LenMin=5
  LenMax=50
  xor esi, esi
  Dim My$()
  Let edi=New$(LenMax+1)
  .Repeat
      push edi
      add Rand(LenMax-LenMin), LenMin
      push eax
      xor ecx, ecx
      .Repeat
            add Rand(26), "a"
            stosb
            inc ecx
      .Until ecx>=stack
      pop eax
      xor eax, eax
      stosb
      pop edi
      Let My$(esi)=edi
      inc esi
  .Until esi>=1000
  Store "Random.txt", My$()
  Inkey Str$("%i strings generated", eax)
  Exit
end start
Title: Re: Sorting strings
Post by: KeepingRealBusy on May 29, 2014, 07:25:49 AM
It is more fun if you get a spelling dictionary, and randomly select words and concatenate them to the desired random string length then terminate the line.

Dave.
Title: Re: Sorting strings
Post by: dedndave on May 29, 2014, 07:33:21 AM
my concern is more toward the test you devise   :t

if you generate a set of random strings for a test
it's only fair if you use the same set on each routine tested
rather than generating a different set for each test run
Title: Re: Sorting strings
Post by: KeepingRealBusy on May 29, 2014, 07:36:28 AM
Dave,

Agreed! Write the file with the random data, then test with different programs.

Dave.
Title: Re: Sorting strings
Post by: RuiLoureiro on May 29, 2014, 06:32:46 PM
 :biggrin:
Thanks all,
                  i started to think about this yesterday but now i have some new ideas.
Title: Re: Sorting strings
Post by: sinsi on May 29, 2014, 07:11:35 PM
Random strings? Just wait for some
"kitchen"
"k.i.t.c.h.e.n"
"k i t c h e n"
"kit chen" spam :biggrin:
That handbag one the other day was almost comprehensible.

Back on topic, do you mean random letters or words?
Title: Re: Sorting strings
Post by: jj2007 on May 29, 2014, 09:46:09 PM
Quote from: dedndave on May 29, 2014, 07:33:21 AM
it's only fair if you use the same set on each routine tested

With most pseudo-random number generators including Rand() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1030), you get the same set anyway - unless you explicitly set a new seed. So that shouldn't be a problem.
Title: Re: Sorting strings
Post by: dedndave on May 29, 2014, 09:50:42 PM
you can see how one set of random strings might yield very different results from another
especially seeing as you have been working on sort routines, lately   :P
Title: Re: Sorting strings
Post by: hutch-- on May 30, 2014, 12:36:38 AM
sinsi,

I am glad you approve of the method of encouraging our spamming friends to post else where.  :P
Title: Re: Sorting strings
Post by: RuiLoureiro on May 30, 2014, 03:30:08 AM
Quote from: sinsi on May 29, 2014, 07:11:35 PM
Random strings? Just wait for some
" your link does not work IDIOT "
"/\/\/\/\ BANCI /\/\/\/\"
"/ / / This was posted by an IDIOT / / /"
"kit chen" spam :biggrin:
That handbag one the other day was almost comprehensible.

Back on topic, do you mean random letters or words?

:biggrin: :biggrin:
No, like these:
"two things are infinite:"
"universe and the stupidity."
"but about the first i am not sure."
"Albert Einstein"

on topic:
             may be sets of random letters/numbers
         
Title: Re: Sorting strings
Post by: RuiLoureiro on June 01, 2014, 12:02:33 AM
Quote from: dedndave on May 29, 2014, 07:33:21 AM
my concern is more toward the test you devise   :t

if you generate a set of random strings for a test
it's only fair if you use the same set on each routine tested
rather than generating a different set for each test run
Hi Dave,
              Yes it must be  :t
Title: Re: Sorting strings
Post by: RuiLoureiro on June 10, 2014, 03:00:44 AM
Hi
        2 procedures to compare 2 "strings
        with spaces":       
                              CompareStringXYS    <- compare DWORDS
                              CompareStringXYBS   <- compare BYTES

note that: "abcdef"  is EQUAL to "a b c d e      F"
                  For example, many times, we have product names which name
                  is nearly EQUAL. And they have spaces between words.

        Could you post your results ?
        Thanks

Jochen,
              add some other procedures that you know
              and test it.
              Thanks  :t

Strings used:

Quote
_string01X       db "a bcd efg hij klm nop",0
_string01Y       db "ab cdefg hijk lmnopA",0    <-- equal X+ 'A'

_string02X       db "abc de fghijkl mn op",0
_string02Y       db "abc defg hi jk lm nop",0   <--- equal X

_string03X       db "abc de fghijkl mn op A",0  <--- equal Y+ 'A'
_string03Y       db "abc defg hi jk lm nop",0

Results:
Quote
STRINGS:
a bcd efg hij klm nop
ab cdefg hijk lmnopA

abc de fghijkl mn op
abc defg hi jk lm nop

abc de fghijkl mn op A
abc defg hi jk lm nop

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY
X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY
132 milliseconds, CompareStringXYS, _string01X, _string01Y
121 milliseconds, CompareStringXYS, _string02X, _string02Y
137 milliseconds, CompareStringXYS, _string03X, _string03Y
158 milliseconds, CompareStringXYBS, _string01X, _string01Y
152 milliseconds, CompareStringXYBS, _string02X, _string02Y
159 milliseconds, CompareStringXYBS, _string03X, _string03Y
*** Press any key to get the time table ***

***** Time table *****
121 milliseconds, CompareStringXYS -_string02X EQUAL _string02Y-16 bytes
132 milliseconds, CompareStringXYS -_string01X LESS _string01Y-16 bytes
137 milliseconds, CompareStringXYS -_string03X GREATER _string03Y-16 bytes
152 milliseconds, CompareStringXYBS -_string02X EQUAL _string02Y-16 bytes
158 milliseconds, CompareStringXYBS -_string01X LESS _string01Y-16 bytes
159 milliseconds, CompareStringXYBS -_string03X GREATER _string03Y-16 bytes
********** END 2 **********
Title: Re: Sorting strings
Post by: jj2007 on June 10, 2014, 04:04:15 AM
Quote from: RuiLoureiro on June 10, 2014, 03:00:44 AM
        2 procedures to compare 2 "strings with spaces":

Hi Rui,
What would be a "real life" application of a sort routine that ignores spaces between words?
Title: Re: Sorting strings
Post by: RuiLoureiro on June 10, 2014, 04:08:46 AM
Quote from: jj2007 on June 10, 2014, 04:04:15 AM
Quote from: RuiLoureiro on June 10, 2014, 03:00:44 AM
        2 procedures to compare 2 "strings with spaces":

Hi Rui,
What would be a "real life" application of a sort routine that ignores spaces between words?
for students and it is just "real life" also ! :biggrin:
                 but not only Jochen ! If the input procedure clean spaces ...
                 For instance, the date field. We need to test if it is correct
                 and we remove spaces in that time: "2088- 02- 30" just
                 your birth day ! :P 
                 Sorry if iam wrong  :biggrin:
             
Title: Re: Sorting strings
Post by: dedndave on June 10, 2014, 07:41:41 AM
prescott w/htt XP SP3
STRINGS:
a bcd efg hij klm nop
ab cdefg hijk lmnopA

abc de fghijkl mn op
abc defg hi jk lm nop

abc de fghijkl mn op A
abc defg hi jk lm nop

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY
X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY
145 milliseconds, CompareStringXYS, _string01X, _string01Y
138 milliseconds, CompareStringXYS, _string02X, _string02Y
142 milliseconds, CompareStringXYS, _string03X, _string03Y
171 milliseconds, CompareStringXYBS, _string01X, _string01Y
170 milliseconds, CompareStringXYBS, _string02X, _string02Y
179 milliseconds, CompareStringXYBS, _string03X, _string03Y
*** Press any key to get the time table ***

***** Time table *****

138 milliseconds, CompareStringXYS -_string02X EQUAL _string02Y-16 bytes
142 milliseconds, CompareStringXYS -_string03X GREATER _string03Y-16 bytes
145 milliseconds, CompareStringXYS -_string01X LESS _string01Y-16 bytes
170 milliseconds, CompareStringXYBS -_string02X EQUAL _string02Y-16 bytes
171 milliseconds, CompareStringXYBS -_string01X LESS _string01Y-16 bytes
179 milliseconds, CompareStringXYBS -_string03X GREATER _string03Y-16 bytes
********** END 2 **********
Title: Re: Sorting strings
Post by: RuiLoureiro on June 12, 2014, 04:04:08 AM
Dave,
      9 downloads and only your answer
      which seems to mean that they
      dont want to help but the code(?)
      Ok thank you.  :t

Last results:

      CompareStringXYS  -> remove spaces, convert to uppercase
                           and copy to stack;
                           load dwords, compare.
      CompareStringXYBS -> remove spaces, convert to uppercase
                           and copy to stack;
                           load bytes, compare.
      CompareStringXYT  -> load byte, remove space,
                           convert to uppercase
                           compare.

      Have you anyone to compare ?
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  95 milliseconds, CompareStringXYT - string01X LESS     string01Y
111 milliseconds, CompareStringXYT - string02X EQUAL    string02Y
114 milliseconds, CompareStringXYS - string02X EQUAL    string02Y
121 milliseconds, CompareStringXYT - string03X GREATER  string03Y
124 milliseconds, CompareStringXYS - string01X LESS     string01Y
129 milliseconds, CompareStringXYS - string03X GREATER  string03Y
175 milliseconds, CompareStringXYBS- string01X LESS    string01Y
180 milliseconds, CompareStringXYBS- string03X GREATER string03Y
189 milliseconds, CompareStringXYBS- string02X EQUAL   string02Y
********** END 2 **********
Title: Re: Sorting strings
Post by: FORTRANS on June 12, 2014, 11:28:59 PM
Hi,

Quote from: RuiLoureiro on June 12, 2014, 04:04:08 AM
Dave,
      9 downloads and only your answer
      which seems to mean that they
      dont want to help but the code(?)

   Tried the code from Reply #12 on two systems, and it does
not seem to complete.  P-III, Win2x, and Pentium M, Win XP.


STRINGS:
a bcd efg hij klm nop
ab cdefg hijk lmnopA

abc de fghijkl mn op
abc defg hi jk lm nop

abc de fghijkl mn op A
abc defg hi jk lm nop

X is less than Y
ShowResultXY


Regards,

Steve N.
Title: Re: Sorting strings
Post by: dedndave on June 13, 2014, 02:18:20 AM
you have to hit Enter (or any key) a few times
Title: Re: Sorting strings
Post by: FORTRANS on June 13, 2014, 05:09:47 AM
Quote from: dedndave on June 13, 2014, 02:18:20 AM
you have to hit Enter (or any key) a few times

Hi Dave,

   Oh.  Duh.  In that case.


{P-III}

STRINGS:
a bcd efg hij klm nop
ab cdefg hijk lmnopA

abc de fghijkl mn op
abc defg hi jk lm nop

abc de fghijkl mn op A
abc defg hi jk lm nop

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY
X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY
515 milliseconds, CompareStringXYS, _string01X, _string01Y
587 milliseconds, CompareStringXYS, _string02X, _string02Y
632 milliseconds, CompareStringXYS, _string03X, _string03Y
569 milliseconds, CompareStringXYBS, _string01X, _string01Y
645 milliseconds, CompareStringXYBS, _string02X, _string02Y
686 milliseconds, CompareStringXYBS, _string03X, _string03Y
*** Press any key to get the time table ***

***** Time table *****

515 milliseconds, CompareStringXYS -_string01X LESS _string01Y-16 bytes
569 milliseconds, CompareStringXYBS -_string01X LESS _string01Y-16 bytes
587 milliseconds, CompareStringXYS -_string02X EQUAL _string02Y-16 bytes
632 milliseconds, CompareStringXYS -_string03X GREATER _string03Y-16 bytes
645 milliseconds, CompareStringXYBS -_string02X EQUAL _string02Y-16 bytes
686 milliseconds, CompareStringXYBS -_string03X GREATER _string03Y-16 bytes
********** END 2 **********

{Pentium M}

STRINGS:
a bcd efg hij klm nop
ab cdefg hijk lmnopA

abc de fghijkl mn op
abc defg hi jk lm nop

abc de fghijkl mn op A
abc defg hi jk lm nop

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY
X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY
433 milliseconds, CompareStringXYS, _string01X, _string01Y
203 milliseconds, CompareStringXYS, _string02X, _string02Y
236 milliseconds, CompareStringXYS, _string03X, _string03Y
252 milliseconds, CompareStringXYBS, _string01X, _string01Y
229 milliseconds, CompareStringXYBS, _string02X, _string02Y
245 milliseconds, CompareStringXYBS, _string03X, _string03Y
*** Press any key to get the time table ***

***** Time table *****

203 milliseconds, CompareStringXYS -_string02X EQUAL _string02Y-16 bytes
229 milliseconds, CompareStringXYBS -_string02X EQUAL _string02Y-16 bytes
236 milliseconds, CompareStringXYS -_string03X GREATER _string03Y-16 bytes
245 milliseconds, CompareStringXYBS -_string03X GREATER _string03Y-16 bytes
252 milliseconds, CompareStringXYBS -_string01X LESS _string01Y-16 bytes
433 milliseconds, CompareStringXYS -_string01X LESS _string01Y-16 bytes
********** END 2 **********


HTH,

Steve N.
Title: Re: Sorting strings
Post by: RuiLoureiro on June 14, 2014, 12:37:27 AM
Thanks Dave
     and FORTRANS  :t

now i have this:


X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

126 milliseconds, CompareStringXYS, _string01X, _string01Y
118 milliseconds, CompareStringXYS, _string02X, _string02Y
123 milliseconds, CompareStringXYS, _string03X, _string03Y
104 milliseconds, CompareStringXYT, _string01X, _string01Y
105 milliseconds, CompareStringXYT, _string02X, _string02Y
117 milliseconds, CompareStringXYT, _string03X, _string03Y
152 milliseconds, CompareStringXYBS, _string01X, _string01Y
167 milliseconds, CompareStringXYBS, _string02X, _string02Y
176 milliseconds, CompareStringXYBS, _string03X, _string03Y
91 milliseconds, StringCmpC, _string01X, _string01Y
94 milliseconds, StringCmpC, _string02X, _string02Y
95 milliseconds, StringCmpC, _string03X, _string03Y
78 milliseconds, StringCmpD, _string01X, _string01Y
70 milliseconds, StringCmpD, _string02X, _string02Y
69 milliseconds, StringCmpD, _string03X, _string03Y
106 milliseconds, StringCmpE, _string01X, _string01Y
84 milliseconds, StringCmpE, _string02X, _string02Y
97 milliseconds, StringCmpE, _string03X, _string03Y
79 milliseconds, StringCmpF, _string01X, _string01Y
86 milliseconds, StringCmpF, _string02X, _string02Y
64 milliseconds, StringCmpF, _string03X, _string03Y
*** Press any key to get the time table ***

Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

64 milliseconds, StringCmpF -_string03X GREATER _string03Y-16 bytes
69 milliseconds, StringCmpD -_string03X GREATER _string03Y-16 bytes
70 milliseconds, StringCmpD -_string02X EQUAL _string02Y-16 bytes
78 milliseconds, StringCmpD -_string01X LESS _string01Y-16 bytes
79 milliseconds, StringCmpF -_string01X LESS _string01Y-16 bytes
84 milliseconds, StringCmpE -_string02X EQUAL _string02Y-16 bytes
86 milliseconds, StringCmpF -_string02X EQUAL _string02Y-16 bytes
;--------------------------------------------------------------------
91 milliseconds, StringCmpC -_string01X LESS _string01Y-16 bytes
94 milliseconds, StringCmpC -_string02X EQUAL _string02Y-16 bytes
95 milliseconds, StringCmpC -_string03X GREATER _string03Y-16 bytes
97 milliseconds, StringCmpE -_string03X GREATER _string03Y-16 bytes
104 milliseconds, CompareStringXYT -_string01X LESS _string01Y-16 bytes
105 milliseconds, CompareStringXYT -_string02X EQUAL _string02Y-16 bytes
106 milliseconds, StringCmpE -_string01X LESS _string01Y-16 bytes
117 milliseconds, CompareStringXYT -_string03X GREATER _string03Y-16 bytes
118 milliseconds, CompareStringXYS -_string02X EQUAL _string02Y-16 bytes
123 milliseconds, CompareStringXYS -_string03X GREATER _string03Y-16 bytes
126 milliseconds, CompareStringXYS -_string01X LESS _string01Y-16 bytes
152 milliseconds, CompareStringXYBS -_string01X LESS _string01Y-16 bytes
167 milliseconds, CompareStringXYBS -_string02X EQUAL _string02Y-16 bytes
176 milliseconds, CompareStringXYBS -_string03X GREATER _string03Y-16 bytes
********** END 2 **********

FOR strings WITHOUT spaces:
note: A_stricmp is an optimized version
written by an expert.

STRINGS:
_string01X       db "abCdefghijklmnop",0
_string01Y       db "aBcdEFGHIJKLMNOPA",0

_string02X       db "aBcefghijklmnop",0
_string02Y       db "abCEFGHIJKLMNOP",0

_string03X       db "abcefghijklmnopA",0
_string03Y       db "abcEFGHIJKLMNOP",0


X is less than Y
ShowResultAB
X is EQUAL Y
ShowResultAB
X is greater than Y
ShowResultAB

X is less than Y
ShowResultAB
X is EQUAL Y
ShowResultAB
X is greater than Y
ShowResultAB

X is less than Y
ShowResultAB
X is EQUAL Y
ShowResultAB
X is greater than Y
ShowResultAB

84 milliseconds, A_stricmp, _string01X, _string01Y
71 milliseconds, A_stricmp, _string02X, _string02Y
73 milliseconds, A_stricmp, _string03X, _string03Y
68 milliseconds, B_stricmp, _string01X, _string01Y
63 milliseconds, B_stricmp, _string02X, _string02Y
65 milliseconds, B_stricmp, _string03X, _string03Y
32 milliseconds, C_stricmp, _string01X, _string01Y
29 milliseconds, C_stricmp, _string02X, _string02Y
32 milliseconds, C_stricmp, _string03X, _string03Y
*** Press any key to get the time table ***


Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

29 milliseconds, C_stricmp -_string02X EQUAL   _string02Y-16 bytes
32 milliseconds, C_stricmp -_string03X GREATER _string03Y-16 bytes
32 milliseconds, C_stricmp -_string01X LESS    _string01Y-16 bytes
63 milliseconds, B_stricmp -_string02X EQUAL   _string02Y-16 bytes
65 milliseconds, B_stricmp -_string03X GREATER _string03Y-16 bytes
68 milliseconds, B_stricmp -_string01X LESS    _string01Y-16 bytes
71 milliseconds, A_stricmp -_string02X EQUAL   _string02Y-16 bytes
73 milliseconds, A_stricmp -_string03X GREATER _string03Y-16 bytes
84 milliseconds, A_stricmp -_string01X LESS    _string01Y-16 bytes
********** END 2 **********

more...
Quote
-------------------------------------------------------------
Intel(R) Atom(TM) CPU N455   @ 1.66GHz (SSE4)
-------------------------------------------------------------
***** Time table *****

79 milliseconds, C_stricmp -_string01X LESS    _string01Y-16 bytes
79 milliseconds, B_stricmp -_string01X LESS    _string01Y-16 bytes
86 milliseconds, B_stricmp -_string02X EQUAL   _string02Y-16 bytes
86 milliseconds, C_stricmp -_string02X EQUAL   _string02Y-16 bytes
89 milliseconds, B_stricmp -_string03X GREATER _string03Y-16 bytes
92 milliseconds, C_stricmp -_string03X GREATER _string03Y-16 bytes
142 milliseconds, A_stricmp -_string03X GREATER _string03Y-16 bytes
149 milliseconds, A_stricmp -_string02X EQUAL   _string02Y-16 bytes
152 milliseconds, A_stricmp -_string01X LESS    _string01Y-16 bytes
********** END 2 **********
Title: Re: Sorting strings
Post by: dedndave on June 14, 2014, 02:00:21 AM
i just realized - you and i have the same processor  :P
Title: Re: Sorting strings
Post by: RuiLoureiro on June 14, 2014, 03:42:37 AM
 Dave,
              i posted another
Title: Re: Sorting strings
Post by: dedndave on June 14, 2014, 04:00:31 AM
my processor runs at 3.0 GHz
otherwise, my results will be the same as yours   :biggrin:
Title: Re: Sorting strings
Post by: RuiLoureiro on June 16, 2014, 03:06:17 AM
 :biggrin:
Hi
    This is the latest
    Could you run it ?
    Thank you !  :t

note: any explanation about it costs $250 per hour (very cheap!) :bgrin:

string lowercase= !#$%&/().=<>;:@  a bcd efg hij klm nopqrstuvxyz [\]^_{|}~
string uppercase - string01Z= !#$%&/().=<>;:@  A BCD EFG HIJ KLM NOPQRSTUVXYZ [\]^_{|}~
********* PRESS A KEY ... *********
string uppercase - string01Z =A BCD EFG HIJ KLM NOP
string uppercase - string01W=AB CDEFG HIJK LMN OPA

NOTE: 16 bytes are EQUAL, the last NOT (lowercase),
           they compare  in uppercase.

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

176 milliseconds, StringCmpC, string01X, string01Y
173 milliseconds, StringCmpC, string02X, string02Y
101 milliseconds, StringCmpC, string03X, string03Y
93 milliseconds, StringCmpD, string01X, string01Y
116 milliseconds, StringCmpD, string02X, string02Y
85 milliseconds, StringCmpD, string03X, string03Y
76 milliseconds, StringCmpE, string01X, string01Y
90 milliseconds, StringCmpE, string02X, string02Y
57 milliseconds, StringCmpE, string03X, string03Y
81 milliseconds, StringCmpF, string01X, string01Y
82 milliseconds, StringCmpF, string02X, string02Y
62 milliseconds, StringCmpF, string03X, string03Y
89 milliseconds, StringCmpDT, string01X, string01Y
100 milliseconds, StringCmpDT, string02X, string02Y
85 milliseconds, StringCmpDT, string03X, string03Y
77 milliseconds, StringCmpET, string01X, string01Y
96 milliseconds, StringCmpET, string02X, string02Y
54 milliseconds, StringCmpET, string03X, string03Y
92 milliseconds, StringCmpFT, string01X, string01Y
116 milliseconds, StringCmpFT, string02X, string02Y
144 milliseconds, StringCmpFT, string03X, string03Y
116 milliseconds, StringCmpEB, string01X, string01Y
100 milliseconds, StringCmpEB, string02X, string02Y
80 milliseconds, StringCmpEB, string03X, string03Y
82 milliseconds, StringCmpGA, string01X, string01Y
103 milliseconds, StringCmpGA, string02X, string02Y
58 milliseconds, StringCmpGA, string03X, string03Y
71 milliseconds, StringCmpGB, string01X, string01Y
88 milliseconds, StringCmpGB, string02X, string02Y
77 milliseconds, StringCmpGB, string03X, string03Y
70 milliseconds, StringCmpGC, string01X, string01Y
86 milliseconds, StringCmpGC, string02X, string02Y
67 milliseconds, StringCmpGC, string03X, string03Y
*** Press any key to get the time table ***


Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

54 milliseconds, StringCmpET -string03X GREATER string03Y-16 bytes
57 milliseconds, StringCmpE -string03X GREATER string03Y-16 bytes
58 milliseconds, StringCmpGA -string03X GREATER string03Y-16 bytes
62 milliseconds, StringCmpF -string03X GREATER string03Y-16 bytes
67 milliseconds, StringCmpGC -string03X GREATER string03Y-16 bytes
70 milliseconds, StringCmpGC -string01X LESS    string01Y-16 bytes
71 milliseconds, StringCmpGB -string01X LESS    string01Y-16 bytes
76 milliseconds, StringCmpE -string01X LESS    string01Y-16 bytes
77 milliseconds, StringCmpET -string01X LESS    string01Y-16 bytes
77 milliseconds, StringCmpGB -string03X GREATER string03Y-16 bytes
80 milliseconds, StringCmpEB -string03X GREATER string03Y-16 bytes
81 milliseconds, StringCmpF -string01X LESS    string01Y-16 bytes
82 milliseconds, StringCmpF -string02X EQUAL   string02Y-16 bytes
82 milliseconds, StringCmpGA -string01X LESS    string01Y-16 bytes
85 milliseconds, StringCmpDT -string03X GREATER string03Y-16 bytes
85 milliseconds, StringCmpD -string03X GREATER string03Y-16 bytes
86 milliseconds, StringCmpGC -string02X EQUAL   string02Y-16 bytes
88 milliseconds, StringCmpGB -string02X EQUAL   string02Y-16 bytes
89 milliseconds, StringCmpDT -string01X LESS    string01Y-16 bytes
90 milliseconds, StringCmpE -string02X EQUAL   string02Y-16 bytes
92 milliseconds, StringCmpFT -string01X LESS    string01Y-16 bytes
93 milliseconds, StringCmpD -string01X LESS    string01Y-16 bytes
96 milliseconds, StringCmpET -string02X EQUAL   string02Y-16 bytes
100 milliseconds, StringCmpDT -string02X EQUAL   string02Y-16 bytes
100 milliseconds, StringCmpEB -string02X EQUAL   string02Y-16 bytes
101 milliseconds, StringCmpC -string03X GREATER string03Y-16 bytes
103 milliseconds, StringCmpGA -string02X EQUAL   string02Y-16 bytes
116 milliseconds, StringCmpD -string02X EQUAL   string02Y-16 bytes
116 milliseconds, StringCmpEB -string01X LESS    string01Y-16 bytes
116 milliseconds, StringCmpFT -string02X EQUAL   string02Y-16 bytes
144 milliseconds, StringCmpFT -string03X GREATER string03Y-16 bytes
173 milliseconds, StringCmpC -string02X EQUAL   string02Y-16 bytes
176 milliseconds, StringCmpC -string01X LESS    string01Y-16 bytes
********** END 2 **********
Title: Re: Sorting strings
Post by: Gunther on June 16, 2014, 04:27:04 AM
Hi Rui,


string lowercase= !#$%&/().=<>;:@  a bcd efg hij klm nopqrstuvxyz [\]^_{|}~
string uppercase - string01Z= !#$%&/().=<>;:@  A BCD EFG HIJ KLM NOPQRSTUVXYZ [
\]^_{|}~
********* PRESS A KEY ... *********
string uppercase - string01Z=A BCD EFG HIJ KLM NOP
string uppercase - string01W=AB CDEFG HIJK LMNOPA

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

33 milliseconds, StringCmpC, string01X, string01Y
20 milliseconds, StringCmpC, string02X, string02Y
21 milliseconds, StringCmpC, string03X, string03Y
19 milliseconds, StringCmpD, string01X, string01Y
21 milliseconds, StringCmpD, string02X, string02Y
20 milliseconds, StringCmpD, string03X, string03Y
19 milliseconds, StringCmpE, string01X, string01Y
19 milliseconds, StringCmpE, string02X, string02Y
20 milliseconds, StringCmpE, string03X, string03Y
20 milliseconds, StringCmpF, string01X, string01Y
20 milliseconds, StringCmpF, string02X, string02Y
21 milliseconds, StringCmpF, string03X, string03Y
19 milliseconds, StringCmpDT, string01X, string01Y
20 milliseconds, StringCmpDT, string02X, string02Y
21 milliseconds, StringCmpDT, string03X, string03Y
19 milliseconds, StringCmpET, string01X, string01Y
20 milliseconds, StringCmpET, string02X, string02Y
21 milliseconds, StringCmpET, string03X, string03Y
20 milliseconds, StringCmpFT, string01X, string01Y
20 milliseconds, StringCmpFT, string02X, string02Y
22 milliseconds, StringCmpFT, string03X, string03Y
21 milliseconds, StringCmpEB, string01X, string01Y
21 milliseconds, StringCmpEB, string02X, string02Y
22 milliseconds, StringCmpEB, string03X, string03Y
19 milliseconds, StringCmpGA, string01X, string01Y
19 milliseconds, StringCmpGA, string02X, string02Y
21 milliseconds, StringCmpGA, string03X, string03Y
19 milliseconds, StringCmpGB, string01X, string01Y
19 milliseconds, StringCmpGB, string02X, string02Y
20 milliseconds, StringCmpGB, string03X, string03Y
19 milliseconds, StringCmpGC, string01X, string01Y
20 milliseconds, StringCmpGC, string02X, string02Y
21 milliseconds, StringCmpGC, string03X, string03Y
*** Press any key to get the time table ***

------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------
***** Time table *****

19 milliseconds, StringCmpGC -string01X LESS    string01Y-16 bytes
19 milliseconds, StringCmpGB -string02X EQUAL   string02Y-16 bytes
19 milliseconds, StringCmpGB -string01X LESS    string01Y-16 bytes
19 milliseconds, StringCmpGA -string02X EQUAL   string02Y-16 bytes
19 milliseconds, StringCmpGA -string01X LESS    string01Y-16 bytes
19 milliseconds, StringCmpET -string01X LESS    string01Y-16 bytes
19 milliseconds, StringCmpDT -string01X LESS    string01Y-16 bytes
19 milliseconds, StringCmpE -string02X EQUAL   string02Y-16 bytes
19 milliseconds, StringCmpE -string01X LESS    string01Y-16 bytes
19 milliseconds, StringCmpD -string01X LESS    string01Y-16 bytes
20 milliseconds, StringCmpGC -string02X EQUAL   string02Y-16 bytes
20 milliseconds, StringCmpDT -string02X EQUAL   string02Y-16 bytes
20 milliseconds, StringCmpGB -string03X GREATER string03Y-16 bytes
20 milliseconds, StringCmpF -string02X EQUAL   string02Y-16 bytes
20 milliseconds, StringCmpF -string01X LESS    string01Y-16 bytes
20 milliseconds, StringCmpE -string03X GREATER string03Y-16 bytes
20 milliseconds, StringCmpFT -string02X EQUAL   string02Y-16 bytes
20 milliseconds, StringCmpFT -string01X LESS    string01Y-16 bytes
20 milliseconds, StringCmpD -string03X GREATER string03Y-16 bytes
20 milliseconds, StringCmpET -string02X EQUAL   string02Y-16 bytes
20 milliseconds, StringCmpC -string02X EQUAL   string02Y-16 bytes
21 milliseconds, StringCmpEB -string02X EQUAL   string02Y-16 bytes
21 milliseconds, StringCmpEB -string01X LESS    string01Y-16 bytes
21 milliseconds, StringCmpDT -string03X GREATER string03Y-16 bytes
21 milliseconds, StringCmpGC -string03X GREATER string03Y-16 bytes
21 milliseconds, StringCmpGA -string03X GREATER string03Y-16 bytes
21 milliseconds, StringCmpD -string02X EQUAL   string02Y-16 bytes
21 milliseconds, StringCmpF -string03X GREATER string03Y-16 bytes
21 milliseconds, StringCmpC -string03X GREATER string03Y-16 bytes
21 milliseconds, StringCmpET -string03X GREATER string03Y-16 bytes
22 milliseconds, StringCmpFT -string03X GREATER string03Y-16 bytes
22 milliseconds, StringCmpEB -string03X GREATER string03Y-16 bytes
33 milliseconds, StringCmpC -string01X LESS    string01Y-16 bytes
********** END 2 **********


Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 16, 2014, 06:29:16 AM
Hi Gunther,
                   Thanks. Your i7 is powerful !

                   By the way, could you run
                   that CmpString0 in my reply #20
                   if you dont mind, Gunther ?
                   Thanks  :t
Title: Re: Sorting strings
Post by: Gunther on June 16, 2014, 09:25:54 PM
Hi Rui,

Quote from: RuiLoureiro on June 16, 2014, 06:29:16 AM
                   By the way, could you run
                   that CmpString0 in my reply #20
                   if you dont mind, Gunther ?
                   Thanks  :t

Sure. I'll run it this evening, because I'm now I am now at the University.

Gunther
Title: Re: Sorting strings
Post by: FORTRANS on June 17, 2014, 12:13:01 AM
Quote from: RuiLoureiro on June 16, 2014, 03:06:17 AM
Hi
    This is the latest
    Could you run it ?
    Thank you !

Hi,

   Two results for you.

Regards,

Steve

string lowercase= !#$%&/().=<>;:@  a bcd efg hij klm nopqrstuvxyz [\]^_{|}~
string uppercase - string01Z= !#$%&/().=<>;:@  A BCD EFG HIJ KLM NOPQRSTUVXYZ [\]^_{|}~
********* PRESS A KEY ... *********
string uppercase - string01Z=A BCD EFG HIJ KLM NOP
string uppercase - string01W=AB CDEFG HIJK LMNOPA

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

384 milliseconds, StringCmpC, string01X, string01Y
373 milliseconds, StringCmpC, string02X, string02Y
424 milliseconds, StringCmpC, string03X, string03Y
348 milliseconds, StringCmpD, string01X, string01Y
352 milliseconds, StringCmpD, string02X, string02Y
390 milliseconds, StringCmpD, string03X, string03Y
355 milliseconds, StringCmpE, string01X, string01Y
363 milliseconds, StringCmpE, string02X, string02Y
384 milliseconds, StringCmpE, string03X, string03Y
390 milliseconds, StringCmpF, string01X, string01Y
357 milliseconds, StringCmpF, string02X, string02Y
383 milliseconds, StringCmpF, string03X, string03Y
295 milliseconds, StringCmpDT, string01X, string01Y
346 milliseconds, StringCmpDT, string02X, string02Y
398 milliseconds, StringCmpDT, string03X, string03Y
358 milliseconds, StringCmpET, string01X, string01Y
363 milliseconds, StringCmpET, string02X, string02Y
382 milliseconds, StringCmpET, string03X, string03Y
345 milliseconds, StringCmpFT, string01X, string01Y
345 milliseconds, StringCmpFT, string02X, string02Y
379 milliseconds, StringCmpFT, string03X, string03Y
348 milliseconds, StringCmpEB, string01X, string01Y
351 milliseconds, StringCmpEB, string02X, string02Y
389 milliseconds, StringCmpEB, string03X, string03Y
317 milliseconds, StringCmpGA, string01X, string01Y
377 milliseconds, StringCmpGA, string02X, string02Y
407 milliseconds, StringCmpGA, string03X, string03Y
310 milliseconds, StringCmpGB, string01X, string01Y
363 milliseconds, StringCmpGB, string02X, string02Y
424 milliseconds, StringCmpGB, string03X, string03Y
347 milliseconds, StringCmpGC, string01X, string01Y
355 milliseconds, StringCmpGC, string02X, string02Y
399 milliseconds, StringCmpGC, string03X, string03Y
*** Press any key to get the time table ***

------------------------------------------
 (SSE1)
------------------------------------------
***** Time table *****

295 milliseconds, StringCmpDT -string01X LESS    string01Y-16 bytes
310 milliseconds, StringCmpGB -string01X LESS    string01Y-16 bytes
317 milliseconds, StringCmpGA -string01X LESS    string01Y-16 bytes
345 milliseconds, StringCmpFT -string01X LESS    string01Y-16 bytes
345 milliseconds, StringCmpFT -string02X EQUAL   string02Y-16 bytes
346 milliseconds, StringCmpDT -string02X EQUAL   string02Y-16 bytes
347 milliseconds, StringCmpGC -string01X LESS    string01Y-16 bytes
348 milliseconds, StringCmpEB -string01X LESS    string01Y-16 bytes
348 milliseconds, StringCmpD -string01X LESS    string01Y-16 bytes
351 milliseconds, StringCmpEB -string02X EQUAL   string02Y-16 bytes
352 milliseconds, StringCmpD -string02X EQUAL   string02Y-16 bytes
355 milliseconds, StringCmpGC -string02X EQUAL   string02Y-16 bytes
355 milliseconds, StringCmpE -string01X LESS    string01Y-16 bytes
357 milliseconds, StringCmpF -string02X EQUAL   string02Y-16 bytes
358 milliseconds, StringCmpET -string01X LESS    string01Y-16 bytes
363 milliseconds, StringCmpGB -string02X EQUAL   string02Y-16 bytes
363 milliseconds, StringCmpET -string02X EQUAL   string02Y-16 bytes
363 milliseconds, StringCmpE -string02X EQUAL   string02Y-16 bytes
373 milliseconds, StringCmpC -string02X EQUAL   string02Y-16 bytes
377 milliseconds, StringCmpGA -string02X EQUAL   string02Y-16 bytes
379 milliseconds, StringCmpFT -string03X GREATER string03Y-16 bytes
382 milliseconds, StringCmpET -string03X GREATER string03Y-16 bytes
383 milliseconds, StringCmpF -string03X GREATER string03Y-16 bytes
384 milliseconds, StringCmpE -string03X GREATER string03Y-16 bytes
384 milliseconds, StringCmpC -string01X LESS    string01Y-16 bytes
389 milliseconds, StringCmpEB -string03X GREATER string03Y-16 bytes
390 milliseconds, StringCmpF -string01X LESS    string01Y-16 bytes
390 milliseconds, StringCmpD -string03X GREATER string03Y-16 bytes
398 milliseconds, StringCmpDT -string03X GREATER string03Y-16 bytes
399 milliseconds, StringCmpGC -string03X GREATER string03Y-16 bytes
407 milliseconds, StringCmpGA -string03X GREATER string03Y-16 bytes
424 milliseconds, StringCmpGB -string03X GREATER string03Y-16 bytes
424 milliseconds, StringCmpC -string03X GREATER string03Y-16 bytes
********** END 2 **********



string lowercase= !#$%&/().=<>;:@  a bcd efg hij klm nopqrstuvxyz [\]^_{|}~
string uppercase - string01Z= !#$%&/().=<>;:@  A BCD EFG HIJ KLM NOPQRSTUVXYZ [\]^_{|}~
********* PRESS A KEY ... *********
string uppercase - string01Z=A BCD EFG HIJ KLM NOP
string uppercase - string01W=AB CDEFG HIJK LMNOPA

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

242 milliseconds, StringCmpC, string01X, string01Y
86 milliseconds, StringCmpC, string02X, string02Y
92 milliseconds, StringCmpC, string03X, string03Y
103 milliseconds, StringCmpD, string01X, string01Y
94 milliseconds, StringCmpD, string02X, string02Y
106 milliseconds, StringCmpD, string03X, string03Y
147 milliseconds, StringCmpE, string01X, string01Y
95 milliseconds, StringCmpE, string02X, string02Y
111 milliseconds, StringCmpE, string03X, string03Y
103 milliseconds, StringCmpF, string01X, string01Y
152 milliseconds, StringCmpF, string02X, string02Y
118 milliseconds, StringCmpF, string03X, string03Y
100 milliseconds, StringCmpDT, string01X, string01Y
101 milliseconds, StringCmpDT, string02X, string02Y
103 milliseconds, StringCmpDT, string03X, string03Y
131 milliseconds, StringCmpET, string01X, string01Y
148 milliseconds, StringCmpET, string02X, string02Y
109 milliseconds, StringCmpET, string03X, string03Y
101 milliseconds, StringCmpFT, string01X, string01Y
96 milliseconds, StringCmpFT, string02X, string02Y
108 milliseconds, StringCmpFT, string03X, string03Y
103 milliseconds, StringCmpEB, string01X, string01Y
131 milliseconds, StringCmpEB, string02X, string02Y
115 milliseconds, StringCmpEB, string03X, string03Y
92 milliseconds, StringCmpGA, string01X, string01Y
73 milliseconds, StringCmpGA, string02X, string02Y
77 milliseconds, StringCmpGA, string03X, string03Y
88 milliseconds, StringCmpGB, string01X, string01Y
88 milliseconds, StringCmpGB, string02X, string02Y
129 milliseconds, StringCmpGB, string03X, string03Y
101 milliseconds, StringCmpGC, string01X, string01Y
104 milliseconds, StringCmpGC, string02X, string02Y
117 milliseconds, StringCmpGC, string03X, string03Y
*** Press any key to get the time table ***

------------------------------------------
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
------------------------------------------
***** Time table *****

73 milliseconds, StringCmpGA -string02X EQUAL   string02Y-16 bytes
77 milliseconds, StringCmpGA -string03X GREATER string03Y-16 bytes
86 milliseconds, StringCmpC -string02X EQUAL   string02Y-16 bytes
88 milliseconds, StringCmpGB -string01X LESS    string01Y-16 bytes
88 milliseconds, StringCmpGB -string02X EQUAL   string02Y-16 bytes
92 milliseconds, StringCmpC -string03X GREATER string03Y-16 bytes
92 milliseconds, StringCmpGA -string01X LESS    string01Y-16 bytes
94 milliseconds, StringCmpD -string02X EQUAL   string02Y-16 bytes
95 milliseconds, StringCmpE -string02X EQUAL   string02Y-16 bytes
96 milliseconds, StringCmpFT -string02X EQUAL   string02Y-16 bytes
100 milliseconds, StringCmpDT -string01X LESS    string01Y-16 bytes
101 milliseconds, StringCmpDT -string02X EQUAL   string02Y-16 bytes
101 milliseconds, StringCmpGC -string01X LESS    string01Y-16 bytes
101 milliseconds, StringCmpFT -string01X LESS    string01Y-16 bytes
103 milliseconds, StringCmpDT -string03X GREATER string03Y-16 bytes
103 milliseconds, StringCmpD -string01X LESS    string01Y-16 bytes
103 milliseconds, StringCmpF -string01X LESS    string01Y-16 bytes
103 milliseconds, StringCmpEB -string01X LESS    string01Y-16 bytes
104 milliseconds, StringCmpGC -string02X EQUAL   string02Y-16 bytes
106 milliseconds, StringCmpD -string03X GREATER string03Y-16 bytes
108 milliseconds, StringCmpFT -string03X GREATER string03Y-16 bytes
109 milliseconds, StringCmpET -string03X GREATER string03Y-16 bytes
111 milliseconds, StringCmpE -string03X GREATER string03Y-16 bytes
115 milliseconds, StringCmpEB -string03X GREATER string03Y-16 bytes
117 milliseconds, StringCmpGC -string03X GREATER string03Y-16 bytes
118 milliseconds, StringCmpF -string03X GREATER string03Y-16 bytes
129 milliseconds, StringCmpGB -string03X GREATER string03Y-16 bytes
131 milliseconds, StringCmpEB -string02X EQUAL   string02Y-16 bytes
131 milliseconds, StringCmpET -string01X LESS    string01Y-16 bytes
147 milliseconds, StringCmpE -string01X LESS    string01Y-16 bytes
148 milliseconds, StringCmpET -string02X EQUAL   string02Y-16 bytes
152 milliseconds, StringCmpF -string02X EQUAL   string02Y-16 bytes
242 milliseconds, StringCmpC -string01X LESS    string01Y-16 bytes
********** END 2 **********
Title: Re: Sorting strings
Post by: Gunther on June 17, 2014, 01:59:21 AM
Hi Rui,

the results from CmpString1 (post #20):

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

X is less than Y
ShowResultXY
X is EQUAL Y
ShowResultXY
X is greater than Y
ShowResultXY

70 milliseconds, CompareStringXYS, _string01X, _string01Y
44 milliseconds, CompareStringXYS, _string02X, _string02Y
47 milliseconds, CompareStringXYS, _string03X, _string03Y
28 milliseconds, CompareStringXYT, _string01X, _string01Y
29 milliseconds, CompareStringXYT, _string02X, _string02Y
28 milliseconds, CompareStringXYT, _string03X, _string03Y
51 milliseconds, CompareStringXYBS, _string01X, _string01Y
50 milliseconds, CompareStringXYBS, _string02X, _string02Y
53 milliseconds, CompareStringXYBS, _string03X, _string03Y
20 milliseconds, StringCmpC, _string01X, _string01Y
20 milliseconds, StringCmpC, _string02X, _string02Y
22 milliseconds, StringCmpC, _string03X, _string03Y
27 milliseconds, StringCmpD, _string01X, _string01Y
27 milliseconds, StringCmpD, _string02X, _string02Y
28 milliseconds, StringCmpD, _string03X, _string03Y
27 milliseconds, StringCmpE, _string01X, _string01Y
27 milliseconds, StringCmpE, _string02X, _string02Y
28 milliseconds, StringCmpE, _string03X, _string03Y
19 milliseconds, StringCmpF, _string01X, _string01Y
19 milliseconds, StringCmpF, _string02X, _string02Y
21 milliseconds, StringCmpF, _string03X, _string03Y
*** Press any key to get the time table ***

------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------
***** Time table *****

19 milliseconds, StringCmpF -_string02X EQUAL _string02Y-16 bytes
19 milliseconds, StringCmpF -_string01X LESS _string01Y-16 bytes
20 milliseconds, StringCmpC -_string02X EQUAL _string02Y-16 bytes
20 milliseconds, StringCmpC -_string01X LESS _string01Y-16 bytes
21 milliseconds, StringCmpF -_string03X GREATER _string03Y-16 bytes
22 milliseconds, StringCmpC -_string03X GREATER _string03Y-16 bytes
27 milliseconds, StringCmpD -_string01X LESS _string01Y-16 bytes
27 milliseconds, StringCmpE -_string02X EQUAL _string02Y-16 bytes
27 milliseconds, StringCmpE -_string01X LESS _string01Y-16 bytes
27 milliseconds, StringCmpD -_string02X EQUAL _string02Y-16 bytes
28 milliseconds, StringCmpD -_string03X GREATER _string03Y-16 bytes
28 milliseconds, StringCmpE -_string03X GREATER _string03Y-16 bytes
28 milliseconds, CompareStringXYT -_string03X GREATER _string03Y-16 bytes
28 milliseconds, CompareStringXYT -_string01X LESS _string01Y-16 bytes
29 milliseconds, CompareStringXYT -_string02X EQUAL _string02Y-16 bytes
44 milliseconds, CompareStringXYS -_string02X EQUAL _string02Y-16 bytes
47 milliseconds, CompareStringXYS -_string03X GREATER _string03Y-16 bytes
50 milliseconds, CompareStringXYBS -_string02X EQUAL _string02Y-16 bytes
51 milliseconds, CompareStringXYBS -_string01X LESS _string01Y-16 bytes
53 milliseconds, CompareStringXYBS -_string03X GREATER _string03Y-16 bytes
70 milliseconds, CompareStringXYS -_string01X LESS _string01Y-16 bytes
********** END 2 **********


Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 17, 2014, 07:06:08 AM
Thanks all

          No Gunther, this one,
          if you dont mind.

A_stricmp does this:

_string01X       db "abC",0
_string01Y       db "aBc",0

00000061                === new byte
00000061
SeeHexa AL AH --X

00000062                === new byte
00000042
SeeHexa AL AH --X

00000042
00000042                <---- here they are EQUAL
SeeHexa AL AH --Y

00000062                ----> BUT DO THIS
00000042
SeeHexa AL AH --Z

00000001                ----> BUT DO THIS
00000019
SeeHexa AL AH --W

00000043                ----> NOW new byte !
00000063           
SeeHexa AL AH --X

00000063                === new byte
00000063                <---- here they are EQUAL

SeeHexa AL AH --Y

00000063                ----> BUT DO THIS
00000063

SeeHexa AL AH --Z

00000002                ----> BUT DO THIS
00000019

SeeHexa AL AH --W

00000000                ----> BUT DO THIS
00000000

SeeHexa AL AH --X
X is EQUAL Y
ShowResultXY
 
Title: Re: Sorting strings
Post by: Gunther on June 17, 2014, 07:10:00 PM
Hi Rui,

okay, I'll post the result this evening.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 19, 2014, 07:09:00 AM
Quote from: Gunther on June 17, 2014, 07:10:00 PM
Hi Rui,

okay, I'll post the result this evening.

Gunther
oK Gunther, these are my results:

Quote
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
***** Time table *****

29 milliseconds, C_stricmp -_string02X EQUAL   _string02Y-16 bytes
32 milliseconds, C_stricmp -_string03X GREATER _string03Y-16 bytes
32 milliseconds, C_stricmp -_string01X LESS    _string01Y-16 bytes
63 milliseconds, B_stricmp -_string02X EQUAL   _string02Y-16 bytes
65 milliseconds, B_stricmp -_string03X GREATER _string03Y-16 bytes
68 milliseconds, B_stricmp -_string01X LESS    _string01Y-16 bytes
71 milliseconds, A_stricmp -_string02X EQUAL   _string02Y-16 bytes
73 milliseconds, A_stricmp -_string03X GREATER _string03Y-16 bytes
84 milliseconds, A_stricmp -_string01X LESS    _string01Y-16 bytes
********** END 2 **********
Title: Re: Sorting strings
Post by: Gunther on June 20, 2014, 02:00:31 AM
Hi Rui,

please excuse the delay, but it's exam time here and I've a lot to do. Here the results:

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

X is less than Y
X is EQUAL Y
X is greater than Y

28 milliseconds, A_stricmp, _string01X, _string01Y
23 milliseconds, A_stricmp, _string02X, _string02Y
23 milliseconds, A_stricmp, _string03X, _string03Y
18 milliseconds, B_stricmp, _string01X, _string01Y
16 milliseconds, B_stricmp, _string02X, _string02Y
16 milliseconds, B_stricmp, _string03X, _string03Y
18 milliseconds, C_stricmp, _string01X, _string01Y
16 milliseconds, C_stricmp, _string02X, _string02Y
16 milliseconds, C_stricmp, _string03X, _string03Y
*** Press any key to get the time table ***

------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------
***** Time table *****

16 milliseconds, C_stricmp -_string03X GREATER _string03Y-16 bytes
16 milliseconds, C_stricmp -_string02X EQUAL   _string02Y-16 bytes
16 milliseconds, B_stricmp -_string03X GREATER _string03Y-16 bytes
16 milliseconds, B_stricmp -_string02X EQUAL   _string02Y-16 bytes
18 milliseconds, C_stricmp -_string01X LESS    _string01Y-16 bytes
18 milliseconds, B_stricmp -_string01X LESS    _string01Y-16 bytes
23 milliseconds, A_stricmp -_string03X GREATER _string03Y-16 bytes
23 milliseconds, A_stricmp -_string02X EQUAL   _string02Y-16 bytes
28 milliseconds, A_stricmp -_string01X LESS    _string01Y-16 bytes
********** END 2 **********


Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 21, 2014, 04:44:09 AM
good Gunther  :t
Title: Re: Sorting strings
Post by: RuiLoureiro on June 21, 2014, 07:40:24 AM
Hi Gunther,
            Could you run this .exe
            if you dont mind ?
            I am testing algos to copy
            strings.
            Thank you  :t

This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is
one that we want to copy to ***This string here is one that we want to copy to**
This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is
one that we want to copy to ***This string here is one that we want to copy to**
This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is
one that we want to copy to ***This string here is one that we want to copy to**

This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is
one that we want to copy to ***This string here is one that we want to copy to**

This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is
one that we want to copy to ***This string here is one that we want to copy to**

9 milliseconds, MOVEAtoB_X, 10 bytes - copy lenght DWORDS
9 milliseconds, MOVEAtoB_Y, 10 bytes - copy lenght MOVZX
9 milliseconds, MOVEAtoB_Z, 10 bytes - copy length BYTES
9 milliseconds, MOVEAtoB_W, 10 bytes - copy null terminated
71 milliseconds, MOVEAtoB_X, 50 bytes - copy lenght DWORDS
43 milliseconds, MOVEAtoB_Y, 50 bytes - copy lenght MOVZX
50 milliseconds, MOVEAtoB_Z, 50 bytes - copy length BYTES
55 milliseconds, MOVEAtoB_W, 50 bytes - copy null terminated
100 milliseconds, MOVEAtoB_X, 100 bytes - copy lenght DWORDS
70 milliseconds, MOVEAtoB_Y, 100 bytes - copy lenght MOVZX
79 milliseconds, MOVEAtoB_Z, 100 bytes - copy length BYTES
90 milliseconds, MOVEAtoB_W, 100 bytes - copy null terminated
206 milliseconds, MOVEAtoB_X, 200 bytes - copy lenght DWORDS
134 milliseconds, MOVEAtoB_Y, 200 bytes - copy lenght MOVZX
149 milliseconds, MOVEAtoB_Z, 200 bytes - copy length BYTES
167 milliseconds, MOVEAtoB_W, 200 bytes - copy null terminated
15 milliseconds, MOVEAtoB_YY, 10 bytes - copy lenght MOVZX
56 milliseconds, MOVEAtoB_YY, 50 bytes - copy lenght MOVZX
101 milliseconds, MOVEAtoB_YY, 100 bytes - copy lenght MOVZX
208 milliseconds, MOVEAtoB_YY, 200 bytes - copy lenght MOVZX
*** Press any key to get the time table ***



Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  9 milliseconds, MOVEAtoB_W - 10 bytes- copy null terminated
  9 milliseconds, MOVEAtoB_Z - 10 bytes- copy length BYTES
  9 milliseconds, MOVEAtoB_Y - 10 bytes- copy lenght MOVZX
  9 milliseconds, MOVEAtoB_X - 10 bytes- copy lenght DWORDS
15 milliseconds, MOVEAtoB_YY- 10  bytes- copy lenght MOVZX

43 milliseconds, MOVEAtoB_Y - 50 bytes- copy lenght MOVZX
50 milliseconds, MOVEAtoB_Z - 50 bytes- copy length BYTES
55 milliseconds, MOVEAtoB_W - 50 bytes- copy null terminated
56 milliseconds, MOVEAtoB_YY- 50  bytes- copy lenght MOVZX
70 milliseconds, MOVEAtoB_Y - 100 bytes- copy lenght MOVZX
71 milliseconds, MOVEAtoB_X - 50 bytes- copy lenght DWORDS

79 milliseconds, MOVEAtoB_Z - 100 bytes- copy length BYTES
90 milliseconds, MOVEAtoB_W - 100 bytes- copy null terminated
100 milliseconds, MOVEAtoB_X - 100 bytes- copy lenght DWORDS
101 milliseconds, MOVEAtoB_YY- 100 bytes- copy lenght MOVZX

134 milliseconds, MOVEAtoB_Y - 200 bytes- copy lenght MOVZX
149 milliseconds, MOVEAtoB_Z - 200 bytes- copy length BYTES
167 milliseconds, MOVEAtoB_W - 200 bytes- copy null terminated
206 milliseconds, MOVEAtoB_X - 200 bytes- copy lenght DWORDS
208 milliseconds, MOVEAtoB_YY- 200 bytes- copy lenght MOVZX
********** END 2 **********
Title: Re: Sorting strings
Post by: Gunther on June 21, 2014, 07:26:01 PM
Quote from: RuiLoureiro on June 21, 2014, 07:40:24 AM
Hi Gunther,
            Could you run this .exe
            if you dont mind ?

Why not? Here's the result:

This string here is one that we want to copy to ***This string here is one that
we want to copy to**This string here is one that we want to copy to ***This stri
ng here is one that we want to copy to**
This string here is one that we want to copy to ***This string here is one that
we want to copy to**This string here is one that we want to copy to ***This stri
ng here is one that we want to copy to**
This string here is one that we want to copy to ***This string here is one that
we want to copy to**This string here is one that we want to copy to ***This stri
ng here is one that we want to copy to**

This string here is one that we want to copy to ***This string here is one that
we want to copy to**This string here is one that we want to copy to ***This stri
ng here is one that we want to copy to**

This string here is one that we want to copy to ***This string here is one that
we want to copy to**This string here is one that we want to copy to ***This stri
ng here is one that we want to copy to**

5 milliseconds, MOVEAtoB_X, 10 bytes - copy lenght DWORDS
6 milliseconds, MOVEAtoB_Y, 10 bytes - copy lenght MOVZX
4 milliseconds, MOVEAtoB_Z, 10 bytes - copy length BYTES
5 milliseconds, MOVEAtoB_W, 10 bytes - copy null terminated
29 milliseconds, MOVEAtoB_X, 50 bytes - copy lenght DWORDS
39 milliseconds, MOVEAtoB_Y, 50 bytes - copy lenght MOVZX
29 milliseconds, MOVEAtoB_Z, 50 bytes - copy length BYTES
32 milliseconds, MOVEAtoB_W, 50 bytes - copy null terminated
55 milliseconds, MOVEAtoB_X, 100 bytes - copy lenght DWORDS
66 milliseconds, MOVEAtoB_Y, 100 bytes - copy lenght MOVZX
55 milliseconds, MOVEAtoB_Z, 100 bytes - copy length BYTES
57 milliseconds, MOVEAtoB_W, 100 bytes - copy null terminated
106 milliseconds, MOVEAtoB_X, 200 bytes - copy lenght DWORDS
116 milliseconds, MOVEAtoB_Y, 200 bytes - copy lenght MOVZX
106 milliseconds, MOVEAtoB_Z, 200 bytes - copy length BYTES
109 milliseconds, MOVEAtoB_W, 200 bytes - copy null terminated
5 milliseconds, MOVEAtoB_YY, 10 bytes - copy lenght MOVZX
31 milliseconds, MOVEAtoB_YY, 50 bytes - copy lenght MOVZX
57 milliseconds, MOVEAtoB_YY, 100 bytes - copy lenght MOVZX
109 milliseconds, MOVEAtoB_YY, 200 bytes - copy lenght MOVZX
*** Press any key to get the time table ***

------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------
***** Time table *****

4 milliseconds, MOVEAtoB_Z - 10 bytes- copy length BYTES
5 milliseconds, MOVEAtoB_W - 10 bytes- copy null terminated
5 milliseconds, MOVEAtoB_YY- 10  bytes- copy lenght MOVZX
5 milliseconds, MOVEAtoB_X - 10 bytes- copy lenght DWORDS
6 milliseconds, MOVEAtoB_Y - 10 bytes- copy lenght MOVZX
29 milliseconds, MOVEAtoB_Z - 50 bytes- copy length BYTES
29 milliseconds, MOVEAtoB_X - 50 bytes- copy lenght DWORDS
31 milliseconds, MOVEAtoB_YY- 50  bytes- copy lenght MOVZX
32 milliseconds, MOVEAtoB_W - 50 bytes- copy null terminated
39 milliseconds, MOVEAtoB_Y - 50 bytes- copy lenght MOVZX
55 milliseconds, MOVEAtoB_Z - 100 bytes- copy length BYTES
55 milliseconds, MOVEAtoB_X - 100 bytes- copy lenght DWORDS
57 milliseconds, MOVEAtoB_YY- 100 bytes- copy lenght MOVZX
57 milliseconds, MOVEAtoB_W - 100 bytes- copy null terminated
66 milliseconds, MOVEAtoB_Y - 100 bytes- copy lenght MOVZX
106 milliseconds, MOVEAtoB_Z - 200 bytes- copy length BYTES
106 milliseconds, MOVEAtoB_X - 200 bytes- copy lenght DWORDS
109 milliseconds, MOVEAtoB_YY- 200 bytes- copy lenght MOVZX
109 milliseconds, MOVEAtoB_W - 200 bytes- copy null terminated
116 milliseconds, MOVEAtoB_Y - 200 bytes- copy lenght MOVZX
********** END 2 **********


Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 22, 2014, 05:02:30 AM
Gunther,
          Thanks  :t
          Could you run this new to copy strings,
          please ?

          EDIT: my results seems to mean that working with
                    strings with length is better than null terminated.
                    (far better!). My strings have length and null terminator also.
                    So when i copy they have the null terminator.


This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is
one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_X10= This strin$$$
MOVEAtoB_Y10= This strin$$$
MOVEAtoB_Z10= This strin$$$
MOVEAtoB_W10= This strin$$$

MOVEAtoB_X50= This string here is one that we want to copy to **$$$
MOVEAtoB_Y50= This string here is one that we want to copy to **$$$
MOVEAtoB_Z50= This string here is one that we want to copy to **$$$
MOVEAtoB_W50= This string here is one that we want to copy to **$$$

MOVEAtoB_X100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Y100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Z100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_W100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_X200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This
string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Y200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This
string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Z200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This
string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_W200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This
string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_YY10= This strin$$$
MOVEAtoB_YY50= This string here is one that we want to copy to **$$$
MOVEAtoB_YY100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_YY200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This
string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_XX10= This strin$$$
MOVEAtoB_XX50= This string here is one that we want to copy to **$$$
MOVEAtoB_XX100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_XX200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This
string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_XZ10= This strin$$$
MOVEAtoB_XZ50= This string here is one that we want to copy to **$$$
MOVEAtoB_XZ100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_XZ200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This
string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

szCopy10MASM= This strin$$$
szCopy50MASM= This string here is one that we want to copy to **$$$
szCopy100MASM= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
szCopy200MASM= This string here is one that we want to copy to ***This string here is one that we want to copy to**This
string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

11 milliseconds, MOVEAtoB_X, 13 bytes - copy lenght DWORDS
12 milliseconds, MOVEAtoB_Y, 13 bytes - copy lenght MOVZX
12 milliseconds, MOVEAtoB_Z, 13 bytes - copy length BYTES
12 milliseconds, MOVEAtoB_W, 13 bytes - copy null terminated
68 milliseconds, MOVEAtoB_X, 53 bytes - copy lenght DWORDS
45 milliseconds, MOVEAtoB_Y, 53 bytes - copy lenght MOVZX
49 milliseconds, MOVEAtoB_Z, 53 bytes - copy length BYTES
55 milliseconds, MOVEAtoB_W, 53 bytes - copy null terminated
97 milliseconds, MOVEAtoB_X, 103 bytes - copy lenght DWORDS
73 milliseconds, MOVEAtoB_Y, 103 bytes - copy lenght MOVZX
97 milliseconds, MOVEAtoB_Z, 103 bytes - copy length BYTES
92 milliseconds, MOVEAtoB_W, 103 bytes - copy null terminated
217 milliseconds, MOVEAtoB_X, 203 bytes - copy lenght DWORDS
136 milliseconds, MOVEAtoB_Y, 203 bytes - copy lenght MOVZX
146 milliseconds, MOVEAtoB_Z, 203 bytes - copy length BYTES
175 milliseconds, MOVEAtoB_W, 203 bytes - copy null terminated
13 milliseconds, MOVEAtoB_YY, 13 bytes - copy lenght MOVZX
59 milliseconds, MOVEAtoB_YY, 53 bytes - copy lenght MOVZX
104 milliseconds, MOVEAtoB_YY, 103 bytes - copy lenght MOVZX
197 milliseconds, MOVEAtoB_YY, 203 bytes - copy lenght MOVZX
5 milliseconds, MOVEAtoB_XX, 13 bytes - copy lenght DWORDS+MOVZX
27 milliseconds, MOVEAtoB_XX, 53 bytes - copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XX, 103 bytes - copy lenght DWORDS+MOVZX
136 milliseconds, MOVEAtoB_XX, 203 bytes - copy lenght DWORDS+MOVZX
5 milliseconds, MOVEAtoB_XZ, 13 bytes - copy lenght DWORDS+MOVZX
19 milliseconds, MOVEAtoB_XZ, 53 bytes - copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZ, 103 bytes - copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_XZ, 203 bytes - copy lenght DWORDS+MOVZX
13 milliseconds, szCopy10MASM, 13 bytes - copy lenght DWORDS+MOVZX
61 milliseconds, szCopy50MASM, 53 bytes - copy lenght DWORDS+MOVZX
105 milliseconds, szCopy100MASM, 103 bytes - copy lenght DWORDS+MOVZX
197 milliseconds, szCopy200MASM, 203 bytes - copy lenght DWORDS+MOVZX
*** Press any key to get the time table ***


From  Gunther:
Quote
-------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
-------------------------------------------------------------
***** Time table *****

  1 milliseconds, MOVEAtoB_XX-   13 bytes- copy lenght DWORDS+MOVZX
  2 milliseconds, MOVEAtoB_XZ-   13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_Z -   13 bytes- copy length BYTES
  5 milliseconds, MOVEAtoB_X -   13 bytes- copy lenght DWORDS
  6 milliseconds, MOVEAtoB_Y -   13 bytes- copy lenght MOVZX
  6 milliseconds, MOVEAtoB_XZ-   53 bytes- copy lenght DWORDS+MOVZX
  9 milliseconds, szCopy10MASM - 13 bytes- copy null terminated
  9 milliseconds, MOVEAtoB_YY-   13 bytes- copy lenght MOVZX
  9 milliseconds, MOVEAtoB_W -   13 bytes- copy null terminated
 
10 milliseconds, MOVEAtoB_XX-   53 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZ103 bytes- copy lenght DWORDS+MOVZX
20 milliseconds, MOVEAtoB_XX-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_Z -   53 bytes- copy length BYTES
30 milliseconds, MOVEAtoB_X -   53 bytes- copy lenght DWORDS
31 milliseconds, MOVEAtoB_Y -   53 bytes- copy lenght MOVZX
32 milliseconds, szCopy50MASM - 53 bytes- copy null terminated
32 milliseconds, MOVEAtoB_XZ203 bytes- copy lenght DWORDS+MOVZX
39 milliseconds, MOVEAtoB_XX-  203 bytes- copy lenght DWORDS+MOVZX
42 milliseconds, MOVEAtoB_W -   53 bytes- copy null terminated
44 milliseconds, MOVEAtoB_YY-   53 bytes- copy lenght MOVZX

56 milliseconds, MOVEAtoB_Z -  103 bytes- copy length BYTES
56 milliseconds, MOVEAtoB_X -  103 bytes- copy lenght DWORDS
59 milliseconds, MOVEAtoB_Y -  103 bytes- copy lenght MOVZX
67 milliseconds, MOVEAtoB_YY-  103 bytes- copy lenght MOVZX
67 milliseconds, szCopy100MASM-103 bytes- copy null terminated
68 milliseconds, MOVEAtoB_W -  103 bytes- copy null terminated

108 milliseconds, MOVEAtoB_Z -  203 bytes- copy length BYTES
108 milliseconds, MOVEAtoB_X -  203 bytes- copy lenght DWORDS
109 milliseconds, MOVEAtoB_Y -  203 bytes- copy lenght MOVZX
110 milliseconds, szCopy200MASM-203 bytes- copy null terminated
119 milliseconds, MOVEAtoB_YY-  203 bytes- copy lenght MOVZX
120 milliseconds, MOVEAtoB_W -  203 bytes- copy null terminated
********** END 2 **********

These are my results:
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

5 milliseconds, MOVEAtoB_XZ-   13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_XX-   13 bytes- copy lenght DWORDS+MOVZX
11 milliseconds, MOVEAtoB_X -   13 bytes- copy lenght DWORDS
12 milliseconds, MOVEAtoB_Z -   13 bytes- copy length BYTES
12 milliseconds, MOVEAtoB_Y -   13 bytes- copy lenght MOVZX
12 milliseconds, MOVEAtoB_W -   13 bytes- copy null terminated
13 milliseconds, szCopy10MASM - 13 bytes- copy null terminated
13 milliseconds, MOVEAtoB_YY-   13 bytes- copy lenght MOVZX
;---***---
19 milliseconds, MOVEAtoB_XZ-   53 bytes- copy lenght DWORDS+MOVZX
27 milliseconds, MOVEAtoB_XX-   53 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZ103 bytes- copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_XZ203 bytes- copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_Y -   53 bytes- copy lenght MOVZX
49 milliseconds, MOVEAtoB_Z -   53 bytes- copy length BYTES
51 milliseconds, MOVEAtoB_XX-  103 bytes- copy lenght DWORDS+MOVZX
55 milliseconds, MOVEAtoB_W -   53 bytes- copy null terminated
59 milliseconds, MOVEAtoB_YY-   53 bytes- copy lenght MOVZX
61 milliseconds, szCopy50MASM - 53 bytes- copy null terminated
68 milliseconds, MOVEAtoB_X -   53 bytes- copy lenght DWORDS
;---***---
73 milliseconds, MOVEAtoB_Y -  103 bytes- copy lenght MOVZX
92 milliseconds, MOVEAtoB_W -  103 bytes- copy null terminated
97 milliseconds, MOVEAtoB_Z -  103 bytes- copy length BYTES
97 milliseconds, MOVEAtoB_X -  103 bytes- copy lenght DWORDS
104 milliseconds, MOVEAtoB_YY-  103 bytes- copy lenght MOVZX
105 milliseconds, szCopy100MASM-103 bytes- copy null terminated
;---***---
136 milliseconds, MOVEAtoB_XX-  203 bytes- copy lenght DWORDS+MOVZX
136 milliseconds, MOVEAtoB_Y -  203 bytes- copy lenght MOVZX
146 milliseconds, MOVEAtoB_Z -  203 bytes- copy length BYTES
175 milliseconds, MOVEAtoB_W -  203 bytes- copy null terminated
197 milliseconds, szCopy200MASM-203 bytes- copy null terminated
197 milliseconds, MOVEAtoB_YY-  203 bytes- copy lenght MOVZX
217 milliseconds, MOVEAtoB_X -  203 bytes- copy lenght DWORDS
********** END 2 **********

Quote
------------------------------------------------------------
Intel(R) Atom(TM) CPU N455   @ 1.66GHz (SSE4)
------------------------------------------------------------
***** Time table *****

21 milliseconds, MOVEAtoB_XX-    13 bytes- copy lenght DWORDS+MOVZX
26 milliseconds, MOVEAtoB_XZ-    13 bytes- copy lenght DWORDS+MOVZX
63 milliseconds, szCopy10MASM -  13 bytes- copy null terminated
70 milliseconds, MOVEAtoB_W -    13 bytes- copy null terminated
71 milliseconds, MOVEAtoB_XZ-    53 bytes- copy lenght DWORDS+MOVZX
74 milliseconds, MOVEAtoB_YY-    13 bytes- copy lenght MOVZX
74 milliseconds, MOVEAtoB_XX-    53 bytes- copy lenght DWORDS+MOVZX
75 milliseconds, MOVEAtoB_Y -    13 bytes- copy lenght MOVZX
76 milliseconds, MOVEAtoB_Z -    13 bytes- copy length BYTES
77 milliseconds, MOVEAtoB_X -    13 bytes- copy lenght DWORDS

121 milliseconds, MOVEAtoB_XZ-   103 bytes- copy lenght DWORDS+MOVZX
134 milliseconds, MOVEAtoB_XX-   103 bytes- copy lenght DWORDS+MOVZX
174 milliseconds, MOVEAtoB_W -    53 bytes- copy null terminated
205 milliseconds, MOVEAtoB_Z -    53 bytes- copy length BYTES
206 milliseconds, MOVEAtoB_Y -    53 bytes- copy lenght MOVZX
207 milliseconds, szCopy50MASM -  53 bytes- copy null terminated
211 milliseconds, MOVEAtoB_XZ-   203 bytes- copy lenght DWORDS+MOVZX
234 milliseconds, MOVEAtoB_X -    53 bytes- copy lenght DWORDS
242 milliseconds, MOVEAtoB_YY-    53 bytes- copy lenght MOVZX
268 milliseconds, MOVEAtoB_XX-   203 bytes- copy lenght DWORDS+MOVZX

325 milliseconds, MOVEAtoB_W -   103 bytes- copy null terminated
386 milliseconds, MOVEAtoB_Z -   103 bytes- copy length BYTES
386 milliseconds, MOVEAtoB_Y -   103 bytes- copy lenght MOVZX
388 milliseconds, szCopy100MASM- 103 bytes- copy null terminated
413 milliseconds, MOVEAtoB_X -   103 bytes- copy lenght DWORDS
453 milliseconds, MOVEAtoB_YY-   103 bytes- copy lenght MOVZX

625 milliseconds, MOVEAtoB_W -   203 bytes- copy null terminated
747 milliseconds, MOVEAtoB_Y -   203 bytes- copy lenght MOVZX
749 milliseconds, szCopy200MASM- 203 bytes- copy null terminated
765 milliseconds, MOVEAtoB_Z -   203 bytes- copy length BYTES
848 milliseconds, MOVEAtoB_X -   203 bytes- copy lenght DWORDS
875 milliseconds, MOVEAtoB_YY-   203 bytes- copy lenght MOVZX
********** END 2 **********
Title: Re: Sorting strings
Post by: Gunther on June 22, 2014, 06:51:40 PM
Rui,

results from CopyString1:

This string here is one that we want to copy to ***This string here is one that
we want to copy to**This string here is one that we want to copy to ***This stri
ng here is one that we want to copy to**$$$

MOVEAtoB_X10= This strin$$$
MOVEAtoB_Y10= This strin$$$
MOVEAtoB_Z10= This strin$$$
MOVEAtoB_W10= This strin$$$

MOVEAtoB_X50= This string here is one that we want to copy to **$$$
MOVEAtoB_Y50= This string here is one that we want to copy to **$$$
MOVEAtoB_Z50= This string here is one that we want to copy to **$$$
MOVEAtoB_W50= This string here is one that we want to copy to **$$$

MOVEAtoB_X100= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**$$$
MOVEAtoB_Y100= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**$$$
MOVEAtoB_Z100= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**$$$
MOVEAtoB_W100= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**$$$

MOVEAtoB_X200= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**This string here is one that we want to copy
to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Y200= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**This string here is one that we want to copy
to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Z200= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**This string here is one that we want to copy
to ***This string here is one that we want to copy to**$$$
MOVEAtoB_W200= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**This string here is one that we want to copy
to ***This string here is one that we want to copy to**$$$

MOVEAtoB_YY10= This strin$$$
MOVEAtoB_YY50= This string here is one that we want to copy to **$$$
MOVEAtoB_YY100= This string here is one that we want to copy to ***This string h
ere is one that we want to copy to**$$$
MOVEAtoB_YY200= This string here is one that we want to copy to ***This string h
ere is one that we want to copy to**This string here is one that we want to copy
to ***This string here is one that we want to copy to**$$$

MOVEAtoB_XX10= This strin$$$
MOVEAtoB_XX50= This string here is one that we want to copy to **$$$
MOVEAtoB_XX100= This string here is one that we want to copy to ***This string h
ere is one that we want to copy to**$$$
MOVEAtoB_XX200= This string here is one that we want to copy to ***This string h
ere is one that we want to copy to**This string here is one that we want to copy
to ***This string here is one that we want to copy to**$$$

MOVEAtoB_XZ10= This strin$$$
MOVEAtoB_XZ50= This string here is one that we want to copy to **$$$
MOVEAtoB_XZ100= This string here is one that we want to copy to ***This string h
ere is one that we want to copy to**$$$
MOVEAtoB_XZ200= This string here is one that we want to copy to ***This string h
ere is one that we want to copy to**This string here is one that we want to copy
to ***This string here is one that we want to copy to**$$$

szCopy10MASM= This strin$$$
szCopy50MASM= This string here is one that we want to copy to **$$$
szCopy100MASM= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**$$$
szCopy200MASM= This string here is one that we want to copy to ***This string he
re is one that we want to copy to**This string here is one that we want to copy
to ***This string here is one that we want to copy to**$$$

5 milliseconds, MOVEAtoB_X, 13 bytes - copy lenght DWORDS
6 milliseconds, MOVEAtoB_Y, 13 bytes - copy lenght MOVZX
5 milliseconds, MOVEAtoB_Z, 13 bytes - copy length BYTES
9 milliseconds, MOVEAtoB_W, 13 bytes - copy null terminated
30 milliseconds, MOVEAtoB_X, 53 bytes - copy lenght DWORDS
31 milliseconds, MOVEAtoB_Y, 53 bytes - copy lenght MOVZX
30 milliseconds, MOVEAtoB_Z, 53 bytes - copy length BYTES
42 milliseconds, MOVEAtoB_W, 53 bytes - copy null terminated
56 milliseconds, MOVEAtoB_X, 103 bytes - copy lenght DWORDS
59 milliseconds, MOVEAtoB_Y, 103 bytes - copy lenght MOVZX
56 milliseconds, MOVEAtoB_Z, 103 bytes - copy length BYTES
68 milliseconds, MOVEAtoB_W, 103 bytes - copy null terminated
108 milliseconds, MOVEAtoB_X, 203 bytes - copy lenght DWORDS
109 milliseconds, MOVEAtoB_Y, 203 bytes - copy lenght MOVZX
108 milliseconds, MOVEAtoB_Z, 203 bytes - copy length BYTES
120 milliseconds, MOVEAtoB_W, 203 bytes - copy null terminated
9 milliseconds, MOVEAtoB_YY, 13 bytes - copy lenght MOVZX
44 milliseconds, MOVEAtoB_YY, 53 bytes - copy lenght MOVZX
67 milliseconds, MOVEAtoB_YY, 103 bytes - copy lenght MOVZX
119 milliseconds, MOVEAtoB_YY, 203 bytes - copy lenght MOVZX
1 milliseconds, MOVEAtoB_XX, 13 bytes - copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XX, 53 bytes - copy lenght DWORDS+MOVZX
20 milliseconds, MOVEAtoB_XX, 103 bytes - copy lenght DWORDS+MOVZX
39 milliseconds, MOVEAtoB_XX, 203 bytes - copy lenght DWORDS+MOVZX
2 milliseconds, MOVEAtoB_XZ, 13 bytes - copy lenght DWORDS+MOVZX
6 milliseconds, MOVEAtoB_XZ, 53 bytes - copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZ, 103 bytes - copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_XZ, 203 bytes - copy lenght DWORDS+MOVZX
9 milliseconds, szCopy10MASM, 13 bytes - copy lenght DWORDS+MOVZX
32 milliseconds, szCopy50MASM, 53 bytes - copy lenght DWORDS+MOVZX
67 milliseconds, szCopy100MASM, 103 bytes - copy lenght DWORDS+MOVZX
110 milliseconds, szCopy200MASM, 203 bytes - copy lenght DWORDS+MOVZX
*** Press any key to get the time table ***

------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------
***** Time table *****

1 milliseconds, MOVEAtoB_XX-  13  bytes- copy lenght DWORDS+MOVZX
2 milliseconds, MOVEAtoB_XZ-  13 bytes- copy lenght DWORDS+MOVZX
5 milliseconds, MOVEAtoB_Z -  13 bytes- copy length BYTES
5 milliseconds, MOVEAtoB_X -  13 bytes- copy lenght DWORDS
6 milliseconds, MOVEAtoB_Y -  13 bytes- copy lenght MOVZX
6 milliseconds, MOVEAtoB_XZ-  53 bytes- copy lenght DWORDS+MOVZX
9 milliseconds, szCopy10MASM -  13 bytes- copy null terminated
9 milliseconds, MOVEAtoB_YY-  13  bytes- copy lenght MOVZX
9 milliseconds, MOVEAtoB_W -  13 bytes- copy null terminated
10 milliseconds, MOVEAtoB_XX-  53  bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZ- 103 bytes- copy lenght DWORDS+MOVZX
20 milliseconds, MOVEAtoB_XX- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_Z -  53 bytes- copy length BYTES
30 milliseconds, MOVEAtoB_X -  53 bytes- copy lenght DWORDS
31 milliseconds, MOVEAtoB_Y -  53 bytes- copy lenght MOVZX
32 milliseconds, szCopy50MASM -  53 bytes- copy null terminated
32 milliseconds, MOVEAtoB_XZ- 203 bytes- copy lenght DWORDS+MOVZX
39 milliseconds, MOVEAtoB_XX- 203 bytes- copy lenght DWORDS+MOVZX
42 milliseconds, MOVEAtoB_W -  53 bytes- copy null terminated
44 milliseconds, MOVEAtoB_YY-  53  bytes- copy lenght MOVZX
56 milliseconds, MOVEAtoB_Z - 103 bytes- copy length BYTES
56 milliseconds, MOVEAtoB_X - 103 bytes- copy lenght DWORDS
59 milliseconds, MOVEAtoB_Y - 103 bytes- copy lenght MOVZX
67 milliseconds, MOVEAtoB_YY- 103 bytes- copy lenght MOVZX
67 milliseconds, szCopy100MASM- 103 bytes- copy null terminated
68 milliseconds, MOVEAtoB_W - 103 bytes- copy null terminated
108 milliseconds, MOVEAtoB_Z - 203 bytes- copy length BYTES
108 milliseconds, MOVEAtoB_X - 203 bytes- copy lenght DWORDS
109 milliseconds, MOVEAtoB_Y - 203 bytes- copy lenght MOVZX
110 milliseconds, szCopy200MASM- 203 bytes- copy null terminated
119 milliseconds, MOVEAtoB_YY- 203 bytes- copy lenght MOVZX
120 milliseconds, MOVEAtoB_W - 203 bytes- copy null terminated
********** END 2 **********


Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 22, 2014, 07:11:46 PM
Gunther,
              thank you !  :t
              i added yours results in my reply 37.
              We have 3 different set of results.
              szCopy from MASM is only better in i7
              to copy null terminated strings but not
              for all cases.
Title: Re: Sorting strings
Post by: Gunther on June 22, 2014, 07:17:53 PM
Hi Rui,

Quote from: RuiLoureiro on June 22, 2014, 07:11:46 PM
              i added yours results in my reply 37.
              We have 3 different set of results.

Don't forget: We've different processors, too.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 22, 2014, 07:20:47 PM
 Yes see reply #37 we see i7 ...
Title: Re: Sorting strings
Post by: RuiLoureiro on June 22, 2014, 09:15:17 PM
Gunther,
          Thank you for your work !
          Now i am testing this new procedures
          Could you run this new to copy strings,
          please ?

          NOTE: when i copy the length, the string is null terminated also.
                     All lengths are in the strings.

results from Gunther (thanks  :t ):

Quote
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
***** Time table *****

  2 milliseconds, MOVEAtoB_XZ-   13 bytes- copy lenght DWORDS+MOVZX
  2 milliseconds, MOVEAtoB_XX-   13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_X -   13 bytes- copy lenght DWORDS
  6 milliseconds, MOVEAtoB_XZ-   53 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_Z -   13 bytes- copy length BYTES
  6 milliseconds, MOVEAtoB_Y -   13 bytes- copy lenght MOVZX
  6 milliseconds, MOVEAtoB_WZ-   13 bytes- copy null terminated
  9 milliseconds, szCopy10MASM - 13 bytes- copy null terminated
  9 milliseconds, MOVEAtoB_YY-   13 bytes- copy lenght MOVZX
  9 milliseconds, MOVEAtoB_W -   13 bytes- copy null terminated
10 milliseconds, MOVEAtoB_WW-   13 bytes- copy null terminated
10 milliseconds, MOVEAtoB_XX-   53 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZ103 bytes- copy lenght DWORDS+MOVZX
20 milliseconds, MOVEAtoB_XX-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_Y -   53 bytes- copy lenght MOVZX
30 milliseconds, MOVEAtoB_X -   53 bytes- copy lenght DWORDS
30 milliseconds, MOVEAtoB_Z -   53 bytes- copy length BYTES
32 milliseconds, MOVEAtoB_XZ203 bytes- copy lenght DWORDS+MOVZX
32 milliseconds, szCopy50MASM - 53 bytes- copy null terminated
32 milliseconds, MOVEAtoB_WZ-   53 bytes- copy null terminated
33 milliseconds, MOVEAtoB_WW-   53 bytes- copy null terminated
40 milliseconds, MOVEAtoB_XX-  203 bytes- copy lenght DWORDS+MOVZX
42 milliseconds, MOVEAtoB_W -   53 bytes- copy null terminated
43 milliseconds, MOVEAtoB_YY-   53 bytes- copy lenght MOVZX

56 milliseconds, MOVEAtoB_Z -  103 bytes- copy length BYTES
56 milliseconds, MOVEAtoB_X -  103 bytes- copy lenght DWORDS
58 milliseconds, MOVEAtoB_Y -  103 bytes- copy lenght MOVZX
58 milliseconds, MOVEAtoB_WZ103 bytes- copy null terminated
67 milliseconds, MOVEAtoB_YY-  103 bytes- copy lenght MOVZX
68 milliseconds, MOVEAtoB_WW-  103 bytes- copy null terminated
69 milliseconds, MOVEAtoB_W -  103 bytes- copy null terminated
69 milliseconds, szCopy100MASM-103 bytes- copy null terminated
71 milliseconds, MOVEAtoB_XZ503 bytes- copy lenght DWORDS+MOVZX
79 milliseconds, MOVEAtoB_XX-  503 bytes- copy lenght DWORDS+MOVZX
108 milliseconds, MOVEAtoB_Z -  203 bytes- copy length BYTES
108 milliseconds, MOVEAtoB_Y -  203 bytes- copy lenght MOVZX
109 milliseconds, MOVEAtoB_WZ203 bytes- copy null terminated
110 milliseconds, MOVEAtoB_X -  203 bytes- copy lenght DWORDS
111 milliseconds, szCopy200MASM-203 bytes- copy null terminated
111 milliseconds, MOVEAtoB_WW-  203 bytes- copy null terminated
120 milliseconds, MOVEAtoB_W -  203 bytes- copy null terminated
120 milliseconds, MOVEAtoB_YY-  203 bytes- copy lenght MOVZX

264 milliseconds, MOVEAtoB_X -  503 bytes- copy lenght DWORDS
265 milliseconds, MOVEAtoB_Z -  503 bytes- copy length BYTES
265 milliseconds, MOVEAtoB_WZ503 bytes- copy null terminated
266 milliseconds, MOVEAtoB_Y -  503 bytes- copy lenght MOVZX
275 milliseconds, MOVEAtoB_WW-  503 bytes- copy null terminated
277 milliseconds, MOVEAtoB_W -  503 bytes- copy null terminated
277 milliseconds, szCopy200MASM-503 bytes- copy null terminated
281 milliseconds, MOVEAtoB_YY-  503 bytes- copy lenght MOVZX
********** END II **********

These are my results:
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  5 milliseconds, MOVEAtoB_XZ-   13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_XX-   13 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_WZ-   13 bytes- copy null terminated
10 milliseconds, MOVEAtoB_WW-   13 bytes- copy null terminated
11 milliseconds, MOVEAtoB_W -   13 bytes- copy null terminated
11 milliseconds, MOVEAtoB_X -   13 bytes- copy lenght DWORDS
12 milliseconds, MOVEAtoB_Y -   13 bytes- copy lenght MOVZX
12 milliseconds, MOVEAtoB_Z -   13 bytes- copy length BYTES
13 milliseconds, szCopy10MASM - 13 bytes- copy null terminated
13 milliseconds, MOVEAtoB_YY-   13 bytes- copy lenght MOVZX

16 milliseconds, MOVEAtoB_XZ-   53 bytes- copy lenght DWORDS+MOVZX
26 milliseconds, MOVEAtoB_XX-   53 bytes- copy lenght DWORDS+MOVZX
29 milliseconds, MOVEAtoB_XZ103 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_Y -   53 bytes- copy lenght MOVZX
44 milliseconds, MOVEAtoB_XZ203 bytes- copy lenght DWORDS+MOVZX
46 milliseconds, MOVEAtoB_Z -   53 bytes- copy length BYTES
52 milliseconds, MOVEAtoB_WW-   53 bytes- copy null terminated
52 milliseconds, MOVEAtoB_WZ-   53 bytes- copy null terminated
53 milliseconds, MOVEAtoB_W -   53 bytes- copy null terminated
57 milliseconds, MOVEAtoB_XX-  103 bytes- copy lenght DWORDS+MOVZX
59 milliseconds, MOVEAtoB_YY-   53 bytes- copy lenght MOVZX
62 milliseconds, szCopy50MASM - 53 bytes- copy null terminated
67 milliseconds, MOVEAtoB_X -   53 bytes- copy lenght DWORDS

71 milliseconds, MOVEAtoB_Y -  103 bytes- copy lenght MOVZX
79 milliseconds, MOVEAtoB_Z -  103 bytes- copy length BYTES
89 milliseconds, MOVEAtoB_WW-  103 bytes- copy null terminated
89 milliseconds, MOVEAtoB_WZ-  103 bytes- copy null terminated
93 milliseconds, MOVEAtoB_W -  103 bytes- copy null terminated
96 milliseconds, MOVEAtoB_X -  103 bytes- copy lenght DWORDS
98 milliseconds, MOVEAtoB_XZ503 bytes- copy lenght DWORDS+MOVZX
105 milliseconds, MOVEAtoB_YY-  103 bytes- copy lenght MOVZX
108 milliseconds, szCopy100MASM-103 bytes- copy null terminated

131 milliseconds, MOVEAtoB_Y -  203 bytes- copy lenght MOVZX
139 milliseconds, MOVEAtoB_XX-  203 bytes- copy lenght DWORDS+MOVZX
146 milliseconds, MOVEAtoB_Z -  203 bytes- copy length BYTES
165 milliseconds, MOVEAtoB_WZ-  203 bytes- copy null terminated
167 milliseconds, MOVEAtoB_WW-  203 bytes- copy null terminated
169 milliseconds, MOVEAtoB_W -  203 bytes- copy null terminated
193 milliseconds, MOVEAtoB_YY-  203 bytes- copy lenght MOVZX
197 milliseconds, szCopy200MASM-203 bytes- copy null terminated
213 milliseconds, MOVEAtoB_X -  203 bytes- copy lenght DWORDS

313 milliseconds, MOVEAtoB_XX-  503 bytes- copy lenght DWORDS+MOVZX
318 milliseconds, MOVEAtoB_Y -  503 bytes- copy lenght MOVZX
355 milliseconds, MOVEAtoB_Z -  503 bytes- copy length BYTES
388 milliseconds, MOVEAtoB_WW-  503 bytes- copy null terminated
394 milliseconds, MOVEAtoB_WZ-  503 bytes- copy null terminated
407 milliseconds, MOVEAtoB_W -  503 bytes- copy null terminated
466 milliseconds, MOVEAtoB_YY-  503 bytes- copy lenght MOVZX
477 milliseconds, szCopy200MASM-503 bytes- copy null terminated
519 milliseconds, MOVEAtoB_X -  503 bytes- copy lenght DWORDS
********** END II **********

Quote
-------------------------------------------------------------
Intel(R) Atom(TM) CPU N455   @ 1.66GHz (SSE4)
-------------------------------------------------------------
***** Time table *****

25 milliseconds, MOVEAtoB_XX-    13 bytes- copy lenght DWORDS+MOVZX
26 milliseconds, MOVEAtoB_XZ-    13 bytes- copy lenght DWORDS+MOVZX
52 milliseconds, MOVEAtoB_WZ-    13 bytes- copy null terminated
52 milliseconds, MOVEAtoB_W -    13 bytes- copy null terminated
54 milliseconds, MOVEAtoB_WW-    13 bytes- copy null terminated
60 milliseconds, MOVEAtoB_Y -    13 bytes- copy lenght MOVZX
63 milliseconds, szCopy10MASM -  13 bytes- copy null terminated
65 milliseconds, MOVEAtoB_Z -    13 bytes- copy length BYTES
72 milliseconds, MOVEAtoB_XZ-    53 bytes- copy lenght DWORDS+MOVZX
75 milliseconds, MOVEAtoB_X -    13 bytes- copy lenght DWORDS
76 milliseconds, MOVEAtoB_XX-    53 bytes- copy lenght DWORDS+MOVZX
89 milliseconds, MOVEAtoB_YY-    13 bytes- copy lenght MOVZX

121 milliseconds, MOVEAtoB_XZ-   103 bytes- copy lenght DWORDS+MOVZX
134 milliseconds, MOVEAtoB_XX-   103 bytes- copy lenght DWORDS+MOVZX
174 milliseconds, MOVEAtoB_WW-    53 bytes- copy null terminated
174 milliseconds, MOVEAtoB_W -    53 bytes- copy null terminated
175 milliseconds, MOVEAtoB_WZ-    53 bytes- copy null terminated
205 milliseconds, MOVEAtoB_Y -    53 bytes- copy lenght MOVZX
206 milliseconds, MOVEAtoB_Z -    53 bytes- copy length BYTES
208 milliseconds, szCopy50MASM -  53 bytes- copy null terminated
211 milliseconds, MOVEAtoB_XZ-   203 bytes- copy lenght DWORDS+MOVZX
232 milliseconds, MOVEAtoB_X -    53 bytes- copy lenght DWORDS
268 milliseconds, MOVEAtoB_XX-   203 bytes- copy lenght DWORDS+MOVZX
298 milliseconds, MOVEAtoB_YY-    53 bytes- copy lenght MOVZX

331 milliseconds, MOVEAtoB_W -   103 bytes- copy null terminated
333 milliseconds, MOVEAtoB_WZ-   103 bytes- copy null terminated
336 milliseconds, MOVEAtoB_WW-   103 bytes- copy null terminated
385 milliseconds, MOVEAtoB_Z -   103 bytes- copy length BYTES
387 milliseconds, MOVEAtoB_Y -   103 bytes- copy lenght MOVZX
389 milliseconds, szCopy100MASM- 103 bytes- copy null terminated
411 milliseconds, MOVEAtoB_X -   103 bytes- copy lenght DWORDS
483 milliseconds, MOVEAtoB_XZ-   503 bytes- copy lenght DWORDS+MOVZX
551 milliseconds, MOVEAtoB_YY-   103 bytes- copy lenght MOVZX
600 milliseconds, MOVEAtoB_XX-   503 bytes- copy lenght DWORDS+MOVZX

627 milliseconds, MOVEAtoB_W -   203 bytes- copy null terminated
628 milliseconds, MOVEAtoB_WW-   203 bytes- copy null terminated
640 milliseconds, MOVEAtoB_WZ-   203 bytes- copy null terminated
749 milliseconds, MOVEAtoB_Z -   203 bytes- copy length BYTES
751 milliseconds, MOVEAtoB_Y -   203 bytes- copy lenght MOVZX
753 milliseconds, szCopy200MASM- 203 bytes- copy null terminated
862 milliseconds, MOVEAtoB_X -   203 bytes- copy lenght DWORDS
926 milliseconds, MOVEAtoB_YY-   203 bytes- copy lenght MOVZX
---***---
1536 milliseconds, MOVEAtoB_WW503 bytes- copy null terminated
1538 milliseconds, MOVEAtoB_WZ-  503 bytes- copy null terminated
1834 milliseconds, MOVEAtoB_Y -  503 bytes- copy lenght MOVZX
1838 milliseconds, szCopy200MASM-503 bytes- copy null terminated
1842 milliseconds, MOVEAtoB_Z -  503 bytes- copy length BYTES
1939 milliseconds, MOVEAtoB_W -  503 bytes- copy null terminated
2026 milliseconds, MOVEAtoB_X -  503 bytes- copy lenght DWORDS
2147 milliseconds, MOVEAtoB_YY-  503 bytes- copy lenght MOVZX
********** END II **********
Title: Re: Sorting strings
Post by: Gunther on June 22, 2014, 11:03:21 PM
Hi Rui,


This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_X10= This strin$$$
MOVEAtoB_Y10= This strin$$$
MOVEAtoB_Z10= This strin$$$
MOVEAtoB_W10= This strin$$$

MOVEAtoB_X50= This string here is one that we want to copy to **$$$
MOVEAtoB_Y50= This string here is one that we want to copy to **$$$
MOVEAtoB_Z50= This string here is one that we want to copy to **$$$
MOVEAtoB_W50= This string here is one that we want to copy to **$$$

MOVEAtoB_X100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Y100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Z100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_W100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_X200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Y200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Z200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_W200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_X500= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Y500= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_Z500= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_W500= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_YY10= This strin$$$
MOVEAtoB_YY50= This string here is one that we want to copy to **$$$
MOVEAtoB_YY100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_YY200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_YY500= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_XX10= This strin$$$
MOVEAtoB_XX50= This string here is one that we want to copy to **$$$
MOVEAtoB_XX100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_XX200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_XX500= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_XZ10= This strin$$$
MOVEAtoB_XZ50= This string here is one that we want to copy to **$$$
MOVEAtoB_XZ100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_XZ200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_XZ500= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

szCopy10MASM= This strin$$$
szCopy50MASM= This string here is one that we want to copy to **$$$
szCopy100MASM= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
szCopy200MASM= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
szCopy500MASM= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_WW10= This strin$$$
MOVEAtoB_WW50= This string here is one that we want to copy to **$$$
MOVEAtoB_WW100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_WW200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_WW500= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

MOVEAtoB_WZ10= This strin$$$
MOVEAtoB_WZ50= This string here is one that we want to copy to **$$$
MOVEAtoB_WZ100= This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_WZ200= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$
MOVEAtoB_WZ500= This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**This string here is one that we want to copy to ***This string here is one that we want to copy to**$$$

********** END I **********
5 milliseconds, MOVEAtoB_X, 13 bytes - copy lenght DWORDS
6 milliseconds, MOVEAtoB_Y, 13 bytes - copy lenght MOVZX
6 milliseconds, MOVEAtoB_Z, 13 bytes - copy length BYTES
9 milliseconds, MOVEAtoB_W, 13 bytes - copy null terminated
30 milliseconds, MOVEAtoB_X, 53 bytes - copy lenght DWORDS
30 milliseconds, MOVEAtoB_Y, 53 bytes - copy lenght MOVZX
30 milliseconds, MOVEAtoB_Z, 53 bytes - copy length BYTES
42 milliseconds, MOVEAtoB_W, 53 bytes - copy null terminated
56 milliseconds, MOVEAtoB_X, 103 bytes - copy lenght DWORDS
58 milliseconds, MOVEAtoB_Y, 103 bytes - copy lenght MOVZX
56 milliseconds, MOVEAtoB_Z, 103 bytes - copy length BYTES
69 milliseconds, MOVEAtoB_W, 103 bytes - copy null terminated
110 milliseconds, MOVEAtoB_X, 203 bytes - copy lenght DWORDS
108 milliseconds, MOVEAtoB_Y, 203 bytes - copy lenght MOVZX
108 milliseconds, MOVEAtoB_Z, 203 bytes - copy length BYTES
120 milliseconds, MOVEAtoB_W, 203 bytes - copy null terminated
264 milliseconds, MOVEAtoB_X, 503 bytes - copy lenght DWORDS
266 milliseconds, MOVEAtoB_Y, 503 bytes - copy lenght MOVZX
265 milliseconds, MOVEAtoB_Z, 503 bytes - copy length BYTES
277 milliseconds, MOVEAtoB_W, 503 bytes - copy null terminated
9 milliseconds, MOVEAtoB_YY, 13 bytes - copy lenght MOVZX
43 milliseconds, MOVEAtoB_YY, 53 bytes - copy lenght MOVZX
67 milliseconds, MOVEAtoB_YY, 103 bytes - copy lenght MOVZX
120 milliseconds, MOVEAtoB_YY, 203 bytes - copy lenght MOVZX
281 milliseconds, MOVEAtoB_YY, 503 bytes - copy lenght MOVZX
2 milliseconds, MOVEAtoB_XX, 13 bytes - copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XX, 53 bytes - copy lenght DWORDS+MOVZX
20 milliseconds, MOVEAtoB_XX, 103 bytes - copy lenght DWORDS+MOVZX
40 milliseconds, MOVEAtoB_XX, 203 bytes - copy lenght DWORDS+MOVZX
79 milliseconds, MOVEAtoB_XX, 503 bytes - copy lenght DWORDS+MOVZX
2 milliseconds, MOVEAtoB_XZ, 13 bytes - copy lenght DWORDS+MOVZX
6 milliseconds, MOVEAtoB_XZ, 53 bytes - copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZ, 103 bytes - copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_XZ, 203 bytes - copy lenght DWORDS+MOVZX
71 milliseconds, MOVEAtoB_XZ, 503 bytes - copy lenght DWORDS+MOVZX
9 milliseconds, szCopy10MASM, 13 bytes - copy lenght DWORDS+MOVZX
32 milliseconds, szCopy50MASM, 53 bytes - copy lenght DWORDS+MOVZX
69 milliseconds, szCopy100MASM, 103 bytes - copy lenght DWORDS+MOVZX
111 milliseconds, szCopy200MASM, 203 bytes - copy lenght DWORDS+MOVZX
277 milliseconds, szCopy500MASM, 503 bytes - copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_WW, 13 bytes - copy lenght DWORDS+MOVZX
33 milliseconds, MOVEAtoB_WW, 53 bytes - copy lenght DWORDS+MOVZX
68 milliseconds, MOVEAtoB_WW, 103 bytes - copy lenght DWORDS+MOVZX
111 milliseconds, MOVEAtoB_WW, 203 bytes - copy lenght DWORDS+MOVZX
275 milliseconds, MOVEAtoB_WW, 503 bytes - copy lenght DWORDS+MOVZX
6 milliseconds, MOVEAtoB_WZ, 13 bytes - copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_WZ, 53 bytes - copy lenght DWORDS+MOVZX
58 milliseconds, MOVEAtoB_WZ, 103 bytes - copy lenght DWORDS+MOVZX
109 milliseconds, MOVEAtoB_WZ, 203 bytes - copy lenght DWORDS+MOVZX
265 milliseconds, MOVEAtoB_WZ, 503 bytes - copy lenght DWORDS+MOVZX
*** Press any key to get the time table ***

------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------
***** Time table *****

2 milliseconds, MOVEAtoB_XZ-  13 bytes- copy lenght DWORDS+MOVZX
2 milliseconds, MOVEAtoB_XX-  13  bytes- copy lenght DWORDS+MOVZX
5 milliseconds, MOVEAtoB_X -  13 bytes- copy lenght DWORDS
6 milliseconds, MOVEAtoB_XZ-  53 bytes- copy lenght DWORDS+MOVZX
6 milliseconds, MOVEAtoB_Z -  13 bytes- copy length BYTES
6 milliseconds, MOVEAtoB_Y -  13 bytes- copy lenght MOVZX
6 milliseconds, MOVEAtoB_WZ-  13 bytes- copy null terminated
9 milliseconds, szCopy10MASM -  13 bytes- copy null terminated
9 milliseconds, MOVEAtoB_YY-  13  bytes- copy lenght MOVZX
9 milliseconds, MOVEAtoB_W -  13 bytes- copy null terminated
10 milliseconds, MOVEAtoB_WW-  13 bytes- copy null terminated
10 milliseconds, MOVEAtoB_XX-  53  bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZ- 103 bytes- copy lenght DWORDS+MOVZX
20 milliseconds, MOVEAtoB_XX- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_Y -  53 bytes- copy lenght MOVZX
30 milliseconds, MOVEAtoB_X -  53 bytes- copy lenght DWORDS
30 milliseconds, MOVEAtoB_Z -  53 bytes- copy length BYTES
32 milliseconds, MOVEAtoB_XZ- 203 bytes- copy lenght DWORDS+MOVZX
32 milliseconds, szCopy50MASM -  53 bytes- copy null terminated
32 milliseconds, MOVEAtoB_WZ-  53 bytes- copy null terminated
33 milliseconds, MOVEAtoB_WW-  53 bytes- copy null terminated
40 milliseconds, MOVEAtoB_XX- 203 bytes- copy lenght DWORDS+MOVZX
42 milliseconds, MOVEAtoB_W -  53 bytes- copy null terminated
43 milliseconds, MOVEAtoB_YY-  53  bytes- copy lenght MOVZX
56 milliseconds, MOVEAtoB_Z - 103 bytes- copy length BYTES
56 milliseconds, MOVEAtoB_X - 103 bytes- copy lenght DWORDS
58 milliseconds, MOVEAtoB_Y - 103 bytes- copy lenght MOVZX
58 milliseconds, MOVEAtoB_WZ- 103 bytes- copy null terminated
67 milliseconds, MOVEAtoB_YY- 103 bytes- copy lenght MOVZX
68 milliseconds, MOVEAtoB_WW- 103 bytes- copy null terminated
69 milliseconds, MOVEAtoB_W - 103 bytes- copy null terminated
69 milliseconds, szCopy100MASM- 103 bytes- copy null terminated
71 milliseconds, MOVEAtoB_XZ- 503 bytes- copy lenght DWORDS+MOVZX
79 milliseconds, MOVEAtoB_XX- 503 bytes- copy lenght DWORDS+MOVZX
108 milliseconds, MOVEAtoB_Z - 203 bytes- copy length BYTES
108 milliseconds, MOVEAtoB_Y - 203 bytes- copy lenght MOVZX
109 milliseconds, MOVEAtoB_WZ- 203 bytes- copy null terminated
110 milliseconds, MOVEAtoB_X - 203 bytes- copy lenght DWORDS
111 milliseconds, szCopy200MASM- 203 bytes- copy null terminated
111 milliseconds, MOVEAtoB_WW- 203 bytes- copy null terminated
120 milliseconds, MOVEAtoB_W - 203 bytes- copy null terminated
120 milliseconds, MOVEAtoB_YY- 203 bytes- copy lenght MOVZX
264 milliseconds, MOVEAtoB_X - 503 bytes- copy lenght DWORDS
265 milliseconds, MOVEAtoB_Z - 503 bytes- copy length BYTES
265 milliseconds, MOVEAtoB_WZ- 503 bytes- copy null terminated
266 milliseconds, MOVEAtoB_Y - 503 bytes- copy lenght MOVZX
275 milliseconds, MOVEAtoB_WW- 503 bytes- copy null terminated
277 milliseconds, MOVEAtoB_W - 503 bytes- copy null terminated
277 milliseconds, szCopy200MASM- 503 bytes- copy null terminated
281 milliseconds, MOVEAtoB_YY- 503 bytes- copy lenght MOVZX
********** END II **********


Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 22, 2014, 11:40:42 PM
 :biggrin: 
Hi Gunther,             
                     thank you !  :t                
                     i added yours results in my reply #42 (see it).
                     impressive!

                    Your i7 is 4 times more powerful (+/-)!
Title: Re: Sorting strings
Post by: RuiLoureiro on June 26, 2014, 06:03:28 AM
Hi
    Now, i am testing this new 2 procedures
    (versions A...E)
    Could you post your results, please ?
    Thanks

These are my results.
       
Quote
ALIGNED
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

12 milliseconds, MOVEAtoB_XZZA-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZE53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX

29 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
29 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
29 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
29 milliseconds, MOVEAtoB_XZB-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZA-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZD-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX

44 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX
44 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
44 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
46 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
46 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
47 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX
47 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX

97 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX
97 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
97 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
98 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
98 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX
98 milliseconds, MOVEAtoB_XZD-  503 bytes- copy lenght DWORDS+MOVZX
98 milliseconds, MOVEAtoB_XZC-  503 bytes- copy lenght DWORDS+MOVZX
98 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX
98 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZA-  503 bytes- copy lenght DWORDS+MOVZX

179 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
179 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
179 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
179 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
180 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
180 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
181 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
184 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
184 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
186 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX
203 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********

Quote
NOT ALIGNED
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

15 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
15 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
15 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZA-  53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
25 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX

29 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
29 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
29 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZA-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZD-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZB-  103 bytes- copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX

48 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX
49 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
49 milliseconds, MOVEAtoB_XZE203 bytes- copy lenght DWORDS+MOVZX
49 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
49 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
50 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
50 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
52 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX

99 milliseconds, MOVEAtoB_XZD-  503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZA-  503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZC-  503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX
102 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX

185 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
186 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
186 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
186 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
187 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
189 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX
190 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
191 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
191 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
192 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
193 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********
Title: Re: Sorting strings
Post by: nidud on June 26, 2014, 09:14:28 PM
deleted
Title: Re: Sorting strings
Post by: Gunther on June 27, 2014, 12:27:06 AM
Hi Rui,

the attached archive contains the results.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 27, 2014, 05:52:25 AM
 :biggrin:
Hi
    Thank you ninud
    and
    thank you also Gunther  :t

---------------
contradictions
---------------
When we read each procedure, we have no doubts,
the best are: XZE, XZZE or XZZF
(they are perfect solutions -for all cases).

But when we read the results,
the best seems to be XZZA, XZZB or XZZC!

one conclusion: XZZ seems to be better than XZ
but i need to study this results with more time
(but not from my results).

XZ  uses:    mov  eax, [esi+edx*4]
XZZ uses:   mov  eax, [esi+edx]

------------------------
Results from ninud
------------------------
Quote
ALIGNED
------------------------------------------------------
AMD Athlon(tm) II X2 245 Processor (SSE3)
------------------------------------------------------
***** Time table *****

***>> CASES 1,2,3,4 AND 13 bytes WERE REMOVED <<<<****

13 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
13 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZA-  53 bytes- copy lenght DWORDS+MOVZX  <<<--
14 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
18 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
18 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
18 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX

23 milliseconds, MOVEAtoB_XZA-  103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX  <<<--
23 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
24 milliseconds, MOVEAtoB_XZB-  103 bytes- copy lenght DWORDS+MOVZX
24 milliseconds, MOVEAtoB_XZD-  103 bytes- copy lenght DWORDS+MOVZX
25 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
26 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX

41 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX  <<<--
56 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
58 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
58 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX

92 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZC-  503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX  <<<--
92 milliseconds, MOVEAtoB_XZD-  503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZA-  503 bytes- copy lenght DWORDS+MOVZX
93 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
94 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
95 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX

182 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX <<<-
182 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
182 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
182 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
183 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
269 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
269 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
269 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
270 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
270 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
270 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********

Quote
NOT ALIGNED
-------------------------------------------------------
AMD Athlon(tm) II X2 245 Processor (SSE3)
-------------------------------------------------------
***** Time table *****

***>> CASES 1,2,3,4 AND 13 bytes WERE REMOVED <<<<****
 
14 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZA53 bytes- copy lenght DWORDS+MOVZX   <<<-
14 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
18 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
18 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX

23 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZA-  103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZD-  103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZB-  103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
23 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX <<<--
25 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
25 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX

41 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX <<<--
56 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX

92 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZC-  503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZA-  503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
92 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
93 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX
94 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX  <<<-
94 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZD-  503 bytes- copy lenght DWORDS+MOVZX
105 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX

182 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX<<<-
182 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
182 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
182 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
183 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
269 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
269 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
270 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
270 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
270 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
270 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********

-----------------------------
Results from Gunther:
-----------------------------
Quote
ALIGNED
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
***** Time table *****

***>> CASES 1,2,3,4 AND 13 bytes WERE REMOVED <<<<****
 
  7 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
  7 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
  7 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
  7 milliseconds, MOVEAtoB_XZZA-  53 bytes- copy lenght DWORDS+MOVZX    <<<--
  7 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX

14 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
          <<<- all equal--

31 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX <<<--
32 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
40 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
40 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
40 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX

70 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
                 <<<- all equal--
71 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX

137 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
138 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX <<<-
138 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
138 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
138 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
146 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
146 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
146 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
147 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
147 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
147 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********

Quote
NOT ALIGNED
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
-------------------------------------------------------------
***** Time table *****
   
  7 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
  7 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
  7 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
  7 milliseconds, MOVEAtoB_XZZA53 bytes- copy lenght DWORDS+MOVZX  <<<---
  7 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX

14 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
    <<<- all equal--

31 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX <<<--
32 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
40 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
40 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
40 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
41 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX

70 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
          <<<- all equal--

137 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
138 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX<<<--
138 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
138 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
138 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
146 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
146 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
146 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
147 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
147 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
147 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********
Title: Re: Sorting strings
Post by: FORTRANS on June 27, 2014, 09:58:04 PM
Hi,

   Do you want results from older processors?

HTH,

Steve N.
Title: Re: Sorting strings
Post by: RuiLoureiro on June 27, 2014, 11:53:20 PM
Thanks you Steve,  :t

The same conclusion as before:
XZZA, XZZB, XZZC are better !!!

---------------------------------
Results from   FORTRANS
---------------------------------
Quote
NOT ALIGNED
------------------------------------------
???? (SSE1)
------------------------------------------
***** Time table *****

***>> CASES 1,2,3,4 AND 13 bytes WERE REMOVED <<<<****
 
  91 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
  91 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
  92 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
  92 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
  92 milliseconds, MOVEAtoB_XZZA53 bytes- copy lenght DWORDS+MOVZX
  93 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
  93 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
  99 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX
101 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
101 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX

108 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
108 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
108 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX
116 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX
126 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
126 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
127 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
136 milliseconds, MOVEAtoB_XZD-  103 bytes- copy lenght DWORDS+MOVZX
136 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
136 milliseconds, MOVEAtoB_XZA-  103 bytes- copy lenght DWORDS+MOVZX
139 milliseconds, MOVEAtoB_XZB-  103 bytes- copy lenght DWORDS+MOVZX

171 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
171 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX
172 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
179 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
219 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
222 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
222 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
230 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
230 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
231 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX
233 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX

361 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
361 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
362 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX
362 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX
504 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
504 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
505 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
514 milliseconds, MOVEAtoB_XZA-  503 bytes- copy lenght DWORDS+MOVZX
514 milliseconds, MOVEAtoB_XZD-  503 bytes- copy lenght DWORDS+MOVZX
514 milliseconds, MOVEAtoB_XZC-  503 bytes- copy lenght DWORDS+MOVZX
523 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX

690 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
690 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
691 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX
699 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
998 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
998 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
999 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
1010 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
1010 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
1010 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
1012 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********
Title: Re: Sorting strings
Post by: guga on June 28, 2014, 01:12:37 AM
Rui

here are mine results
Title: Re: Sorting strings
Post by: RuiLoureiro on June 28, 2014, 02:29:50 AM
Olá guga!
                   obrigado ! :t

The same conclusion:
XZZA, XZZB, XZZC are better as before !
XZZC seems to be better.

But when we read each procedure, we have no doubts,
the best are: XZE, XZZE or XZZF
(note: all these procedures were written by me)

------------------------
Results from  guga
------------------------

Quote
NOT ALIGNED

---------------------------------------------------------------
Intel(R) Core(TM) i7 CPU  870  @ 2.93GHz (SSE4)
---------------------------------------------------------------
***** Time table *****

***>> CASES 1,2,3,4 AND 13 bytes WERE REMOVED <<<<****

  8 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
  8 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
  9 milliseconds, MOVEAtoB_XZZA53 bytes- copy lenght DWORDS+MOVZX
  9 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
  9 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
10 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
11 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
11 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
11 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX

16 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZB-  103 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZA-  103 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZD-  103 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX

30 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX
43 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
43 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX
45 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
47 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX

77 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
77 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX
78 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX
78 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
78 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
78 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
79 milliseconds, MOVEAtoB_XZC-  503 bytes- copy lenght DWORDS+MOVZX
79 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
79 milliseconds, MOVEAtoB_XZD-  503 bytes- copy lenght DWORDS+MOVZX
79 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX
80 milliseconds, MOVEAtoB_XZA-  503 bytes- copy lenght DWORDS+MOVZX

152 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
152 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX
153 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
153 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
153 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
154 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
154 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
154 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
155 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
155 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
156 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********
Title: Re: Sorting strings
Post by: Gunther on June 28, 2014, 03:37:07 AM
Hi Rui,

it seems that you've found the right way.  :t

Gunther
Title: Re: Sorting strings
Post by: guga on June 28, 2014, 05:01:14 AM
Oi rui.

de nada !  :t

Se precisar de mais testes, me avise, ok ?

Abs
Title: Re: Sorting strings
Post by: RuiLoureiro on June 28, 2014, 06:18:58 AM
Gunther,
              It seems !

oi guga!
               ok !
--------------------------------------------------------
:biggrin:
Hi
          I replaced 2 intructions by only one
          in some procedures.
          Could you run it now again ?

Quote
NOT ALIGNED
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****


  5 milliseconds, MOVEAtoB_XZZF-  13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_XZZE-  13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_XZE-   13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZZC-  13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZZB-  13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZZA-  13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZZD-  13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZD-   13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZC-   13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZB-   13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZA-   13 bytes- copy lenght DWORDS+MOVZX
 
16 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZZA-  53 bytes- copy lenght DWORDS+MOVZX

29 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
29 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZA-  103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZD-  103 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZB-  103 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX

49 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
50 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
50 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
50 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
50 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
50 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX
52 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX

99 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZC-  503 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZA-  503 bytes- copy lenght DWORDS+MOVZX
102 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
108 milliseconds, MOVEAtoB_XZD-  503 bytes- copy lenght DWORDS+MOVZX
116 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX

187 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
187 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
188 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
189 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
189 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
190 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
191 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
192 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
192 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX
193 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
197 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********
Title: Re: Sorting strings
Post by: guga on June 28, 2014, 07:20:17 AM
Better  :t

Btw....The routine you are testing works for aligned and unaligned strings ?
Title: Re: Sorting strings
Post by: RuiLoureiro on June 28, 2014, 07:42:16 AM
Quote from: guga on June 28, 2014, 07:20:17 AM
Better  :t

Btw....The routine you are testing works for aligned and unaligned strings ?
Thank you guga  :t
              For both.
Title: Re: Sorting strings
Post by: jj2007 on June 28, 2014, 08:56:58 AM
Here it is, Rui :icon14:

------------------------------------------
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
------------------------------------------
***** Time table *****

8 milliseconds, MOVEAtoB_XZZF-  13 bytes- copy lenght DWORDS+MOVZX
8 milliseconds, MOVEAtoB_XZE-  13 bytes- copy lenght DWORDS+MOVZX
9 milliseconds, MOVEAtoB_XZZE-  13 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZC-  13 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZB-  13 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZD-  13 bytes- copy lenght DWORDS+MOVZX
15 milliseconds, MOVEAtoB_XZZA-  13 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZD-  13 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZC-  13 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZB-  13 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZA-  13 bytes- copy lenght DWORDS+MOVZX
26 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
27 milliseconds, MOVEAtoB_XZE-  53 bytes- copy lenght DWORDS+MOVZX
27 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_XZB-  53 bytes- copy lenght DWORDS+MOVZX
33 milliseconds, MOVEAtoB_XZZA-  53 bytes- copy lenght DWORDS+MOVZX
33 milliseconds, MOVEAtoB_XZC-  53 bytes- copy lenght DWORDS+MOVZX
33 milliseconds, MOVEAtoB_XZD-  53 bytes- copy lenght DWORDS+MOVZX
34 milliseconds, MOVEAtoB_XZA-  53 bytes- copy lenght DWORDS+MOVZX
43 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
43 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
44 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX
46 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
53 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
53 milliseconds, MOVEAtoB_XZE- 103 bytes- copy lenght DWORDS+MOVZX
56 milliseconds, MOVEAtoB_XZA- 103 bytes- copy lenght DWORDS+MOVZX
56 milliseconds, MOVEAtoB_XZD- 103 bytes- copy lenght DWORDS+MOVZX
56 milliseconds, MOVEAtoB_XZC- 103 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZB- 103 bytes- copy lenght DWORDS+MOVZX
74 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
75 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
75 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
79 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX
98 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZE- 203 bytes- copy lenght DWORDS+MOVZX
101 milliseconds, MOVEAtoB_XZC- 203 bytes- copy lenght DWORDS+MOVZX
101 milliseconds, MOVEAtoB_XZD- 203 bytes- copy lenght DWORDS+MOVZX
103 milliseconds, MOVEAtoB_XZA- 203 bytes- copy lenght DWORDS+MOVZX
103 milliseconds, MOVEAtoB_XZB- 203 bytes- copy lenght DWORDS+MOVZX
174 milliseconds, MOVEAtoB_XZE- 503 bytes- copy lenght DWORDS+MOVZX
174 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
174 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
176 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
176 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
177 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX
177 milliseconds, MOVEAtoB_XZA- 503 bytes- copy lenght DWORDS+MOVZX
178 milliseconds, MOVEAtoB_XZD- 503 bytes- copy lenght DWORDS+MOVZX
179 milliseconds, MOVEAtoB_XZC- 503 bytes- copy lenght DWORDS+MOVZX
179 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX
181 milliseconds, MOVEAtoB_XZB- 503 bytes- copy lenght DWORDS+MOVZX
338 milliseconds, MOVEAtoB_XZE-1027 bytes- copy lenght DWORDS+MOVZX
338 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
339 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
340 milliseconds, MOVEAtoB_XZA-1027 bytes- copy lenght DWORDS+MOVZX
341 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX
341 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
341 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
344 milliseconds, MOVEAtoB_XZD-1027 bytes- copy lenght DWORDS+MOVZX
344 milliseconds, MOVEAtoB_XZC-1027 bytes- copy lenght DWORDS+MOVZX
345 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
346 milliseconds, MOVEAtoB_XZB-1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********
Title: Re: Sorting strings
Post by: RuiLoureiro on June 28, 2014, 07:53:19 PM
Hi Jochen,
           Thank you for your work !
            :icon14:
-----------------------------------
contradictions are SOLVED ?
-----------------------------------
As i said:
When we read each procedure, we have no doubts,
the best are: XZE, XZZE or XZZF
(they are perfect solutions- for all cases).

note I: XZ  uses:   mov  eax, [esi+edx*4]
           XZZ uses:   mov  eax, [esi+edx]
           ZE   uses:   shr, shl
           ZF   uses:   and

note II: these procedures doesn't use cmp or test instructions
             but only operations and negative counters.

--------------------------
Results from Jochen
--------------------------
Quote
---------------------------------------------------------------
Intel(R) Celeron(R) M CPU 420  @ 1.60GHz (SSE3)
---------------------------------------------------------------
***** Time table *****

  8 milliseconds, MOVEAtoB_XZZF-  13 bytes- copy lenght DWORDS+MOVZX
  8 milliseconds, MOVEAtoB_XZE-   13 bytes- copy lenght DWORDS+MOVZX
  9 milliseconds, MOVEAtoB_XZZE13 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZC-  13 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZB-  13 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZD-  13 bytes- copy lenght DWORDS+MOVZX
15 milliseconds, MOVEAtoB_XZZA-  13 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZD-   13 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZC-   13 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZB-   13 bytes- copy lenght DWORDS+MOVZX
16 milliseconds, MOVEAtoB_XZA-   13 bytes- copy lenght DWORDS+MOVZX

26 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
27 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
27 milliseconds, MOVEAtoB_XZZE53 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
33 milliseconds, MOVEAtoB_XZZA-  53 bytes- copy lenght DWORDS+MOVZX
33 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
33 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX
34 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX

43 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
43 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
44 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX
46 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX
51 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
53 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
53 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
56 milliseconds, MOVEAtoB_XZA-  103 bytes- copy lenght DWORDS+MOVZX
56 milliseconds, MOVEAtoB_XZD-  103 bytes- copy lenght DWORDS+MOVZX
56 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZB-  103 bytes- copy lenght DWORDS+MOVZX

74 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
75 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
75 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
79 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX
98 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
100 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
101 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX
101 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
103 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
103 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX

174 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
174 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
174 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
176 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
176 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
177 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX
177 milliseconds, MOVEAtoB_XZA-  503 bytes- copy lenght DWORDS+MOVZX
178 milliseconds, MOVEAtoB_XZD-  503 bytes- copy lenght DWORDS+MOVZX
179 milliseconds, MOVEAtoB_XZC-  503 bytes- copy lenght DWORDS+MOVZX
179 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX
181 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX

338 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
338 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
339 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
340 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
341 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX
341 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
341 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
344 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX
344 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
345 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
346 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********
Title: Re: Sorting strings
Post by: guga on June 28, 2014, 10:09:27 PM
Rui, your  algo demands that the string have fixed lenght, right ?

I mean, you need to predefine the lenght. Ex:

[MyString: D$ 4
                 B$ "Guga", 0]


Can you do it without the predefinition, and making it works according oinly to the null termination "0" ?

Or it was designed only to work with predefined strings lens ?
Title: Re: Sorting strings
Post by: RuiLoureiro on June 28, 2014, 10:43:19 PM
guga, see my reply #42 where we used null terminated and we compare szCopy with _WZ. You are right my strings have a predefined length and they are far better than null terminated. See the results. I predefine the length in this way
---------------------------------------------------
.data
           dd 4              <--- here the length
sguga db "guga",0  -----> it is also null terminated because the system needs it.
---------------------------------------------------
In the next set of tests we may see the results when we want to add one to another. Se quiseres o _WZ eu envio-te.  :t
Title: Re: Sorting strings
Post by: guga on June 28, 2014, 11:13:03 PM
I´ll read it.

But...if it uses a fixed len, there is no need to also search for the null termination ;) It would be a bit faster if you avoided the check for the zero ending. If it needs only 4 bytes ("Guga") checking the ending 0 (5th byte) is unnecessary since the len (4) is already defined.

Eu gostaria de ver o _WZ sim. Parece interessante para determinados tipos de dados (e não apenas strings). Digo, se vc tirar a checagem pela terminação em zero (ele ficará um pouco mais rápido e tb vai estar em conformidade com a funcionalidade do algoritmo em si), vc pode usar o algoritmo para copiar toda uma gama de dados e não apenas "strings". Principalmente, pq é um meio alternativo (e mais rapido) de se copiar a memória, por exemplo, como um substituto para funções como memcpy.
Title: Re: Sorting strings
Post by: RuiLoureiro on June 28, 2014, 11:26:46 PM
guga, claro que eu não testo o zero, só serve para o OS se  e quando for preciso.
Eu já te envio.
note: these procedures doesn't use cmp or test or
         similar instructions but only operations and negative counters.
         (predefined length)
Title: Re: Sorting strings
Post by: RuiLoureiro on June 29, 2014, 07:37:01 AM
one special word to Gunther: thanks  :t
Title: Re: Sorting strings
Post by: Gunther on June 29, 2014, 11:34:59 PM
Quote from: RuiLoureiro on June 29, 2014, 07:37:01 AM
one special word to Gunther: thanks  :t

You're welcome.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 30, 2014, 01:33:10 AM
Hi
------------------------------
Adds a string to another
------------------------------

    . ADDAtoB_X is the reference procedure: add only BYTES;
    . all other add DWORDS+BYTES;
    . all follow the same procedures as to copy strings.
    . In these procedures, it is not possible to add
      201 bytes if the maximum is defined for 200.
    . If the string contains 199 bytes and we try to
      add 1000 we add only 1 byte and exit.
    . ADDAtoB_X stringA, stringB    <<<- add A to B

    . In these examples, we add 0,1,2,3,4,15,55,103,203,503,1027
      to one that contains 11 bytes and the maximum is 1100.
   
    Could you post your results, please ?
    Thank you  :t
   
Here are my results
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****


   2 milliseconds, ADDAtoB_X-       0 bytes
   3 milliseconds, ADDAtoB_XZZEE-   0 bytes
   3 milliseconds, ADDAtoB_XZZE-    0 bytes
   3 milliseconds, ADDAtoB_XZE-     0 bytes
   3 milliseconds, ADDAtoB_X-       2 bytes
   3 milliseconds, ADDAtoB_X-       1 bytes
   3 milliseconds, ADDAtoB_XZZF-    0 bytes
   4 milliseconds, ADDAtoB_XZZF-    2 bytes
   4 milliseconds, ADDAtoB_XZZE-    4 bytes
   4 milliseconds, ADDAtoB_XZZE-    2 bytes
   4 milliseconds, ADDAtoB_XZZE-    1 bytes
   4 milliseconds, ADDAtoB_XZZF-    1 bytes
   4 milliseconds, ADDAtoB_XZE-     4 bytes
   4 milliseconds, ADDAtoB_XZE-     1 bytes
   4 milliseconds, ADDAtoB_XZZF-    4 bytes
   4 milliseconds, ADDAtoB_X-       4 bytes
   4 milliseconds, ADDAtoB_XZZEE-   4 bytes
   4 milliseconds, ADDAtoB_XZZEE-   2 bytes
   4 milliseconds, ADDAtoB_XZZEE-   1 bytes
   5 milliseconds, ADDAtoB_XZZE-    3 bytes
   5 milliseconds, ADDAtoB_XZZF-    3 bytes
   5 milliseconds, ADDAtoB_XZE-     3 bytes
   5 milliseconds, ADDAtoB_XZE-     2 bytes
   5 milliseconds, ADDAtoB_XZZEE-   3 bytes
   
   7 milliseconds, ADDAtoB_XZZEE15 bytes
   7 milliseconds, ADDAtoB_XZZF-   15 bytes
   7 milliseconds, ADDAtoB_XZE-    15 bytes
   7 milliseconds, ADDAtoB_XZZE-   15 bytes   
  11 milliseconds, ADDAtoB_X-      15 bytes
  13 milliseconds, ADDAtoB_X-       3 bytes
 
  36 milliseconds, ADDAtoB_XZZF-   55 bytes
  36 milliseconds, ADDAtoB_XZZEE-  55 bytes
  37 milliseconds, ADDAtoB_XZZE-   55 bytes
  37 milliseconds, ADDAtoB_XZE-    55 bytes
 
  57 milliseconds, ADDAtoB_XZZE-  103 bytes
  58 milliseconds, ADDAtoB_XZZF103 bytes
  58 milliseconds, ADDAtoB_XZZEE- 103 bytes
  59 milliseconds, ADDAtoB_XZE-   103 bytes
  70 milliseconds, ADDAtoB_X-      55 bytes
123 milliseconds, ADDAtoB_X-     103 bytes

127 milliseconds, ADDAtoB_XZZE-  203 bytes
127 milliseconds, ADDAtoB_XZZF203 bytes
127 milliseconds, ADDAtoB_XZZEE- 203 bytes
127 milliseconds, ADDAtoB_XZE-   203 bytes

128 milliseconds, ADDAtoB_XZE-   503 bytes
128 milliseconds, ADDAtoB_XZZEE- 503 bytes
129 milliseconds, ADDAtoB_XZZE-  503 bytes
129 milliseconds, ADDAtoB_XZZF-  503 bytes
202 milliseconds, ADDAtoB_X-     203 bytes
520 milliseconds, ADDAtoB_X-     503 bytes

700 milliseconds, ADDAtoB_XZZEE-1027 bytes
701 milliseconds, ADDAtoB_XZZF- 1027 bytes
701 milliseconds, ADDAtoB_XZE-  1027 bytes
716 milliseconds, ADDAtoB_XZZE- 1027 bytes
1093 milliseconds, ADDAtoB_X-    1027 bytes
********** END III **********
Title: Re: Sorting strings
Post by: FORTRANS on June 30, 2014, 02:19:43 AM
Hi,

   Three older processors, P-III, Pentium M, and P-MMX.  The
last seems to vary a bit from run to run.  66597 milliseconds
was larger in the first run.

Regards,

Steve N.
Title: Re: Sorting strings
Post by: Gunther on June 30, 2014, 03:34:04 AM
Rui,

the result is attached.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on June 30, 2014, 07:01:55 AM
Thank you, Steve  :t
(i am sorry, thank you Dave)
--------------------------------
Results from FORTRANS
--------------------------------
------------------------------------------
???? (SSE1)
------------------------------------------
***** Time table *****

   8 milliseconds, ADDAtoB_X-       0 bytes
  10 milliseconds, ADDAtoB_X-       1 bytes
  13 milliseconds, ADDAtoB_XZZF-    0 bytes
  14 milliseconds, ADDAtoB_X-       2 bytes
  14 milliseconds, ADDAtoB_XZZF-    1 bytes
  14 milliseconds, ADDAtoB_XZZF-    4 bytes
  19 milliseconds, ADDAtoB_XZZF-    2 bytes
  21 milliseconds, ADDAtoB_X-       4 bytes
  22 milliseconds, ADDAtoB_XZZE-    0 bytes
  22 milliseconds, ADDAtoB_XZE-     0 bytes
  24 milliseconds, ADDAtoB_XZE-     1 bytes
  24 milliseconds, ADDAtoB_XZZE-    1 bytes
  24 milliseconds, ADDAtoB_XZZEE-   0 bytes
  26 milliseconds, ADDAtoB_XZZEE-   4 bytes
  27 milliseconds, ADDAtoB_XZZEE-   1 bytes
  28 milliseconds, ADDAtoB_XZZF-   15 bytes
  29 milliseconds, ADDAtoB_XZZE-    4 bytes
  29 milliseconds, ADDAtoB_XZZE-    2 bytes
  29 milliseconds, ADDAtoB_XZE-     2 bytes
  30 milliseconds, ADDAtoB_XZZF-    3 bytes
  32 milliseconds, ADDAtoB_XZZEE-   2 bytes
  35 milliseconds, ADDAtoB_XZE-     4 bytes
  40 milliseconds, ADDAtoB_XZZEE-  15 bytes
  45 milliseconds, ADDAtoB_XZE-     3 bytes
  45 milliseconds, ADDAtoB_XZZE-    3 bytes
  49 milliseconds, ADDAtoB_XZZEE-   3 bytes
  54 milliseconds, ADDAtoB_XZE-    15 bytes
  55 milliseconds, ADDAtoB_XZZE-   15 bytes
  69 milliseconds, ADDAtoB_X-       3 bytes
  74 milliseconds, ADDAtoB_X-      15 bytes
 
134 milliseconds, ADDAtoB_XZZE-   55 bytes
134 milliseconds, ADDAtoB_XZE-    55 bytes
137 milliseconds, ADDAtoB_XZZF-   55 bytes
143 milliseconds, ADDAtoB_XZZEE-  55 bytes

176 milliseconds, ADDAtoB_XZZEE- 103 bytes
177 milliseconds, ADDAtoB_XZE-   103 bytes
178 milliseconds, ADDAtoB_XZZE-  103 bytes
194 milliseconds, ADDAtoB_XZZF-  103 bytes
258 milliseconds, ADDAtoB_X-      55 bytes

295 milliseconds, ADDAtoB_XZZEE- 203 bytes
300 milliseconds, ADDAtoB_XZE-   203 bytes
300 milliseconds, ADDAtoB_XZZE-  203 bytes
346 milliseconds, ADDAtoB_XZZF-  203 bytes
492 milliseconds, ADDAtoB_X-     103 bytes

545 milliseconds, ADDAtoB_XZZF-  503 bytes
683 milliseconds, ADDAtoB_XZZEE- 503 bytes
691 milliseconds, ADDAtoB_XZE-   503 bytes
691 milliseconds, ADDAtoB_XZZE-  503 bytes
831 milliseconds, ADDAtoB_X-     203 bytes

1165 milliseconds, ADDAtoB_XZZF- 1027 bytes
1452 milliseconds, ADDAtoB_XZZEE-1027 bytes
1465 milliseconds, ADDAtoB_XZE-  1027 bytes
1466 milliseconds, ADDAtoB_XZZE- 1027 bytes
2071 milliseconds, ADDAtoB_X-     503 bytes
4481 milliseconds, ADDAtoB_X-    1027 bytes
********** END III **********

-------------------------------------------------------------
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
-------------------------------------------------------------
***** Time table *****

   4 milliseconds, ADDAtoB_XZZF-    0 bytes
   6 milliseconds, ADDAtoB_XZZF-    2 bytes
   6 milliseconds, ADDAtoB_XZZF-    1 bytes
   6 milliseconds, ADDAtoB_XZZF-    4 bytes
   8 milliseconds, ADDAtoB_XZZF-    3 bytes
   8 milliseconds, ADDAtoB_X-       0 bytes
  10 milliseconds, ADDAtoB_XZZF-   15 bytes
  11 milliseconds, ADDAtoB_XZZE-    1 bytes
  11 milliseconds, ADDAtoB_XZZE-    0 bytes
  11 milliseconds, ADDAtoB_XZE-     2 bytes
  11 milliseconds, ADDAtoB_XZE-     1 bytes
  11 milliseconds, ADDAtoB_XZE-     0 bytes
  11 milliseconds, ADDAtoB_X-       1 bytes
  11 milliseconds, ADDAtoB_XZZE-    2 bytes
  12 milliseconds, ADDAtoB_XZZE-    4 bytes
  12 milliseconds, ADDAtoB_XZZEE-   2 bytes
  12 milliseconds, ADDAtoB_XZZEE-   1 bytes
  12 milliseconds, ADDAtoB_XZZEE-   0 bytes
  12 milliseconds, ADDAtoB_XZE-     4 bytes
  13 milliseconds, ADDAtoB_XZE-     3 bytes
  13 milliseconds, ADDAtoB_XZZEE-   4 bytes
  13 milliseconds, ADDAtoB_XZZE-    3 bytes
  14 milliseconds, ADDAtoB_XZZEE-   3 bytes
  14 milliseconds, ADDAtoB_X-       2 bytes
 
  16 milliseconds, ADDAtoB_XZZEE15 bytes
  18 milliseconds, ADDAtoB_XZZE-   15 bytes
  18 milliseconds, ADDAtoB_XZE-    15 bytes
  19 milliseconds, ADDAtoB_X-       4 bytes
  25 milliseconds, ADDAtoB_X-      15 bytes
 
  30 milliseconds, ADDAtoB_XZZF-   55 bytes
  36 milliseconds, ADDAtoB_XZZE-   55 bytes
  38 milliseconds, ADDAtoB_XZZEE-  55 bytes
  40 milliseconds, ADDAtoB_XZE-    55 bytes
  44 milliseconds, ADDAtoB_X-       3 bytes
 
  55 milliseconds, ADDAtoB_XZZF103 bytes
  56 milliseconds, ADDAtoB_XZZEE- 103 bytes
  56 milliseconds, ADDAtoB_XZZE-  103 bytes
  58 milliseconds, ADDAtoB_XZE-   103 bytes
 
  94 milliseconds, ADDAtoB_XZZF203 bytes
  98 milliseconds, ADDAtoB_X-      55 bytes
100 milliseconds, ADDAtoB_XZZEE- 203 bytes
100 milliseconds, ADDAtoB_XZZE-  203 bytes
100 milliseconds, ADDAtoB_XZE-   203 bytes
169 milliseconds, ADDAtoB_X-     103 bytes

210 milliseconds, ADDAtoB_XZZF503 bytes
213 milliseconds, ADDAtoB_XZZE-  503 bytes
217 milliseconds, ADDAtoB_XZE-   503 bytes
217 milliseconds, ADDAtoB_XZZEE- 503 bytes
305 milliseconds, ADDAtoB_X-     203 bytes

483 milliseconds, ADDAtoB_XZZF- 1027 bytes
492 milliseconds, ADDAtoB_XZZEE-1027 bytes
493 milliseconds, ADDAtoB_XZZE- 1027 bytes
494 milliseconds, ADDAtoB_XZE-  1027 bytes
744 milliseconds, ADDAtoB_X-     503 bytes
1783 milliseconds, ADDAtoB_X-    1027 bytes
********** END III **********

------------------------------------------
;      this is one REFERENCE !
------------------------------------------
***** Time table *****

   87 milliseconds, ADDAtoB_X-       0 bytes
  111 milliseconds, ADDAtoB_XZE-     0 bytes
  111 milliseconds, ADDAtoB_XZZE-    0 bytes
  123 milliseconds, ADDAtoB_XZZEE-   0 bytes
  123 milliseconds, ADDAtoB_XZZF-    0 bytes
 
  215 milliseconds, ADDAtoB_XZE-     2 bytes
  215 milliseconds, ADDAtoB_XZZE-    2 bytes
  215 milliseconds, ADDAtoB_XZZF-    2 bytes
  222 milliseconds, ADDAtoB_X-       2 bytes
  227 milliseconds, ADDAtoB_XZZE-    1 bytes
  227 milliseconds, ADDAtoB_XZZEE-   2 bytes
  228 milliseconds, ADDAtoB_XZZF-    1 bytes
  228 milliseconds, ADDAtoB_XZZEE-   1 bytes
  229 milliseconds, ADDAtoB_XZE-     1 bytes
  233 milliseconds, ADDAtoB_XZE-     3 bytes
  233 milliseconds, ADDAtoB_X-       3 bytes
  234 milliseconds, ADDAtoB_XZZF-    3 bytes
  234 milliseconds, ADDAtoB_XZZE-    3 bytes
  242 milliseconds, ADDAtoB_X-       1 bytes
  246 milliseconds, ADDAtoB_XZZEE-   3 bytes
  288 milliseconds, ADDAtoB_XZE-     4 bytes
  289 milliseconds, ADDAtoB_XZZE-    4 bytes
  289 milliseconds, ADDAtoB_XZZF-    4 bytes
  290 milliseconds, ADDAtoB_XZZEE-   4 bytes
 
  580 milliseconds, ADDAtoB_XZZEE-  15 bytes
  581 milliseconds, ADDAtoB_XZE-    15 bytes
  582 milliseconds, ADDAtoB_XZZE-   15 bytes
  595 milliseconds, ADDAtoB_XZZF-   15 bytes
  713 milliseconds, ADDAtoB_X-       4 bytes
2038 milliseconds, ADDAtoB_X-      15 bytes

3470 milliseconds, ADDAtoB_XZZF-   55 bytes
3705 milliseconds, ADDAtoB_XZZEE-  55 bytes
3934 milliseconds, ADDAtoB_XZZE-   55 bytes
3945 milliseconds, ADDAtoB_XZE-    55 bytes
5890 milliseconds, ADDAtoB_X-      55 bytes

6554 milliseconds, ADDAtoB_XZZF-  103 bytes
6932 milliseconds, ADDAtoB_XZZEE- 103 bytes
7170 milliseconds, ADDAtoB_XZE-   103 bytes
7706 milliseconds, ADDAtoB_XZZE-  103 bytes

10091 milliseconds, ADDAtoB_XZZEE- 203 bytes
10320 milliseconds, ADDAtoB_XZZEE- 503 bytes
10421 milliseconds, ADDAtoB_XZZF-  203 bytes
10490 milliseconds, ADDAtoB_XZE-   203 bytes
10491 milliseconds, ADDAtoB_XZZE-  203 bytes


10644 milliseconds, ADDAtoB_XZZF-  503 bytes
10651 milliseconds, ADDAtoB_X-     103 bytes
10721 milliseconds, ADDAtoB_XZE-   503 bytes
10731 milliseconds, ADDAtoB_XZZE-  503 bytes
17916 milliseconds, ADDAtoB_X-     203 bytes
33803 milliseconds, ADDAtoB_X-     503 bytes


34786 milliseconds, ADDAtoB_XZZE- 1027 bytes
34792 milliseconds, ADDAtoB_XZZF- 1027 bytes
34796 milliseconds, ADDAtoB_XZZEE-1027 bytes
35036 milliseconds, ADDAtoB_XZE-  1027 bytes
66597 milliseconds, ADDAtoB_X-    1027 bytes
********** END III **********
Title: Re: Sorting strings
Post by: dedndave on June 30, 2014, 07:10:10 AM
psst - FORTRANS is Steve   :biggrin:

i haven't been running these, because we have essentially the same processor
my results will always look like yours
Title: Re: Sorting strings
Post by: RuiLoureiro on June 30, 2014, 07:18:43 AM
Dave, i did a correction in 2 posts  :t

Thank you Gunther,  :t

XZZF seems to be better here with 137 milliseconds to ADD 1027 bytes
and we got the same 137 to MOV 1027 bytes with XZZC (see my reply #48)!

IS THIS GOOD ? What do you think ?

----------------------------
Results from Gunther
----------------------------
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
***** Time table *****

  0 milliseconds, ADDAtoB_XZZF-    0 bytes
  0 milliseconds, ADDAtoB_XZZEE-   0 bytes
  0 milliseconds, ADDAtoB_XZZE-    4 bytes
  0 milliseconds, ADDAtoB_XZZE-    1 bytes
  0 milliseconds, ADDAtoB_XZZE-    0 bytes
  0 milliseconds, ADDAtoB_XZE-     0 bytes
  0 milliseconds, ADDAtoB_X-       1 bytes
  1 milliseconds, ADDAtoB_XZZF-    4 bytes
  1 milliseconds, ADDAtoB_XZZE-    3 bytes
  1 milliseconds, ADDAtoB_XZZE-    2 bytes
  1 milliseconds, ADDAtoB_XZZEE-   4 bytes
  1 milliseconds, ADDAtoB_XZZEE-   2 bytes
  1 milliseconds, ADDAtoB_XZE-     4 bytes
  1 milliseconds, ADDAtoB_XZE-     3 bytes
  1 milliseconds, ADDAtoB_XZE-     2 bytes
  1 milliseconds, ADDAtoB_XZE-     1 bytes
  1 milliseconds, ADDAtoB_XZZEE-   1 bytes
  1 milliseconds, ADDAtoB_X-       2 bytes
  1 milliseconds, ADDAtoB_XZZF-    1 bytes
  1 milliseconds, ADDAtoB_X-       0 bytes
  2 milliseconds, ADDAtoB_XZZF-    2 bytes
  3 milliseconds, ADDAtoB_XZZF-   15 bytes
  3 milliseconds, ADDAtoB_XZE-    15 bytes
  3 milliseconds, ADDAtoB_X-       4 bytes
  3 milliseconds, ADDAtoB_X-       3 bytes
  3 milliseconds, ADDAtoB_XZZF-    3 bytes
  3 milliseconds, ADDAtoB_XZZEE-   3 bytes
  3 milliseconds, ADDAtoB_XZZE-   15 bytes
  4 milliseconds, ADDAtoB_XZZEE15 bytes
  7 milliseconds, ADDAtoB_X-      15 bytes
 
  8 milliseconds, ADDAtoB_XZZF-   55 bytes
10 milliseconds, ADDAtoB_XZZEE-  55 bytes
11 milliseconds, ADDAtoB_XZZE-   55 bytes
11 milliseconds, ADDAtoB_XZE-    55 bytes

14 milliseconds, ADDAtoB_XZZEE- 103 bytes
14 milliseconds, ADDAtoB_XZZF103 bytes
20 milliseconds, ADDAtoB_XZE-   103 bytes
20 milliseconds, ADDAtoB_XZZE-  103 bytes

31 milliseconds, ADDAtoB_XZZF203 bytes
40 milliseconds, ADDAtoB_XZE-   203 bytes
40 milliseconds, ADDAtoB_XZZEE- 203 bytes
40 milliseconds, ADDAtoB_XZZE-  203 bytes
41 milliseconds, ADDAtoB_X-      55 bytes
57 milliseconds, ADDAtoB_X-     103 bytes

70 milliseconds, ADDAtoB_XZZEE- 503 bytes
70 milliseconds, ADDAtoB_XZZF503 bytes
78 milliseconds, ADDAtoB_XZE-   503 bytes
79 milliseconds, ADDAtoB_XZZE-  503 bytes
117 milliseconds, ADDAtoB_X-     203 bytes

137 milliseconds, ADDAtoB_XZZF- 1027 bytes
146 milliseconds, ADDAtoB_XZZE- 1027 bytes
146 milliseconds, ADDAtoB_XZE-  1027 bytes
146 milliseconds, ADDAtoB_XZZEE-1027 bytes
263 milliseconds, ADDAtoB_X-     503 bytes
557 milliseconds, ADDAtoB_X-    1027 bytes
********** END III **********
Title: Re: Sorting strings
Post by: Gunther on July 01, 2014, 02:04:39 AM
Hi Rui,

Quote from: RuiLoureiro on June 30, 2014, 07:18:43 AM
XZZF seems to be better here with 137 milliseconds to ADD 1027 bytes
and we got the same 137 to MOV 1027 bytes with XZZC !

IS THIS GOOD ? What do you think ?

I think it's not bad.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 02, 2014, 03:26:08 AM
Hi
--------------------------------
Insert a string to another
--------------------------------

    . INSAtoB_X is the reference procedure:
      move and add only BYTES;
    . all other move/add DWORDS+BYTES;
    . INSAtoB_X stringA, stringB, Position    <<<- insert A into B

    . In these examples, we insert 0,1,2,3,4,15,55,103,203,503,1027
      to one that contains 100 bytes.

    Gunther, could you post your results, please ?
    Thank you  :t
   
Here are my results

Quote
INSERTING AT POSITION 7 -string length=100
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  3 milliseconds, INSAtoB_XZZF-   0 bytes
  3 milliseconds, INSAtoB_XZZE-   0 bytes
  6 milliseconds, INSAtoB_X-      0 bytes
 
45 milliseconds, INSAtoB_XZZE-   3 bytes
59 milliseconds, INSAtoB_XZZF-   3 bytes
64 milliseconds, INSAtoB_XZZF-   1 bytes
65 milliseconds, INSAtoB_XZZE-   1 bytes

69 milliseconds, INSAtoB_XZZF15 bytes
76 milliseconds, INSAtoB_XZZE- 103 bytes
87 milliseconds, INSAtoB_XZZE-  15 bytes
91 milliseconds, INSAtoB_XZZE-   2 bytes
100 milliseconds, INSAtoB_X-      4 bytes
101 milliseconds, INSAtoB_X-      1 bytes
102 milliseconds, INSAtoB_XZZE-  55 bytes
103 milliseconds, INSAtoB_XZZE-   4 bytes
103 milliseconds, INSAtoB_XZZF-   4 bytes
105 milliseconds, INSAtoB_XZZF- 203 bytes
105 milliseconds, INSAtoB_XZZF-  55 bytes
106 milliseconds, INSAtoB_XZZF-   2 bytes
113 milliseconds, INSAtoB_X-     15 bytes
126 milliseconds, INSAtoB_X-      3 bytes
126 milliseconds, INSAtoB_X-      2 bytes
152 milliseconds, INSAtoB_X-     55 bytes

156 milliseconds, INSAtoB_XZZF- 103 bytes
186 milliseconds, INSAtoB_X-    103 bytes
219 milliseconds, INSAtoB_XZZE- 203 bytes
243 milliseconds, INSAtoB_X-    203 bytes
443 milliseconds, INSAtoB_X-    503 bytes
479 milliseconds, INSAtoB_XZZE- 503 bytes
539 milliseconds, INSAtoB_XZZF- 503 bytes

816 milliseconds, INSAtoB_X-   1027 bytes
830 milliseconds, INSAtoB_XZZE-1027 bytes
999 milliseconds, INSAtoB_XZZF-1027 bytes
********** END III **********
Quote
INSERTING AT POSITION 50 -string length=100
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  4 milliseconds, INSAtoB_XZZE-   0 bytes
  7 milliseconds, INSAtoB_X-      0 bytes
  8 milliseconds, INSAtoB_XZZF-   0 bytes
 
29 milliseconds, INSAtoB_XZZE-   3 bytes
34 milliseconds, INSAtoB_XZZE-   1 bytes
38 milliseconds, INSAtoB_XZZF-   1 bytes
43 milliseconds, INSAtoB_XZZE-   2 bytes
46 milliseconds, INSAtoB_XZZF-   3 bytes
46 milliseconds, INSAtoB_XZZF15 bytes
54 milliseconds, INSAtoB_XZZE-   4 bytes
55 milliseconds, INSAtoB_XZZF-   4 bytes
59 milliseconds, INSAtoB_XZZE-  15 bytes

61 milliseconds, INSAtoB_XZZE55 bytes
62 milliseconds, INSAtoB_XZZF-   2 bytes
70 milliseconds, INSAtoB_XZZF55 bytes
78 milliseconds, INSAtoB_X-      2 bytes
83 milliseconds, INSAtoB_X-      4 bytes
84 milliseconds, INSAtoB_X-      1 bytes
84 milliseconds, INSAtoB_X-      3 bytes
110 milliseconds, INSAtoB_X-     55 bytes

112 milliseconds, INSAtoB_XZZE- 103 bytes
122 milliseconds, INSAtoB_X-     15 bytes
131 milliseconds, INSAtoB_XZZF- 103 bytes
132 milliseconds, INSAtoB_X-    103 bytes

139 milliseconds, INSAtoB_XZZE- 203 bytes
148 milliseconds, INSAtoB_XZZF- 203 bytes

190 milliseconds, INSAtoB_XZZF- 503 bytes
201 milliseconds, INSAtoB_X-    203 bytes
442 milliseconds, INSAtoB_X-    503 bytes
459 milliseconds, INSAtoB_XZZE- 503 bytes

776 milliseconds, INSAtoB_XZZF-1027 bytes
784 milliseconds, INSAtoB_XZZE-1027 bytes
795 milliseconds, INSAtoB_X-   1027 bytes
********** END III **********
Quote
INSERTING AT POSITION 93 -string length=100
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  3 milliseconds, INSAtoB_XZZE-   0 bytes
  7 milliseconds, INSAtoB_X-      0 bytes
  8 milliseconds, INSAtoB_XZZF-   0 bytes
12 milliseconds, INSAtoB_X-      1 bytes
13 milliseconds, INSAtoB_X-      2 bytes
15 milliseconds, INSAtoB_XZZE-   2 bytes
16 milliseconds, INSAtoB_XZZE-   3 bytes
17 milliseconds, INSAtoB_XZZE-   4 bytes
19 milliseconds, INSAtoB_XZZF-   4 bytes
22 milliseconds, INSAtoB_XZZF-   1 bytes
24 milliseconds, INSAtoB_XZZF-   2 bytes
24 milliseconds, INSAtoB_XZZF-   3 bytes
25 milliseconds, INSAtoB_X-      3 bytes
25 milliseconds, INSAtoB_XZZE-   1 bytes
27 milliseconds, INSAtoB_X-      4 bytes

35 milliseconds, INSAtoB_XZZE15 bytes
37 milliseconds, INSAtoB_XZZF-  55 bytes
40 milliseconds, INSAtoB_XZZF-  15 bytes
45 milliseconds, INSAtoB_X-     15 bytes
60 milliseconds, INSAtoB_XZZE-  55 bytes
67 milliseconds, INSAtoB_X-     55 bytes

72 milliseconds, INSAtoB_XZZF- 103 bytes
101 milliseconds, INSAtoB_X-    103 bytes
119 milliseconds, INSAtoB_XZZE- 103 bytes
146 milliseconds, INSAtoB_XZZF- 203 bytes
155 milliseconds, INSAtoB_X-    203 bytes
156 milliseconds, INSAtoB_XZZE- 503 bytes
163 milliseconds, INSAtoB_XZZE- 203 bytes

315 milliseconds, INSAtoB_XZZF-1027 bytes
398 milliseconds, INSAtoB_X-    503 bytes
417 milliseconds, INSAtoB_XZZF- 503 bytes
782 milliseconds, INSAtoB_XZZE-1027 bytes
876 milliseconds, INSAtoB_X-   1027 bytes
********** END III **********
Title: Re: Sorting strings
Post by: Gunther on July 02, 2014, 08:07:01 PM
Hi Rui,

Quote from: RuiLoureiro on July 02, 2014, 03:26:08 AM
    Gunther, could you post your results, please ?
    Thank you  :t

No problem. Do you need the results of all 3 attachments? I can do it this evening.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 02, 2014, 09:06:03 PM
Hi Gunther,
                   yes for all 3, if you dont mind
                   Thanks.
Title: Re: Sorting strings
Post by: RuiLoureiro on July 03, 2014, 02:33:21 AM
In the following example, each procedure
moves 100 bytes (string length) to the right
to insert strings at position 0.

Strange result (i need to see why: nothing wrong!):
Quote
289 milliseconds, INSAtoB_XZZE-1027 bytes
492 milliseconds, INSAtoB_XZZE- 503 bytes   
INSAtoB_X   ->>> moves only BYTES

INSAtoB_BA and INSAtoB_BB use an auxiliary buffer

INSAtoB_BA   ->>> moves only BYTES
INSAtoB_BB   ->>> moves DWORDS+BYTES

It seems that "auxiliary buffer" is not a good idea.

Quote
INSERTING AT POSITION 0 -string length=100
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  39 milliseconds, INSAtoB_XZZE-   3 bytes
  41 milliseconds, INSAtoB_XZZE-   1 bytes
  55 milliseconds, INSAtoB_XZZF-   3 bytes
  62 milliseconds, INSAtoB_XZZF-  15 bytes
  62 milliseconds, INSAtoB_XZZF-   1 bytes
  74 milliseconds, INSAtoB_XZZE-  55 bytes
  84 milliseconds, INSAtoB_XZZE-  15 bytes
  86 milliseconds, INSAtoB_XZZE-   2 bytes
  95 milliseconds, INSAtoB_XZZE-   4 bytes
  96 milliseconds, INSAtoB_XZZF-  55 bytes
  96 milliseconds, INSAtoB_XZZF-   2 bytes
  96 milliseconds, INSAtoB_XZZF-   4 bytes
102 milliseconds, INSAtoB_X-      2 bytes
103 milliseconds, INSAtoB_X-      1 bytes
113 milliseconds, INSAtoB_XZZF- 103 bytes
119 milliseconds, INSAtoB_X-     15 bytes
121 milliseconds, INSAtoB_XZZE- 103 bytes
126 milliseconds, INSAtoB_X-      3 bytes
129 milliseconds, INSAtoB_X-      4 bytes
130 milliseconds, INSAtoB_BB-     2 bytes
146 milliseconds, INSAtoB_X-     55 bytes
161 milliseconds, INSAtoB_XZZF- 203 bytes
168 milliseconds, INSAtoB_BB-     1 bytes
173 milliseconds, INSAtoB_BB-     3 bytes
173 milliseconds, INSAtoB_X-    103 bytes
174 milliseconds, INSAtoB_BA-     1 bytes
178 milliseconds, INSAtoB_BA-     4 bytes
178 milliseconds, INSAtoB_BB-     4 bytes
183 milliseconds, INSAtoB_BA-     3 bytes
186 milliseconds, INSAtoB_BA-     2 bytes
197 milliseconds, INSAtoB_BB-    15 bytes
202 milliseconds, INSAtoB_BB-    55 bytes
204 milliseconds, INSAtoB_BA-    15 bytes
217 milliseconds, INSAtoB_XZZE- 203 bytes
259 milliseconds, INSAtoB_BA-    55 bytes
283 milliseconds, INSAtoB_X-    203 bytes

289 milliseconds, INSAtoB_XZZE-1027 bytes

320 milliseconds, INSAtoB_BA-   103 bytes
363 milliseconds, INSAtoB_BB-   103 bytes
417 milliseconds, INSAtoB_X-    503 bytes
448 milliseconds, INSAtoB_BA-   203 bytes
476 milliseconds, INSAtoB_BB-   203 bytes

492 milliseconds, INSAtoB_XZZE- 503 bytes
494 milliseconds, INSAtoB_XZZF- 503 bytes

754 milliseconds, INSAtoB_X-   1027 bytes
768 milliseconds, INSAtoB_XZZF-1027 bytes

841 milliseconds, INSAtoB_BA-   503 bytes
950 milliseconds, INSAtoB_BB-   503 bytes
1081 milliseconds, INSAtoB_BB1027 bytes
1489 milliseconds, INSAtoB_BA-  1027 bytes
********** END III **********
Title: Re: Sorting strings
Post by: Gunther on July 03, 2014, 04:37:09 AM
Hi Rui,

here are the results. String position 7:

INSERTING AT POSITION 7 -string length=100
------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------
***** Time table *****

0 milliseconds, INSAtoB_XZZF-   0 bytes
0 milliseconds, INSAtoB_X-    0 bytes
1 milliseconds, INSAtoB_XZZE-   0 bytes
13 milliseconds, INSAtoB_XZZF-   4 bytes
14 milliseconds, INSAtoB_XZZE-   1 bytes
14 milliseconds, INSAtoB_XZZF-   1 bytes
14 milliseconds, INSAtoB_XZZF-   2 bytes
15 milliseconds, INSAtoB_XZZE-   3 bytes
15 milliseconds, INSAtoB_XZZF-   3 bytes
17 milliseconds, INSAtoB_XZZF-  15 bytes
18 milliseconds, INSAtoB_XZZE-  15 bytes
18 milliseconds, INSAtoB_XZZE-   4 bytes
20 milliseconds, INSAtoB_XZZE-   2 bytes
23 milliseconds, INSAtoB_XZZF-  55 bytes
32 milliseconds, INSAtoB_XZZE- 103 bytes
32 milliseconds, INSAtoB_XZZE-  55 bytes
32 milliseconds, INSAtoB_XZZF- 103 bytes
53 milliseconds, INSAtoB_XZZF- 203 bytes
55 milliseconds, INSAtoB_X-    1 bytes
56 milliseconds, INSAtoB_XZZE- 203 bytes
57 milliseconds, INSAtoB_X-    3 bytes
62 milliseconds, INSAtoB_X-    2 bytes
63 milliseconds, INSAtoB_X-    4 bytes
64 milliseconds, INSAtoB_X-   15 bytes
84 milliseconds, INSAtoB_XZZE- 503 bytes
91 milliseconds, INSAtoB_XZZF- 503 bytes
92 milliseconds, INSAtoB_X-   55 bytes
112 milliseconds, INSAtoB_X-  103 bytes
160 milliseconds, INSAtoB_XZZF-1027 bytes
160 milliseconds, INSAtoB_XZZE-1027 bytes
171 milliseconds, INSAtoB_X-  203 bytes
319 milliseconds, INSAtoB_X-  503 bytes
601 milliseconds, INSAtoB_X- 1027 bytes
********** END III **********

String position 50:

INSERTING AT POSITION 50 -string length=100
------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------
***** Time table *****

0 milliseconds, INSAtoB_XZZF-   0 bytes
0 milliseconds, INSAtoB_X-    0 bytes
1 milliseconds, INSAtoB_XZZE-   0 bytes
8 milliseconds, INSAtoB_XZZF-   4 bytes
9 milliseconds, INSAtoB_XZZF-   1 bytes
9 milliseconds, INSAtoB_XZZE-   1 bytes
10 milliseconds, INSAtoB_XZZF-   3 bytes
10 milliseconds, INSAtoB_XZZF-   2 bytes
10 milliseconds, INSAtoB_XZZE-   3 bytes
11 milliseconds, INSAtoB_XZZE-   4 bytes
12 milliseconds, INSAtoB_XZZE-   2 bytes
12 milliseconds, INSAtoB_XZZF-  15 bytes
13 milliseconds, INSAtoB_XZZE-  15 bytes
18 milliseconds, INSAtoB_XZZF-  55 bytes
21 milliseconds, INSAtoB_XZZE-  55 bytes
22 milliseconds, INSAtoB_XZZE- 103 bytes
28 milliseconds, INSAtoB_XZZF- 103 bytes
33 milliseconds, INSAtoB_X-    1 bytes
34 milliseconds, INSAtoB_X-    3 bytes
40 milliseconds, INSAtoB_X-    2 bytes
41 milliseconds, INSAtoB_X-    4 bytes
42 milliseconds, INSAtoB_X-   15 bytes
48 milliseconds, INSAtoB_XZZF- 203 bytes
51 milliseconds, INSAtoB_XZZE- 203 bytes
71 milliseconds, INSAtoB_X-   55 bytes
79 milliseconds, INSAtoB_XZZE- 503 bytes
86 milliseconds, INSAtoB_XZZF- 503 bytes
91 milliseconds, INSAtoB_X-  103 bytes
150 milliseconds, INSAtoB_X-  203 bytes
155 milliseconds, INSAtoB_XZZF-1027 bytes
158 milliseconds, INSAtoB_XZZE-1027 bytes
299 milliseconds, INSAtoB_X-  503 bytes
579 milliseconds, INSAtoB_X- 1027 bytes
********** END III **********

String position 93:

INSERTING AT POSITION 93 -string length=100
------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------
***** Time table *****

1 milliseconds, INSAtoB_XZZF-   0 bytes
1 milliseconds, INSAtoB_XZZE-   0 bytes
1 milliseconds, INSAtoB_X-    0 bytes
3 milliseconds, INSAtoB_X-    1 bytes
4 milliseconds, INSAtoB_XZZF-   1 bytes
4 milliseconds, INSAtoB_XZZF-   2 bytes
4 milliseconds, INSAtoB_XZZE-   1 bytes
5 milliseconds, INSAtoB_XZZE-   2 bytes
5 milliseconds, INSAtoB_XZZF-   3 bytes
5 milliseconds, INSAtoB_XZZF-   4 bytes
5 milliseconds, INSAtoB_X-    3 bytes
5 milliseconds, INSAtoB_XZZE-   4 bytes
5 milliseconds, INSAtoB_XZZE-   3 bytes
7 milliseconds, INSAtoB_XZZE-  15 bytes
7 milliseconds, INSAtoB_X-    2 bytes
7 milliseconds, INSAtoB_XZZF-  15 bytes
7 milliseconds, INSAtoB_X-    4 bytes
13 milliseconds, INSAtoB_X-   15 bytes
13 milliseconds, INSAtoB_XZZF-  55 bytes
14 milliseconds, INSAtoB_XZZE-  55 bytes
17 milliseconds, INSAtoB_XZZE- 103 bytes
23 milliseconds, INSAtoB_XZZF- 103 bytes
38 milliseconds, INSAtoB_X-   55 bytes
42 milliseconds, INSAtoB_XZZE- 203 bytes
42 milliseconds, INSAtoB_XZZF- 203 bytes
62 milliseconds, INSAtoB_X-  103 bytes
74 milliseconds, INSAtoB_XZZE- 503 bytes
81 milliseconds, INSAtoB_XZZF- 503 bytes
116 milliseconds, INSAtoB_X-  203 bytes
149 milliseconds, INSAtoB_XZZE-1027 bytes
149 milliseconds, INSAtoB_XZZF-1027 bytes
271 milliseconds, INSAtoB_X-  503 bytes
545 milliseconds, INSAtoB_X- 1027 bytes
********** END III **********


I hope that helps.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 03, 2014, 05:43:03 AM
Thank you so much, Gunther  :t

XZZE or XZZF are good options

18 milliseconds to insert 55 bytes at the middle
seems to be good.

----------------------------
Results from Gunther
----------------------------
Quote
INSERTING AT POSITION 7 -string length=100
-------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
-------------------------------------------------------------
***** Time table *****

17 milliseconds, INSAtoB_XZZF15 bytes
18 milliseconds, INSAtoB_XZZE-  15 bytes
23 milliseconds, INSAtoB_XZZF-  55 bytes
32 milliseconds, INSAtoB_XZZE- 103 bytes
32 milliseconds, INSAtoB_XZZE-  55 bytes
32 milliseconds, INSAtoB_XZZF- 103 bytes
53 milliseconds, INSAtoB_XZZF- 203 bytes
56 milliseconds, INSAtoB_XZZE- 203 bytes
64 milliseconds, INSAtoB_X-     15 bytes
84 milliseconds, INSAtoB_XZZE- 503 bytes
91 milliseconds, INSAtoB_XZZF- 503 bytes
92 milliseconds, INSAtoB_X-     55 bytes
112 milliseconds, INSAtoB_X-    103 bytes
160 milliseconds, INSAtoB_XZZF-1027 bytes
160 milliseconds, INSAtoB_XZZE-1027 bytes
171 milliseconds, INSAtoB_X-    203 bytes
319 milliseconds, INSAtoB_X-    503 bytes
601 milliseconds, INSAtoB_X-   1027 bytes
********** END III **********
Quote
INSERTING AT POSITION 50 -string length=100
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
***** Time table *****

12 milliseconds, INSAtoB_XZZF15 bytes
13 milliseconds, INSAtoB_XZZE-  15 bytes
18 milliseconds, INSAtoB_XZZF55 bytes
21 milliseconds, INSAtoB_XZZE-  55 bytes
22 milliseconds, INSAtoB_XZZE- 103 bytes
28 milliseconds, INSAtoB_XZZF- 103 bytes
42 milliseconds, INSAtoB_X-     15 bytes
48 milliseconds, INSAtoB_XZZF- 203 bytes
51 milliseconds, INSAtoB_XZZE- 203 bytes
71 milliseconds, INSAtoB_X-     55 bytes
79 milliseconds, INSAtoB_XZZE- 503 bytes
86 milliseconds, INSAtoB_XZZF- 503 bytes
91 milliseconds, INSAtoB_X-    103 bytes
150 milliseconds, INSAtoB_X-    203 bytes
155 milliseconds, INSAtoB_XZZF-1027 bytes  <<<<-----
158 milliseconds, INSAtoB_XZZE-1027 bytes
299 milliseconds, INSAtoB_X-    503 bytes
579 milliseconds, INSAtoB_X-   1027 bytes
********** END III **********
Quote
INSERTING AT POSITION 93 -string length=100
-------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
-------------------------------------------------------------
***** Time table *****

  7 milliseconds, INSAtoB_XZZE-  15 bytes
  7 milliseconds, INSAtoB_XZZF15 bytes
13 milliseconds, INSAtoB_X-     15 bytes
13 milliseconds, INSAtoB_XZZF-  55 bytes
14 milliseconds, INSAtoB_XZZE-  55 bytes
17 milliseconds, INSAtoB_XZZE- 103 bytes
23 milliseconds, INSAtoB_XZZF- 103 bytes
38 milliseconds, INSAtoB_X-     55 bytes
42 milliseconds, INSAtoB_XZZE- 203 bytes
42 milliseconds, INSAtoB_XZZF- 203 bytes
62 milliseconds, INSAtoB_X-    103 bytes
74 milliseconds, INSAtoB_XZZE- 503 bytes
81 milliseconds, INSAtoB_XZZF- 503 bytes
116 milliseconds, INSAtoB_X-    203 bytes
149 milliseconds, INSAtoB_XZZE-1027 bytes
149 milliseconds, INSAtoB_XZZF-1027 bytes
271 milliseconds, INSAtoB_X-    503 bytes
545 milliseconds, INSAtoB_X-   1027 bytes
********** END III **********
Title: Re: Sorting strings
Post by: Gunther on July 03, 2014, 06:12:53 AM
Hi Rui,

Quote from: RuiLoureiro on July 03, 2014, 05:43:03 AM
Thank you so much, Gunther  :t

You're welcome.

Quote from: RuiLoureiro on July 03, 2014, 05:43:03 AM
XZZE or XZZF are good options

18 milliseconds to insert 55 bytes at the middle
seems to be good.

Yes, indeed. Not so bad.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 03, 2014, 07:17:21 AM
Hi Gunther  :t
              In my reply #71 we see that we need 137 milliseconds to MOV or ADD 1027 bytes. Now, to insert at position 50, moving bytes (50 bytes), we need 155 milliseconds to insert 1027 bytes. Seems to be good. Of course, in your powerful i7 !
note: "moving bytes" means DWORDS+BYTES. If we need to move 15 bytes, we move 3 DWORDS + 3 BYTES.
Title: Re: Sorting strings
Post by: guga on July 03, 2014, 12:15:48 PM
Hi Rui
here is a function i made that you could adapt to your needs or even improve it. It copies from 128 to 128 bits.

It works for strings or any other data set. Didn´t fully tested, but so far t is very very fast.


; A replacement for memcpy inside msvcrt.dll. Fast memory copy of large buffers. It copies from 128 to 128 bits at once. (4 Dwords)
Proc memcpy:
    Arguments @pDest, @pSource, @Length
    Uses esi, edi, ecx, edx, eax

    mov edi D@pDest
    mov esi D@pSource

    ; we are copying a memory from 128 to 128 bytes at once
    mov ecx D@Length
    mov eax ecx | shr ecx 4 ; integer count. Divide by 16 (4 dwords)
    mov edx ecx | shl edx 4 | sub eax edx ; remainder

    xor edx edx ; here it is used as an index
    While ecx <> 0
        movupd XMM1 X$esi+edx*8 ; copy the 1st 4 dwords from esi to register XMM
        movupd X$edi+edx*8 XMM1 ; copy the 1st 4 dwords from register XMM to edi
        add edx 2 ; we are copying the 128 bits. So instead simply inc by 1, we made it by 2, because each index holds only 8 bytes (limitation of the operand multiplication edx*8 / esi*8 etc)
                  ; So, when edx = 0. edx*8 = 0. X$esi will point to esi+0 bytes
                  ; when edx = 2. edx*8*2 = edx*16. X$esi will point to esi+16 bytes
                  ; when edx = 4. edx*8*4 =edx*32. X$esi will point to esi+32  bytes.
                  ; So. The important is that after each loop esiand edi must points 16 bytes ahead.
        dec ecx   ; ecx is our counter. It simply computes the lenght/16. Why 16 ? because we are jumping from 4 to 4 dwords. Which means that the loop is 16 x faster then using a regular byte by byte operation.
    End_While
    emms ; clear the regsters back to use on FPU
    shl edx 3 ; mul edx by 8 to get the pos
    add edi edx
    add esi edx

    ; If the memory of not 4 dword aligned we may have some remainder here So, just clean them.
    mov ecx eax
    While ecx <> 0
        movsb
        dec ecx
    End_While

EndP
Title: Re: Sorting strings
Post by: jj2007 on July 03, 2014, 01:22:19 PM
Hi Gustavo,

The emms must be a leftover from previous MMX versions ;-)

BTW memcpy has been beaten to death here (http://www.masmforum.com/board/index.php?topic=11454.msg87609#msg87609).
Title: Re: Sorting strings
Post by: guga on July 03, 2014, 02:33:18 PM
Hi JJ, tks

I searched the link, but the file was deleted from the old forum. Do you have the source for the memcpy so i can test ?

Btw..the memcpy is from lingo ? He usually code well for speed.
Title: Re: Sorting strings
Post by: guga on July 03, 2014, 03:20:37 PM
I found one from my old copy of the 2012 masm form


regcopy proc src:DWORD,dst:DWORD,ln:DWORD

    push ebx
    push esi
    push edi
    push ebp

    mov esi, [esp+4+16]     ; src
    mov edi, [esp+8+16]     ; dst
    mov ecx, [esp+12+16]    ; ln

    shr ecx, 4              ; div by 16
    jz tail

  @@:
    mov eax, [esi]
    mov ebx, [esi+4]
    mov edx, [esi+8]
    mov ebp, [esi+12]
    mov [edi], eax
    mov [edi+4], ebx
    mov [edi+8], edx
    mov [edi+12], ebp
    add esi, 16
    add edi, 16
    sub ecx, 1
    jns @B

  tail:
    mov ecx, [esp+12+16]    ; ln
    and ecx, 4
    jz quit

  @@:
    movzx eax, BYTE PTR [esi]
    mov [edi], al
    add esi, 1
    add edi, 1
    sub ecx, 1
    jnz @B

  quit:
    pop ebp
    pop edi
    pop esi
    pop ebx

    ret 12

regcopy endp


But...If is this one....on my tests, the function i made is twice as faster ? Or im analyzing the wrong function to test ?
Title: Re: Sorting strings
Post by: jj2007 on July 03, 2014, 04:56:53 PM
Quote from: guga on July 03, 2014, 02:33:18 PM
I searched the link, but the file was deleted from the old forum. Do you have the source for the memcpy so i can test ?

Here it is. (http://masm32.com/board/index.php?topic=1971.msg20617#msg20617)
Title: Re: Sorting strings
Post by: nidud on July 03, 2014, 11:00:03 PM
deleted
Title: Re: Sorting strings
Post by: guga on July 04, 2014, 01:07:03 AM
Tks JJ. I forgot I was already tested it :)

I´ll take a look and test on the new BenchMark Template app
Title: Re: Sorting strings
Post by: RuiLoureiro on July 04, 2014, 03:17:26 AM
guga, logo que possa vou tentar  :t
Title: Re: Sorting strings
Post by: RuiLoureiro on July 04, 2014, 03:52:27 AM
Hi
In the following example, each procedure
moves 100 bytes (string length) to the right
to insert strings at position 0.

This is my first version with SSE instructions

INSAtoB_SSEE uses «movups  xmm0,[...]»

    Gunther, could you run it, please ?
    Thanks

Quote
INSERTING AT POSITION 0 -string length=100
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  37 milliseconds, INSAtoB_SSEE-   3 bytes
  38 milliseconds, INSAtoB_XZZE-   1 bytes
  40 milliseconds, INSAtoB_XZZE-   3 bytes
  55 milliseconds, INSAtoB_XZZF-   3 bytes
  58 milliseconds, INSAtoB_SSEE-   1 bytes
  61 milliseconds, INSAtoB_XZZF-   1 bytes
  62 milliseconds, INSAtoB_XZZF-  15 bytes

  66 milliseconds, INSAtoB_XZZE55 bytes
  73 milliseconds, INSAtoB_SSEE55 bytes
 
  73 milliseconds, INSAtoB_SSEE-   2 bytes
  73 milliseconds, INSAtoB_SSEE-  15 bytes
  81 milliseconds, INSAtoB_SSEE-   4 bytes
  83 milliseconds, INSAtoB_XZZE-  15 bytes
  84 milliseconds, INSAtoB_XZZE-   2 bytes
  94 milliseconds, INSAtoB_XZZE-   4 bytes
  96 milliseconds, INSAtoB_XZZF-  55 bytes
  96 milliseconds, INSAtoB_XZZF-   2 bytes
  96 milliseconds, INSAtoB_XZZF-   4 bytes
103 milliseconds, INSAtoB_X-      4 bytes
104 milliseconds, INSAtoB_X-      1 bytes
105 milliseconds, INSAtoB_X-      2 bytes
105 milliseconds, INSAtoB_X-      3 bytes
118 milliseconds, INSAtoB_X-     15 bytes

120 milliseconds, INSAtoB_SSEE- 103 bytes
121 milliseconds, INSAtoB_XZZE- 103 bytes
123 milliseconds, INSAtoB_XZZF- 103 bytes

144 milliseconds, INSAtoB_X-     55 bytes
151 milliseconds, INSAtoB_BB-     2 bytes
158 milliseconds, INSAtoB_XZZF- 203 bytes
174 milliseconds, INSAtoB_BA-     1 bytes
175 milliseconds, INSAtoB_BB-     1 bytes
182 milliseconds, INSAtoB_BA-     3 bytes
185 milliseconds, INSAtoB_BB-     3 bytes
185 milliseconds, INSAtoB_X-    103 bytes
186 milliseconds, INSAtoB_BA-     4 bytes
189 milliseconds, INSAtoB_BA-     2 bytes
193 milliseconds, INSAtoB_BA-    15 bytes
196 milliseconds, INSAtoB_SSEE- 203 bytes
211 milliseconds, INSAtoB_BB-     4 bytes
214 milliseconds, INSAtoB_XZZE- 203 bytes
223 milliseconds, INSAtoB_BB-    55 bytes
228 milliseconds, INSAtoB_BB-    15 bytes
238 milliseconds, INSAtoB_X-    203 bytes
256 milliseconds, INSAtoB_BA-    55 bytes
306 milliseconds, INSAtoB_XZZE-1027 bytes
342 milliseconds, INSAtoB_BA-   103 bytes

389 milliseconds, INSAtoB_SSEE- 503 bytes
397 milliseconds, INSAtoB_BB-   103 bytes
415 milliseconds, INSAtoB_X-    503 bytes
445 milliseconds, INSAtoB_BA-   203 bytes
506 milliseconds, INSAtoB_BB-   203 bytes

509 milliseconds, INSAtoB_XZZF- 503 bytes
518 milliseconds, INSAtoB_XZZE- 503 bytes
602 milliseconds, INSAtoB_SSEE-1027 bytes
753 milliseconds, INSAtoB_X-   1027 bytes
779 milliseconds, INSAtoB_XZZF-1027 bytes
822 milliseconds, INSAtoB_BA-   503 bytes
1000 milliseconds, INSAtoB_BB-   503 bytes
1073 milliseconds, INSAtoB_BB-  1027 bytes
1485 milliseconds, INSAtoB_BA-  1027 bytes
********** END III **********
Title: Re: Sorting strings
Post by: guga on July 04, 2014, 04:00:40 AM
Hi Rui.

Ok, many thanks.Here is an updated version that is a bit more faster (For large data, of course)




Proc memcpy_SSE_V2:
    Arguments @pDest, @pSource, @Length
    Uses esi, edi, ecx, edx, eax

    mov edi D@pDest
    mov esi D@pSource
    ; we are copying a memory from 128 to 128 bits at once
    mov ecx D@Length
    mov eax ecx | shr ecx 4 ; integer count. Divide by 16 (4 dwords)
    jz L0> ; The memory size if smaller then 16 bytes long. Jmp over

        ; No we must compute he remainder, to see how many times we will loop
        mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes
        mov edx 0 ; here it is used as an index
        L1:
            movupd XMM1 X$esi+edx*8 ; copy the 1st 4 dwords from esi to register XMM
            movupd X$edi+edx*8 XMM1 ; copy the 1st 4 dwords from register XMM to edi
            dec ecx
            lea edx D$edx+2
            jnz L1<
        ; emms ; clear the registers back to use on FPU <--- Removed tks to JJ. Old CPU instruction uneeded
        test eax eax | jz L4> ; No remainders ? Exit
        jmp L9> ; jmp to the remainder computation

L0:
   ; If we are here, It means that the data is smaller then 16 bytes, and we ned to compute the remainder.
   mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes

L2:

    ; If the memory is not 4 dword aligned we may have some remainder here So, just clean them.
    test eax eax | jz L4>  ; No remainders ? Exit
L9:
        lea edi D$edi+edx*8 ; mul edx by 8 to get the pos
        ; mov eax eax ; fix potential stallings <--- Not needed. There is no stall.
        lea esi D$esi+edx*8 ; mul edx by 8 to get the pos

L3:  movsb | dec eax | jnz L3<

L4:

EndP
Title: Re: Sorting strings
Post by: Gunther on July 04, 2014, 04:07:52 AM
Hi Rui,

Quote from: RuiLoureiro on July 04, 2014, 03:52:27 AM
    Gunther, could you run it, please ?
    Thanks

Yes, of course. The results are in the attachment.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 04, 2014, 05:45:57 AM
Thank you, Gunther  :t

Here SSE is far better
in all cases

----------------------------
Results from Gunther
----------------------------

Quote
INSERTING AT POSITION 0 -string length=100
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
***** Time table *****

   7 milliseconds, INSAtoB_SSEE-   2 bytes
   8 milliseconds, INSAtoB_SSEE-   4 bytes
   8 milliseconds, INSAtoB_SSEE-   1 bytes
   9 milliseconds, INSAtoB_SSEE-  15 bytes
  10 milliseconds, INSAtoB_SSEE-   3 bytes
  13 milliseconds, INSAtoB_SSEE-  55 bytes

  14 milliseconds, INSAtoB_XZZE-   4 bytes
  15 milliseconds, INSAtoB_SSEE- 103 bytes
  15 milliseconds, INSAtoB_XZZE-   2 bytes
  20 milliseconds, INSAtoB_XZZF-   4 bytes
  21 milliseconds, INSAtoB_XZZF-   1 bytes
  21 milliseconds, INSAtoB_SSEE- 203 bytes
  21 milliseconds, INSAtoB_XZZF-   2 bytes
  21 milliseconds, INSAtoB_XZZE-   1 bytes
  22 milliseconds, INSAtoB_XZZE-   3 bytes
  22 milliseconds, INSAtoB_XZZE-  55 bytes
  22 milliseconds, INSAtoB_XZZF-   3 bytes
  24 milliseconds, INSAtoB_XZZF-  15 bytes
  25 milliseconds, INSAtoB_XZZE-  15 bytes
  29 milliseconds, INSAtoB_XZZF-  55 bytes
  34 milliseconds, INSAtoB_SSEE- 503 bytes
  34 milliseconds, INSAtoB_XZZE- 103 bytes
  35 milliseconds, INSAtoB_XZZF- 103 bytes
  44 milliseconds, INSAtoB_SSEE-1027 bytes

  45 milliseconds, INSAtoB_XZZE- 203 bytes
  52 milliseconds, INSAtoB_XZZF- 203 bytes
  59 milliseconds, INSAtoB_X-      3 bytes
  65 milliseconds, INSAtoB_X-      2 bytes
  66 milliseconds, INSAtoB_X-      4 bytes
  67 milliseconds, INSAtoB_X-     15 bytes
  72 milliseconds, INSAtoB_BB-     1 bytes
  72 milliseconds, INSAtoB_X-      1 bytes
  74 milliseconds, INSAtoB_BB-     3 bytes
  83 milliseconds, INSAtoB_BB-    15 bytes
  86 milliseconds, INSAtoB_BB-     2 bytes
  88 milliseconds, INSAtoB_BB-     4 bytes
  90 milliseconds, INSAtoB_XZZE- 503 bytes
  91 milliseconds, INSAtoB_XZZF- 503 bytes
  98 milliseconds, INSAtoB_X-     55 bytes
115 milliseconds, INSAtoB_X-    103 bytes
121 milliseconds, INSAtoB_BA-     1 bytes
123 milliseconds, INSAtoB_BA-     3 bytes
123 milliseconds, INSAtoB_BA-     2 bytes
125 milliseconds, INSAtoB_BA-     4 bytes
138 milliseconds, INSAtoB_BB-    55 bytes
139 milliseconds, INSAtoB_BA-    15 bytes
146 milliseconds, INSAtoB_BB-   103 bytes
151 milliseconds, INSAtoB_XZZE-1027 bytes
159 milliseconds, INSAtoB_XZZF-1027 bytes
174 milliseconds, INSAtoB_X-    203 bytes
190 milliseconds, INSAtoB_BA-    55 bytes
230 milliseconds, INSAtoB_BA-   103 bytes
234 milliseconds, INSAtoB_BB-   203 bytes
321 milliseconds, INSAtoB_X-    503 bytes
343 milliseconds, INSAtoB_BA-   203 bytes
404 milliseconds, INSAtoB_BB-   503 bytes
598 milliseconds, INSAtoB_X-   1027 bytes
642 milliseconds, INSAtoB_BA-   503 bytes
765 milliseconds, INSAtoB_BB-  1027 bytes
1192 milliseconds, INSAtoB_BA-  1027 bytes
********** END III **********
Title: Re: Sorting strings
Post by: RuiLoureiro on July 04, 2014, 06:15:32 AM
Hi
        What is «movupd» ?

        Where can i see this instructions (a good reference)

        I am not able to assemble this:
        movups      xmm1, [esi+edx*8]

        I GET: instruction or register not accepted in current CPU mode
Title: Re: Sorting strings
Post by: dedndave on July 04, 2014, 06:23:17 AM
http://x86.renejeschke.de/html/file_module_x86_id_207.html (http://x86.renejeschke.de/html/file_module_x86_id_207.html)

http://x86.renejeschke.de/ (http://x86.renejeschke.de/)
Title: Re: Sorting strings
Post by: qWord on July 04, 2014, 06:38:53 AM
Quote from: RuiLoureiro on July 04, 2014, 06:15:32 AMWhere can i see this instructions (a good reference)

        I am not able to assemble this:
Did you never heard of Intel's and AMD's developer manuals?

Quote from: RuiLoureiro on July 04, 2014, 06:15:32 AM
        movups      xmm1, [esi+edx*8]

        I GET: instruction or register not accepted in current CPU mode
MOVUPD is an SSE2 instruction, which are supported since MASM v6.15  (6.14 does only support SSE). Also I would recommend you to use MOVDQU instead, because you are not dealing with FP data.
Title: Re: Sorting strings
Post by: RuiLoureiro on July 04, 2014, 07:15:41 AM
qWord,
             Â«Did you never heard of Intel's and AMD's developer manuals?»
              No, noone knows it ! :P
              I want to use movupd because guga used it.
              Thanks.
Dave,
              Thanks  :t
Title: Re: Sorting strings
Post by: guga on July 04, 2014, 07:59:13 AM
Great idea Qword. Tks for reminding  :t.

It is always worth to keep compatibility for olders CPUs...but....I don´t recall of a unaligned instruction for SSE - except movups that is for  FPU data. Does SSE1 have similar instruction movdqu (SSE2) ?

Rui, you may use movdqu insetad. It won´t change anything the performance (If i recall well on Agner´s Fog recommendations )

Quote from: Agner´s Fog Optimization Manual
"Using unaligned read instructions
The instructions MOVDQU, MOVUPS, MOVUPD and LDDQU are all able to read unaligned vectors.
LDDQU is faster than the alternatives on P4E and PM processors, but not on any later processors. The unaligned read instructions are relatively slow on older Intel processors
and on Intel Atom, but fast on Nehalem and later Intel processors as well as on AMD and VIA processors.
; Example 13.10. Unaligned vector read
; esi contains pointer to unaligned array
movdqu xmm0, [esi] ; Read vector unaligned
On contemporary processors, there is no penalty for using the unaligned instruction MOVDQU rather than the aligned MOVDQA if the data are in fact aligned. Therefore, it is convenient to
use MOVDQU if you are not sure whether the data are aligned or not."

Ref: http://www.agner.org/optimize/blog/read.php?i=285

So, you can try using the following code:



Proc memcpy_SSE_V3:
    Arguments @pDest, @pSource, @Length
    Uses esi, edi, ecx, edx, eax

    mov edi D@pDest
    mov esi D@pSource
    ; we are copying a memory from 128 to 128 bytes at once
    mov ecx D@Length
    mov eax ecx | shr ecx 4 ; integer count. Divide by 16 (4 dwords)
    jz L0> ; The memory size if smaller then 16 bytes long. Jmp over

        ; No we must compute he remainder, to see how many times we will loop
        mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes
        mov edx 0 ; here it is used as an index
        L1:
            movdqu XMM1 X$esi+edx*8 ; copy the 1st 4 dwords from esi to register XMM
            movdqu X$edi+edx*8 XMM1 ; copy the 1st 4 dwords from register XMM to edi
            dec ecx
            lea edx D$edx+2
            jnz L1<
        test eax eax | jz L4> ; No remainders ? Exit
        jmp L9> ; jmp to the remainder computation

L0:
   ; If we are here, It means that the data is smaller then 16 bytes, and we ned to compute the remainder.
   mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes

L2:

    ; If the memory is not 4 dword aligned we may have some remainder here So, just clean them.
    test eax eax | jz L4>  ; No remainders ? Exit
L9:
        lea edi D$edi+edx*8 ; mul edx by 8 to get the pos
        lea esi D$esi+edx*8 ; mul edx by 8 to get the pos

L3:  movsb | dec eax | jnz L3<

L4:

EndP


Title: Re: Sorting strings
Post by: guga on July 04, 2014, 08:49:49 AM
Another version that uses SSE1 here seems a bit faster. (About 3% faster), but still uses as FPU (I couldn´t find an alternative for movups yet :( )

Try this....


Proc memcpy_SSE_V4:
    Arguments @pDest, @pSource, @Length
    Uses esi, edi, ecx, edx, eax

    mov edi D@pDest
    mov esi D@pSource
    ; we are copying a memory from 128 to 128 bytes at once
    mov ecx D@Length
    mov eax ecx | shr ecx 4 ; integer count. Divide by 16 (4 dwords)
    jz L0> ; The memory size if smaller then 16 bytes long. Jmp over

        ; No we must compute he remainder, to see how many times we will loop
        mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes
        mov edx 0 ; here it is used as an index
        L1:
            lddqu XMM1 X$esi+edx*8 ; copy the 1st 4 dwords from esi to register XMM
            movups X$edi+edx*8 XMM1 ; copy the 1st 4 dwords from register XMM to edi
            dec ecx
            lea edx D$edx+2
            jnz L1<
        test eax eax | jz L4> ; No remainders ? Exit
        jmp L9> ; jmp to the remainder computation

L0:
   ; If we are here, It means that the data is smaller then 16 bytes, and we ned to compute the remainder.
   mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes

L2:

    ; If the memory is not 4 dword aligned we may have some remainder here So, just clean them.
    test eax eax | jz L4>  ; No remainders ? Exit
L9:
        lea edi D$edi+edx*8 ; mul edx by 8 to get the pos
        lea esi D$esi+edx*8 ; mul edx by 8 to get the pos

L3:  movsb | dec eax | jnz L3<

L4:

EndP
Title: Re: Sorting strings
Post by: nidud on July 04, 2014, 09:04:28 AM
deleted
Title: Re: Sorting strings
Post by: RuiLoureiro on July 04, 2014, 07:57:32 PM
Hi guga,
               The problem is this:

               I am not able to assemble this instruction
               movups      xmm1, [esi+edx*8]               
               I GET: instruction or register not accepted in current CPU mode
Title: Re: Sorting strings
Post by: guga on July 05, 2014, 01:19:56 AM
I don´t remember the masm syntax completelly, but you can try this initializers

http://masm32.com/board/index.php?topic=1217.0
and
http://www.masmforum.com/board/index.php?PHPSESSID=786dd40408172108b65a5a36b09c88c0&topic=15872.0


;###############################################################################################

        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
        .686p
        .MMX
        .XMM
        INCLUDE    \Masm32\Macros\Timers.asm
        .List

;###############################################################################################


And, of course, use v 6.15
Title: Re: Sorting strings
Post by: guga on July 05, 2014, 02:02:05 AM
Ok, found it...It assembled correctly as JJ suggested here
http://masm32.com/board/index.php?topic=3369.0;topicseen


Rui, try to assemble this


include \masm32\include\masm32rt.inc
.686
.xmm

.code

start:   MsgBox 0, "Hello World", "Hi", MB_OK
   exit
   movups      xmm0, [esi]
end start


If you suceed, then you can be able to assemble the timming app with the SSE instructions
Title: Re: Sorting strings
Post by: dedndave on July 05, 2014, 03:12:18 AM
prescott w/htt XP SP3
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
44875      cycles - a   1..256  (  0) crt_memcpy
37179      cycles - a   1..256  ( 89) regcopy
76425      cycles - a   1..256  ( 48) memcpy SSE
74412      cycles - a   1..256  (171) memcpyxmmU SSE
73665      cycles - a   1..256  ( 91) memcpy_SSE_V2

53738      cycles - u   1..256  (  0) crt_memcpy
70189      cycles - u   1..256  ( 89) regcopy
110211     cycles - u   1..256  ( 48) memcpy SSE
97812      cycles - u   1..256  (171) memcpyxmmU SSE
80674      cycles - u   1..256  ( 91) memcpy_SSE_V2

904814     cycles - a 400..2000 (  0) crt_memcpy
1144614    cycles - a 400..2000 ( 89) regcopy
1813848    cycles - a 400..2000 ( 48) memcpy SSE
1651549    cycles - a 400..2000 (171) memcpyxmmU SSE
1723459    cycles - a 400..2000 ( 91) memcpy_SSE_V2

1887691    cycles - u 400..2000 (  0) crt_memcpy
4090627    cycles - u 400..2000 ( 89) regcopy
4085473    cycles - u 400..2000 ( 48) memcpy SSE
4061232    cycles - u 400..2000 (171) memcpyxmmU SSE
3912585    cycles - u 400..2000 ( 91) memcpy_SSE_V2
Title: Re: Sorting strings
Post by: guga on July 05, 2014, 03:54:11 AM
Can someone please test this zeromem routine ?




Proc ZeroMem_SSE:
    Arguments @pMem, @Length
    Uses esi, edi, ecx, edx, eax

    mov edi D@pMem
    ; we are copying a memory from 128 to 128 bytes at once
    mov ecx D@Length
    mov eax ecx | shr ecx 4 ; integer count. Divide by 16 (4 dwords)
    jz L0> ; The memory size if smaller then 16 bytes long. Jmp over

        pxor XMM1 XMM1 ; clear XMM1 register
        ; No we must compute he remainder, to see how many times we will loop
        mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes
        mov edx 0 ; here it is used as an index
        L1:
            movdqu X$edi+edx*8 XMM1 ; copy the 1st 4 dwords from register XMM to edi
            dec ecx
            lea edx D$edx+2
            jnz L1<
        test eax eax | jz L4> ; No remainders ? Exit
        jmp L9> ; jmp to the remainder computation

L0:
   ; If we are here, It means that the data is smaller then 16 bytes, and we ned to compute the remainder.
   mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes

L2:

    ; If the memory is not 4 dword aligned we may have some remainder here So, just clean them.
    test eax eax | jz L4>  ; No remainders ? Exit
L9:
        lea edi D$edi+edx*8 ; mul edx by 8 to get the pos

L3:  mov B$edi+eax-1 0 | dec eax | jnz L3<

L4:

EndP
Title: Re: Sorting strings
Post by: nidud on July 05, 2014, 06:34:22 AM
deleted
Title: Re: Sorting strings
Post by: dedndave on July 05, 2014, 07:56:19 AM
i would modify that code so that it aligns itself, first
should be faster for large blocks   :t

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
------------------------------------------------------
47730      cycles - a   1..256  (  0) crt_memset
131030     cycles - a   1..256  ( 22) stosb
41845      cycles - a   1..256  ( 67) memzero
54881      cycles - a   1..256  ( 80) ZeroMem_SSE

55975      cycles - u   1..256  (  0) crt_memset
148928     cycles - u   1..256  ( 22) stosb
62356      cycles - u   1..256  ( 67) memzero
77576      cycles - u   1..256  ( 80) ZeroMem_SSE

13075466   cycles - a 400..8192 (  0) crt_memset
14596370   cycles - a 400..8192 ( 22) stosb
12471946   cycles - a 400..8192 ( 67) memzero
23698809   cycles - a 400..8192 ( 80) ZeroMem_SSE

24246850   cycles - u 400..8192 (  0) crt_memset
94611193   cycles - u 400..8192 ( 22) stosb
22922458   cycles - u 400..8192 ( 67) memzero
71302961   cycles - u 400..8192 ( 80) ZeroMem_SSE
Title: Re: Sorting strings
Post by: jj2007 on July 05, 2014, 10:35:30 AM
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
------------------------------------------------------
25333      cycles - a   1..256  (  0) crt_memcpy
23268      cycles - a   1..256  ( 98) memcpy
23663      cycles - a   1..256  (157) memcpy SSE2
24829      cycles - a   1..256  ( 89) regcopy
24889      cycles - a   1..256  (172) memcpyxmmU SSE
34273      cycles - a   1..256  ( 87) memcpy_SSE_V2
34119      cycles - a   1..256  ( 84) memcpy_SSE_V4

94394      cycles - u   1..256  (  0) crt_memcpy
69785      cycles - u   1..256  ( 98) memcpy
47876      cycles - u   1..256  (157) memcpy SSE2
56750      cycles - u   1..256  ( 89) regcopy
45682      cycles - u   1..256  (172) memcpyxmmU SSE
54575      cycles - u   1..256  ( 87) memcpy_SSE_V2
54696      cycles - u   1..256  ( 84) memcpy_SSE_V4

2127610    cycles - a 400..4000 (  0) crt_memcpy
2070954    cycles - a 400..4000 ( 98) memcpy
3257495    cycles - a 400..4000 (157) memcpy SSE2
2717699    cycles - a 400..4000 ( 89) regcopy
3279839    cycles - a 400..4000 (172) memcpyxmmU SSE
3782636    cycles - a 400..4000 ( 87) memcpy_SSE_V2
3785678    cycles - a 400..4000 ( 84) memcpy_SSE_V4

20412269   cycles - u 400..4000 (  0) crt_memcpy
12605121   cycles - u 400..4000 ( 98) memcpy
5820491    cycles - u 400..4000 (157) memcpy SSE2
8852658    cycles - u 400..4000 ( 89) regcopy
4978994    cycles - u 400..4000 (172) memcpyxmmU SSE
7896637    cycles - u 400..4000 ( 87) memcpy_SSE_V2
7894145    cycles - u 400..4000 ( 84) memcpy_SSE_V4
Title: Re: Sorting strings
Post by: guga on July 05, 2014, 10:52:33 AM
This ?




Proc ZeroMem_SSE:
    Arguments @pMem, @Length
    Uses esi, edi, ecx, edx, eax

    mov edi D@pMem
    ; we are copying a memory from 128 to 128 bytes at once
    mov ecx D@Length
    mov eax ecx | shr ecx 4 ; integer count. Divide by 16 (4 dwords)
    jz L0> ; The memory size if smaller then 16 bytes long. Jmp over
        PREFETCHNTA B$edi
        align 16;16
        pxor XMM1 XMM1 ; clear XMM1 register
        ; No we must compute he remainder, to see how many times we will loop
        mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes
        mov edx 0 ; here it is used as an index
        L1:
            movdqu X$edi+edx*8 XMM1 ; copy the 1st 4 dwords from register XMM to edi
            dec ecx
            lea edx D$edx+2
            jnz L1<
        test eax eax | jz L4> ; No remainders ? Exit
        jmp L9> ; jmp to the remainder computation

L0:
   ; If we are here, It means that the data is smaller then 16 bytes, and we ned to compute the remainder.
   mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes

L2:

    ; If the memory is not 4 dword aligned we may have some remainder here So, just clean them.
    test eax eax | jz L4>  ; No remainders ? Exit
L9:
        lea edi D$edi+edx*8 ; mul edx by 8 to get the pos

L3:  mov B$edi+eax-1 0 | dec eax | jnz L3<

L4:

EndP

Title: Re: Sorting strings
Post by: dedndave on July 05, 2014, 12:29:14 PM
that's GoAsm syntax, Gustavo
Dave <--- Masm guy   :biggrin:
Title: Re: Sorting strings
Post by: guga on July 05, 2014, 02:40:54 PM
Hmm...i don´t remember the proper syntax for masm.

But it should be something like:


ZeroMem_SSE Proc
    pMem:DWORD, Length:DWORD

    Uses esi, edi, ecx, edx, eax <---Just a RosAsm macro to push/pop those registers after and before the end of a function

    mov edi pMem
    ; we are copying a memory from 128 to 128 bytes at once
    mov ecx Length
    mov eax ecx
    shr ecx 4 ; integer count. Divide by 16 (4 dwords)
    jz L0 ; The memory size if smaller then 16 bytes long. Jmp over
        PREFETCHNTA Byte ptr [edi]
        align 16 <---- don´t recall the syntax for masm...but should be the same
        pxor XMM1 XMM1 ; clear XMM1 register
        ; No we must compute he remainder, to see how many times we will loop
        mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes
        mov edx 0 ; here it is used as an index
        L1:
            movdqu [edi+edx*8 XMM1] ; copy the 1st 4 dwords from register XMM to edi
            dec ecx
            lea edx [edx+2]
            jnz L1
        test eax eax
        jz L4 ; No remainders ? Exit
        jmp L9 ; jmp to the remainder computation

L0:
   ; If we are here, It means that the data is smaller then 16 bytes, and we ned to compute the remainder.
   mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes

L2:

    ; If the memory is not 4 dword aligned we may have some remainder here So, just clean them.
    test eax eax
    jz L4>  ; No remainders ? Exit
L9:
        lea edi [edi+edx*8] ; mul edx by 8 to get the pos

L3:  mov [edi+eax-1] 0 | dec eax | jnz L3

L4:

ZeroMem_SSE endp


Although it has been some years since i last coded using masm syntax, basically both are not at all that different (except, for the macros usage). The main differences i see is due to the fact that on RosAsm we must set the size directive to make easier to "see" what is a dword a word, byte etc.
D$ = dword ptr etc
W$ = word ptr etc
B$ = byte ptr
X$ = any type size used mainly in SSE
T$ = terabyte ptr
F$ = floating ptr
R$ = real floating ptr
Q$ = quadword ptr

Those are the mainly differences.

The rest is basically from the usage or not of the macros (I like to use due to readability)
Title: Re: Sorting strings
Post by: dedndave on July 05, 2014, 03:42:43 PM
T is probably TenByte Ptr   :P
Title: Re: Sorting strings
Post by: guga on July 05, 2014, 04:09:04 PM
 :biggrin: my bad  :icon_mrgreen: TenByte not "Terabyte" :icon_mrgreen: :icon_mrgreen:
Title: Re: Sorting strings
Post by: Gunther on July 05, 2014, 09:04:55 PM
Hi nidud,

results for memcpy:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
14554      cycles - a   1..256  (  0) crt_memcpy
14385      cycles - a   1..256  ( 98) memcpy
12017      cycles - a   1..256  (157) memcpy SSE2
13916      cycles - a   1..256  ( 89) regcopy
12463      cycles - a   1..256  (172) memcpyxmmU SSE
39451      cycles - a   1..256  ( 87) memcpy_SSE_V2
46031      cycles - a   1..256  ( 84) memcpy_SSE_V4

44040      cycles - u   1..256  (  0) crt_memcpy
41462      cycles - u   1..256  ( 98) memcpy
29933      cycles - u   1..256  (157) memcpy SSE2
37559      cycles - u   1..256  ( 89) regcopy
30145      cycles - u   1..256  (172) memcpyxmmU SSE
41172      cycles - u   1..256  ( 87) memcpy_SSE_V2
45774      cycles - u   1..256  ( 84) memcpy_SSE_V4

757294     cycles - a 400..4000 (  0) crt_memcpy
767330     cycles - a 400..4000 ( 98) memcpy
668195     cycles - a 400..4000 (157) memcpy SSE2
2342388    cycles - a 400..4000 ( 89) regcopy
643145     cycles - a 400..4000 (172) memcpyxmmU SSE
1147039    cycles - a 400..4000 ( 87) memcpy_SSE_V2
1286232    cycles - a 400..4000 ( 84) memcpy_SSE_V4

2441358    cycles - u 400..4000 (  0) crt_memcpy
2334729    cycles - u 400..4000 ( 98) memcpy
1461821    cycles - u 400..4000 (157) memcpy SSE2
2672089    cycles - u 400..4000 ( 89) regcopy
1289715    cycles - u 400..4000 (172) memcpyxmmU SSE
1159220    cycles - u 400..4000 ( 87) memcpy_SSE_V2
1288160    cycles - u 400..4000 ( 84) memcpy_SSE_V4
--- ok ---

Results for memzero:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
------------------------------------------------------
39303      cycles - a   1..256  (  0) crt_memset
22441      cycles - a   1..256  ( 22) stosb
32617      cycles - a   1..256  ( 67) memzero
22725      cycles - a   1..256  ( 80) ZeroMem_SSE

41195      cycles - u   1..256  (  0) crt_memset
22984      cycles - u   1..256  ( 22) stosb
35609      cycles - u   1..256  ( 67) memzero
23456      cycles - u   1..256  ( 80) ZeroMem_SSE

2369821    cycles - a 400..8192 (  0) crt_memset
2143114    cycles - a 400..8192 ( 22) stosb
2355755    cycles - a 400..8192 ( 67) memzero
4063180    cycles - a 400..8192 ( 80) ZeroMem_SSE

2739486    cycles - u 400..8192 (  0) crt_memset
2944992    cycles - u 400..8192 ( 22) stosb
3212215    cycles - u 400..8192 ( 67) memzero
4038264    cycles - u 400..8192 ( 80) ZeroMem_SSE
--- ok ---


Gunther
Title: Re: Sorting strings
Post by: jj2007 on July 06, 2014, 11:42:56 AM
Quote from: guga on July 05, 2014, 10:52:33 AM
This ?

I've given it a try, Gustavo - version 0.01 of Ros2Masm attached here (http://masm32.com/board/index.php?topic=3375.msg35680#msg35680).below.
It expects a text file in the commandline; if there is no argument, RosAsmTest.asm is assumed. Use Console Build All in qEditor.
Title: Re: Sorting strings
Post by: dedndave on July 06, 2014, 12:20:43 PM
you have to test aligned and unaligned starting addresses, of course   :P
Title: Re: Sorting strings
Post by: guga on July 06, 2014, 12:24:39 PM
WOW..great work, JJ :t :t
Title: Re: Sorting strings
Post by: guga on July 06, 2014, 12:30:22 PM
 :icon_mrgreen: Sure, dave. The aligned version works faster :):):)

I tested it here and it works like a gem . Many tks :t
Title: Re: Sorting strings
Post by: RuiLoureiro on July 08, 2014, 02:35:07 AM
Hi guga

I couldn't do anything this weekend. I was sick (and...).
About your memcpy_SSE_V4, you should improve it

    1. If it jumps to L0, ECX=0.
        So we don't need to do this: mov edx ecx | shl edx 4
    2. L2 is not used
    3. If we do «sub eax edx» we don't need: test eax eax
       but only jz L4>
   
Quote
Proc memcpy_SSE_V4:
    Arguments @pDest, @pSource, @Length
    Uses esi, edi, ecx, edx, eax

    mov edi D@pDest
    mov esi D@pSource
    ; we are copying a memory from 128 to 128 bytes at once
    mov ecx D@Length
    mov eax ecx | shr ecx 4 ; integer count. Divide by 16 (4 dwords)
    jz L0> ; The memory size if smaller then 16 bytes long. Jmp over

        ; No we must compute he remainder, to see how many times we will loop
        mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes
        mov edx 0 ; here it is used as an index
        L1:
            lddqu XMM1 X$esi+edx*8 ; copy the 1st 4 dwords from esi to register XMM
            movups X$edi+edx*8 XMM1 ; copy the 1st 4 dwords from register XMM to edi
            dec ecx
            lea edx D$edx+2
            jnz L1<
        test eax eax | jz L4> ; No remainders ? Exit
        jmp L9> ; jmp to the remainder computation
L0:
   ; If we are here, It means that the data is smaller then 16 bytes, and we ned to compute the remainder.
   mov edx ecx | shl edx 4 | sub eax edx ; remainder. It can only have be 0 to 15 remainders bytes

L2:

    ; If the memory is not 4 dword aligned we may have some remainder here So, just clean them.
    test eax eax | jz L4>  ; No remainders ? Exit
L9:
        lea edi D$edi+edx*8 ; mul edx by 8 to get the pos
        lea esi D$esi+edx*8 ; mul edx by 8 to get the pos
L3:  movsb | dec eax | jnz L3<
L4:
EndP
Title: Re: Sorting strings
Post by: RuiLoureiro on July 08, 2014, 02:46:28 AM
Hi,
        These are my results.
       
        MOVEAtoB_SSEG is a macro with your V4 (+/-)
        (MOVEAtoB_SSEG srcName, dstName, cpyLen)
       
        Could you post your results ?

        Gunther, could you run this CopyString47, and CopyString48, please ?
        Thanks
     
**replace 128 BYTES by 128 BITS

Quote
NOT ALIGNED
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  5 milliseconds, MOVEAtoB_SSEE-  13 bytes- copy 128 BYTES+MOVZX
  5 milliseconds, MOVEAtoB_XZZF-  13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_XZZC-  13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_XZE-   13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_XZD-   13 bytes- copy lenght DWORDS+MOVZX
  5 milliseconds, MOVEAtoB_XZC-   13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZZA-  13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZZD-  13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZZE-  13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZZB-  13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZB-   13 bytes- copy lenght DWORDS+MOVZX
  8 milliseconds, MOVEAtoB_XZA-   13 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZD-   53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZB-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZC-   53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZD-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZB-   53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZA-  53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZA-   53 bytes- copy lenght DWORDS+MOVZX
12 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
18 milliseconds, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
34 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
34 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
35 milliseconds, MOVEAtoB_XZZA- 103 bytes- copy lenght DWORDS+MOVZX
35 milliseconds, MOVEAtoB_XZZB- 103 bytes- copy lenght DWORDS+MOVZX
35 milliseconds, MOVEAtoB_XZB-  103 bytes- copy lenght DWORDS+MOVZX
35 milliseconds, MOVEAtoB_XZD-  103 bytes- copy lenght DWORDS+MOVZX
35 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
35 milliseconds, MOVEAtoB_XZZD- 103 bytes- copy lenght DWORDS+MOVZX
35 milliseconds, MOVEAtoB_XZC-  103 bytes- copy lenght DWORDS+MOVZX
37 milliseconds, MOVEAtoB_SSEE53 bytes- copy 128 BYTES+MOVZX
37 milliseconds, MOVEAtoB_XZA-  103 bytes- copy lenght DWORDS+MOVZX
39 milliseconds, MOVEAtoB_SSEH-  53 bytes- copy 128 BYTES+MOVZX
40 milliseconds, MOVEAtoB_SSEG53 bytes- copy 128 BYTES+MOVZX
42 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
42 milliseconds, MOVEAtoB_SSEG-  13 bytes- copy 128 BYTES+MOVZX
42 milliseconds, MOVEAtoB_SSEH-  13 bytes- copy 128 BYTES+MOVZX
51 milliseconds, MOVEAtoB_SSEE- 103 bytes- copy 128 BYTES+MOVZX
52 milliseconds, MOVEAtoB_SSEH- 103 bytes- copy 128 BYTES+MOVZX
56 milliseconds, MOVEAtoB_SSEG- 103 bytes- copy 128 BYTES+MOVZX
57 milliseconds, MOVEAtoB_XZB-  203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZZA- 203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZC-  203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZZD- 203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZD-  203 bytes- copy lenght DWORDS+MOVZX
57 milliseconds, MOVEAtoB_XZZB- 203 bytes- copy lenght DWORDS+MOVZX
58 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
59 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
59 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
61 milliseconds, MOVEAtoB_XZA-  203 bytes- copy lenght DWORDS+MOVZX
122 milliseconds, MOVEAtoB_SSEH- 203 bytes- copy 128 BYTES+MOVZX
124 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
124 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
124 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
124 milliseconds, MOVEAtoB_XZC-  503 bytes- copy lenght DWORDS+MOVZX
124 milliseconds, MOVEAtoB_XZZA- 503 bytes- copy lenght DWORDS+MOVZX
124 milliseconds, MOVEAtoB_XZZB- 503 bytes- copy lenght DWORDS+MOVZX
124 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
124 milliseconds, MOVEAtoB_SSEG- 203 bytes- copy 128 BYTES+MOVZX
125 milliseconds, MOVEAtoB_XZD-  503 bytes- copy lenght DWORDS+MOVZX
125 milliseconds, MOVEAtoB_SSEE- 203 bytes- copy 128 BYTES+MOVZX
126 milliseconds, MOVEAtoB_XZZD- 503 bytes- copy lenght DWORDS+MOVZX
126 milliseconds, MOVEAtoB_XZB-  503 bytes- copy lenght DWORDS+MOVZX
145 milliseconds, MOVEAtoB_XZA-  503 bytes- copy lenght DWORDS+MOVZX
238 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
238 milliseconds, MOVEAtoB_XZZA-1027 bytes- copy lenght DWORDS+MOVZX
238 milliseconds, MOVEAtoB_XZZB-1027 bytes- copy lenght DWORDS+MOVZX
238 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
239 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
240 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
240 milliseconds, MOVEAtoB_XZB- 1027 bytes- copy lenght DWORDS+MOVZX
240 milliseconds, MOVEAtoB_XZZD-1027 bytes- copy lenght DWORDS+MOVZX
241 milliseconds, MOVEAtoB_XZA- 1027 bytes- copy lenght DWORDS+MOVZX
242 milliseconds, MOVEAtoB_XZC- 1027 bytes- copy lenght DWORDS+MOVZX
249 milliseconds, MOVEAtoB_XZD- 1027 bytes- copy lenght DWORDS+MOVZX

308 milliseconds, MOVEAtoB_SSEG- 503 bytes- copy 128 BYTES+MOVZX
312 milliseconds, MOVEAtoB_SSEH- 503 bytes- copy 128 BYTES+MOVZX
317 milliseconds, MOVEAtoB_SSEE- 503 bytes- copy 128 BYTES+MOVZX
601 milliseconds, MOVEAtoB_SSEG-1027 bytes- copy 128 BYTES+MOVZX
629 milliseconds, MOVEAtoB_SSEH-1027 bytes- copy 128 BYTES+MOVZX
674 milliseconds, MOVEAtoB_SSEE-1027 bytes- copy 128 BYTES+MOVZX
********** END III **********
Quote
NOT ALIGNED
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  5 milliseconds, MOVEAtoB_SSED-  13 bytes- copy 128 BYTES+MOVZX
  5 milliseconds, MOVEAtoB_SSEF-  13 bytes- copy 128 BYTES+MOVZX
  5 milliseconds, MOVEAtoB_SSEE-  13 bytes- copy 128 BYTES+MOVZX
  5 milliseconds, MOVEAtoB_XZZF-  13 bytes- copy lenght DWORDS+MOVZX
  6 milliseconds, MOVEAtoB_XZZC-  13 bytes- copy lenght DWORDS+MOVZX
  8 milliseconds, MOVEAtoB_XZE-   13 bytes- copy lenght DWORDS+MOVZX
14 milliseconds, MOVEAtoB_XZZE-  13 bytes- copy lenght DWORDS+MOVZX

15 milliseconds, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
15 milliseconds, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
17 milliseconds, MOVEAtoB_XZZC53 bytes- copy lenght DWORDS+MOVZX
18 milliseconds, MOVEAtoB_SSEF-  53 bytes- copy 128 BYTES+MOVZX
20 milliseconds, MOVEAtoB_SSED-  53 bytes- copy 128 BYTES+MOVZX
25 milliseconds, MOVEAtoB_SSEE-  53 bytes- copy 128 BYTES+MOVZX
29 milliseconds, MOVEAtoB_SSEG-  53 bytes- copy 128 BYTES+MOVZX
29 milliseconds, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
29 milliseconds, MOVEAtoB_SSEH53 bytes- copy 128 BYTES+MOVZX

30 milliseconds, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
30 milliseconds, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
31 milliseconds, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
32 milliseconds, MOVEAtoB_SSED- 103 bytes- copy 128 BYTES+MOVZX
34 milliseconds, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
35 milliseconds, MOVEAtoB_SSEE- 103 bytes- copy 128 BYTES+MOVZX
40 milliseconds, MOVEAtoB_SSEF- 103 bytes- copy 128 BYTES+MOVZX
42 milliseconds, MOVEAtoB_SSEH-  13 bytes- copy 128 BYTES+MOVZX
43 milliseconds, MOVEAtoB_SSEG-  13 bytes- copy 128 BYTES+MOVZX

48 milliseconds, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
49 milliseconds, MOVEAtoB_SSEH- 103 bytes- copy 128 BYTES+MOVZX
49 milliseconds, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
49 milliseconds, MOVEAtoB_SSEG- 103 bytes- copy 128 BYTES+MOVZX
51 milliseconds, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
53 milliseconds, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
67 milliseconds, MOVEAtoB_SSED- 203 bytes- copy 128 BYTES+MOVZX
67 milliseconds, MOVEAtoB_SSEF- 203 bytes- copy 128 BYTES+MOVZX
70 milliseconds, MOVEAtoB_SSEE- 203 bytes- copy 128 BYTES+MOVZX
91 milliseconds, MOVEAtoB_SSEH- 203 bytes- copy 128 BYTES+MOVZX
92 milliseconds, MOVEAtoB_SSEG- 203 bytes- copy 128 BYTES+MOVZX

99 milliseconds, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
99 milliseconds, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
114 milliseconds, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
116 milliseconds, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
127 milliseconds, MOVEAtoB_SSED- 503 bytes- copy 128 BYTES+MOVZX
127 milliseconds, MOVEAtoB_SSEE- 503 bytes- copy 128 BYTES+MOVZX
129 milliseconds, MOVEAtoB_SSEF- 503 bytes- copy 128 BYTES+MOVZX
144 milliseconds, MOVEAtoB_SSEH- 503 bytes- copy 128 BYTES+MOVZX
151 milliseconds, MOVEAtoB_SSEG- 503 bytes- copy 128 BYTES+MOVZX

185 milliseconds, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
187 milliseconds, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
193 milliseconds, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
209 milliseconds, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
241 milliseconds, MOVEAtoB_SSEE-1027 bytes- copy 128 BYTES+MOVZX
241 milliseconds, MOVEAtoB_SSEF-1027 bytes- copy 128 BYTES+MOVZX
243 milliseconds, MOVEAtoB_SSED-1027 bytes- copy 128 BYTES+MOVZX
248 milliseconds, MOVEAtoB_SSEG-1027 bytes- copy 128 BYTES+MOVZX
252 milliseconds, MOVEAtoB_SSEH-1027 bytes- copy 128 BYTES+MOVZX
********** END III **********
Quote
NOT ALIGNED
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
------------------------------------------------------
***** Time table *****

  16 cycles, MOVEAtoB_SSED-  13 bytes- copy 128 BITS+MOVZX
  17 cycles, MOVEAtoB_SSEF-  13 bytes- copy 128 BITS+MOVZX
  18 cycles, MOVEAtoB_SSEE-  13 bytes- copy 128 BITS+MOVZX
  18 cycles, MOVEAtoB_XZZF-  13 bytes- copy lenght DWORDS+MOVZX
  18 cycles, MOVEAtoB_XZZE-  13 bytes- copy lenght DWORDS+MOVZX
  19 cycles, MOVEAtoB_XZZC-  13 bytes- copy lenght DWORDS+MOVZX
  27 cycles, MOVEAtoB_XZE-   13 bytes- copy lenght DWORDS+MOVZX
 
  39 cycles, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
  40 cycles, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
  40 cycles, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
  40 cycles, MOVEAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
117 cycles, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
117 cycles, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
118 cycles, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
120 cycles, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX

121 cycles, MOVEAtoB_SSED-  53 bytes- copy 128 BITS+MOVZX
126 cycles, MOVEAtoB_SSEF-  53 bytes- copy 128 BITS+MOVZX
129 cycles, MOVEAtoB_SSEE-  53 bytes- copy 128 BITS+MOVZX
131 cycles, MOVEAtoB_SSEH-  53 bytes- copy 128 BITS+MOVZX
137 cycles, MOVEAtoB_SSEG-  53 bytes- copy 128 BITS+MOVZX
144 cycles, MOVEAtoB_SSEH-  13 bytes- copy 128 BITS+MOVZX
144 cycles, MOVEAtoB_SSEG-  13 bytes- copy 128 BITS+MOVZX
171 cycles, MOVEAtoB_SSED- 103 bytes- copy 128 BITS+MOVZX
173 cycles, MOVEAtoB_SSEF- 103 bytes- copy 128 BITS+MOVZX
175 cycles, MOVEAtoB_SSEE- 103 bytes- copy 128 BITS+MOVZX
178 cycles, MOVEAtoB_SSEH- 103 bytes- copy 128 BITS+MOVZX
188 cycles, MOVEAtoB_SSEG- 103 bytes- copy 128 BITS+MOVZX

193 cycles, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
194 cycles, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
196 cycles, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
210 cycles, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
367 cycles, MOVEAtoB_SSED- 203 bytes- copy 128 BITS+MOVZX
410 cycles, MOVEAtoB_SSEG- 203 bytes- copy 128 BITS+MOVZX
423 cycles, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
425 cycles, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
426 cycles, MOVEAtoB_SSEE- 203 bytes- copy 128 BITS+MOVZX
427 cycles, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
437 cycles, MOVEAtoB_SSEH- 203 bytes- copy 128 BITS+MOVZX
442 cycles, MOVEAtoB_SSEF- 203 bytes- copy 128 BITS+MOVZX

493 cycles, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
823 cycles, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
835 cycles, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
933 cycles, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
987 cycles, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
1024 cycles, MOVEAtoB_SSED- 503 bytes- copy 128 BITS+MOVZX
1043 cycles, MOVEAtoB_SSEG- 503 bytes- copy 128 BITS+MOVZX
1064 cycles, MOVEAtoB_SSEH- 503 bytes- copy 128 BITS+MOVZX
1077 cycles, MOVEAtoB_SSEF- 503 bytes- copy 128 BITS+MOVZX
1081 cycles, MOVEAtoB_SSEE- 503 bytes- copy 128 BITS+MOVZX
2045 cycles, MOVEAtoB_SSEG-1027 bytes- copy 128 BITS+MOVZX
2117 cycles, MOVEAtoB_SSED-1027 bytes- copy 128 BITS+MOVZX
2141 cycles, MOVEAtoB_SSEH-1027 bytes- copy 128 BITS+MOVZX
2293 cycles, MOVEAtoB_SSEE-1027 bytes- copy 128 BITS+MOVZX
2293 cycles, MOVEAtoB_SSEF-1027 bytes- copy 128 BITS+MOVZX
********** END III **********
Title: Re: Sorting strings
Post by: RuiLoureiro on July 09, 2014, 08:17:51 PM
Hi,
        Now, i used crt_memcpy, memcpy_1, memcpy_2, memcpy_3
        procedures.
        Copying 16 bytes at a time we don't get the best results in my P4.

        The versions YZZE, YZZG and YZZH copy 16 bytes at a time using
        the 32 bit registers.
        The strings are not aligned.
       
        Gunther, could you post your results, if you dont mind ?
        Thank you.
       
        These are my results.
Quote
NOT ALIGNED
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****
  32 cycles, COPYAtoB_SSEDS-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
  32 cycles, COPYAtoB_SSEES-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
  32 cycles, COPYAtoB_XZZFS-  15 bytes- copy lenght DWORDS+MOVZX- uses ESP
  33 cycles, COPYAtoB_XZES-   15 bytes- copy lenght DWORDS+MOVZX- uses ESP
  34 cycles, COPYAtoB_XZZES-  15 bytes- copy lenght DWORDS+MOVZX- uses ESP
  35 cycles, COPYAtoB_XZZCS-  15 bytes- copy lenght DWORDS+MOVZX- uses ESP
  35 cycles, COPYAtoB_SSEFS-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
  35 cycles, crt_memcpy-      15 bytes- copy crt_memcpy
  35 cycles, COPYAtoB_SSED-   15 bytes- copy 16 BYTES+MOVZX
  35 cycles, COPYAtoB_XZZF-   15 bytes- copy lenght DWORDS+MOVZX
  36 cycles, COPYAtoB_XZZC-   15 bytes- copy lenght DWORDS+MOVZX

  37 cycles, COPYAtoB_YZZH-   15 bytes- copy length 16 BYTES+MOVZX  <<<<---
 
  37 cycles, COPYAtoB_SSEF-   15 bytes- copy 16 BYTES+MOVZX
  38 cycles, COPYAtoB_SSEE-   15 bytes- copy 16 BYTES+MOVZX
  39 cycles, COPYAtoB_XZZE-   15 bytes- copy lenght DWORDS+MOVZX
  41 cycles, COPYAtoB_YZZG-   15 bytes- copy length 16 BYTES+MOVZX  <<<<---
 
  41 cycles, COPYAtoB_YZZE-   15 bytes- copy length 16 BYTES+MOVZX
  45 cycles, COPYAtoB_XZE-    15 bytes- copy lenght DWORDS+MOVZX
  50 cycles, COPYAtoB_WZZE-   15 bytes- copy 16 BYTES+MOVZX
  63 cycles, COPYAtoB_XZES-   53 bytes- copy lenght DWORDS+MOVZX- uses ESP
  65 cycles, COPYAtoB_XZZFS-  53 bytes- copy lenght DWORDS+MOVZX- uses ESP

  69 cycles, memcpy_1-        53 bytes- copy regcopy                >>>>---
  72 cycles, COPYAtoB_XZZES-  53 bytes- copy lenght DWORDS+MOVZX- uses ESP
  74 cycles, COPYAtoB_XZZCS-  53 bytes- copy lenght DWORDS+MOVZX- uses ESP

  74 cycles, COPYAtoB_YZZG-   53 bytes- copy length 16 BYTES+MOVZX  <<<<---

  75 cycles, COPYAtoB_XZZE-   53 bytes- copy lenght DWORDS+MOVZX
  75 cycles, COPYAtoB_XZE-    53 bytes- copy lenght DWORDS+MOVZX
  75 cycles, COPYAtoB_XZZF-   53 bytes- copy lenght DWORDS+MOVZX
  76 cycles, COPYAtoB_YZZE-   53 bytes- copy length 16 BYTES+MOVZX
  77 cycles, COPYAtoB_XZZC-   53 bytes- copy lenght DWORDS+MOVZX

  78 cycles, memcpy_1-        15 bytes- copy regcopy                >>>>---
 
  78 cycles, COPYAtoB_WZZE-   53 bytes- copy 16 BYTES+MOVZX       

  82 cycles, COPYAtoB_YZZH-   53 bytes- copy length 16 BYTES+MOVZX  <<<<---
 
  85 cycles, COPYAtoB_SSEDS-  53 bytes- copy 16 BYTES+MOVZX- uses ESP
  90 cycles, COPYAtoB_SSEFS-  53 bytes- copy 16 BYTES+MOVZX- uses ESP
  91 cycles, COPYAtoB_SSEES-  53 bytes- copy 16 BYTES+MOVZX- uses ESP
  93 cycles, COPYAtoB_SSEF-   53 bytes- copy 16 BYTES+MOVZX
  93 cycles, COPYAtoB_SSED-   53 bytes- copy 16 BYTES+MOVZX
  93 cycles, COPYAtoB_SSEE-   53 bytes- copy 16 BYTES+MOVZX
106 cycles, COPYAtoB_SSEHS-  53 bytes- copy 16 BYTES+MOVZX- uses ESP

106 cycles, memcpy_1-       103 bytes- copy regcopy                >>>>---
107 cycles, crt_memcpy-      53 bytes- copy crt_memcpy
109 cycles, COPYAtoB_YZZH103 bytes- copy length 16 BYTES+MOVZX  <<<<---
109 cycles, COPYAtoB_YZZG-  103 bytes- copy length 16 BYTES+MOVZX  <<<<---

110 cycles, COPYAtoB_SSEH-   53 bytes- copy 16 BYTES+MOVZX
110 cycles, COPYAtoB_SSEG-   53 bytes- copy 16 BYTES+MOVZX
110 cycles, COPYAtoB_SSEGS-  53 bytes- copy 16 BYTES+MOVZX- uses ESP
111 cycles, COPYAtoB_YZZE-  103 bytes- copy length 16 BYTES+MOVZX
113 cycles, COPYAtoB_WZZE-  103 bytes- copy 16 BYTES+MOVZX
117 cycles, COPYAtoB_XZZCS- 103 bytes- copy lenght DWORDS+MOVZX- uses ESP
117 cycles, COPYAtoB_XZZFS- 103 bytes- copy lenght DWORDS+MOVZX- uses ESP
117 cycles, COPYAtoB_XZZES- 103 bytes- copy lenght DWORDS+MOVZX- uses ESP
117 cycles, COPYAtoB_XZES-  103 bytes- copy lenght DWORDS+MOVZX- uses ESP
119 cycles, COPYAtoB_XZZE-  103 bytes- copy lenght DWORDS+MOVZX
120 cycles, COPYAtoB_XZZC-  103 bytes- copy lenght DWORDS+MOVZX
120 cycles, COPYAtoB_XZE-   103 bytes- copy lenght DWORDS+MOVZX
122 cycles, COPYAtoB_XZZF-  103 bytes- copy lenght DWORDS+MOVZX
141 cycles, crt_memcpy-     103 bytes- copy crt_memcpy

173 cycles, COPYAtoB_YZZH203 bytes- copy length 16 BYTES+MOVZX  <<<<---
173 cycles, COPYAtoB_YZZG-  203 bytes- copy length 16 BYTES+MOVZX  <<<<---

174 cycles, COPYAtoB_YZZE-  203 bytes- copy length 16 BYTES+MOVZX
175 cycles, COPYAtoB_SSEH-   15 bytes- copy 16 BYTES+MOVZX
176 cycles, COPYAtoB_SSEG-   15 bytes- copy 16 BYTES+MOVZX
177 cycles, COPYAtoB_SSEHS-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
177 cycles, memcpy_1-       203 bytes- copy regcopy                >>>>---

177 cycles, COPYAtoB_SSEGS-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
188 cycles, COPYAtoB_SSEDS- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
205 cycles, COPYAtoB_XZES-  203 bytes- copy lenght DWORDS+MOVZX- uses ESP
206 cycles, COPYAtoB_XZZFS- 203 bytes- copy lenght DWORDS+MOVZX- uses ESP
207 cycles, COPYAtoB_XZZCS- 203 bytes- copy lenght DWORDS+MOVZX- uses ESP
208 cycles, COPYAtoB_SSEFS- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
209 cycles, COPYAtoB_SSEHS- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
210 cycles, COPYAtoB_SSED-  103 bytes- copy 16 BYTES+MOVZX
210 cycles, COPYAtoB_XZZC-  203 bytes- copy lenght DWORDS+MOVZX
211 cycles, COPYAtoB_SSEES- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
211 cycles, COPYAtoB_XZZE-  203 bytes- copy lenght DWORDS+MOVZX
211 cycles, COPYAtoB_XZZF-  203 bytes- copy lenght DWORDS+MOVZX
212 cycles, COPYAtoB_XZE-   203 bytes- copy lenght DWORDS+MOVZX
213 cycles, COPYAtoB_SSEF-  103 bytes- copy 16 BYTES+MOVZX
213 cycles, COPYAtoB_SSEE-  103 bytes- copy 16 BYTES+MOVZX
217 cycles, COPYAtoB_SSEH-  103 bytes- copy 16 BYTES+MOVZX
217 cycles, COPYAtoB_WZZE-  203 bytes- copy 16 BYTES+MOVZX
223 cycles, crt_memcpy-     203 bytes- copy crt_memcpy
224 cycles, COPYAtoB_SSEGS- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
229 cycles, COPYAtoB_SSEG-  103 bytes- copy 16 BYTES+MOVZX
248 cycles, memcpy_2-        15 bytes- copy memcpy SSE
252 cycles, memcpy_3-        15 bytes- copy memcpyxmmU SSE
259 cycles, COPYAtoB_XZZES- 203 bytes- copy lenght DWORDS+MOVZX- uses ESP
276 cycles, memcpy_3-        53 bytes- copy memcpyxmmU SSE
300 cycles, memcpy_2-        53 bytes- copy memcpy SSE
379 cycles, memcpy_3-       103 bytes- copy memcpyxmmU SSE

395 cycles, COPYAtoB_YZZH503 bytes- copy length 16 BYTES+MOVZX  <<<<---
397 cycles, COPYAtoB_YZZG-  503 bytes- copy length 16 BYTES+MOVZX  <<<<---

403 cycles, COPYAtoB_YZZE-  503 bytes- copy length 16 BYTES+MOVZX
406 cycles, COPYAtoB_SSEE-  203 bytes- copy 16 BYTES+MOVZX
406 cycles, COPYAtoB_SSEFS- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
407 cycles, COPYAtoB_SSEF-  203 bytes- copy 16 BYTES+MOVZX
412 cycles, memcpy_2-       103 bytes- copy memcpy SSE
417 cycles, COPYAtoB_SSEES- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
419 cycles, COPYAtoB_XZZES- 503 bytes- copy lenght DWORDS+MOVZX- uses ESP
419 cycles, COPYAtoB_XZZF-  503 bytes- copy lenght DWORDS+MOVZX
420 cycles, COPYAtoB_XZZCS- 503 bytes- copy lenght DWORDS+MOVZX- uses ESP
421 cycles, COPYAtoB_XZES-  503 bytes- copy lenght DWORDS+MOVZX- uses ESP
421 cycles, COPYAtoB_XZZC-  503 bytes- copy lenght DWORDS+MOVZX
422 cycles, COPYAtoB_XZE-   503 bytes- copy lenght DWORDS+MOVZX
424 cycles, COPYAtoB_XZZFS- 503 bytes- copy lenght DWORDS+MOVZX- uses ESP
425 cycles, COPYAtoB_WZZE-  503 bytes- copy 16 BYTES+MOVZX
427 cycles, COPYAtoB_XZZE-  503 bytes- copy lenght DWORDS+MOVZX
428 cycles, COPYAtoB_SSEDS- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
435 cycles, COPYAtoB_SSED-  203 bytes- copy 16 BYTES+MOVZX
443 cycles, crt_memcpy-     503 bytes- copy crt_memcpy

447 cycles, memcpy_1-       503 bytes- copy regcopy                >>>>---

466 cycles, COPYAtoB_SSEGS- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
466 cycles, COPYAtoB_SSEHS- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
466 cycles, COPYAtoB_SSEG-  203 bytes- copy 16 BYTES+MOVZX
472 cycles, COPYAtoB_SSEH-  203 bytes- copy 16 BYTES+MOVZX
624 cycles, memcpy_2-       203 bytes- copy memcpy SSE
634 cycles, memcpy_3-       203 bytes- copy memcpyxmmU SSE

728 cycles, memcpy_1-      1027 bytes- copy regcopy                >>>>---
730 cycles, COPYAtoB_YZZH- 1027 bytes- copy length 16 BYTES+MOVZX  <<<<---
760 cycles, COPYAtoB_YZZG- 1027 bytes- copy length 16 BYTES+MOVZX  <<<<---

784 cycles, COPYAtoB_YZZE- 1027 bytes- copy length 16 BYTES+MOVZX
792 cycles, COPYAtoB_XZZES-1027 bytes- copy lenght DWORDS+MOVZX- uses ESP
793 cycles, COPYAtoB_XZE-  1027 bytes- copy lenght DWORDS+MOVZX
793 cycles, COPYAtoB_XZZCS-1027 bytes- copy lenght DWORDS+MOVZX- uses ESP
796 cycles, COPYAtoB_XZZC- 1027 bytes- copy lenght DWORDS+MOVZX
799 cycles, COPYAtoB_XZZE- 1027 bytes- copy lenght DWORDS+MOVZX
801 cycles, COPYAtoB_XZZF- 1027 bytes- copy lenght DWORDS+MOVZX
808 cycles, COPYAtoB_XZES- 1027 bytes- copy lenght DWORDS+MOVZX- uses ESP
812 cycles, COPYAtoB_WZZE- 1027 bytes- copy 16 BYTES+MOVZX
817 cycles, COPYAtoB_XZZFS-1027 bytes- copy lenght DWORDS+MOVZX- uses ESP
831 cycles, crt_memcpy-    1027 bytes- copy crt_memcpy
962 cycles, COPYAtoB_SSEDS- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
968 cycles, COPYAtoB_SSEGS- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
969 cycles, COPYAtoB_SSED-  503 bytes- copy 16 BYTES+MOVZX
980 cycles, COPYAtoB_SSEFS- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
981 cycles, COPYAtoB_SSEE-  503 bytes- copy 16 BYTES+MOVZX
982 cycles, COPYAtoB_SSEG-  503 bytes- copy 16 BYTES+MOVZX
984 cycles, COPYAtoB_SSEES- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
987 cycles, COPYAtoB_SSEF-  503 bytes- copy 16 BYTES+MOVZX
991 cycles, COPYAtoB_SSEH-  503 bytes- copy 16 BYTES+MOVZX
995 cycles, COPYAtoB_SSEHS- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
1140 cycles, memcpy_2-       503 bytes- copy memcpy SSE
1162 cycles, memcpy_3-       503 bytes- copy memcpyxmmU SSE

1527 cycles, COPYAtoB_YZZH- 2062 bytes- copy length 16 BYTES+MOVZX  <<<<---
1546 cycles, COPYAtoB_YZZG- 2062 bytes- copy length 16 BYTES+MOVZX  <<<<---

1557 cycles, COPYAtoB_XZZE- 2062 bytes- copy lenght DWORDS+MOVZX
1558 cycles, memcpy_1-      2062 bytes- copy regcopy                >>>>---

1568 cycles, COPYAtoB_XZZC- 2062 bytes- copy lenght DWORDS+MOVZX
1569 cycles, COPYAtoB_XZZF- 2062 bytes- copy lenght DWORDS+MOVZX
1570 cycles, COPYAtoB_XZZFS-2062 bytes- copy lenght DWORDS+MOVZX- uses ESP
1571 cycles, COPYAtoB_WZZE- 2062 bytes- copy 16 BYTES+MOVZX
1573 cycles, COPYAtoB_XZZCS-2062 bytes- copy lenght DWORDS+MOVZX- uses ESP
1576 cycles, COPYAtoB_XZE-  2062 bytes- copy lenght DWORDS+MOVZX
1577 cycles, COPYAtoB_XZZES-2062 bytes- copy lenght DWORDS+MOVZX- uses ESP
1580 cycles, COPYAtoB_XZES- 2062 bytes- copy lenght DWORDS+MOVZX- uses ESP
1589 cycles, COPYAtoB_YZZE- 2062 bytes- copy length 16 BYTES+MOVZX
1635 cycles, crt_memcpy-    2062 bytes- copy crt_memcpy
1908 cycles, COPYAtoB_SSEES-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
2026 cycles, COPYAtoB_SSEFS-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
2030 cycles, COPYAtoB_SSEF- 1027 bytes- copy 16 BYTES+MOVZX
2043 cycles, COPYAtoB_SSEHS-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
2045 cycles, COPYAtoB_SSEE- 1027 bytes- copy 16 BYTES+MOVZX
2046 cycles, COPYAtoB_SSED- 1027 bytes- copy 16 BYTES+MOVZX
2078 cycles, COPYAtoB_SSEH- 1027 bytes- copy 16 BYTES+MOVZX
2087 cycles, COPYAtoB_SSEDS-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
2109 cycles, memcpy_3-      1027 bytes- copy memcpyxmmU SSE
2118 cycles, memcpy_2-      1027 bytes- copy memcpy SSE
2123 cycles, COPYAtoB_SSEGS-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
2150 cycles, COPYAtoB_SSEG- 1027 bytes- copy 16 BYTES+MOVZX
3949 cycles, COPYAtoB_SSEGS-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
4010 cycles, COPYAtoB_SSEG- 2062 bytes- copy 16 BYTES+MOVZX
4020 cycles, COPYAtoB_SSEES-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
4046 cycles, COPYAtoB_SSEE- 2062 bytes- copy 16 BYTES+MOVZX
4052 cycles, COPYAtoB_SSEDS-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
4054 cycles, COPYAtoB_SSEFS-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
4060 cycles, COPYAtoB_SSED- 2062 bytes- copy 16 BYTES+MOVZX
4081 cycles, COPYAtoB_SSEHS-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
4086 cycles, COPYAtoB_SSEF- 2062 bytes- copy 16 BYTES+MOVZX
4099 cycles, COPYAtoB_SSEH- 2062 bytes- copy 16 BYTES+MOVZX
4116 cycles, memcpy_2-      2062 bytes- copy memcpy SSE
4368 cycles, memcpy_3-      2062 bytes- copy memcpyxmmU SSE
********** END III **********
_YZZH  mean:  436.14
_YZZG               444.57
memcpy mean: 451.86
Title: Re: Sorting strings
Post by: Gunther on July 09, 2014, 09:09:41 PM
Hi Rui,

the results of #47 and #48 are attached. copy.zip is the archive's name.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 09, 2014, 09:44:04 PM
Thank you Gunther  :t

In your powerful i7, _SSEE is the best

Now, i would like to know the results for CopyString53
if you dont mind.

Quote
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
***** Time table *****

  5 cycles, MOVEAtoB_SSEF-  13 bytes- copy 128 BITS+MOVZX
  5 cycles, MOVEAtoB_SSEE13 bytes- copy 128 BITS+MOVZX
  6 cycles, MOVEAtoB_SSED-  13 bytes- copy 128 BITS+MOVZX
  7 cycles, MOVEAtoB_XZZF-  13 bytes- copy lenght DWORDS+MOVZX
  7 cycles, MOVEAtoB_XZZE-  13 bytes- copy lenght DWORDS+MOVZX
  7 cycles, MOVEAtoB_XZZC-  13 bytes- copy lenght DWORDS+MOVZX
 
11 cycles, MOVEAtoB_SSEE53 bytes- copy 128 BITS+MOVZX
12 cycles, MOVEAtoB_SSEF-  53 bytes- copy 128 BITS+MOVZX
14 cycles, MOVEAtoB_XZE-   13 bytes- copy lenght DWORDS+MOVZX
15 cycles, MOVEAtoB_SSED-  53 bytes- copy 128 BITS+MOVZX
19 cycles, MOVEAtoB_SSEE- 103 bytes- copy 128 BITS+MOVZX
20 cycles, MOVEAtoB_SSEF- 103 bytes- copy 128 BITS+MOVZX
21 cycles, MOVEAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
22 cycles, MOVEAtoB_XZZC53 bytes- copy lenght DWORDS+MOVZX

25 cycles, MOVEAtoB_SSED- 103 bytes- copy 128 BITS+MOVZX
29 cycles, MOVEAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
30 cycles, MOVEAtoB_SSEH-  53 bytes- copy 128 BITS+MOVZX
32 cycles, MOVEAtoB_SSEG-  53 bytes- copy 128 BITS+MOVZX
34 cycles, MOVEAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX

37 cycles, MOVEAtoB_SSEE- 203 bytes- copy 128 BITS+MOVZX
37 cycles, MOVEAtoB_SSEF- 203 bytes- copy 128 BITS+MOVZX
47 cycles, MOVEAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
47 cycles, MOVEAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
47 cycles, MOVEAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
48 cycles, MOVEAtoB_SSEH- 103 bytes- copy 128 BITS+MOVZX
49 cycles, MOVEAtoB_SSED- 203 bytes- copy 128 BITS+MOVZX
50 cycles, MOVEAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
52 cycles, MOVEAtoB_SSEG- 103 bytes- copy 128 BITS+MOVZX
63 cycles, MOVEAtoB_SSEH-  13 bytes- copy 128 BITS+MOVZX

63 cycles, MOVEAtoB_SSEE- 503 bytes- copy 128 BITS+MOVZX
63 cycles, MOVEAtoB_SSEF- 503 bytes- copy 128 BITS+MOVZX
66 cycles, MOVEAtoB_SSEG-  13 bytes- copy 128 BITS+MOVZX
72 cycles, MOVEAtoB_SSEG- 203 bytes- copy 128 BITS+MOVZX
78 cycles, MOVEAtoB_SSEH- 203 bytes- copy 128 BITS+MOVZX
90 cycles, MOVEAtoB_SSED- 503 bytes- copy 128 BITS+MOVZX
105 cycles, MOVEAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
106 cycles, MOVEAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
106 cycles, MOVEAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
110 cycles, MOVEAtoB_SSEH- 503 bytes- copy 128 BITS+MOVZX
128 cycles, MOVEAtoB_SSEG- 503 bytes- copy 128 BITS+MOVZX

129 cycles, MOVEAtoB_SSEF-1027 bytes- copy 128 BITS+MOVZX
129 cycles, MOVEAtoB_SSEE-1027 bytes- copy 128 BITS+MOVZX
130 cycles, MOVEAtoB_SSEG-1027 bytes- copy 128 BITS+MOVZX
136 cycles, MOVEAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
158 cycles, MOVEAtoB_SSED-1027 bytes- copy 128 BITS+MOVZX
165 cycles, MOVEAtoB_SSEH-1027 bytes- copy 128 BITS+MOVZX
237 cycles, MOVEAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
238 cycles, MOVEAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
238 cycles, MOVEAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
240 cycles, MOVEAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
465 cycles, MOVEAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
466 cycles, MOVEAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
466 cycles, MOVEAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
495 cycles, MOVEAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX
********** END III **********
Title: Re: Sorting strings
Post by: Gunther on July 09, 2014, 10:10:24 PM
Hi Rui,

Quote from: RuiLoureiro on July 09, 2014, 09:44:04 PM
Now, i would like to know the results for CopyString53
if you dont mind.

no problem. Results are attached.

Gunther
Title: Re: Sorting strings
Post by: nidud on July 10, 2014, 12:45:41 AM
deleted
Title: Re: Sorting strings
Post by: RuiLoureiro on July 10, 2014, 01:34:05 AM
Thank you so much Gunther  :t

In this test, to copy a large amount of bytes
(.../1027/2062 bytes) the best seems to be memcpy_3

nidud,
        I will add your new procedure
        in the next post.

--------------------------
Results from Gunther
---------------------------
---------------------------------------------
    names used by nidud
---------------------------------------------
memcpy_1  -> is regcopy
memcpy_2  -> is memcpy SSE
memcpy_3  -> is memcpyxmmU SSE
---------------------------------------------
Quote
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
***** Time table *****

13 cycles, COPYAtoB_YZZH-   15 bytes- copy length 16 BYTES+MOVZX
13 cycles, crt_memcpy-      15 bytes- copy crt_memcpy
14 cycles, COPYAtoB_SSEE-   15 bytes- copy 16 BYTES+MOVZX

15 cycles, COPYAtoB_SSED-   15 bytes- copy 16 BYTES+MOVZX
15 cycles, COPYAtoB_SSEES-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
16 cycles, COPYAtoB_SSEDS-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
16 cycles, COPYAtoB_SSEFS-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
16 cycles, COPYAtoB_WZZE-   15 bytes- copy 16 BYTES+MOVZX
17 cycles, COPYAtoB_YZZE-   15 bytes- copy length 16 BYTES+MOVZX
17 cycles, COPYAtoB_SSED-   53 bytes- copy 16 BYTES+MOVZX
17 cycles, COPYAtoB_SSEES-  53 bytes- copy 16 BYTES+MOVZX- uses ESP
17 cycles, COPYAtoB_SSEF-   53 bytes- copy 16 BYTES+MOVZX
17 cycles, COPYAtoB_SSEF-   15 bytes- copy 16 BYTES+MOVZX
17 cycles, COPYAtoB_SSEFS-  53 bytes- copy 16 BYTES+MOVZX- uses ESP
17 cycles, COPYAtoB_XZZE-   15 bytes- copy lenght DWORDS+MOVZX
17 cycles, COPYAtoB_XZZC-   15 bytes- copy lenght DWORDS+MOVZX
18 cycles, COPYAtoB_XZZCS-  15 bytes- copy lenght DWORDS+MOVZX- uses ESP
18 cycles, COPYAtoB_XZES-   15 bytes- copy lenght DWORDS+MOVZX- uses ESP
18 cycles, COPYAtoB_SSEDS-  53 bytes- copy 16 BYTES+MOVZX- uses ESP
18 cycles, COPYAtoB_YZZG-   15 bytes- copy length 16 BYTES+MOVZX
18 cycles, COPYAtoB_XZZES-  15 bytes- copy lenght DWORDS+MOVZX- uses ESP
19 cycles, COPYAtoB_XZZFS-  15 bytes- copy lenght DWORDS+MOVZX- uses ESP
19 cycles, COPYAtoB_XZZF-   15 bytes- copy lenght DWORDS+MOVZX
21 cycles, COPYAtoB_SSEE-   53 bytes- copy 16 BYTES+MOVZX
22 cycles, COPYAtoB_YZZH-   53 bytes- copy length 16 BYTES+MOVZX
22 cycles, COPYAtoB_YZZG-   53 bytes- copy length 16 BYTES+MOVZX
24 cycles, COPYAtoB_WZZE-   53 bytes- copy 16 BYTES+MOVZX
25 cycles, memcpy_1-        53 bytes- copy regcopy
25 cycles, COPYAtoB_XZZFS-  53 bytes- copy lenght DWORDS+MOVZX- uses ESP
26 cycles, COPYAtoB_YZZE-   53 bytes- copy length 16 BYTES+MOVZX
26 cycles, COPYAtoB_XZZES-  53 bytes- copy lenght DWORDS+MOVZX- uses ESP
26 cycles, COPYAtoB_XZZF-   53 bytes- copy lenght DWORDS+MOVZX
27 cycles, COPYAtoB_XZZE-   53 bytes- copy lenght DWORDS+MOVZX
28 cycles, COPYAtoB_XZZC-   53 bytes- copy lenght DWORDS+MOVZX
28 cycles, memcpy_1-        15 bytes- copy regcopy
31 cycles, COPYAtoB_SSEGS-  53 bytes- copy 16 BYTES+MOVZX- uses ESP
32 cycles, COPYAtoB_SSEG-   53 bytes- copy 16 BYTES+MOVZX
32 cycles, memcpy_2-        15 bytes- copy memcpy SSE
32 cycles, COPYAtoB_SSEH-   53 bytes- copy 16 BYTES+MOVZX
34 cycles, memcpy_3-        15 bytes- copy memcpyxmmU SSE
34 cycles, COPYAtoB_SSEHS-  53 bytes- copy 16 BYTES+MOVZX- uses ESP
36 cycles, COPYAtoB_XZZCS-  53 bytes- copy lenght DWORDS+MOVZX- uses ESP
37 cycles, crt_memcpy-      53 bytes- copy crt_memcpy

37 cycles, COPYAtoB_YZZG-  103 bytes- copy length 16 BYTES+MOVZX
37 cycles, COPYAtoB_XZES-   53 bytes- copy lenght DWORDS+MOVZX- uses ESP
37 cycles, COPYAtoB_YZZH-  103 bytes- copy length 16 BYTES+MOVZX
38 cycles, memcpy_2-        53 bytes- copy memcpy SSE
38 cycles, COPYAtoB_SSEES- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
38 cycles, COPYAtoB_SSEDS- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
38 cycles, COPYAtoB_SSEFS- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
39 cycles, memcpy_1-       103 bytes- copy regcopy
39 cycles, memcpy_3-        53 bytes- copy memcpyxmmU SSE
40 cycles, COPYAtoB_SSED-  103 bytes- copy 16 BYTES+MOVZX
40 cycles, COPYAtoB_SSEF-  103 bytes- copy 16 BYTES+MOVZX
40 cycles, COPYAtoB_WZZE-  103 bytes- copy 16 BYTES+MOVZX
40 cycles, COPYAtoB_SSEE-  103 bytes- copy 16 BYTES+MOVZX
41 cycles, COPYAtoB_XZE-    15 bytes- copy lenght DWORDS+MOVZX
42 cycles, COPYAtoB_YZZE-  103 bytes- copy length 16 BYTES+MOVZX
47 cycles, crt_memcpy-     103 bytes- copy crt_memcpy
48 cycles, COPYAtoB_SSEGS- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
48 cycles, COPYAtoB_SSEH-  103 bytes- copy 16 BYTES+MOVZX
49 cycles, memcpy_2-       103 bytes- copy memcpy SSE
49 cycles, COPYAtoB_XZZES- 103 bytes- copy lenght DWORDS+MOVZX- uses ESP
50 cycles, COPYAtoB_XZZFS- 103 bytes- copy lenght DWORDS+MOVZX- uses ESP
50 cycles, COPYAtoB_SSEHS- 103 bytes- copy 16 BYTES+MOVZX- uses ESP
51 cycles, memcpy_3-       103 bytes- copy memcpyxmmU SSE  <<<<<<<-----

51 cycles, COPYAtoB_SSEES- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
51 cycles, COPYAtoB_XZZE-  103 bytes- copy lenght DWORDS+MOVZX
51 cycles, COPYAtoB_SSEFS- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
52 cycles, COPYAtoB_SSED-  203 bytes- copy 16 BYTES+MOVZX
52 cycles, COPYAtoB_SSEDS- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
52 cycles, COPYAtoB_SSEG-  103 bytes- copy 16 BYTES+MOVZX
52 cycles, COPYAtoB_XZZC-  103 bytes- copy lenght DWORDS+MOVZX
52 cycles, COPYAtoB_XZZF-  103 bytes- copy lenght DWORDS+MOVZX
52 cycles, COPYAtoB_SSEF-  203 bytes- copy 16 BYTES+MOVZX
52 cycles, COPYAtoB_XZE-   103 bytes- copy lenght DWORDS+MOVZX
52 cycles, COPYAtoB_SSEE-  203 bytes- copy 16 BYTES+MOVZX
55 cycles, memcpy_3-       203 bytes- copy memcpyxmmU SSE      <<<<<<<-----
59 cycles, COPYAtoB_XZE-    53 bytes- copy lenght DWORDS+MOVZX
59 cycles, memcpy_2-       203 bytes- copy memcpy SSE
64 cycles, COPYAtoB_YZZH-  203 bytes- copy length 16 BYTES+MOVZX
64 cycles, COPYAtoB_YZZG-  203 bytes- copy length 16 BYTES+MOVZX
65 cycles, COPYAtoB_WZZE-  203 bytes- copy 16 BYTES+MOVZX
68 cycles, COPYAtoB_YZZE-  203 bytes- copy length 16 BYTES+MOVZX
68 cycles, memcpy_1-       203 bytes- copy regcopy
69 cycles, crt_memcpy-     203 bytes- copy crt_memcpy

69 cycles, memcpy_3-       503 bytes- copy memcpyxmmU SSE
70 cycles, COPYAtoB_SSEG-  203 bytes- copy 16 BYTES+MOVZX
72 cycles, COPYAtoB_XZZCS- 103 bytes- copy lenght DWORDS+MOVZX- uses ESP
72 cycles, COPYAtoB_SSEE-  503 bytes- copy 16 BYTES+MOVZX
72 cycles, COPYAtoB_XZES-  103 bytes- copy lenght DWORDS+MOVZX- uses ESP
75 cycles, COPYAtoB_SSEGS- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
76 cycles, COPYAtoB_SSEH-   15 bytes- copy 16 BYTES+MOVZX
79 cycles, COPYAtoB_SSEGS-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
80 cycles, COPYAtoB_SSEG-   15 bytes- copy 16 BYTES+MOVZX
82 cycles, COPYAtoB_SSEH-  203 bytes- copy 16 BYTES+MOVZX
84 cycles, COPYAtoB_SSEGS- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
87 cycles, COPYAtoB_SSEHS- 203 bytes- copy 16 BYTES+MOVZX- uses ESP
88 cycles, COPYAtoB_SSEG-  503 bytes- copy 16 BYTES+MOVZX
92 cycles, COPYAtoB_SSEHS-  15 bytes- copy 16 BYTES+MOVZX- uses ESP
93 cycles, crt_memcpy-     503 bytes- copy crt_memcpy
93 cycles, COPYAtoB_SSED-  503 bytes- copy 16 BYTES+MOVZX
94 cycles, COPYAtoB_SSEFS- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
94 cycles, COPYAtoB_SSEDS- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
94 cycles, COPYAtoB_SSEF-  503 bytes- copy 16 BYTES+MOVZX
107 cycles, COPYAtoB_SSEES- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
108 cycles, COPYAtoB_XZZFS- 203 bytes- copy lenght DWORDS+MOVZX- uses ESP
109 cycles, COPYAtoB_XZZES- 203 bytes- copy lenght DWORDS+MOVZX- uses ESP
110 cycles, COPYAtoB_XZZF-  203 bytes- copy lenght DWORDS+MOVZX
110 cycles, COPYAtoB_XZZC-  203 bytes- copy lenght DWORDS+MOVZX
111 cycles, COPYAtoB_XZZE-  203 bytes- copy lenght DWORDS+MOVZX
111 cycles, COPYAtoB_SSEH-  503 bytes- copy 16 BYTES+MOVZX
111 cycles, COPYAtoB_XZE-   203 bytes- copy lenght DWORDS+MOVZX

115 cycles, memcpy_3-      1027 bytes- copy memcpyxmmU SSE
129 cycles, COPYAtoB_SSEHS- 503 bytes- copy 16 BYTES+MOVZX- uses ESP
130 cycles, memcpy_2-       503 bytes- copy memcpy SSE
135 cycles, COPYAtoB_SSEE- 1027 bytes- copy 16 BYTES+MOVZX

137 cycles, crt_memcpy-    1027 bytes- copy crt_memcpy
137 cycles, COPYAtoB_XZES-  203 bytes- copy lenght DWORDS+MOVZX- uses ESP
137 cycles, COPYAtoB_XZZCS- 203 bytes- copy lenght DWORDS+MOVZX- uses ESP
143 cycles, COPYAtoB_YZZE-  503 bytes- copy length 16 BYTES+MOVZX
145 cycles, memcpy_1-       503 bytes- copy regcopy
150 cycles, COPYAtoB_YZZH-  503 bytes- copy length 16 BYTES+MOVZX
150 cycles, COPYAtoB_YZZG-  503 bytes- copy length 16 BYTES+MOVZX
153 cycles, COPYAtoB_SSEG- 1027 bytes- copy 16 BYTES+MOVZX
153 cycles, COPYAtoB_SSEGS-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
160 cycles, COPYAtoB_SSEDS-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
161 cycles, COPYAtoB_SSED- 1027 bytes- copy 16 BYTES+MOVZX
161 cycles, COPYAtoB_SSEF- 1027 bytes- copy 16 BYTES+MOVZX
162 cycles, COPYAtoB_SSEFS-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
162 cycles, COPYAtoB_SSEES-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
165 cycles, COPYAtoB_SSEHS-1027 bytes- copy 16 BYTES+MOVZX- uses ESP
169 cycles, COPYAtoB_WZZE-  503 bytes- copy 16 BYTES+MOVZX
170 cycles, COPYAtoB_SSEH- 1027 bytes- copy 16 BYTES+MOVZX
185 cycles, memcpy_2-      1027 bytes- copy memcpy SSE

192 cycles, memcpy_3-      2062 bytes- copy memcpyxmmU SSE
216 cycles, crt_memcpy-    2062 bytes- copy crt_memcpy
239 cycles, COPYAtoB_XZZES- 503 bytes- copy lenght DWORDS+MOVZX- uses ESP
241 cycles, COPYAtoB_XZZE-  503 bytes- copy lenght DWORDS+MOVZX
241 cycles, COPYAtoB_XZE-   503 bytes- copy lenght DWORDS+MOVZX
242 cycles, COPYAtoB_XZZC-  503 bytes- copy lenght DWORDS+MOVZX
242 cycles, COPYAtoB_XZZFS- 503 bytes- copy lenght DWORDS+MOVZX- uses ESP
243 cycles, COPYAtoB_XZZF-  503 bytes- copy lenght DWORDS+MOVZX
268 cycles, COPYAtoB_XZES-  503 bytes- copy lenght DWORDS+MOVZX- uses ESP
269 cycles, COPYAtoB_XZZCS- 503 bytes- copy lenght DWORDS+MOVZX- uses ESP
275 cycles, COPYAtoB_SSEE- 2062 bytes- copy 16 BYTES+MOVZX

278 cycles, COPYAtoB_YZZE- 1027 bytes- copy length 16 BYTES+MOVZX
283 cycles, COPYAtoB_YZZH- 1027 bytes- copy length 16 BYTES+MOVZX
284 cycles, COPYAtoB_YZZG- 1027 bytes- copy length 16 BYTES+MOVZX
286 cycles, memcpy_1-      1027 bytes- copy regcopy

292 cycles, COPYAtoB_SSED- 2062 bytes- copy 16 BYTES+MOVZX
293 cycles, COPYAtoB_SSEF- 2062 bytes- copy 16 BYTES+MOVZX
294 cycles, COPYAtoB_SSEES-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
294 cycles, COPYAtoB_SSEDS-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
294 cycles, COPYAtoB_SSEFS-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
296 cycles, memcpy_2-      2062 bytes- copy memcpy SSE
310 cycles, COPYAtoB_SSEG- 2062 bytes- copy 16 BYTES+MOVZX
319 cycles, COPYAtoB_SSEGS-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
329 cycles, COPYAtoB_WZZE- 1027 bytes- copy 16 BYTES+MOVZX
331 cycles, COPYAtoB_SSEHS-2062 bytes- copy 16 BYTES+MOVZX- uses ESP
331 cycles, COPYAtoB_SSEH- 2062 bytes- copy 16 BYTES+MOVZX
468 cycles, COPYAtoB_XZZFS-1027 bytes- copy lenght DWORDS+MOVZX- uses ESP
469 cycles, COPYAtoB_XZZES-1027 bytes- copy lenght DWORDS+MOVZX- uses ESP
470 cycles, COPYAtoB_XZZF- 1027 bytes- copy lenght DWORDS+MOVZX
471 cycles, COPYAtoB_XZZE- 1027 bytes- copy lenght DWORDS+MOVZX
471 cycles, COPYAtoB_XZZC- 1027 bytes- copy lenght DWORDS+MOVZX
473 cycles, COPYAtoB_XZE-  1027 bytes- copy lenght DWORDS+MOVZX
497 cycles, COPYAtoB_XZES- 1027 bytes- copy lenght DWORDS+MOVZX- uses ESP
499 cycles, COPYAtoB_XZZCS-1027 bytes- copy lenght DWORDS+MOVZX- uses ESP
509 cycles, COPYAtoB_YZZE- 2062 bytes- copy length 16 BYTES+MOVZX
568 cycles, COPYAtoB_YZZH- 2062 bytes- copy length 16 BYTES+MOVZX
570 cycles, COPYAtoB_YZZG- 2062 bytes- copy length 16 BYTES+MOVZX
583 cycles, memcpy_1-      2062 bytes- copy regcopy
677 cycles, COPYAtoB_WZZE- 2062 bytes- copy 16 BYTES+MOVZX
919 cycles, COPYAtoB_XZZFS-2062 bytes- copy lenght DWORDS+MOVZX- uses ESP
919 cycles, COPYAtoB_XZZES-2062 bytes- copy lenght DWORDS+MOVZX- uses ESP
920 cycles, COPYAtoB_XZZE- 2062 bytes- copy lenght DWORDS+MOVZX
921 cycles, COPYAtoB_XZE-  2062 bytes- copy lenght DWORDS+MOVZX
922 cycles, COPYAtoB_XZZC- 2062 bytes- copy lenght DWORDS+MOVZX
922 cycles, COPYAtoB_XZZF- 2062 bytes- copy lenght DWORDS+MOVZX
948 cycles, COPYAtoB_XZZCS-2062 bytes- copy lenght DWORDS+MOVZX- uses ESP
948 cycles, COPYAtoB_XZES- 2062 bytes- copy lenght DWORDS+MOVZX- uses ESP
********** END III **********
Title: Re: Sorting strings
Post by: RuiLoureiro on July 10, 2014, 03:03:09 AM
Hi,
        Now, i used crt_memcpy, memcpy_1, memcpy_2, memcpy_3, memcpy_4
        procedures and some SSE procedures.
       
        Gunther, could you post the results of your i7,
        if you dont mind ?
        Thank you.
       
        These are my results.
Quote
NOT ALIGNED
----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Time table *****

  34 cycles, COPYAtoB_SSEE-  15 bytes- copy 16 BYTES+MOVZX
  35 cycles, COPYAtoB_YZZI-  15 bytes- copy length 16 BYTES+MOVZX
  35 cycles, crt_memcpy-     15 bytes- copy crt_memcpy
  35 cycles, COPYAtoB_XZZF-  15 bytes- copy lenght DWORDS+MOVZX
  37 cycles, COPYAtoB_WZZE-  15 bytes- copy 16 BYTES+MOVZX
  37 cycles, COPYAtoB_YZZH-  15 bytes- copy length 16 BYTES+MOVZX
  37 cycles, COPYAtoB_XZZE-  15 bytes- copy lenght DWORDS+MOVZX
  37 cycles, COPYAtoB_XZZC-  15 bytes- copy lenght DWORDS+MOVZX
  39 cycles, COPYAtoB_YZZG-  15 bytes- copy length 16 BYTES+MOVZX
  39 cycles, COPYAtoB_YZZE-  15 bytes- copy length 16 BYTES+MOVZX
  47 cycles, COPYAtoB_XZE-   15 bytes- copy lenght DWORDS+MOVZX
  50 cycles, COPYAtoB_SSEI-  15 bytes- copy 16 BYTES+MOVZX
  51 cycles, COPYAtoB_SSEJ-  15 bytes- copy 16 BYTES+MOVZX
  61 cycles, COPYAtoB_SSEH-  15 bytes- copy 16 BYTES+MOVZX
  68 cycles, memcpy_1-       53 bytes- copy regcopy
  72 cycles, memcpy_1-       15 bytes- copy regcopy
 
  72 cycles, COPYAtoB_YZZI-  53 bytes- copy length 16 BYTES+MOVZX
  73 cycles, memcpy_4-       15 bytes- copy memcpy * 8
  73 cycles, COPYAtoB_YZZH-  53 bytes- copy length 16 BYTES+MOVZX
  75 cycles, COPYAtoB_YZZG-  53 bytes- copy length 16 BYTES+MOVZX
  75 cycles, COPYAtoB_XZZC-  53 bytes- copy lenght DWORDS+MOVZX
  76 cycles, COPYAtoB_XZE-   53 bytes- copy lenght DWORDS+MOVZX
  76 cycles, COPYAtoB_XZZF-  53 bytes- copy lenght DWORDS+MOVZX
  78 cycles, COPYAtoB_XZZE-  53 bytes- copy lenght DWORDS+MOVZX
  78 cycles, COPYAtoB_YZZE-  53 bytes- copy length 16 BYTES+MOVZX
  89 cycles, COPYAtoB_WZZE-  53 bytes- copy 16 BYTES+MOVZX
  90 cycles, COPYAtoB_SSEJ-  53 bytes- copy 16 BYTES+MOVZX
  94 cycles, COPYAtoB_SSEI-  53 bytes- copy 16 BYTES+MOVZX
  98 cycles, COPYAtoB_SSEH-  53 bytes- copy 16 BYTES+MOVZX
100 cycles, COPYAtoB_SSEE53 bytes- copy 16 BYTES+MOVZX

103 cycles, memcpy_4-       53 bytes- copy memcpy * 8
106 cycles, memcpy_1-      103 bytes- copy regcopy
108 cycles, COPYAtoB_YZZG- 103 bytes- copy length 16 BYTES+MOVZX
108 cycles, COPYAtoB_YZZI- 103 bytes- copy length 16 BYTES+MOVZX
108 cycles, crt_memcpy-     53 bytes- copy crt_memcpy
109 cycles, COPYAtoB_YZZH- 103 bytes- copy length 16 BYTES+MOVZX
111 cycles, COPYAtoB_YZZE- 103 bytes- copy length 16 BYTES+MOVZX
112 cycles, COPYAtoB_WZZE- 103 bytes- copy 16 BYTES+MOVZX
121 cycles, COPYAtoB_XZZF- 103 bytes- copy lenght DWORDS+MOVZX
122 cycles, COPYAtoB_XZZC- 103 bytes- copy lenght DWORDS+MOVZX
125 cycles, COPYAtoB_XZE-  103 bytes- copy lenght DWORDS+MOVZX
130 cycles, COPYAtoB_XZZE- 103 bytes- copy lenght DWORDS+MOVZX
132 cycles, memcpy_4-      103 bytes- copy memcpy * 8
139 cycles, crt_memcpy-    103 bytes- copy crt_memcpy

172 cycles, COPYAtoB_YZZG- 203 bytes- copy length 16 BYTES+MOVZX
173 cycles, COPYAtoB_YZZI- 203 bytes- copy length 16 BYTES+MOVZX
173 cycles, COPYAtoB_YZZE- 203 bytes- copy length 16 BYTES+MOVZX
174 cycles, COPYAtoB_YZZH- 203 bytes- copy length 16 BYTES+MOVZX
177 cycles, memcpy_1-      203 bytes- copy regcopy
206 cycles, COPYAtoB_SSEJ- 103 bytes- copy 16 BYTES+MOVZX
209 cycles, COPYAtoB_SSEI- 103 bytes- copy 16 BYTES+MOVZX
211 cycles, COPYAtoB_SSEE- 103 bytes- copy 16 BYTES+MOVZX
212 cycles, COPYAtoB_WZZE- 203 bytes- copy 16 BYTES+MOVZX
212 cycles, memcpy_4-      203 bytes- copy memcpy * 8
213 cycles, COPYAtoB_XZZE- 203 bytes- copy lenght DWORDS+MOVZX
213 cycles, COPYAtoB_XZZC- 203 bytes- copy lenght DWORDS+MOVZX
214 cycles, COPYAtoB_SSEH- 103 bytes- copy 16 BYTES+MOVZX
214 cycles, COPYAtoB_XZZF- 203 bytes- copy lenght DWORDS+MOVZX
225 cycles, crt_memcpy-    203 bytes- copy crt_memcpy
231 cycles, COPYAtoB_XZE-  203 bytes- copy lenght DWORDS+MOVZX
250 cycles, memcpy_2-       15 bytes- copy memcpy SSE
256 cycles, memcpy_3-       15 bytes- copy memcpyxmmU SSE
277 cycles, memcpy_3-       53 bytes- copy memcpyxmmU SSE
278 cycles, memcpy_2-       53 bytes- copy memcpy SSE
379 cycles, memcpy_3-      103 bytes- copy memcpyxmmU SSE
393 cycles, memcpy_2-      103 bytes- copy memcpy SSE
394 cycles, COPYAtoB_YZZI- 503 bytes- copy length 16 BYTES+MOVZX
395 cycles, COPYAtoB_YZZG- 503 bytes- copy length 16 BYTES+MOVZX
398 cycles, COPYAtoB_YZZH- 503 bytes- copy length 16 BYTES+MOVZX

412 cycles, COPYAtoB_SSEE- 203 bytes- copy 16 BYTES+MOVZX
414 cycles, memcpy_1-      503 bytes- copy regcopy
418 cycles, COPYAtoB_YZZE- 503 bytes- copy length 16 BYTES+MOVZX
427 cycles, COPYAtoB_XZZC- 503 bytes- copy lenght DWORDS+MOVZX
428 cycles, COPYAtoB_WZZE- 503 bytes- copy 16 BYTES+MOVZX
429 cycles, COPYAtoB_SSEH- 203 bytes- copy 16 BYTES+MOVZX
430 cycles, COPYAtoB_SSEI- 203 bytes- copy 16 BYTES+MOVZX
431 cycles, COPYAtoB_SSEJ- 203 bytes- copy 16 BYTES+MOVZX

436 cycles, memcpy_4-      503 bytes- copy memcpy * 8
442 cycles, COPYAtoB_XZZF- 503 bytes- copy lenght DWORDS+MOVZX
444 cycles, COPYAtoB_XZZE- 503 bytes- copy lenght DWORDS+MOVZX
454 cycles, crt_memcpy-    503 bytes- copy crt_memcpy
503 cycles, COPYAtoB_XZE-  503 bytes- copy lenght DWORDS+MOVZX
620 cycles, memcpy_2-      203 bytes- copy memcpy SSE
631 cycles, memcpy_3-      203 bytes- copy memcpyxmmU SSE

726 cycles, memcpy_1-     1027 bytes- copy regcopy
734 cycles, COPYAtoB_YZZG-1027 bytes- copy length 16 BYTES+MOVZX
769 cycles, COPYAtoB_YZZI-1027 bytes- copy length 16 BYTES+MOVZX
787 cycles, COPYAtoB_YZZH-1027 bytes- copy length 16 BYTES+MOVZX
788 cycles, COPYAtoB_YZZE-1027 bytes- copy length 16 BYTES+MOVZX
808 cycles, COPYAtoB_XZZE-1027 bytes- copy lenght DWORDS+MOVZX
815 cycles, COPYAtoB_XZE- 1027 bytes- copy lenght DWORDS+MOVZX
815 cycles, COPYAtoB_XZZC-1027 bytes- copy lenght DWORDS+MOVZX
817 cycles, COPYAtoB_WZZE-1027 bytes- copy 16 BYTES+MOVZX
825 cycles, memcpy_4-     1027 bytes- copy memcpy * 8
835 cycles, crt_memcpy-   1027 bytes- copy crt_memcpy
851 cycles, COPYAtoB_XZZF-1027 bytes- copy lenght DWORDS+MOVZX

965 cycles, COPYAtoB_SSEH- 503 bytes- copy 16 BYTES+MOVZX
968 cycles, COPYAtoB_SSEJ- 503 bytes- copy 16 BYTES+MOVZX
970 cycles, COPYAtoB_SSEI- 503 bytes- copy 16 BYTES+MOVZX
989 cycles, COPYAtoB_SSEE- 503 bytes- copy 16 BYTES+MOVZX
1142 cycles, memcpy_2-      503 bytes- copy memcpy SSE
1163 cycles, memcpy_3-      503 bytes- copy memcpyxmmU SSE
1529 cycles, COPYAtoB_YZZI-2062 bytes- copy length 16 BYTES+MOVZX
1538 cycles, COPYAtoB_YZZG-2062 bytes- copy length 16 BYTES+MOVZX
1563 cycles, COPYAtoB_XZZC-2062 bytes- copy lenght DWORDS+MOVZX
1581 cycles, COPYAtoB_YZZE-2062 bytes- copy length 16 BYTES+MOVZX
1585 cycles, COPYAtoB_WZZE-2062 bytes- copy 16 BYTES+MOVZX
1603 cycles, memcpy_1-     2062 bytes- copy regcopy
1622 cycles, COPYAtoB_XZZF-2062 bytes- copy lenght DWORDS+MOVZX
1622 cycles, memcpy_4-     2062 bytes- copy memcpy * 8
1626 cycles, crt_memcpy-   2062 bytes- copy crt_memcpy
1626 cycles, COPYAtoB_XZE- 2062 bytes- copy lenght DWORDS+MOVZX
1630 cycles, COPYAtoB_XZZE-2062 bytes- copy lenght DWORDS+MOVZX
1734 cycles, COPYAtoB_YZZH-2062 bytes- copy length 16 BYTES+MOVZX
2029 cycles, COPYAtoB_SSEE-1027 bytes- copy 16 BYTES+MOVZX
2060 cycles, COPYAtoB_SSEH-1027 bytes- copy 16 BYTES+MOVZX
2074 cycles, COPYAtoB_SSEJ-1027 bytes- copy 16 BYTES+MOVZX
2097 cycles, memcpy_3-     1027 bytes- copy memcpyxmmU SSE
2119 cycles, COPYAtoB_SSEI-1027 bytes- copy 16 BYTES+MOVZX
2123 cycles, memcpy_2-     1027 bytes- copy memcpy SSE
4043 cycles, COPYAtoB_SSEE-2062 bytes- copy 16 BYTES+MOVZX
4065 cycles, COPYAtoB_SSEI-2062 bytes- copy 16 BYTES+MOVZX
4070 cycles, COPYAtoB_SSEH-2062 bytes- copy 16 BYTES+MOVZX
4086 cycles, COPYAtoB_SSEJ-2062 bytes- copy 16 BYTES+MOVZX
4088 cycles, memcpy_2-     2062 bytes- copy memcpy SSE
4391 cycles, memcpy_3-     2062 bytes- copy memcpyxmmU SSE
********** END III **********
Title: Re: Sorting strings
Post by: Gunther on July 10, 2014, 04:06:42 AM
Hi Rui,

the results for CopyString54 are attached as CopyStrin54.zip. I hope that helps:

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 10, 2014, 04:44:14 AM
Hi Gunther,
            It helps, thank you.  :t
            Now i want to test another procedure.         
            Could you run CopyString55, please ?
            Thanks.
       
These are my results.
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
----------------------------------------------------
***** Time table *****

34 cycles, COPYAtoB_YZZI-   15 bytes- copy length 16 BYTES+MOVZX
35 cycles, COPYAtoB_SSEE-   15 bytes- copy 16 BYTES+MOVZX
35 cycles, crt_memcpy-      15 bytes- copy crt_memcpy
35 cycles, COPYAtoB_XZZF-   15 bytes- copy lenght DWORDS+MOVZX
36 cycles, COPYAtoB_YZZH-   15 bytes- copy length 16 BYTES+MOVZX
36 cycles, COPYAtoB_XZZC-   15 bytes- copy lenght DWORDS+MOVZX
37 cycles, COPYAtoB_XZZE-   15 bytes- copy lenght DWORDS+MOVZX
37 cycles, COPYAtoB_WZZE-   15 bytes- copy 16 BYTES+MOVZX
39 cycles, COPYAtoB_YZZG-   15 bytes- copy length 16 BYTES+MOVZX
39 cycles, COPYAtoB_YZZE-   15 bytes- copy length 16 BYTES+MOVZX
44 cycles, COPYAtoB_XZE-    15 bytes- copy lenght DWORDS+MOVZX
48 cycles, COPYAtoB_SSEK-   15 bytes- copy 16 BYTES+MOVZX
50 cycles, COPYAtoB_SSEJ-   15 bytes- copy 16 BYTES+MOVZX
51 cycles, COPYAtoB_SSEI-   15 bytes- copy 16 BYTES+MOVZX
59 cycles, COPYAtoB_SSEH-   15 bytes- copy 16 BYTES+MOVZX
68 cycles, memcpy_1-        53 bytes- copy regcopy
71 cycles, COPYAtoB_YZZI-   53 bytes- copy length 16 BYTES+MOVZX
71 cycles, COPYAtoB_YZZG-   53 bytes- copy length 16 BYTES+MOVZX
71 cycles, memcpy_4-        15 bytes- copy memcpy * 8
74 cycles, COPYAtoB_WZZE-   53 bytes- copy 16 BYTES+MOVZX
75 cycles, COPYAtoB_XZZE-   53 bytes- copy lenght DWORDS+MOVZX
75 cycles, COPYAtoB_XZZF-   53 bytes- copy lenght DWORDS+MOVZX
77 cycles, COPYAtoB_YZZE-   53 bytes- copy length 16 BYTES+MOVZX
79 cycles, COPYAtoB_XZZC-   53 bytes- copy lenght DWORDS+MOVZX
81 cycles, memcpy_1-        15 bytes- copy regcopy
83 cycles, COPYAtoB_XZE-    53 bytes- copy lenght DWORDS+MOVZX
84 cycles, COPYAtoB_SSEK-   53 bytes- copy 16 BYTES+MOVZX
89 cycles, COPYAtoB_SSEI-   53 bytes- copy 16 BYTES+MOVZX
91 cycles, COPYAtoB_SSEE-   53 bytes- copy 16 BYTES+MOVZX
94 cycles, COPYAtoB_SSEJ-   53 bytes- copy 16 BYTES+MOVZX
100 cycles, COPYAtoB_YZZH-   53 bytes- copy length 16 BYTES+MOVZX
100 cycles, COPYAtoB_SSEH-   53 bytes- copy 16 BYTES+MOVZX
102 cycles, memcpy_4-        53 bytes- copy memcpy * 8
107 cycles, crt_memcpy-      53 bytes- copy crt_memcpy
107 cycles, memcpy_1-       103 bytes- copy regcopy
108 cycles, COPYAtoB_YZZG-  103 bytes- copy length 16 BYTES+MOVZX
108 cycles, COPYAtoB_YZZI-  103 bytes- copy length 16 BYTES+MOVZX
110 cycles, COPYAtoB_YZZE-  103 bytes- copy length 16 BYTES+MOVZX
110 cycles, COPYAtoB_YZZH-  103 bytes- copy length 16 BYTES+MOVZX
112 cycles, COPYAtoB_WZZE-  103 bytes- copy 16 BYTES+MOVZX
121 cycles, COPYAtoB_XZZF-  103 bytes- copy lenght DWORDS+MOVZX
122 cycles, COPYAtoB_XZZC-  103 bytes- copy lenght DWORDS+MOVZX
123 cycles, COPYAtoB_XZZE-  103 bytes- copy lenght DWORDS+MOVZX
133 cycles, memcpy_4-       103 bytes- copy memcpy * 8
134 cycles, COPYAtoB_XZE-   103 bytes- copy lenght DWORDS+MOVZX
139 cycles, crt_memcpy-     103 bytes- copy crt_memcpy
170 cycles, COPYAtoB_YZZG-  203 bytes- copy length 16 BYTES+MOVZX
175 cycles, COPYAtoB_YZZH-  203 bytes- copy length 16 BYTES+MOVZX
175 cycles, memcpy_1-       203 bytes- copy regcopy
179 cycles, COPYAtoB_YZZE-  203 bytes- copy length 16 BYTES+MOVZX
188 cycles, COPYAtoB_YZZI-  203 bytes- copy length 16 BYTES+MOVZX
198 cycles, COPYAtoB_SSEK103 bytes- copy 16 BYTES+MOVZX
207 cycles, COPYAtoB_SSEI-  103 bytes- copy 16 BYTES+MOVZX
210 cycles, COPYAtoB_WZZE-  203 bytes- copy 16 BYTES+MOVZX
211 cycles, COPYAtoB_SSEE-  103 bytes- copy 16 BYTES+MOVZX
211 cycles, COPYAtoB_XZZC-  203 bytes- copy lenght DWORDS+MOVZX
212 cycles, COPYAtoB_SSEH-  103 bytes- copy 16 BYTES+MOVZX
212 cycles, COPYAtoB_XZZE-  203 bytes- copy lenght DWORDS+MOVZX
213 cycles, memcpy_4-       203 bytes- copy memcpy * 8
214 cycles, COPYAtoB_XZZF-  203 bytes- copy lenght DWORDS+MOVZX
216 cycles, COPYAtoB_SSEJ-  103 bytes- copy 16 BYTES+MOVZX
224 cycles, crt_memcpy-     203 bytes- copy crt_memcpy
233 cycles, COPYAtoB_XZE-   203 bytes- copy lenght DWORDS+MOVZX
250 cycles, memcpy_2-        15 bytes- copy memcpy SSE
254 cycles, memcpy_3-        15 bytes- copy memcpyxmmU SSE
276 cycles, memcpy_3-        53 bytes- copy memcpyxmmU SSE
299 cycles, memcpy_2-        53 bytes- copy memcpy SSE
381 cycles, memcpy_3-       103 bytes- copy memcpyxmmU SSE
392 cycles, COPYAtoB_YZZG-  503 bytes- copy length 16 BYTES+MOVZX
392 cycles, COPYAtoB_YZZI-  503 bytes- copy length 16 BYTES+MOVZX
393 cycles, memcpy_1-       503 bytes- copy regcopy
397 cycles, COPYAtoB_YZZH-  503 bytes- copy length 16 BYTES+MOVZX
403 cycles, memcpy_2-       103 bytes- copy memcpy SSE
408 cycles, COPYAtoB_YZZE-  503 bytes- copy length 16 BYTES+MOVZX
408 cycles, COPYAtoB_SSEE-  203 bytes- copy 16 BYTES+MOVZX
421 cycles, COPYAtoB_XZZC-  503 bytes- copy lenght DWORDS+MOVZX
424 cycles, COPYAtoB_XZZE-  503 bytes- copy lenght DWORDS+MOVZX
428 cycles, COPYAtoB_SSEH-  203 bytes- copy 16 BYTES+MOVZX
429 cycles, COPYAtoB_SSEI-  203 bytes- copy 16 BYTES+MOVZX
435 cycles, COPYAtoB_SSEJ-  203 bytes- copy 16 BYTES+MOVZX
437 cycles, COPYAtoB_WZZE-  503 bytes- copy 16 BYTES+MOVZX
440 cycles, crt_memcpy-     503 bytes- copy crt_memcpy
447 cycles, COPYAtoB_XZZF-  503 bytes- copy lenght DWORDS+MOVZX
448 cycles, COPYAtoB_XZE-   503 bytes- copy lenght DWORDS+MOVZX
452 cycles, COPYAtoB_SSEK203 bytes- copy 16 BYTES+MOVZX
464 cycles, memcpy_4-       503 bytes- copy memcpy * 8
623 cycles, memcpy_2-       203 bytes- copy memcpy SSE
631 cycles, memcpy_3-       203 bytes- copy memcpyxmmU SSE
728 cycles, COPYAtoB_YZZI- 1027 bytes- copy length 16 BYTES+MOVZX
732 cycles, COPYAtoB_YZZG- 1027 bytes- copy length 16 BYTES+MOVZX
732 cycles, memcpy_1-      1027 bytes- copy regcopy
733 cycles, COPYAtoB_YZZH- 1027 bytes- copy length 16 BYTES+MOVZX
787 cycles, COPYAtoB_YZZE- 1027 bytes- copy length 16 BYTES+MOVZX
806 cycles, COPYAtoB_XZZF- 1027 bytes- copy lenght DWORDS+MOVZX
807 cycles, COPYAtoB_XZE-  1027 bytes- copy lenght DWORDS+MOVZX
830 cycles, memcpy_4-      1027 bytes- copy memcpy * 8
830 cycles, COPYAtoB_XZZE- 1027 bytes- copy lenght DWORDS+MOVZX
850 cycles, crt_memcpy-    1027 bytes- copy crt_memcpy
864 cycles, COPYAtoB_WZZE- 1027 bytes- copy 16 BYTES+MOVZX
874 cycles, COPYAtoB_XZZC- 1027 bytes- copy lenght DWORDS+MOVZX
960 cycles, COPYAtoB_SSEK503 bytes- copy 16 BYTES+MOVZX
964 cycles, COPYAtoB_SSEI-  503 bytes- copy 16 BYTES+MOVZX
965 cycles, COPYAtoB_SSEJ-  503 bytes- copy 16 BYTES+MOVZX
965 cycles, COPYAtoB_SSEH-  503 bytes- copy 16 BYTES+MOVZX
986 cycles, COPYAtoB_SSEE-  503 bytes- copy 16 BYTES+MOVZX
1166 cycles, memcpy_2-      503 bytes- copy memcpy SSE
1241 cycles, memcpy_3-      503 bytes- copy memcpyxmmU SSE
1526 cycles, COPYAtoB_YZZH-2062 bytes- copy length 16 BYTES+MOVZX
1527 cycles, COPYAtoB_YZZG-2062 bytes- copy length 16 BYTES+MOVZX
1536 cycles, COPYAtoB_YZZI-2062 bytes- copy length 16 BYTES+MOVZX
1556 cycles, memcpy_1-     2062 bytes- copy regcopy
1559 cycles, COPYAtoB_XZZE-2062 bytes- copy lenght DWORDS+MOVZX
1561 cycles, COPYAtoB_XZE- 2062 bytes- copy lenght DWORDS+MOVZX
1571 cycles, COPYAtoB_XZZF-2062 bytes- copy lenght DWORDS+MOVZX
1574 cycles, COPYAtoB_YZZE-2062 bytes- copy length 16 BYTES+MOVZX
1577 cycles, COPYAtoB_WZZE-2062 bytes- copy 16 BYTES+MOVZX
1585 cycles, COPYAtoB_XZZC-2062 bytes- copy lenght DWORDS+MOVZX
1606 cycles, memcpy_4-     2062 bytes- copy memcpy * 8
1620 cycles, crt_memcpy-   2062 bytes- copy crt_memcpy
2019 cycles, COPYAtoB_SSEE-1027 bytes- copy 16 BYTES+MOVZX
2038 cycles, COPYAtoB_SSEK-1027 bytes- copy 16 BYTES+MOVZX
2049 cycles, COPYAtoB_SSEH-1027 bytes- copy 16 BYTES+MOVZX
2051 cycles, COPYAtoB_SSEJ-1027 bytes- copy 16 BYTES+MOVZX
2061 cycles, COPYAtoB_SSEI-1027 bytes- copy 16 BYTES+MOVZX
2110 cycles, memcpy_3-     1027 bytes- copy memcpyxmmU SSE
2116 cycles, memcpy_2-     1027 bytes- copy memcpy SSE
4029 cycles, COPYAtoB_SSEI-2062 bytes- copy 16 BYTES+MOVZX
4031 cycles, COPYAtoB_SSEJ-2062 bytes- copy 16 BYTES+MOVZX
4042 cycles, COPYAtoB_SSEE-2062 bytes- copy 16 BYTES+MOVZX
4046 cycles, COPYAtoB_SSEK-2062 bytes- copy 16 BYTES+MOVZX
4057 cycles, COPYAtoB_SSEH-2062 bytes- copy 16 BYTES+MOVZX
4085 cycles, memcpy_2-     2062 bytes- copy memcpy SSE
4389 cycles, memcpy_3-     2062 bytes- copy memcpyxmmU SSE
********** END III **********
Title: Re: Sorting strings
Post by: Gunther on July 10, 2014, 04:55:12 AM
Rui,

results are attached.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 10, 2014, 08:19:48 PM
Hi Gunther,
            Thank you for your work  :t
            I am trying to understand the behaviour of
            each procedure.
            COPYAtoB_SSEJ are EQUAL to COPYAtoB_SSEK
            but 'J' has 1push+1pop and 'K' not (30 cycles more).
            I don't believe that 'J' is best than 'K'.
            Something seems to be wrong here.
---------------------------
Results from Gunther
---------------------------
Quote
CopyString54.txt
-------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
***** Time table *****

14 cycles, crt_memcpy-     15 bytes- copy crt_memcpy
17 cycles, COPYAtoB_SSEJ-  53 bytes- copy 16 BYTES+MOVZX           <<<<---J

23 cycles, COPYAtoB_SSEJ- 103 bytes- copy 16 BYTES+MOVZX           <<<<---J
25 cycles, memcpy_1-       53 bytes- copy regcopy
27 cycles, memcpy_1-       15 bytes- copy regcopy
27 cycles, COPYAtoB_SSEJ-  15 bytes- copy 16 BYTES+MOVZX
30 cycles, memcpy_4-       15 bytes- copy memcpy * 8
32 cycles, memcpy_2-       15 bytes- copy memcpy SSE
34 cycles, memcpy_3-       15 bytes- copy memcpyxmmU SSE
35 cycles, memcpy_4-       53 bytes- copy memcpy * 8
36 cycles, crt_memcpy-     53 bytes- copy crt_memcpy
38 cycles, memcpy_2-       53 bytes- copy memcpy SSE
39 cycles, memcpy_3-       53 bytes- copy memcpyxmmU SSE
39 cycles, memcpy_1-      103 bytes- copy regcopy

41 cycles, COPYAtoB_SSEJ- 203 bytes- copy 16 BYTES+MOVZX           <<<<---J
44 cycles, memcpy_2-      103 bytes- copy memcpy SSE
46 cycles, memcpy_4-      103 bytes- copy memcpy * 8
47 cycles, crt_memcpy-    103 bytes- copy crt_memcpy
48 cycles, memcpy_3-      103 bytes- copy memcpyxmmU SSE
50 cycles, memcpy_3-      203 bytes- copy memcpyxmmU SSE
60 cycles, memcpy_2-      203 bytes- copy memcpy SSE
67 cycles, memcpy_4-      203 bytes- copy memcpy * 8
68 cycles, memcpy_1-      203 bytes- copy regcopy
68 cycles, crt_memcpy-    203 bytes- copy crt_memcpy

69 cycles, memcpy_3-      503 bytes- copy memcpyxmmU SSE           >>>>>---
82 cycles, COPYAtoB_SSEJ- 503 bytes- copy 16 BYTES+MOVZX           <<<<---J
86 cycles, memcpy_4-      503 bytes- copy memcpy * 8
93 cycles, crt_memcpy-    503 bytes- copy crt_memcpy

102 cycles, memcpy_3-     1027 bytes- copy memcpyxmmU SSE           >>>>>---
122 cycles, memcpy_4-     1027 bytes- copy memcpy * 8
129 cycles, crt_memcpy-   1027 bytes- copy crt_memcpy
132 cycles, memcpy_2-      503 bytes- copy memcpy SSE
132 cycles, COPYAtoB_SSEJ-1027 bytes- copy 16 BYTES+MOVZX           <<<<---J
146 cycles, memcpy_1-      503 bytes- copy regcopy

184 cycles, memcpy_3-     2062 bytes- copy memcpyxmmU SSE           >>>>>---
187 cycles, memcpy_2-     1027 bytes- copy memcpy SSE
210 cycles, memcpy_4-     2062 bytes- copy memcpy * 8
215 cycles, crt_memcpy-   2062 bytes- copy crt_memcpy
262 cycles, COPYAtoB_SSEJ-2062 bytes- copy 16 BYTES+MOVZX           <<<<---J

286 cycles, memcpy_1-     1027 bytes- copy regcopy
297 cycles, memcpy_2-     2062 bytes- copy memcpy SSE
582 cycles, memcpy_1-     2062 bytes- copy regcopy
********** END III **********
Quote
CopyString55.txt
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
***** Time table *****

14 cycles, crt_memcpy-     15 bytes- copy crt_memcpy
17 cycles, COPYAtoB_SSEK-  53 bytes- copy 16 BYTES+MOVZX       <<<<<---- K
17 cycles, COPYAtoB_SSEJ-  53 bytes- copy 16 BYTES+MOVZX       <<<<<---- J

25 cycles, memcpy_1-       53 bytes- copy regcopy
26 cycles, COPYAtoB_SSEJ-  15 bytes- copy 16 BYTES+MOVZX       <<<<<---- J
29 cycles, memcpy_1-       15 bytes- copy regcopy
30 cycles, COPYAtoB_SSEK-  15 bytes- copy 16 BYTES+MOVZX       <<<<<---- K
31 cycles, memcpy_4-       15 bytes- copy memcpy * 8
33 cycles, memcpy_2-       15 bytes- copy memcpy SSE
35 cycles, memcpy_3-       15 bytes- copy memcpyxmmU SSE
36 cycles, memcpy_4-       53 bytes- copy memcpy * 8
37 cycles, COPYAtoB_SSEJ- 103 bytes- copy 16 BYTES+MOVZX       <<<<<---- J
37 cycles, COPYAtoB_SSEK- 103 bytes- copy 16 BYTES+MOVZX       <<<<<---- K

37 cycles, crt_memcpy-     53 bytes- copy crt_memcpy
38 cycles, memcpy_2-       53 bytes- copy memcpy SSE
39 cycles, memcpy_3-       53 bytes- copy memcpyxmmU SSE
41 cycles, memcpy_1-      103 bytes- copy regcopy
46 cycles, memcpy_4-      103 bytes- copy memcpy * 8
47 cycles, crt_memcpy-    103 bytes- copy crt_memcpy
48 cycles, COPYAtoB_SSEJ- 203 bytes- copy 16 BYTES+MOVZX       <<<<<---- J
51 cycles, COPYAtoB_SSEK- 203 bytes- copy 16 BYTES+MOVZX       <<<<<---- K

52 cycles, memcpy_2-      103 bytes- copy memcpy SSE
53 cycles, memcpy_3-      103 bytes- copy memcpyxmmU SSE
55 cycles, memcpy_3-      203 bytes- copy memcpyxmmU SSE
62 cycles, memcpy_2-      203 bytes- copy memcpy SSE
67 cycles, COPYAtoB_SSEJ- 503 bytes- copy 16 BYTES+MOVZX       <<<<<---- J

68 cycles, memcpy_4-      203 bytes- copy memcpy * 8
69 cycles, memcpy_1-      203 bytes- copy regcopy
70 cycles, crt_memcpy-    203 bytes- copy crt_memcpy
72 cycles, memcpy_3-      503 bytes- copy memcpyxmmU SSE       >>>>>>>--
88 cycles, memcpy_4-      503 bytes- copy memcpy * 8
94 cycles, COPYAtoB_SSEK- 503 bytes- copy 16 BYTES+MOVZX       <<<<<---- K
97 cycles, crt_memcpy-    503 bytes- copy crt_memcpy

119 cycles, memcpy_3-     1027 bytes- copy memcpyxmmU SSE       >>>>>>--
126 cycles, memcpy_2-      503 bytes- copy memcpy SSE
133 cycles, memcpy_4-     1027 bytes- copy memcpy * 8
133 cycles, COPYAtoB_SSEJ-1027 bytes- copy 16 BYTES+MOVZX       <<<<<---- J
138 cycles, crt_memcpy-   1027 bytes- copy crt_memcpy
149 cycles, memcpy_1-      503 bytes- copy regcopy
162 cycles, COPYAtoB_SSEK-1027 bytes- copy 16 BYTES+MOVZX       <<<<<---- K
194 cycles, memcpy_2-     1027 bytes- copy memcpy SSE

196 cycles, memcpy_3-     2062 bytes- copy memcpyxmmU SSE       >>>>>>--
219 cycles, crt_memcpy-   2062 bytes- copy crt_memcpy
220 cycles, memcpy_4-     2062 bytes- copy memcpy * 8
262 cycles, COPYAtoB_SSEJ-2062 bytes- copy 16 BYTES+MOVZX       <<<<<---- J
289 cycles, memcpy_1-     1027 bytes- copy regcopy
296 cycles, COPYAtoB_SSEK-2062 bytes- copy 16 BYTES+MOVZX       <<<<<---- K

306 cycles, memcpy_2-     2062 bytes- copy memcpy SSE
597 cycles, memcpy_1-     2062 bytes- copy regcopy
********** END III **********
Title: Re: Sorting strings
Post by: RuiLoureiro on July 10, 2014, 09:30:03 PM
Hi Gunther,

            Now, i want to test SSEL in your i7.         
            Could you run CopyString56, please ?
            Thanks.
       
These are my results.
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
...   
  59 cycles, COPYAtoB_SSEL,   15 bytes - copy 16 BYTES+MOVZX
  64 cycles, COPYAtoB_SSEL,   53 bytes - copy 16 BYTES+MOVZX
199 cycles, COPYAtoB_SSEL,  103 bytes - copy 16 BYTES+MOVZX
445 cycles, COPYAtoB_SSEL,  203 bytes - copy 16 BYTES+MOVZX
959 cycles, COPYAtoB_SSEL,  503 bytes - copy 16 BYTES+MOVZX
2026 cycles, COPYAtoB_SSEL, 1027 bytes - copy 16 BYTES+MOVZX
4025 cycles, COPYAtoB_SSEL, 2062 bytes - copy 16 BYTES+MOVZX

  58 cycles, COPYAtoB_SSEJ,   15 bytes - copy 16 BYTES+MOVZX
  91 cycles, COPYAtoB_SSEJ,   53 bytes - copy 16 BYTES+MOVZX
204 cycles, COPYAtoB_SSEJ,  103 bytes - copy 16 BYTES+MOVZX
430 cycles, COPYAtoB_SSEJ,  203 bytes - copy 16 BYTES+MOVZX
974 cycles, COPYAtoB_SSEJ,  503 bytes - copy 16 BYTES+MOVZX
2046 cycles, COPYAtoB_SSEJ, 1027 bytes - copy 16 BYTES+MOVZX
4025 cycles, COPYAtoB_SSEJ, 2062 bytes - copy 16 BYTES+MOVZX

  50 cycles, COPYAtoB_SSEK,   15 bytes - copy 16 BYTES+MOVZX
  88 cycles, COPYAtoB_SSEK,   53 bytes - copy 16 BYTES+MOVZX
189 cycles, COPYAtoB_SSEK,  103 bytes - copy 16 BYTES+MOVZX
429 cycles, COPYAtoB_SSEK,  203 bytes - copy 16 BYTES+MOVZX
962 cycles, COPYAtoB_SSEK,  503 bytes - copy 16 BYTES+MOVZX
2035 cycles, COPYAtoB_SSEK, 1027 bytes - copy 16 BYTES+MOVZX
4044 cycles, COPYAtoB_SSEK, 2062 bytes - copy 16 BYTES+MOVZX
Title: Re: Sorting strings
Post by: Gunther on July 11, 2014, 12:06:29 AM
Rui,

results are attached in c56.zip. I think it's time now to sum up and provide the source code for the best procedures.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 11, 2014, 01:28:19 AM
Quote from: Gunther on July 11, 2014, 12:06:29 AM
Rui,
...
I think it's time now to sum up and provide the source code for the best procedures.
Gunther
Hi Gunther,
            You are doing a very good work.
            Thank you so much for this work.  :t

I am trying to get the best procedure that uses SSE (in your i7)
without alignement. I think i still need to adjust some code.
But if someone wants to know some proc i will give it to him.
If you want i will send to you.

Only now, we may compare COPYAtoB_SSEJ with COPYAtoB_SSEK
and we may see K is better than J, as i said before.
But it is not good for 15 bytes.

---------------------------------
Some results from Gunther
----------------------------------
Quote
--------------------------------------------------------------
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
--------------------------------------------------------------
13 cycles, crt_memcpy,   15 bytes - copy crt_memcpy
35 cycles, crt_memcpy,   53 bytes - copy crt_memcpy


46 cycles, crt_memcpy,  103 bytes - copy crt_memcpy
72 cycles, crt_memcpy,  203 bytes - copy crt_memcpy
95 cycles, crt_memcpy,  503 bytes - copy crt_memcpy
138 cycles, crt_memcpy, 1027 bytes - copy crt_memcpy
219 cycles, crt_memcpy, 2062 bytes - copy crt_memcpy

40 cycles, memcpy_1,   15 bytes - copy memcpy_1
24 cycles, memcpy_1,   53 bytes - copy memcpy_1
39 cycles, memcpy_1,  103 bytes - copy memcpy_1
69 cycles, memcpy_1,  203 bytes - copy memcpy_1
147 cycles, memcpy_1,  503 bytes - copy memcpy_1
294 cycles, memcpy_1, 1027 bytes - copy memcpy_1
606 cycles, memcpy_1, 2062 bytes - copy memcpy_1

33 cycles, memcpy_2,   15 bytes - copy memcpy_2
38 cycles, memcpy_2,   53 bytes - copy memcpy_2
54 cycles, memcpy_2,  103 bytes - copy memcpy_2
62 cycles, memcpy_2,  203 bytes - copy memcpy_2
125 cycles, memcpy_2,  503 bytes - copy memcpy_2
189 cycles, memcpy_2, 1027 bytes - copy memcpy_2
304 cycles, memcpy_2, 2062 bytes - copy memcpy_2
-----------------------------------------------------------
43 cycles, COPYAtoB_SSEL,   15 bytes - copy 16 BYTES+MOVZX
16 cycles, COPYAtoB_SSEL,   53 bytes - copy 16 BYTES+MOVZX

36 cycles, COPYAtoB_SSEL,  103 bytes - copy 16 BYTES+MOVZX
51 cycles, COPYAtoB_SSEL,  203 bytes - copy 16 BYTES+MOVZX

109 cycles, COPYAtoB_SSEL,  503 bytes - copy 16 BYTES+MOVZX
165 cycles, COPYAtoB_SSEL, 1027 bytes - copy 16 BYTES+MOVZX
301 cycles, COPYAtoB_SSEL, 2062 bytes - copy 16 BYTES+MOVZX
-----------------------------------------------------------
34 cycles, memcpy_3,   15 bytes - copy memcpy_3
39 cycles, memcpy_3,   53 bytes - copy memcpy_3

52 cycles, memcpy_3,  103 bytes - copy memcpy_3
54 cycles, memcpy_3,  203 bytes - copy memcpy_3

71 cycles, memcpy_3,  503 bytes - copy memcpy_3
117 cycles, memcpy_3, 1027 bytes - copy memcpy_3
199 cycles, memcpy_3, 2062 bytes - copy memcpy_3
----------------------------------------------------------
32 cycles, memcpy_4,   15 bytes - copy memcpy_4
35 cycles, memcpy_4,   53 bytes - copy memcpy_4
46 cycles, memcpy_4,  103 bytes - copy memcpy_4
69 cycles, memcpy_4,  203 bytes - copy memcpy_4
87 cycles, memcpy_4,  503 bytes - copy memcpy_4
130 cycles, memcpy_4, 1027 bytes - copy memcpy_4
214 cycles, memcpy_4, 2062 bytes - copy memcpy_4

16 cycles, COPYAtoB_SSEE,   15 bytes - copy 16 BYTES+MOVZX
18 cycles, COPYAtoB_SSEE,   53 bytes - copy 16 BYTES+MOVZX
40 cycles, COPYAtoB_SSEE,  103 bytes - copy 16 BYTES+MOVZX
53 cycles, COPYAtoB_SSEE,  203 bytes - copy 16 BYTES+MOVZX
110 cycles, COPYAtoB_SSEE,  503 bytes - copy 16 BYTES+MOVZX
164 cycles, COPYAtoB_SSEE, 1027 bytes - copy 16 BYTES+MOVZX
300 cycles, COPYAtoB_SSEE, 2062 bytes - copy 16 BYTES+MOVZX

48 cycles, COPYAtoB_SSEH,   15 bytes - copy 16 BYTES+MOVZX
18 cycles, COPYAtoB_SSEH,   53 bytes - copy 16 BYTES+MOVZX
41 cycles, COPYAtoB_SSEH,  103 bytes - copy 16 BYTES+MOVZX
55 cycles, COPYAtoB_SSEH,  203 bytes - copy 16 BYTES+MOVZX
72 cycles, COPYAtoB_SSEH,  503 bytes - copy 16 BYTES+MOVZX
135 cycles, COPYAtoB_SSEH, 1027 bytes - copy 16 BYTES+MOVZX
282 cycles, COPYAtoB_SSEH, 2062 bytes - copy 16 BYTES+MOVZX

27 cycles, COPYAtoB_SSEI,   15 bytes - copy 16 BYTES+MOVZX
15 cycles, COPYAtoB_SSEI,   53 bytes - copy 16 BYTES+MOVZX
39 cycles, COPYAtoB_SSEI,  103 bytes - copy 16 BYTES+MOVZX
49 cycles, COPYAtoB_SSEI,  203 bytes - copy 16 BYTES+MOVZX
68 cycles, COPYAtoB_SSEI,  503 bytes - copy 16 BYTES+MOVZX
134 cycles, COPYAtoB_SSEI, 1027 bytes - copy 16 BYTES+MOVZX
268 cycles, COPYAtoB_SSEI, 2062 bytes - copy 16 BYTES+MOVZX

41 cycles, COPYAtoB_SSEJ,   15 bytes - copy 16 BYTES+MOVZX
20 cycles, COPYAtoB_SSEJ,   53 bytes - copy 16 BYTES+MOVZX
39 cycles, COPYAtoB_SSEJ,  103 bytes - copy 16 BYTES+MOVZX
59 cycles, COPYAtoB_SSEJ,  203 bytes - copy 16 BYTES+MOVZX
115 cycles, COPYAtoB_SSEJ,  503 bytes - copy 16 BYTES+MOVZX
163 cycles, COPYAtoB_SSEJ, 1027 bytes - copy 16 BYTES+MOVZX
309 cycles, COPYAtoB_SSEJ, 2062 bytes - copy 16 BYTES+MOVZX

41 cycles, COPYAtoB_SSEK,   15 bytes - copy 16 BYTES+MOVZX
19 cycles, COPYAtoB_SSEK,   53 bytes - copy 16 BYTES+MOVZX
38 cycles, COPYAtoB_SSEK,  103 bytes - copy 16 BYTES+MOVZX
52 cycles, COPYAtoB_SSEK,  203 bytes - copy 16 BYTES+MOVZX
87 cycles, COPYAtoB_SSEK,  503 bytes - copy 16 BYTES+MOVZX
139 cycles, COPYAtoB_SSEK, 1027 bytes - copy 16 BYTES+MOVZX
277 cycles, COPYAtoB_SSEK, 2062 bytes - copy 16 BYTES+MOVZX
Title: Re: Sorting strings
Post by: RuiLoureiro on July 11, 2014, 02:28:57 AM
Hi Gunther,

        I would like to see the results from CopyString57
        Could you run it, please ?
        Thanks
(unfortunately i have not an i7 like yours, till now, to do it)       

Here are some results.
Quote
memcpy_3 doesn't preserve ESI, EDI

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
  35 cycles, crt_memcpy,   15 bytes - copy crt_memcpy
108 cycles, crt_memcpy,   53 bytes - copy crt_memcpy
139 cycles, crt_memcpy,  103 bytes - copy crt_memcpy
226 cycles, crt_memcpy,  203 bytes - copy crt_memcpy
446 cycles, crt_memcpy,  503 bytes - copy crt_memcpy
836 cycles, crt_memcpy, 1027 bytes - copy crt_memcpy
1670 cycles, crt_memcpy, 2062 bytes - copy crt_memcpy

  82 cycles, memcpy_1,   15 bytes - copy memcpy_1
  68 cycles, memcpy_1,   53 bytes - copy memcpy_1
106 cycles, memcpy_1,  103 bytes - copy memcpy_1
179 cycles, memcpy_1,  203 bytes - copy memcpy_1
400 cycles, memcpy_1,  503 bytes - copy memcpy_1
726 cycles, memcpy_1, 1027 bytes - copy memcpy_1
1596 cycles, memcpy_1, 2062 bytes - copy memcpy_1

251 cycles, memcpy_2,   15 bytes - copy memcpy_2
278 cycles, memcpy_2,   53 bytes - copy memcpy_2
383 cycles, memcpy_2,  103 bytes - copy memcpy_2
623 cycles, memcpy_2,  203 bytes - copy memcpy_2
1141 cycles, memcpy_2,  503 bytes - copy memcpy_2
2115 cycles, memcpy_2, 1027 bytes - copy memcpy_2
4113 cycles, memcpy_2, 2062 bytes - copy memcpy_2

  61 cycles, COPYAtoB_SSEL, 15 bytes - copy 16 BYTES+MOVZX
  73 cycles, COPYAtoB_SSEL, 53 bytes - copy 16 BYTES+MOVZX
198 cycles, COPYAtoB_SSEL, 103 bytes - copy 16 BYTES+MOVZX
441 cycles, COPYAtoB_SSEL, 203 bytes - copy 16 BYTES+MOVZX
957 cycles, COPYAtoB_SSEL, 503 bytes - copy 16 BYTES+MOVZX
2028 cycles, COPYAtoB_SSEL, 1027 bytes - copy 16 BYTES+MOVZX
4037 cycles, COPYAtoB_SSEL, 2062 bytes - copy 16 BYTES+MOVZX

256 cycles, memcpy_3, 15 bytes - copy memcpy_3
279 cycles, memcpy_3, 53 bytes - copy memcpy_3
380 cycles, memcpy_3, 103 bytes - copy memcpy_3
634 cycles, memcpy_3, 203 bytes - copy memcpy_3
1154 cycles, memcpy_3, 503 bytes - copy memcpy_3
2088 cycles, memcpy_3, 1027 bytes - copy memcpy_3
4364 cycles, memcpy_3, 2062 bytes - copy memcpy_3

  72 cycles, memcpy_4,   15 bytes - copy memcpy_4
102 cycles, memcpy_4,   53 bytes - copy memcpy_4
132 cycles, memcpy_4,  103 bytes - copy memcpy_4
213 cycles, memcpy_4,  203 bytes - copy memcpy_4
431 cycles, memcpy_4,  503 bytes - copy memcpy_4
828 cycles, memcpy_4, 1027 bytes - copy memcpy_4
1628 cycles, memcpy_4, 2062 bytes - copy memcpy_4

  50 cycles, COPYAtoB_SSEJ,   15 bytes - copy 16 BYTES+MOVZX
  90 cycles, COPYAtoB_SSEJ,   53 bytes - copy 16 BYTES+MOVZX
209 cycles, COPYAtoB_SSEJ,  103 bytes - copy 16 BYTES+MOVZX
432 cycles, COPYAtoB_SSEJ,  203 bytes - copy 16 BYTES+MOVZX
963 cycles, COPYAtoB_SSEJ,  503 bytes - copy 16 BYTES+MOVZX
2048 cycles, COPYAtoB_SSEJ, 1027 bytes - copy 16 BYTES+MOVZX
4021 cycles, COPYAtoB_SSEJ, 2062 bytes - copy 16 BYTES+MOVZX

  47 cycles, COPYAtoB_SSEK,   15 bytes - copy 16 BYTES+MOVZX
  85 cycles, COPYAtoB_SSEK,   53 bytes - copy 16 BYTES+MOVZX
186 cycles, COPYAtoB_SSEK,  103 bytes - copy 16 BYTES+MOVZX
431 cycles, COPYAtoB_SSEK,  203 bytes - copy 16 BYTES+MOVZX
961 cycles, COPYAtoB_SSEK,  503 bytes - copy 16 BYTES+MOVZX
2038 cycles, COPYAtoB_SSEK, 1027 bytes - copy 16 BYTES+MOVZX
4067 cycles, COPYAtoB_SSEK, 2062 bytes - copy 16 BYTES+MOVZX

  28 cycles, COPYAtoB_SSEM,   15 bytes - copy 16 BYTES+MOVZX
  61 cycles, COPYAtoB_SSEM,   53 bytes - copy 16 BYTES+MOVZX
177 cycles, COPYAtoB_SSEM,  103 bytes - copy 16 BYTES+MOVZX
416 cycles, COPYAtoB_SSEM,  203 bytes - copy 16 BYTES+MOVZX
962 cycles, COPYAtoB_SSEM,  503 bytes - copy 16 BYTES+MOVZX
2056 cycles, COPYAtoB_SSEM, 1027 bytes - copy 16 BYTES+MOVZX
4019 cycles, COPYAtoB_SSEM, 2062 bytes - copy 16 BYTES+MOVZX

  36 cycles, COPYAtoB_SSEN,   15 bytes - copy 16 BYTES+MOVZX
  66 cycles, COPYAtoB_SSEN,   53 bytes - copy 16 BYTES+MOVZX
158 cycles, COPYAtoB_SSEN,  103 bytes - copy 16 BYTES+MOVZX
421 cycles, COPYAtoB_SSEN,  203 bytes - copy 16 BYTES+MOVZX
950 cycles, COPYAtoB_SSEN,  503 bytes - copy 16 BYTES+MOVZX
2043 cycles, COPYAtoB_SSEN, 1027 bytes - copy 16 BYTES+MOVZX
4032 cycles, COPYAtoB_SSEN, 2062 bytes - copy 16 BYTES+MOVZX

  34 cycles, COPYAtoB_SSEA,   15 bytes - copy 16 BYTES+MOVZX
  97 cycles, COPYAtoB_SSEA,   53 bytes - copy 16 BYTES+MOVZX
212 cycles, COPYAtoB_SSEA,  103 bytes - copy 16 BYTES+MOVZX
446 cycles, COPYAtoB_SSEA,  203 bytes - copy 16 BYTES+MOVZX
988 cycles, COPYAtoB_SSEA,  503 bytes - copy 16 BYTES+MOVZX
2065 cycles, COPYAtoB_SSEA, 1027 bytes - copy 16 BYTES+MOVZX
4093 cycles, COPYAtoB_SSEA, 2062 bytes - copy 16 BYTES+MOVZX
Title: Re: Sorting strings
Post by: nidud on July 11, 2014, 05:19:29 AM
deleted
Title: Re: Sorting strings
Post by: RuiLoureiro on July 11, 2014, 06:04:07 AM
Quote
If you compare two functions to test the copy-algorithm
they should be called with the same arguments.
Yes if you don't want to see the difference between
        invoke  ProcX, pDst, pSrc, Len
        and
        invoke  ProcY, pDst, pSrc (we get the length from pSrc).
Quote
The memcpy I wrote have three arguments and is compared
with functions using two, so this should be aligned before the test.
Not for all cases. Some have three arguments also.
        I don't want to compare it in that way because
        i have no interest in procedures like ProcX.
Quote
There should also be a test to see if the algo actually works,
so I added some of the functions used and most of them failed.
I do it but it is possible that some have some bugs yet.

I think you tried to modify COPYAtoB_XZE to follow your model
but this:

  toend:
   mov   eax,dst                ; added
   ret
is useless for me.

EDIT:
Quote
so I added some of the functions used and most of them failed.
I am reading the procedure COPYAtoB_XZE that
you wrote and it seems that you need to study
what i wrote.

Where did you get COPYAtoB_XZE ?

Title: Re: Sorting strings
Post by: nidud on July 11, 2014, 07:43:40 AM
deleted
Title: Re: Sorting strings
Post by: Gunther on July 11, 2014, 08:03:38 AM
Hi Rui,

results are attached in c57.zip.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 12, 2014, 12:12:52 AM
Quote
As I follow my model you should follow yours.
But you said: «so I added some of the functions used
and most of them failed.»
        What fails ?
        You may use COPYAtoB_XZE, etc. but call it
        COPYAtoB_XZEm, etc. because it is not
        exactly equal to COPYAtoB_XZE.       
Quote
So why do you include the memcpy_x procedures then?
They are one more procedure.
        You may do the comparisons you want to do,
        the results are here. In my reply 120
        i did one comparison (in my P4). After that
        the problem was about SSE in i7. memcpy_3
        is one that uses movdqu.
        It may be one reference used to develop
        other procs.
Another thing:
   myPocX  proc uses esi edi dst, src
                 mov   edi,dst
                 mov   esi,src
                 mov   ecx,[edi-4]
   you may do it but i have nothing to do with it.
Title: Re: Sorting strings
Post by: nidud on July 12, 2014, 03:10:00 AM
deleted
Title: Re: Sorting strings
Post by: RuiLoureiro on July 12, 2014, 07:35:06 AM
Quote from: Gunther on July 11, 2014, 08:03:38 AM
Hi Rui,
results are attached in c57.zip.
Gunther
Thank you Gunther  :t
Title: Re: Sorting strings
Post by: Gunther on July 13, 2014, 08:14:31 PM
Hi Rui,

Quote from: RuiLoureiro on July 12, 2014, 07:35:06 AM
               Thank you Gunther  :t

You're welcome. Did you solve all of your questions?

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 14, 2014, 06:44:24 PM
Hi Gunther,
                   No. I need to do some work yet.
Title: Re: Sorting strings
Post by: Gunther on July 14, 2014, 10:40:15 PM
Hi Rui,

Quote from: RuiLoureiro on July 14, 2014, 06:44:24 PM
                   No. I need to do some work yet.

Good luck.  :icon14:

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 16, 2014, 01:40:38 AM
Thanks Gunther  :icon14:
Hi
    i wrote a new macro to get the mean values (and...).
    In the second mean, i remove the worst case.
    For example, we run crt_memcpy 5 times (each 5 times).
    In the second mean, we remove 46768 (First SAMPLE).
    In these tests, crt_memcpy is far better here.
    note: COPYAtoB_MOVSB is = rep  movsb
First SAMPLE
Quote
46768 cycles, crt_memcpy - 1...256
43461 cycles, crt_memcpy - 1...256
41069 cycles, crt_memcpy - 1...256
41262 cycles, crt_memcpy - 1...256
40815 cycles, crt_memcpy - 1...256
72366 cycles, memcpy_1 - 1...256
72305 cycles, memcpy_1 - 1...256
74966 cycles, memcpy_1 - 1...256
72619 cycles, memcpy_1 - 1...256
72288 cycles, memcpy_1 - 1...256
109811 cycles, memcpy_2 - 1...256
110395 cycles, memcpy_2 - 1...256
109911 cycles, memcpy_2 - 1...256
110186 cycles, memcpy_2 - 1...256
110124 cycles, memcpy_2 - 1...256
113295 cycles, memcpy_3 - 1...256
113058 cycles, memcpy_3 - 1...256
113821 cycles, memcpy_3 - 1...256
113101 cycles, memcpy_3 - 1...256
112885 cycles, memcpy_3 - 1...256
74974 cycles, memcpy_4 - 1...256
74030 cycles, memcpy_4 - 1...256
73724 cycles, memcpy_4 - 1...256
74066 cycles, memcpy_4 - 1...256
74708 cycles, memcpy_4 - 1...256
75788 cycles, COPYAtoB_XZE - 1...256
73080 cycles, COPYAtoB_XZE - 1...256
72975 cycles, COPYAtoB_XZE - 1...256
73927 cycles, COPYAtoB_XZE - 1...256
73285 cycles, COPYAtoB_XZE - 1...256
121278 cycles, COPYAtoB_MOVSB - 1...256
121220 cycles, COPYAtoB_MOVSB - 1...256
122102 cycles, COPYAtoB_MOVSB - 1...256
121056 cycles, COPYAtoB_MOVSB - 1...256
121417 cycles, COPYAtoB_MOVSB - 1...256
*** Press any key to get the time table ***

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

42675 cycles, crt_memcpy
72908 cycles, memcpy_1
73811 cycles, COPYAtoB_XZE
74300 cycles, memcpy_4
110085 cycles, memcpy_2
113232 cycles, memcpy_3
121414 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

41651 cycles, crt_memcpy
72394 cycles, memcpy_1
73316 cycles, COPYAtoB_XZE
74132 cycles, memcpy_4
110008 cycles, memcpy_2
113084 cycles, memcpy_3
121242 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
Second SAMPLE
Quote
47700 cycles, crt_memcpy - 1...256
41819 cycles, crt_memcpy - 1...256
41564 cycles, crt_memcpy - 1...256
41168 cycles, crt_memcpy - 1...256
41471 cycles, crt_memcpy - 1...256
72293 cycles, memcpy_1 - 1...256
75131 cycles, memcpy_1 - 1...256
74058 cycles, memcpy_1 - 1...256
73880 cycles, memcpy_1 - 1...256
73448 cycles, memcpy_1 - 1...256
111278 cycles, memcpy_2 - 1...256
111630 cycles, memcpy_2 - 1...256
109758 cycles, memcpy_2 - 1...256
110357 cycles, memcpy_2 - 1...256
113262 cycles, memcpy_2 - 1...256
112782 cycles, memcpy_3 - 1...256
114002 cycles, memcpy_3 - 1...256
113916 cycles, memcpy_3 - 1...256
112757 cycles, memcpy_3 - 1...256
113764 cycles, memcpy_3 - 1...256
73673 cycles, memcpy_4 - 1...256
73540 cycles, memcpy_4 - 1...256
73474 cycles, memcpy_4 - 1...256
75833 cycles, memcpy_4 - 1...256
75212 cycles, memcpy_4 - 1...256
74600 cycles, COPYAtoB_XZE - 1...256
73008 cycles, COPYAtoB_XZE - 1...256
73138 cycles, COPYAtoB_XZE - 1...256
73939 cycles, COPYAtoB_XZE - 1...256
73418 cycles, COPYAtoB_XZE - 1...256
121127 cycles, COPYAtoB_MOVSB - 1...256
122576 cycles, COPYAtoB_MOVSB - 1...256
121290 cycles, COPYAtoB_MOVSB - 1...256
121280 cycles, COPYAtoB_MOVSB - 1...256
121743 cycles, COPYAtoB_MOVSB - 1...256
*** Press any key to get the time table ***

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

42744 cycles, crt_memcpy
73620 cycles, COPYAtoB_XZE
73762 cycles, memcpy_1
74346 cycles, memcpy_4
111257 cycles, memcpy_2
113444 cycles, memcpy_3
121603 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

41505 cycles, crt_memcpy
73375 cycles, COPYAtoB_XZE
73419 cycles, memcpy_1
73974 cycles, memcpy_4
110755 cycles, memcpy_2
113304 cycles, memcpy_3
121360 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
The effect of the worst case
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****
...
73463 cycles, COPYAtoB_XZZE
73816 cycles, COPYAtoB_XZE
73919 cycles, COPYAtoB_XZZC
74104 cycles, memcpy_4
74454 cycles, COPYAtoB_YZZE
74939 cycles, COPYAtoB_XZZF
110328 cycles, memcpy_2
113520 cycles, memcpy_3
121795 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****
...
73323 cycles, COPYAtoB_XZZE
73475 cycles, COPYAtoB_XZE
73599 cycles, COPYAtoB_XZZF
73779 cycles, COPYAtoB_XZZC
73885 cycles, memcpy_4
74152 cycles, COPYAtoB_YZZE
110006 cycles, memcpy_2
113457 cycles, memcpy_3
121506 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
Title: Re: Sorting strings
Post by: RuiLoureiro on July 16, 2014, 08:22:43 PM
Hi,
    In the following test i did

JUMPCODE            MACRO
                    LOCAL       label0
                    invoke      Sleep, 100
                    jmp         label0
                    db 4096 dup('x')
    label0:               
ENDM

REPEAT 3
        JUMPCODE
        BEGIN_COUNTER_CYCLE_HIGH_PRIORITY_CLASS  $start, $end 
        invoke      PROCEDUERE_???, ...
        END_COUNTER_CYCLE <...>
        ...
        5 times
       ;-----------------------------------------------------
        JUMPCODE
        BEGIN_COUNTER_CYCLE_HIGH_PRIORITY_CLASS  $start, $end 
        invoke      newPROCEDUERE_???, ...
        END_COUNTER_CYCLE <...>
        ...
        5 times               
       ;-----------------------------------------------------
        ...
ENDM

Here are the results:
SAMPLE 1

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

178495 cycles, crt_memcpy
178933 cycles, crt_memcpy
243835 cycles, COPYAtoB_SSEK
243843 cycles, COPYAtoB_SSEN
245876 cycles, COPYAtoB_SSEB
247870 cycles, COPYAtoB_SSEM
248160 cycles, COPYAtoB_SSEJ
251895 cycles, COPYAtoB_SSEH
252662 cycles, COPYAtoB_SSEY
252845 cycles, COPYAtoB_SSEE
258006 cycles, COPYAtoB_SSEI
260916 cycles, COPYAtoB_WZZE
261092 cycles, COPYAtoB_SSEL
262787 cycles, COPYAtoB_YZZH
263338 cycles, COPYAtoB_YZZI
263700 cycles, COPYAtoB_YZZK
263713 cycles, COPYAtoB_YZZJ
267937 cycles, COPYAtoB_SSEX
269713 cycles, COPYAtoB_XZZC
270168 cycles, COPYAtoB_XZZF
270763 cycles, COPYAtoB_YZZG
276750 cycles, memcpy_4
279076 cycles, COPYAtoB_YZZE
279560 cycles, COPYAtoB_WZZF
288258 cycles, memcpy_1
295782 cycles, COPYAtoB_XZE
297157 cycles, MOVEAtoB_SSEC
300198 cycles, COPYAtoB_SSEA
303091 cycles, COPYAtoB_SSEC
305265 cycles, COPYAtoB_XZZE
325851 cycles, memcpy_2
335337 cycles, memcpy_3
461265 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, crt_memcpy
241691 cycles, COPYAtoB_SSEB
243236 cycles, COPYAtoB_SSEM
243653 cycles, COPYAtoB_SSEK
243672 cycles, COPYAtoB_SSEN
247654 cycles, COPYAtoB_SSEH
247750 cycles, COPYAtoB_SSEJ
251988 cycles, COPYAtoB_SSEY
252137 cycles, COPYAtoB_SSEE
256390 cycles, COPYAtoB_SSEI
260426 cycles, COPYAtoB_WZZE
260548 cycles, COPYAtoB_SSEL
262277 cycles, memcpy_1
262317 cycles, COPYAtoB_SSEX
262507 cycles, COPYAtoB_YZZH
262724 cycles, COPYAtoB_YZZI
262966 cycles, COPYAtoB_YZZK
263086 cycles, COPYAtoB_YZZJ
267991 cycles, COPYAtoB_YZZG
268505 cycles, COPYAtoB_XZZC
268694 cycles, COPYAtoB_XZZF
271661 cycles, COPYAtoB_YZZE
273190 cycles, memcpy_4
278803 cycles, COPYAtoB_WZZF
294645 cycles, COPYAtoB_XZE
296811 cycles, MOVEAtoB_SSEC
299978 cycles, COPYAtoB_SSEA
301585 cycles, COPYAtoB_XZZE
302374 cycles, COPYAtoB_SSEC
325495 cycles, memcpy_2
328389 cycles, memcpy_3
459663 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

SAMPLE2

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

178495 cycles, crt_memcpy
178933 cycles, COPYAtoB_MOVSB
243835 cycles, COPYAtoB_SSEI
243843 cycles, COPYAtoB_SSEY
245876 cycles, COPYAtoB_SSEJ
247870 cycles, COPYAtoB_SSEE
248160 cycles, COPYAtoB_WZZE
251895 cycles, COPYAtoB_YZZH
252662 cycles, COPYAtoB_SSEK
252845 cycles, COPYAtoB_YZZI
258006 cycles, COPYAtoB_SSEL
260916 cycles, COPYAtoB_SSEX
261092 cycles, COPYAtoB_YZZK
262787 cycles, COPYAtoB_YZZE
263338 cycles, memcpy_4
263700 cycles, COPYAtoB_XZZC
263713 cycles, COPYAtoB_YZZG
267937 cycles, COPYAtoB_SSEN
269713 cycles, COPYAtoB_XZE
270168 cycles, memcpy_1
270763 cycles, COPYAtoB_XZZF
276750 cycles, COPYAtoB_SSEC
279076 cycles, COPYAtoB_WZZF
279560 cycles, COPYAtoB_YZZJ
288258 cycles, memcpy_3
295782 cycles, COPYAtoB_SSEA
297157 cycles, crt_memcpy
300198 cycles, COPYAtoB_SSEH
303091 cycles, COPYAtoB_SSEM
305265 cycles, MOVEAtoB_SSEC
325851 cycles, memcpy_2
335337 cycles, COPYAtoB_XZZE
461265 cycles, COPYAtoB_SSEB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, COPYAtoB_MOVSB
241691 cycles, COPYAtoB_SSEJ
243236 cycles, COPYAtoB_SSEE
243653 cycles, COPYAtoB_SSEI
243672 cycles, COPYAtoB_SSEY
247654 cycles, COPYAtoB_YZZH
247750 cycles, COPYAtoB_WZZE
251988 cycles, COPYAtoB_SSEK
252137 cycles, COPYAtoB_YZZI
256390 cycles, COPYAtoB_SSEL
260426 cycles, COPYAtoB_SSEX
260548 cycles, COPYAtoB_YZZK
262277 cycles, memcpy_3
262317 cycles, COPYAtoB_SSEN
262507 cycles, COPYAtoB_YZZE
262724 cycles, memcpy_4
262966 cycles, COPYAtoB_XZZC
263086 cycles, COPYAtoB_YZZG
267991 cycles, COPYAtoB_XZZF
268505 cycles, COPYAtoB_XZE
268694 cycles, memcpy_1
271661 cycles, COPYAtoB_WZZF
273190 cycles, COPYAtoB_SSEC
278803 cycles, COPYAtoB_YZZJ
294645 cycles, COPYAtoB_SSEA
296811 cycles, crt_memcpy
299978 cycles, COPYAtoB_SSEH
301585 cycles, MOVEAtoB_SSEC
302374 cycles, COPYAtoB_SSEM
325495 cycles, memcpy_2
328389 cycles, COPYAtoB_XZZE
459663 cycles, COPYAtoB_SSEB
********** END SortMeans **********

SAMPLE 3

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

178495 cycles, crt_memcpy
178933 cycles, COPYAtoB_SSEB
243835 cycles, COPYAtoB_SSEL
243843 cycles, COPYAtoB_SSEK
245876 cycles, COPYAtoB_WZZE
247870 cycles, COPYAtoB_YZZI
248160 cycles, COPYAtoB_SSEX
251895 cycles, COPYAtoB_YZZE
252662 cycles, COPYAtoB_SSEI
252845 cycles, memcpy_4
258006 cycles, COPYAtoB_YZZK
260916 cycles, COPYAtoB_SSEN
261092 cycles, COPYAtoB_XZZC
262787 cycles, COPYAtoB_WZZF
263338 cycles, COPYAtoB_SSEC
263700 cycles, COPYAtoB_XZE
263713 cycles, COPYAtoB_XZZF
267937 cycles, COPYAtoB_SSEY
269713 cycles, COPYAtoB_SSEA
270168 cycles, memcpy_3
270763 cycles, memcpy_1
276750 cycles, COPYAtoB_SSEM
279076 cycles, COPYAtoB_YZZJ
279560 cycles, COPYAtoB_YZZG
288258 cycles, COPYAtoB_XZZE
295782 cycles, COPYAtoB_SSEH
297157 cycles, COPYAtoB_MOVSB
300198 cycles, COPYAtoB_YZZH
303091 cycles, COPYAtoB_SSEE
305265 cycles, crt_memcpy
325851 cycles, memcpy_2
335337 cycles, MOVEAtoB_SSEC
461265 cycles, COPYAtoB_SSEJ
********** END SortMeans **********


-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, COPYAtoB_SSEB
241691 cycles, COPYAtoB_WZZE
243236 cycles, COPYAtoB_YZZI
243653 cycles, COPYAtoB_SSEL
243672 cycles, COPYAtoB_SSEK
247654 cycles, COPYAtoB_YZZE
247750 cycles, COPYAtoB_SSEX
251988 cycles, COPYAtoB_SSEI
252137 cycles, memcpy_4
256390 cycles, COPYAtoB_YZZK
260426 cycles, COPYAtoB_SSEN
260548 cycles, COPYAtoB_XZZC
262277 cycles, COPYAtoB_XZZE
262317 cycles, COPYAtoB_SSEY
262507 cycles, COPYAtoB_WZZF
262724 cycles, COPYAtoB_SSEC
262966 cycles, COPYAtoB_XZE
263086 cycles, COPYAtoB_XZZF
267991 cycles, memcpy_1
268505 cycles, COPYAtoB_SSEA
268694 cycles, memcpy_3
271661 cycles, COPYAtoB_YZZJ
273190 cycles, COPYAtoB_SSEM
278803 cycles, COPYAtoB_YZZG
294645 cycles, COPYAtoB_SSEH
296811 cycles, COPYAtoB_MOVSB
299978 cycles, COPYAtoB_YZZH
301585 cycles, crt_memcpy
302374 cycles, COPYAtoB_SSEE
325495 cycles, memcpy_2
328389 cycles, MOVEAtoB_SSEC
459663 cycles, COPYAtoB_SSEJ
********** END SortMeans **********

Title: Re: Sorting strings
Post by: RuiLoureiro on July 16, 2014, 08:26:41 PM
Now, see the tables of the mean values.

SAMPLE 1
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means- worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, crt_memcpy
241691 cycles, COPYAtoB_SSEB
243236 cycles, COPYAtoB_SSEM
243653 cycles, COPYAtoB_SSEK
243672 cycles, COPYAtoB_SSEN
247654 cycles, COPYAtoB_SSEH
247750 cycles, COPYAtoB_SSEJ
251988 cycles, COPYAtoB_SSEY
252137 cycles, COPYAtoB_SSEE
256390 cycles, COPYAtoB_SSEI
260426 cycles, COPYAtoB_WZZE
260548 cycles, COPYAtoB_SSEL
262277 cycles, memcpy_1
262317 cycles, COPYAtoB_SSEX
262507 cycles, COPYAtoB_YZZH
262724 cycles, COPYAtoB_YZZI
262966 cycles, COPYAtoB_YZZK
263086 cycles, COPYAtoB_YZZJ
267991 cycles, COPYAtoB_YZZG
268505 cycles, COPYAtoB_XZZC
268694 cycles, COPYAtoB_XZZF
271661 cycles, COPYAtoB_YZZE
273190 cycles, memcpy_4
278803 cycles, COPYAtoB_WZZF
294645 cycles, COPYAtoB_XZE
296811 cycles, MOVEAtoB_SSEC
299978 cycles, COPYAtoB_SSEA
301585 cycles, COPYAtoB_XZZE
302374 cycles, COPYAtoB_SSEC
325495 cycles, memcpy_2
328389 cycles, memcpy_3
459663 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
What happened here ?

COPYAtoB_SSEB is the last and one of crt_memcpy
is not the first. COPYAtoB_MOVSB is the second.

SAMPLE 2
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, COPYAtoB_MOVSB
241691 cycles, COPYAtoB_SSEJ
243236 cycles, COPYAtoB_SSEE
243653 cycles, COPYAtoB_SSEI
243672 cycles, COPYAtoB_SSEY
247654 cycles, COPYAtoB_YZZH
247750 cycles, COPYAtoB_WZZE
251988 cycles, COPYAtoB_SSEK
252137 cycles, COPYAtoB_YZZI
256390 cycles, COPYAtoB_SSEL
260426 cycles, COPYAtoB_SSEX
260548 cycles, COPYAtoB_YZZK
262277 cycles, memcpy_3
262317 cycles, COPYAtoB_SSEN
262507 cycles, COPYAtoB_YZZE
262724 cycles, memcpy_4
262966 cycles, COPYAtoB_XZZC
263086 cycles, COPYAtoB_YZZG
267991 cycles, COPYAtoB_XZZF
268505 cycles, COPYAtoB_XZE
268694 cycles, memcpy_1
271661 cycles, COPYAtoB_WZZF
273190 cycles, COPYAtoB_SSEC
278803 cycles, COPYAtoB_YZZJ
294645 cycles, COPYAtoB_SSEA
296811 cycles, crt_memcpy
299978 cycles, COPYAtoB_SSEH
301585 cycles, MOVEAtoB_SSEC
302374 cycles, COPYAtoB_SSEM
325495 cycles, memcpy_2
328389 cycles, COPYAtoB_XZZE
459663 cycles, COPYAtoB_SSEB
********** END SortMeans **********
Here, COPYAtoB_SSEB is very close to crt_memcpy

SAMPLE 3
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, COPYAtoB_SSEB
241691 cycles, COPYAtoB_WZZE
243236 cycles, COPYAtoB_YZZI
243653 cycles, COPYAtoB_SSEL
243672 cycles, COPYAtoB_SSEK
247654 cycles, COPYAtoB_YZZE
247750 cycles, COPYAtoB_SSEX
251988 cycles, COPYAtoB_SSEI
252137 cycles, memcpy_4
256390 cycles, COPYAtoB_YZZK
260426 cycles, COPYAtoB_SSEN
260548 cycles, COPYAtoB_XZZC
262277 cycles, COPYAtoB_XZZE
262317 cycles, COPYAtoB_SSEY
262507 cycles, COPYAtoB_WZZF
262724 cycles, COPYAtoB_SSEC
262966 cycles, COPYAtoB_XZE
263086 cycles, COPYAtoB_XZZF
267991 cycles, memcpy_1
268505 cycles, COPYAtoB_SSEA
268694 cycles, memcpy_3
271661 cycles, COPYAtoB_YZZJ
273190 cycles, COPYAtoB_SSEM
278803 cycles, COPYAtoB_YZZG
294645 cycles, COPYAtoB_SSEH
296811 cycles, COPYAtoB_MOVSB
299978 cycles, COPYAtoB_YZZH
301585 cycles, crt_memcpy
302374 cycles, COPYAtoB_SSEE
325495 cycles, memcpy_2
328389 cycles, MOVEAtoB_SSEC
459663 cycles, COPYAtoB_SSEJ
********** END SortMeans **********
----------------------------------------------------------
EDIT:
Sorry, i put REPEAT 3 in the wrong place.
The problem is with messages.
The results are very regulars always.
The SSE COPYAtoB_SSEL is far from crt_memcpy.

COPYAtoB_SSEL uses only 1 push/pop ebx;
copy forward 16 bytes at a time (one movdqu)
the remainder is copied byte by byte.
If less than 16, copy byte by byte forward.

COPYAtoB_SSEX and COPYAtoB_SSEY uses
the idea in the reply #140.
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

171473 cycles, crt_memcpy
183587 cycles, crt_memcpy
265709 cycles, COPYAtoB_SSEL
266661 cycles, COPYAtoB_SSEI
267588 cycles, COPYAtoB_SSEJ
268569 cycles, COPYAtoB_SSEK
268958 cycles, COPYAtoB_SSEM
269628 cycles, COPYAtoB_SSEN
270791 cycles, COPYAtoB_SSEH
271806 cycles, COPYAtoB_SSEB
278893 cycles, COPYAtoB_SSEY
279461 cycles, COPYAtoB_SSEX
282468 cycles, COPYAtoB_SSEE
285235 cycles, memcpy_1
286518 cycles, COPYAtoB_WZZF
287311 cycles, COPYAtoB_YZZI
287359 cycles, COPYAtoB_YZZE
289489 cycles, COPYAtoB_YZZH
289555 cycles, COPYAtoB_YZZG
295935 cycles, COPYAtoB_YZZJ
296890 cycles, COPYAtoB_YZZK
297993 cycles, COPYAtoB_WZZE
298908 cycles, memcpy_4
300577 cycles, MOVEAtoB_SSEC
301085 cycles, COPYAtoB_XZE
301981 cycles, COPYAtoB_XZZC
302835 cycles, COPYAtoB_XZZF
303579 cycles, COPYAtoB_SSEC
304886 cycles, COPYAtoB_SSEA
321286 cycles, COPYAtoB_XZZE
337077 cycles, memcpy_3
338400 cycles, memcpy_2
377135 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case removed *****

169657 cycles, crt_memcpy
177504 cycles, crt_memcpy
265000 cycles, COPYAtoB_SSEL
265024 cycles, COPYAtoB_SSEK
266432 cycles, COPYAtoB_SSEI
266540 cycles, COPYAtoB_SSEJ
268096 cycles, COPYAtoB_SSEM
269270 cycles, COPYAtoB_SSEN
270243 cycles, COPYAtoB_SSEH
270836 cycles, COPYAtoB_SSEB
277654 cycles, COPYAtoB_SSEY
277846 cycles, COPYAtoB_SSEE
278930 cycles, COPYAtoB_SSEX
282976 cycles, COPYAtoB_WZZF- use registers
284974 cycles, memcpy_1
286742 cycles, COPYAtoB_YZZI- use registers
286824 cycles, COPYAtoB_YZZE- use registers
287296 cycles, COPYAtoB_YZZG- use registers
288297 cycles, COPYAtoB_YZZH- use registers
295148 cycles, COPYAtoB_YZZJ- use registers
296640 cycles, COPYAtoB_YZZK- use registers
296923 cycles, memcpy_4
297154 cycles, COPYAtoB_WZZE- use registers
298818 cycles, COPYAtoB_XZZC- use registers
300358 cycles, COPYAtoB_XZE- use registers
300368 cycles, MOVEAtoB_SSEC
300885 cycles, COPYAtoB_XZZF- use registers
303185 cycles, COPYAtoB_SSEC
303831 cycles, COPYAtoB_SSEA
308246 cycles, COPYAtoB_XZZE- use registers
336636 cycles, memcpy_3
338190 cycles, memcpy_2
376028 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
Title: Re: Sorting strings
Post by: RuiLoureiro on July 18, 2014, 03:56:59 AM
Hi
    It seems i found one good way of getting
    good results using SSE in my P4.
   

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

98848 cycles, COPYAtoB_DQUB
171329 cycles, crt_memcpy
172769 cycles, crt_memcpy
249969 cycles, COPYAtoB_SSEP
251874 cycles, COPYAtoB_SSEM
251935 cycles, COPYAtoB_SSEB
253224 cycles, COPYAtoB_SSEY
254572 cycles, COPYAtoB_SSEX
254574 cycles, COPYAtoB_SSEH
255315 cycles, COPYAtoB_SSEK
255530 cycles, COPYAtoB_SSEN
255983 cycles, COPYAtoB_SSEE
257129 cycles, COPYAtoB_SSEL
257529 cycles, COPYAtoB_SSEO
258470 cycles, COPYAtoB_SSEJ
258917 cycles, COPYAtoB_SSEI
259654 cycles, COPYAtoB_DQUA
271520 cycles, memcpy_1
272597 cycles, COPYAtoB_WZZE- use registers
273014 cycles, COPYAtoB_WZZF- use registers
273203 cycles, COPYAtoB_YZZI- use registers
273599 cycles, memcpy_4
274972 cycles, COPYAtoB_YZZJ- use registers
276144 cycles, COPYAtoB_YZZH- use registers
276701 cycles, COPYAtoB_YZZK- use registers
276805 cycles, COPYAtoB_YZZG- use registers
276938 cycles, COPYAtoB_YZZE- use registers
282309 cycles, COPYAtoB_XZZC- use registers
283912 cycles, COPYAtoB_XZZF- use registers
285191 cycles, COPYAtoB_XZE - use registers
290683 cycles, COPYAtoB_XZZE- use registers
297654 cycles, MOVEAtoB_SSEC
301184 cycles, COPYAtoB_SSEA
303333 cycles, COPYAtoB_SSEC
329163 cycles, memcpy_3
340136 cycles, memcpy_2
379940 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case REMOVED*****

94062 cycles, COPYAtoB_DQUB
171003 cycles, crt_memcpy
171048 cycles, crt_memcpy
249520 cycles, COPYAtoB_SSEP
251563 cycles, COPYAtoB_SSEB
251637 cycles, COPYAtoB_SSEM
252778 cycles, COPYAtoB_SSEY
252926 cycles, COPYAtoB_SSEN
253431 cycles, COPYAtoB_SSEX
253792 cycles, COPYAtoB_SSEH
254940 cycles, COPYAtoB_SSEL
255056 cycles, COPYAtoB_SSEK
255683 cycles, COPYAtoB_SSEE
257176 cycles, COPYAtoB_SSEO
257666 cycles, COPYAtoB_SSEJ
258695 cycles, COPYAtoB_SSEI
259191 cycles, COPYAtoB_DQUA
268607 cycles, COPYAtoB_WZZF- use registers
271345 cycles, memcpy_1
272253 cycles, COPYAtoB_WZZE- use registers
272671 cycles, memcpy_4
272980 cycles, COPYAtoB_YZZI- use registers
273729 cycles, COPYAtoB_YZZG- use registers
274777 cycles, COPYAtoB_YZZJ- use registers
275167 cycles, COPYAtoB_YZZH- use registers
276398 cycles, COPYAtoB_YZZK- use registers
276478 cycles, COPYAtoB_YZZE- use registers
281989 cycles, COPYAtoB_XZZC- use registers
282306 cycles, COPYAtoB_XZZF- use registers
283778 cycles, COPYAtoB_XZE - use registers
288701 cycles, COPYAtoB_XZZE- use registers
296902 cycles, MOVEAtoB_SSEC
298823 cycles, COPYAtoB_SSEA
302362 cycles, COPYAtoB_SSEC
328756 cycles, memcpy_3
338998 cycles, memcpy_2
379394 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
Title: Re: Sorting strings
Post by: Gunther on July 18, 2014, 06:31:22 AM
Hi Rui,

Quote from: RuiLoureiro on July 18, 2014, 03:56:59 AM
Hi
    It seems i found one good way of getting
    good results using SSE in my P4.

sounds good.  :t Go forward.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 18, 2014, 07:15:24 PM
Hi Gunther,  :t

            Now we can see that
            we can get good results
            using registers (not SSE)
            and it has nothing to do with
            these helps (?) that we get here
            (Compare these results with the previous).
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case removed *****

68156 cycles, COPYAtoB_DQUB-
69378 cycles, COPYAtoB_DQUC-
93990 cycles, COPYAtoB_WZZE- use registers
94727 cycles, memcpy_1
97214 cycles, COPYAtoB_YZZJ- use registers
97233 cycles, COPYAtoB_YZZK- use registers
98405 cycles, COPYAtoB_XZZF- use registers
98632 cycles, COPYAtoB_WZZF- use registers
99312 cycles, COPYAtoB_SSEB
99919 cycles, COPYAtoB_YZZE- use registers
100136 cycles, COPYAtoB_YZZI- use registers
100177 cycles, COPYAtoB_YZZG- use registers
100616 cycles, COPYAtoB_XZZC- use registers
100691 cycles, COPYAtoB_YZZH- use registers
105280 cycles, COPYAtoB_XZE - use registers
108290 cycles, COPYAtoB_DQUA-
108642 cycles, COPYAtoB_XZZE- use registers
120814 cycles, COPYAtoB_SSEO
120996 cycles, COPYAtoB_SSEM
122012 cycles, COPYAtoB_SSEP
125973 cycles, COPYAtoB_SSEY
126106 cycles, COPYAtoB_SSEN
126260 cycles, COPYAtoB_SSEX
126943 cycles, COPYAtoB_SSEH
127500 cycles, crt_memcpy
131576 cycles, COPYAtoB_SSEK
132600 cycles, COPYAtoB_SSEE
133228 cycles, COPYAtoB_SSEJ
134229 cycles, COPYAtoB_SSEI
134992 cycles, COPYAtoB_SSEL
154953 cycles, memcpy_4
159392 cycles, crt_memcpy
165111 cycles, MOVEAtoB_SSEC
167700 cycles, COPYAtoB_SSEA
170294 cycles, COPYAtoB_SSEC
185132 cycles, memcpy_3
211344 cycles, memcpy_2
473021 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
Title: Re: Sorting strings
Post by: Gunther on July 18, 2014, 10:38:42 PM
Hi Rui,

Quote from: RuiLoureiro on July 18, 2014, 07:15:24 PM
Hi Gunther,  :t

            Now we can see that
            we can get good results
            using registers (not SSE)
            and it has nothing to do with
            these helps (?) that we get here
            (Compare these results with the previous).

yes, very interesting. Good work.  :t

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 21, 2014, 07:39:34 AM
Hi
        I did all tests i wanted to do
        The best seems to be COPYAtoB_DQUE
        in my P4 when the address is aligned.
        From 8160 to 8223 bytes, the mean is
        defined by 256264 cycles.
        If the address is unaligned the best
        seems to be COPYAtoB_DQUB.

Here are the results
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 1...512 *****

130482 cycles, COPYAtoB_DQUB-    ALIGNED --> 89 670 cycles
131587 cycles, COPYAtoB_DQUD-            --> 85 424 cycles
131788 cycles, COPYAtoB_DQUC-
134390 cycles, COPYAtoB_DQUE-            --> 89 661 cycles
135774 cycles, crt_memcpy
248412 cycles, COPYAtoB_SSEM
252771 cycles, COPYAtoB_SSEH
253568 cycles, COPYAtoB_SSEO
253608 cycles, COPYAtoB_SSEN
254133 cycles, COPYAtoB_SSEL
255974 cycles, COPYAtoB_SSEP
256932 cycles, COPYAtoB_SSEB
257831 cycles, COPYAtoB_SSEK
258858 cycles, COPYAtoB_SSEJ
261882 cycles, COPYAtoB_SSEI
262240 cycles, COPYAtoB_SSEY
262609 cycles, COPYAtoB_SSEX
263376 cycles, COPYAtoB_WZZE- use registers
264587 cycles, memcpy_4
265348 cycles, COPYAtoB_SSEE
270437 cycles, COPYAtoB_WZZF- use registers
276074 cycles, COPYAtoB_YZZK- use registers
277334 cycles, COPYAtoB_YZZJ- use registers
281863 cycles, COPYAtoB_YZZG- use registers
281877 cycles, COPYAtoB_YZZE- use registers
281942 cycles, COPYAtoB_YZZI- use registers
282510 cycles, COPYAtoB_YZZH- use registers
291839 cycles, memcpy_1
296347 cycles, COPYAtoB_DQUA-
309981 cycles, COPYAtoB_SSEA
311312 cycles, COPYAtoB_SSEC
313779 cycles, MOVEAtoB_SSEC
340787 cycles, memcpy_2
341081 cycles, COPYAtoB_XZZE- use registers
342978 cycles, memcpy_3
344042 cycles, COPYAtoB_XZZF- use registers
351251 cycles, COPYAtoB_XZZC- use registers
351355 cycles, COPYAtoB_XZE - use registers
378696 cycles, COPYAtoB_MOVSB
********** STOP SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 992...1055 *****

41318 cycles, COPYAtoB_DQUB-    ALIGNED --> 34 982 cycles
41516 cycles, COPYAtoB_DQUD-
41638 cycles, COPYAtoB_DQUC-
42056 cycles, COPYAtoB_DQUE-            --> 33 072 cycles
60389 cycles, crt_memcpy
126227 cycles, COPYAtoB_SSEE
127620 cycles, COPYAtoB_SSEN
127659 cycles, COPYAtoB_SSEH
127845 cycles, COPYAtoB_SSEM
128484 cycles, COPYAtoB_SSEL
128532 cycles, COPYAtoB_SSEY
129163 cycles, COPYAtoB_SSEK
129230 cycles, COPYAtoB_SSEP
129432 cycles, COPYAtoB_SSEX
129955 cycles, COPYAtoB_SSEI
130076 cycles, COPYAtoB_WZZE- use registers
130196 cycles, COPYAtoB_DQUA-
130327 cycles, COPYAtoB_SSEJ
130936 cycles, memcpy_4
132255 cycles, COPYAtoB_SSEO
132921 cycles, COPYAtoB_SSEB
137932 cycles, COPYAtoB_WZZF- use registers
139090 cycles, COPYAtoB_YZZG- use registers
139273 cycles, COPYAtoB_YZZI- use registers
139625 cycles, COPYAtoB_YZZH- use registers
140394 cycles, COPYAtoB_YZZJ- use registers
140546 cycles, memcpy_1
140832 cycles, COPYAtoB_YZZE- use registers
141484 cycles, COPYAtoB_SSEC
141676 cycles, COPYAtoB_SSEA
141896 cycles, memcpy_2
142128 cycles, memcpy_3
143169 cycles, COPYAtoB_YZZK- use registers
143228 cycles, MOVEAtoB_SSEC
156236 cycles, COPYAtoB_MOVSB
161533 cycles, COPYAtoB_XZZE- use registers
161612 cycles, COPYAtoB_XZZF- use registers
162562 cycles, COPYAtoB_XZE - use registers
163142 cycles, COPYAtoB_XZZC- use registers
********** STOP SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 2016...2079 *****

68657 cycles, COPYAtoB_DQUB-    ALIGNED --> 67 286 cycles
69082 cycles, COPYAtoB_DQUD-
69388 cycles, COPYAtoB_DQUC-
71219 cycles, COPYAtoB_DQUE-            --> 59 517 cycles
106330 cycles, crt_memcpy
244328 cycles, COPYAtoB_SSEY
247038 cycles, COPYAtoB_SSEE
248428 cycles, COPYAtoB_SSEX
250080 cycles, COPYAtoB_SSEP
251992 cycles, COPYAtoB_DQUA-
254174 cycles, COPYAtoB_SSEN
254364 cycles, COPYAtoB_SSEH
255110 cycles, COPYAtoB_SSEM
256986 cycles, COPYAtoB_SSEL
257096 cycles, COPYAtoB_SSEK
257837 cycles, COPYAtoB_SSEI
258466 cycles, COPYAtoB_SSEJ
260602 cycles, memcpy_4
260685 cycles, COPYAtoB_WZZE- use registers
263622 cycles, COPYAtoB_SSEO
265307 cycles, COPYAtoB_SSEB
273274 cycles, memcpy_2
273774 cycles, COPYAtoB_SSEC
273835 cycles, memcpy_3
274148 cycles, MOVEAtoB_SSEC
274170 cycles, COPYAtoB_WZZF- use registers
274871 cycles, COPYAtoB_YZZH- use registers
275333 cycles, COPYAtoB_YZZG- use registers
275539 cycles, COPYAtoB_SSEA
275914 cycles, COPYAtoB_YZZI- use registers
275975 cycles, memcpy_1
280238 cycles, COPYAtoB_YZZE- use registers
280803 cycles, COPYAtoB_YZZJ- use registers
282198 cycles, COPYAtoB_MOVSB
288596 cycles, COPYAtoB_YZZK- use registers
315167 cycles, COPYAtoB_XZZF- use registers
321628 cycles, COPYAtoB_XZE - use registers
322624 cycles, COPYAtoB_XZZE- use registers
324055 cycles, COPYAtoB_XZZC- use registers
********** STOP SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 4064...4127 ** *****

133524 cycles, COPYAtoB_DQUB-    ALIGNED --> 138 234 cycles
133974 cycles, COPYAtoB_DQUD-
134362 cycles, COPYAtoB_DQUC-
139861 cycles, COPYAtoB_DQUE-            --> 131 426 cycles
219491 cycles, crt_memcpy
494054 cycles, COPYAtoB_DQUA-
499609 cycles, COPYAtoB_SSEY
500909 cycles, COPYAtoB_SSEP
501054 cycles, COPYAtoB_SSEX
503233 cycles, COPYAtoB_SSEE
514665 cycles, COPYAtoB_SSEH
515061 cycles, COPYAtoB_SSEN
518292 cycles, COPYAtoB_SSEM
519263 cycles, COPYAtoB_SSEL
520033 cycles, COPYAtoB_SSEK
521694 cycles, COPYAtoB_SSEI
522140 cycles, COPYAtoB_SSEJ
529019 cycles, COPYAtoB_WZZE- use registers
531639 cycles, memcpy_4
537646 cycles, COPYAtoB_SSEO
538366 cycles, COPYAtoB_SSEB
544214 cycles, memcpy_2
545965 cycles, memcpy_3
546893 cycles, COPYAtoB_SSEA
548279 cycles, MOVEAtoB_SSEC
550654 cycles, COPYAtoB_SSEC
552476 cycles, COPYAtoB_MOVSB
555669 cycles, COPYAtoB_YZZG- use registers
555863 cycles, COPYAtoB_YZZI- use registers
556214 cycles, COPYAtoB_WZZF- use registers
557247 cycles, COPYAtoB_YZZH- use registers
558249 cycles, memcpy_1
563636 cycles, COPYAtoB_YZZE- use registers
569430 cycles, COPYAtoB_YZZJ- use registers
584721 cycles, COPYAtoB_YZZK- use registers
629480 cycles, COPYAtoB_XZZF- use registers
631297 cycles, COPYAtoB_XZE - use registers
631658 cycles, COPYAtoB_XZZC- use registers
632115 cycles, COPYAtoB_XZZE- use registers
********** STOP SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 8160...8223 **

267038 cycles, COPYAtoB_DQUE-   ALIGNED --> 256 264 cycles
284186 cycles, COPYAtoB_DQUB-
284812 cycles, COPYAtoB_DQUC-
294833 cycles, COPYAtoB_DQUD-           --> 282 396 cycles
456833 cycles, crt_memcpy
988290 cycles, COPYAtoB_SSEP
990949 cycles, COPYAtoB_DQUA-
1004369 cycles, COPYAtoB_SSEE
1023446 cycles, COPYAtoB_SSEY
1024652 cycles, COPYAtoB_SSEX
1034689 cycles, COPYAtoB_SSEN
1036664 cycles, COPYAtoB_SSEH
1042597 cycles, COPYAtoB_SSEM
1043202 cycles, COPYAtoB_SSEK
1046816 cycles, COPYAtoB_SSEI
1048882 cycles, COPYAtoB_SSEJ
1055650 cycles, COPYAtoB_SSEL
1062057 cycles, COPYAtoB_WZZE- use registers
1066017 cycles, memcpy_4
1074905 cycles, COPYAtoB_SSEO
1077120 cycles, COPYAtoB_SSEB
1077648 cycles, memcpy_2
1079904 cycles, memcpy_3
1084412 cycles, COPYAtoB_SSEC
1084800 cycles, MOVEAtoB_SSEC
1087530 cycles, memcpy_1
1087803 cycles, COPYAtoB_SSEA
1098031 cycles, COPYAtoB_MOVSB
1125443 cycles, COPYAtoB_YZZE- use registers
1126103 cycles, COPYAtoB_YZZI- use registers
1126736 cycles, COPYAtoB_YZZG- use registers
1128012 cycles, COPYAtoB_YZZH- use registers
1128422 cycles, COPYAtoB_WZZF- use registers
1139396 cycles, COPYAtoB_YZZJ- use registers
1158849 cycles, COPYAtoB_YZZK- use registers
1262942 cycles, COPYAtoB_XZZF- use registers
1268578 cycles, COPYAtoB_XZZC- use registers
1269365 cycles, COPYAtoB_XZZE- use registers
1271633 cycles, COPYAtoB_XZE - use registers
********** STOP SortMeans **********
Title: Re: Sorting strings
Post by: Gunther on July 21, 2014, 08:34:41 AM
Rui,

you're a hard working man.  :t

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 23, 2014, 09:43:30 PM
Hi
    Here are the results of my last test
    to copy from 1 to 512 bytes.

All procedures were tested to copy from 2025 bytes
to 0 bytes and all work correctly. At the end,
source and destination are exactly equal. The
destination ends with a null terminator in the
correct place.

note1:   MOVEAtoB_DQUB is a macro to copy
         a string with a predefined length.
         It defines the length of the destination
         and ends with a null terminator.


note2:   COPYAtoB_DQU? is a procedure to copy
         a given length from source to destination.
         It defines the length of the destination
         and ends with a null terminator.


note3:   COPYAtoB_DQA? is a procedure to copy
         a string with a predefined length.
         It defines the length of the destination
         and ends with a null terminator.

Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 1...512 **

67340 cycles, MOVEAtoB_DQUB
72405 cycles, COPYAtoB_DQUB-
72424 cycles, COPYAtoB_DQAB-
73656 cycles, COPYAtoB_DQAD-
73777 cycles, COPYAtoB_DQUC-
73785 cycles, COPYAtoB_DQAC-
74340 cycles, COPYAtoB_DQAE-
74509 cycles, COPYAtoB_DQUE-
74615 cycles, COPYAtoB_DQUD-
95053 cycles, COPYAtoB_WZZG- use registers
107305 cycles, crt_memcpy
122627 cycles, COPYAtoB_DQAF-
123921 cycles, COPYAtoB_DQUF-
203780 cycles, COPYAtoB_DQAA-
207501 cycles, COPYAtoB_SSEP
208377 cycles, COPYAtoB_DQUA-
208983 cycles, COPYAtoB_SSEX
209303 cycles, COPYAtoB_SSEY
209411 cycles, COPYAtoB_SSEN
210635 cycles, COPYAtoB_SSEM
210645 cycles, COPYAtoB_SSEH
211094 cycles, COPYAtoB_SSEL
211875 cycles, COPYAtoB_SSEB
212347 cycles, COPYAtoB_SSEO
212850 cycles, COPYAtoB_SSEK
213498 cycles, COPYAtoB_SSEF
215333 cycles, COPYAtoB_SSEI
215825 cycles, COPYAtoB_SSEE
216628 cycles, COPYAtoB_SSEJ
218807 cycles, memcpy_4
219464 cycles, COPYAtoB_WZZE- use registers
223717 cycles, COPYAtoB_WZZF- use registers
230277 cycles, COPYAtoB_YZZJ- use registers
231627 cycles, COPYAtoB_YZZK- use registers
231689 cycles, COPYAtoB_YZZE- use registers
232139 cycles, COPYAtoB_XZE - use registers
232573 cycles, memcpy_1
233462 cycles, COPYAtoB_XZZE- use registers
233801 cycles, COPYAtoB_XZZF- use registers
234350 cycles, COPYAtoB_YZZH- use registers
234512 cycles, COPYAtoB_YZZI- use registers
234802 cycles, COPYAtoB_YZZG- use registers
235530 cycles, COPYAtoB_XZZC- use registers
248025 cycles, COPYAtoB_SSEA
249516 cycles, COPYAtoB_SSEC
278887 cycles, memcpy_2
282360 cycles, memcpy_3
300852 cycles, COPYAtoB_MOVSB
********** STOP SortMeans **********
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - ALIGNED-worst case removed - 1...512 **

66970 cycles, MOVEAtoB_DQUB
67416 cycles, COPYAtoB_DQAC-
68079 cycles, COPYAtoB_DQAD-
68437 cycles, COPYAtoB_DQUC-
69240 cycles, COPYAtoB_DQAB-
69305 cycles, COPYAtoB_DQUD-
69496 cycles, COPYAtoB_DQAE-
70094 cycles, COPYAtoB_DQUE-
71099 cycles, COPYAtoB_DQUB-
82996 cycles, COPYAtoB_SSEB
90245 cycles, COPYAtoB_WZZG- use registers
90598 cycles, COPYAtoB_YZZK- use registers
91764 cycles, COPYAtoB_YZZJ- use registers
92333 cycles, COPYAtoB_WZZE- use registers
93184 cycles, COPYAtoB_YZZE- use registers
93192 cycles, memcpy_1
93742 cycles, COPYAtoB_WZZF- use registers
93823 cycles, COPYAtoB_YZZG- use registers
94104 cycles, COPYAtoB_YZZH- use registers
94752 cycles, COPYAtoB_YZZI- use registers
99354 cycles, COPYAtoB_SSEP
99480 cycles, COPYAtoB_SSEO
101173 cycles, COPYAtoB_SSEM
101757 cycles, COPYAtoB_SSEY
102397 cycles, COPYAtoB_SSEN
102470 cycles, COPYAtoB_SSEX
102610 cycles, COPYAtoB_SSEH
104716 cycles, COPYAtoB_SSEK
105919 cycles, COPYAtoB_SSEL
106437 cycles, COPYAtoB_SSEI
107891 cycles, COPYAtoB_SSEF
108012 cycles, COPYAtoB_SSEJ
108711 cycles, COPYAtoB_SSEE
120209 cycles, COPYAtoB_DQUF-
120272 cycles, COPYAtoB_DQAF-
127760 cycles, memcpy_4
129327 cycles, crt_memcpy
135247 cycles, COPYAtoB_SSEA
154312 cycles, memcpy_3
168126 cycles, memcpy_2
198618 cycles, COPYAtoB_DQAA-
200537 cycles, COPYAtoB_DQUA-
231671 cycles, COPYAtoB_XZZF- use registers
232702 cycles, COPYAtoB_XZE - use registers
233570 cycles, COPYAtoB_XZZE- use registers
235548 cycles, COPYAtoB_XZZC- use registers
236412 cycles, COPYAtoB_MOVSB
249124 cycles, COPYAtoB_SSEC
********** STOP SortMeans **********
Title: Re: Sorting strings
Post by: Gunther on July 23, 2014, 09:46:59 PM
Hi Rui,

you've nothing attached.

Gunther
Title: Re: Sorting strings
Post by: RuiLoureiro on July 24, 2014, 04:44:15 AM
Quote from: Gunther on July 23, 2014, 09:46:59 PM
Hi Rui,

you've nothing attached.

Gunther
Hi Gunther,

           I don't need to test any code now.
           If you need any one i will send to you.

Now, see what cls do. It doesn't work properly.
After cls i got this junk:
(i solved the problem with my ScreenClear.
It seems to work properly.)

How do i replace ClearScreen by ScreenClear
in MASM ?
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-992...1055--

33186 cycles, COPYAtoB_DQUE-
33955 cycles, COPYAtoB_DQAE-
35447 cycles, MOVEAtoB_DQUB
36020 cycles, COPYAtoB_DQUB-
36075 cycles, COPYAtoB_DQAD-
36096 cycles, COPYAtoB_DQUD-
37942 cycles, COPYAtoB_DQUC-
38617 cycles, COPYAtoB_DQAC-
38732 cycles, COPYAtoB_DQAB-
42871 cycles, COPYAtoB_DQUF-
42976 cycles, COPYAtoB_DQAF-
52857 cycles, COPYAtoB_WZZG- use registers
53586 cycles, crt_memcpy
133432 cycles, COPYAtoB_SSEK
134433 cycles, COPYAtoB_SSEJ
135038 cycles, COPYAtoB_SSEO
135287 cycles, COPYAtoB_YZZK- use registers
135458 cycles, COPYAtoB_WZZE- use registers
135655 cycles, COPYAtoB_SSEN
135759 cycles, COPYAtoB_SSEM
135852 cycles, COPYAtoB_SSEI
135865 cycles, COPYAtoB_YZZJ- use registers
137361 cycles, COPYAtoB_SSEH
138192 cycles, COPYAtoB_SSEB
138272 cycles, COPYAtoB_SSEL
140704 cycles, memcpy_4
141706 cycles, memcpy_2
142657 cycles, COPYAtoB_SSEC
143276 cycles, COPYAtoB_SSEA
145980 cycles, COPYAtoB_WZZF- use registers
147074 cycles, memcpy_3
147502 cycles, COPYAtoB_MOVSB
147890 cycles, COPYAtoB_DQUA-
148308 cycles, COPYAtoB_DQAA-
153250 cycles, COPYAtoB_YZZG- use registers
153341 cycles, COPYAtoB_YZZH- use registers
153443 cycles, COPYAtoB_SSEF
153817 cycles, COPYAtoB_SSEE
153964 cycles, COPYAtoB_SSEP
154037 cycles, COPYAtoB_YZZE- use registers
154083 cycles, COPYAtoB_SSEY
154313 cycles, COPYAtoB_YZZI- use registers
154803 cycles, COPYAtoB_XZE - use registers
155999 cycles, memcpy_1
159922 cycles, COPYAtoB_SSEX
191833 cycles, COPYAtoB_XZZF- use registers
201893 cycles, COPYAtoB_XZZE- use registers
202891 cycles, COPYAtoB_XZZC- use registers
********** STOP SortMeans ********** 055
153747 cycles, COPYAtoB_YZZI - 992...1055
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)5
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 992...1055--
156242 cycles, COPYAtoB_YZZH - 992...1055  <-- junk
26524 cycles, COPYAtoB_DQUE- - 992...1055
26633 cycles, COPYAtoB_DQAE- - 992...1055
28275 cycles, MOVEAtoB_DQUBE - 992...1055
28761 cycles, COPYAtoB_DQUB- - 992...1055
28771 cycles, COPYAtoB_DQUD- - 992...1055
28776 cycles, COPYAtoB_DQAD- - 992...1055
28820 cycles, COPYAtoB_DQAC- - 992...1055
29074 cycles, COPYAtoB_DQAB- - 992...1055
29349 cycles, COPYAtoB_DQUC- - 992...1055
34240 cycles, COPYAtoB_DQUF- - 992...1055
34340 cycles, COPYAtoB_DQAF- - 992...1055
42217 cycles, COPYAtoB_WZZG- use registers
42810 cycles, crt_memcpyYZZK - 992...1055    <--- junk
106628 cycles, COPYAtoB_SSEK - 992...1055
107189 cycles, COPYAtoB_SSEI - 992...1055
107265 cycles, COPYAtoB_SSEJ - 992...1055
107908 cycles, COPYAtoB_SSEO - 992...1055
108056 cycles, COPYAtoB_WZZE- use registers
108059 cycles, COPYAtoB_YZZK- use registers
108318 cycles, COPYAtoB_SSEM - 992...1055
108432 cycles, COPYAtoB_SSEN - 992...1055
108483 cycles, COPYAtoB_YZZJ- use registers
108518 cycles, COPYAtoB_SSEH - 992...1055
109727 cycles, memcpy_4_WZZF - 992...1055   <-- junk
109814 cycles, COPYAtoB_SSEL - 992...1055
110396 cycles, COPYAtoB_SSEB- 992...1055
113353 cycles, memcpy_2WZZG - 992...1055
113947 cycles, COPYAtoB_SSEC- 992...1055
114278 cycles, COPYAtoB_SSEA- 992...1055
116339 cycles, COPYAtoB_WZZF- use registers
117104 cycles, memcpy_3_SSEL - 992...1055
117349 cycles, COPYAtoB_MOVSB- 992...1055
118148 cycles, COPYAtoB_DQUA-- 992...1055
118415 cycles, COPYAtoB_DQAA-- 992...1055
121758 cycles, COPYAtoB_SSEF - 992...1055
121808 cycles, COPYAtoB_SSEE - 992...1055
122303 cycles, COPYAtoB_YZZG- use registers
122413 cycles, COPYAtoB_YZZH- use registers
123036 cycles, COPYAtoB_SSEP - 992...1055
123064 cycles, COPYAtoB_YZZI- use registers
123084 cycles, COPYAtoB_YZZE- use registers
123157 cycles, COPYAtoB_SSEY - 992...1055
123724 cycles, COPYAtoB_XZE - use registers
124587 cycles, memcpy_1_SSEP - 992...1055
125577 cycles, COPYAtoB_SSEX - 992...1055
149397 cycles, COPYAtoB_XZZC- use registers
152096 cycles, COPYAtoB_XZZF- use registers
155795 cycles, COPYAtoB_XZZE- use registers
********** STOP SortMeans ********** 55
...
and a lot of junk here after
Title: Re: Sorting strings
Post by: RuiLoureiro on July 25, 2014, 09:54:30 PM
How do i replace ClearScreen by ScreenClear
in MASM ?

I replaced this ClearScreen

ClearScreen  proc
  ; -----------------------------------------------------------
  ; This procedure reads the column and row count, multiplies
  ; them together to get the number of characters that will fit
  ; onto the screen, writes that number of blank spaces to the
  ; screen and reposition the prompt at position 0,0.
  ; -----------------------------------------------------------
    LOCAL hOutPut:DWORD
    LOCAL noc    :DWORD
    LOCAL cnt    :DWORD
    LOCAL sbi    :CONSOLE_SCREEN_BUFFER_INFO

    invoke GetStdHandle,STD_OUTPUT_HANDLE
    mov hOutPut, eax

    invoke GetConsoleScreenBufferInfo,hOutPut,ADDR sbi
    mov eax, sbi.dwSize     ; 2 word values returned for screen size

  ; -----------------------------------------------
  ; extract the 2 values and multiply them together
  ; -----------------------------------------------
    push ax
    rol eax, 16
    mov cx, ax
    pop ax
    mul cx
    cwde
    mov cnt, eax

    invoke FillConsoleOutputCharacter,hOutPut,32,cnt,NULL,ADDR noc
    invoke locate,0,0
    ret
ClearScreen  endp

by this ClearScreen  in the file clearscr.asm
and i ran make.bat.

ClearScreen         proc
                    LOCAL hOutPut,noc:DWORD
                    LOCAL sbi:CONSOLE_SCREEN_BUFFER_INFO

                    invoke GetStdHandle,STD_OUTPUT_HANDLE
                    mov    hOutPut, eax
                    invoke GetConsoleScreenBufferInfo,hOutPut,ADDR sbi
                    mov     eax, sbi.dwSize     ; 2 word values returned for screen size

                    ; -----------------------------------------------
                    ; extract the 2 values and multiply them together
                    ; -----------------------------------------------
                    mov     edx, eax
                    and     edx, 0FFFFh         ; number of columns
                    shr     eax, 16             ; number of lines                   
                    mul     edx
                    mov     edx, eax

                    invoke  FillConsoleOutputCharacter, hOutPut, 32, edx, 0, ADDR noc
                    invoke  locate,0,0
                    ret
ClearScreen         endp
Title: Re: Sorting strings
Post by: RuiLoureiro on August 01, 2014, 02:25:01 AM
Hi,
    now, i am testing a procedure to insert
    one string A into one string B of length=???
    at any position from 0 to length+x.

    In this test, the length of string A varies from
    1 to 100. The length of string B = 200.
    In any case, the procedure moves 200 bytes forward
    and then it inserts the string A at position 0.

    note: INSAtoB_?    is a macro
             InsertAtoB_? is a procedure

    My conclusion: the best way is to use SSE in my P4.
   
    The strings may be aligned or unaligned,
    it doesn't matter.
   

INSERTING AT POSITION 0-string length= 200

********** END I - Press a key **********
77675 cycles, INSAtoB_X, insert 1...100--
77465 cycles, INSAtoB_X, insert 1...100--
77586 cycles, INSAtoB_X, insert 1...100--
88744 cycles, INSAtoB_X, insert 1...100--
80409 cycles, INSAtoB_X, insert 1...100--

54655 cycles, INSAtoB_XZZE, insert 1...100--
54866 cycles, INSAtoB_XZZE, insert 1...100--
54472 cycles, INSAtoB_XZZE, insert 1...100--
57474 cycles, INSAtoB_XZZE, insert 1...100--
55063 cycles, INSAtoB_XZZE, insert 1...100--

54452 cycles, INSAtoB_XZZF, insert 1...100--
54363 cycles, INSAtoB_XZZF, insert 1...100--
54351 cycles, INSAtoB_XZZF, insert 1...100--
54272 cycles, INSAtoB_XZZF, insert 1...100--
54556 cycles, INSAtoB_XZZF, insert 1...100--

54204 cycles, INSAtoB_XZZG, insert 1...100--
54211 cycles, INSAtoB_XZZG, insert 1...100--
54224 cycles, INSAtoB_XZZG, insert 1...100--
54209 cycles, INSAtoB_XZZG, insert 1...100--
54185 cycles, INSAtoB_XZZG, insert 1...100--

138734 cycles, INSAtoB_BA, insert 1...100--
135168 cycles, INSAtoB_BA, insert 1...100--
136854 cycles, INSAtoB_BA, insert 1...100--
135718 cycles, INSAtoB_BA, insert 1...100--
135550 cycles, INSAtoB_BA, insert 1...100--

154274 cycles, INSAtoB_BB, insert 1...100--
142298 cycles, INSAtoB_BB, insert 1...100--
143103 cycles, INSAtoB_BB, insert 1...100--
143249 cycles, INSAtoB_BB, insert 1...100--
140827 cycles, INSAtoB_BB, insert 1...100--

54098 cycles, INSAtoB_SSEE, insert 1...100--
53574 cycles, INSAtoB_SSEE, insert 1...100--
53622 cycles, INSAtoB_SSEE, insert 1...100--
53708 cycles, INSAtoB_SSEE, insert 1...100--
53168 cycles, INSAtoB_SSEE, insert 1...100--

54125 cycles, INSAtoB_SSEF, insert 1...100--
51203 cycles, INSAtoB_SSEF, insert 1...100--
51473 cycles, INSAtoB_SSEF, insert 1...100--
51682 cycles, INSAtoB_SSEF, insert 1...100--
51510 cycles, INSAtoB_SSEF, insert 1...100--

47988 cycles, INSAtoB_SSEG, insert 1...100--
48405 cycles, INSAtoB_SSEG, insert 1...100--
47919 cycles, INSAtoB_SSEG, insert 1...100--
48074 cycles, INSAtoB_SSEG, insert 1...100--
48730 cycles, INSAtoB_SSEG, insert 1...100--

48796 cycles, InsertAtoB_SSEG, insert 1...100--
48843 cycles, InsertAtoB_SSEG, insert 1...100--
48854 cycles, InsertAtoB_SSEG, insert 1...100--
49345 cycles, InsertAtoB_SSEG, insert 1...100--
48686 cycles, InsertAtoB_SSEG, insert 1...100--

36357 cycles, InsertAtoB_DQUA, insert 1...100--
37054 cycles, InsertAtoB_DQUA, insert 1...100--
36460 cycles, InsertAtoB_DQUA, insert 1...100--
36575 cycles, InsertAtoB_DQUA, insert 1...100--
35550 cycles, InsertAtoB_DQUA, insert 1...100--

*** Press any key to get the mean values table ***

Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means -Inserting at position 0 - 1...100--

36399 cycles, InsertAtoB_DQUA- insert 1...100--
48223 cycles, INSAtoB_SSEG- insert 1...100--
48904 cycles, InsertAtoB_SSEG- insert 1...100--
51998 cycles, INSAtoB_SSEF- insert 1...100--
53634 cycles, INSAtoB_SSEE- insert 1...100--
54206 cycles, INSAtoB_XZZG- insert 1...100--
54398 cycles, INSAtoB_XZZF- insert 1...100--
55306 cycles, INSAtoB_XZZE- insert 1...100--
80375 cycles, INSAtoB_X-    insert 1...100--
136404 cycles, INSAtoB_BA-   insert 1...100--
144750 cycles, INSAtoB_BB-   insert 1...100--
********** STOP ShowAllMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means -worst case removed-Inserting at position 0 - 1...100--

36235 cycles, InsertAtoB_DQUA- insert 1...100--
48096 cycles, INSAtoB_SSEG- insert 1...100--
48794 cycles, InsertAtoB_SSEG- insert 1...100--
51467 cycles, INSAtoB_SSEF- insert 1...100--
53518 cycles, INSAtoB_SSEE- insert 1...100--
54202 cycles, INSAtoB_XZZG- insert 1...100--
54359 cycles, INSAtoB_XZZF- insert 1...100--
54764 cycles, INSAtoB_XZZE- insert 1...100--
78283 cycles, INSAtoB_X-    insert 1...100--
135822 cycles, INSAtoB_BA-   insert 1...100--
142369 cycles, INSAtoB_BB-   insert 1...100--
********** STOP ShowAllMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means-MINIMUM values-Inserting at position 0 - 1...100--

35550 cycles, InsertAtoB_DQUA- insert 1...100--
47919 cycles, INSAtoB_SSEG- insert 1...100--
48686 cycles, InsertAtoB_SSEG- insert 1...100--
51203 cycles, INSAtoB_SSEF- insert 1...100--
53168 cycles, INSAtoB_SSEE- insert 1...100--
54185 cycles, INSAtoB_XZZG- insert 1...100--
54272 cycles, INSAtoB_XZZF- insert 1...100--
54472 cycles, INSAtoB_XZZE- insert 1...100--
77465 cycles, INSAtoB_X-    insert 1...100--
135168 cycles, INSAtoB_BA-   insert 1...100--
140827 cycles, INSAtoB_BB-   insert 1...100--
********** STOP ShowAllMinimum **********
Quote
INSERTING AT POSITION 100-string length= 200

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means -Inserting at position 100 - 1...100--

25601 cycles, INSAtoB_SSEG- insert 1...100--
29856 cycles, INSAtoB_SSEF- insert 1...100--
30978 cycles, INSAtoB_SSEE- insert 1...100--
31435 cycles, INSAtoB_XZZF- insert 1...100--
31557 cycles, INSAtoB_XZZE- insert 1...100--
31735 cycles, INSAtoB_XZZG- insert 1...100--
35899 cycles, InsertAtoB_DQUA- insert 1...100--
40805 cycles, InsertAtoB_SSEG- insert 1...100--
49302 cycles, INSAtoB_X-    insert 1...100--
123924 cycles, INSAtoB_BB-   insert 1...100--
133299 cycles, INSAtoB_BA-   insert 1...100--
********** STOP ShowAllMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means -worst case removed-Inserting at position 100 - 1...100--

25492 cycles, INSAtoB_SSEG- insert 1...100--
29797 cycles, INSAtoB_SSEF- insert 1...100--
30820 cycles, INSAtoB_SSEE- insert 1...100--
31406 cycles, INSAtoB_XZZF- insert 1...100--
31434 cycles, INSAtoB_XZZE- insert 1...100--
31534 cycles, INSAtoB_XZZG- insert 1...100--
35843 cycles, InsertAtoB_DQUA- insert 1...100--
40628 cycles, InsertAtoB_SSEG- insert 1...100--
48681 cycles, INSAtoB_X-    insert 1...100--
123564 cycles, INSAtoB_BB-   insert 1...100--
131552 cycles, INSAtoB_BA-   insert 1...100--
********** STOP ShowAllMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means-MINIMUM values-Inserting at position 100 - 1...100--

25295 cycles, INSAtoB_SSEG- insert 1...100--
29585 cycles, INSAtoB_SSEF- insert 1...100--
29889 cycles, INSAtoB_SSEE- insert 1...100--
31094 cycles, INSAtoB_XZZE- insert 1...100--
31261 cycles, INSAtoB_XZZG- insert 1...100--
31299 cycles, INSAtoB_XZZF- insert 1...100--
35649 cycles, InsertAtoB_DQUA- insert 1...100--
39612 cycles, InsertAtoB_SSEG- insert 1...100--
47736 cycles, INSAtoB_X-    insert 1...100--
122062 cycles, INSAtoB_BB-   insert 1...100--
129815 cycles, INSAtoB_BA-   insert 1...100--
********** STOP ShowAllMinimum **********