News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Sorting strings

Started by RuiLoureiro, May 29, 2014, 06:15:48 AM

Previous topic - Next topic

nidud

#135
deleted

RuiLoureiro

#136
Quote
If you compare two functions to test the copy-algorithm
they should be called with the same arguments.
Yes if you don't want to see the difference between
        invoke  ProcX, pDst, pSrc, Len
        and
        invoke  ProcY, pDst, pSrc (we get the length from pSrc).
Quote
The memcpy I wrote have three arguments and is compared
with functions using two, so this should be aligned before the test.
Not for all cases. Some have three arguments also.
        I don't want to compare it in that way because
        i have no interest in procedures like ProcX.
Quote
There should also be a test to see if the algo actually works,
so I added some of the functions used and most of them failed.
I do it but it is possible that some have some bugs yet.

I think you tried to modify COPYAtoB_XZE to follow your model
but this:

  toend:
   mov   eax,dst                ; added
   ret
is useless for me.

EDIT:
Quote
so I added some of the functions used and most of them failed.
I am reading the procedure COPYAtoB_XZE that
you wrote and it seems that you need to study
what i wrote.

Where did you get COPYAtoB_XZE ?


nidud

#137
deleted

Gunther

Hi Rui,

results are attached in c57.zip.

Gunther
You have to know the facts before you can distort them.

RuiLoureiro

Quote
As I follow my model you should follow yours.
But you said: «so I added some of the functions used
and most of them failed.»
        What fails ?
        You may use COPYAtoB_XZE, etc. but call it
        COPYAtoB_XZEm, etc. because it is not
        exactly equal to COPYAtoB_XZE.       
Quote
So why do you include the memcpy_x procedures then?
They are one more procedure.
        You may do the comparisons you want to do,
        the results are here. In my reply 120
        i did one comparison (in my P4). After that
        the problem was about SSE in i7. memcpy_3
        is one that uses movdqu.
        It may be one reference used to develop
        other procs.
Another thing:
   myPocX  proc uses esi edi dst, src
                 mov   edi,dst
                 mov   esi,src
                 mov   ecx,[edi-4]
   you may do it but i have nothing to do with it.

nidud

#140
deleted

RuiLoureiro

Quote from: Gunther on July 11, 2014, 08:03:38 AM
Hi Rui,
results are attached in c57.zip.
Gunther
Thank you Gunther  :t

Gunther

Hi Rui,

Quote from: RuiLoureiro on July 12, 2014, 07:35:06 AM
               Thank you Gunther  :t

You're welcome. Did you solve all of your questions?

Gunther
You have to know the facts before you can distort them.

RuiLoureiro

Hi Gunther,
                   No. I need to do some work yet.

Gunther

Hi Rui,

Quote from: RuiLoureiro on July 14, 2014, 06:44:24 PM
                   No. I need to do some work yet.

Good luck.  :icon14:

Gunther
You have to know the facts before you can distort them.

RuiLoureiro

Thanks Gunther  :icon14:
Hi
    i wrote a new macro to get the mean values (and...).
    In the second mean, i remove the worst case.
    For example, we run crt_memcpy 5 times (each 5 times).
    In the second mean, we remove 46768 (First SAMPLE).
    In these tests, crt_memcpy is far better here.
    note: COPYAtoB_MOVSB is = rep  movsb
First SAMPLE
Quote
46768 cycles, crt_memcpy - 1...256
43461 cycles, crt_memcpy - 1...256
41069 cycles, crt_memcpy - 1...256
41262 cycles, crt_memcpy - 1...256
40815 cycles, crt_memcpy - 1...256
72366 cycles, memcpy_1 - 1...256
72305 cycles, memcpy_1 - 1...256
74966 cycles, memcpy_1 - 1...256
72619 cycles, memcpy_1 - 1...256
72288 cycles, memcpy_1 - 1...256
109811 cycles, memcpy_2 - 1...256
110395 cycles, memcpy_2 - 1...256
109911 cycles, memcpy_2 - 1...256
110186 cycles, memcpy_2 - 1...256
110124 cycles, memcpy_2 - 1...256
113295 cycles, memcpy_3 - 1...256
113058 cycles, memcpy_3 - 1...256
113821 cycles, memcpy_3 - 1...256
113101 cycles, memcpy_3 - 1...256
112885 cycles, memcpy_3 - 1...256
74974 cycles, memcpy_4 - 1...256
74030 cycles, memcpy_4 - 1...256
73724 cycles, memcpy_4 - 1...256
74066 cycles, memcpy_4 - 1...256
74708 cycles, memcpy_4 - 1...256
75788 cycles, COPYAtoB_XZE - 1...256
73080 cycles, COPYAtoB_XZE - 1...256
72975 cycles, COPYAtoB_XZE - 1...256
73927 cycles, COPYAtoB_XZE - 1...256
73285 cycles, COPYAtoB_XZE - 1...256
121278 cycles, COPYAtoB_MOVSB - 1...256
121220 cycles, COPYAtoB_MOVSB - 1...256
122102 cycles, COPYAtoB_MOVSB - 1...256
121056 cycles, COPYAtoB_MOVSB - 1...256
121417 cycles, COPYAtoB_MOVSB - 1...256
*** Press any key to get the time table ***

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

42675 cycles, crt_memcpy
72908 cycles, memcpy_1
73811 cycles, COPYAtoB_XZE
74300 cycles, memcpy_4
110085 cycles, memcpy_2
113232 cycles, memcpy_3
121414 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

41651 cycles, crt_memcpy
72394 cycles, memcpy_1
73316 cycles, COPYAtoB_XZE
74132 cycles, memcpy_4
110008 cycles, memcpy_2
113084 cycles, memcpy_3
121242 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
Second SAMPLE
Quote
47700 cycles, crt_memcpy - 1...256
41819 cycles, crt_memcpy - 1...256
41564 cycles, crt_memcpy - 1...256
41168 cycles, crt_memcpy - 1...256
41471 cycles, crt_memcpy - 1...256
72293 cycles, memcpy_1 - 1...256
75131 cycles, memcpy_1 - 1...256
74058 cycles, memcpy_1 - 1...256
73880 cycles, memcpy_1 - 1...256
73448 cycles, memcpy_1 - 1...256
111278 cycles, memcpy_2 - 1...256
111630 cycles, memcpy_2 - 1...256
109758 cycles, memcpy_2 - 1...256
110357 cycles, memcpy_2 - 1...256
113262 cycles, memcpy_2 - 1...256
112782 cycles, memcpy_3 - 1...256
114002 cycles, memcpy_3 - 1...256
113916 cycles, memcpy_3 - 1...256
112757 cycles, memcpy_3 - 1...256
113764 cycles, memcpy_3 - 1...256
73673 cycles, memcpy_4 - 1...256
73540 cycles, memcpy_4 - 1...256
73474 cycles, memcpy_4 - 1...256
75833 cycles, memcpy_4 - 1...256
75212 cycles, memcpy_4 - 1...256
74600 cycles, COPYAtoB_XZE - 1...256
73008 cycles, COPYAtoB_XZE - 1...256
73138 cycles, COPYAtoB_XZE - 1...256
73939 cycles, COPYAtoB_XZE - 1...256
73418 cycles, COPYAtoB_XZE - 1...256
121127 cycles, COPYAtoB_MOVSB - 1...256
122576 cycles, COPYAtoB_MOVSB - 1...256
121290 cycles, COPYAtoB_MOVSB - 1...256
121280 cycles, COPYAtoB_MOVSB - 1...256
121743 cycles, COPYAtoB_MOVSB - 1...256
*** Press any key to get the time table ***

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

42744 cycles, crt_memcpy
73620 cycles, COPYAtoB_XZE
73762 cycles, memcpy_1
74346 cycles, memcpy_4
111257 cycles, memcpy_2
113444 cycles, memcpy_3
121603 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

41505 cycles, crt_memcpy
73375 cycles, COPYAtoB_XZE
73419 cycles, memcpy_1
73974 cycles, memcpy_4
110755 cycles, memcpy_2
113304 cycles, memcpy_3
121360 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
The effect of the worst case
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****
...
73463 cycles, COPYAtoB_XZZE
73816 cycles, COPYAtoB_XZE
73919 cycles, COPYAtoB_XZZC
74104 cycles, memcpy_4
74454 cycles, COPYAtoB_YZZE
74939 cycles, COPYAtoB_XZZF
110328 cycles, memcpy_2
113520 cycles, memcpy_3
121795 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****
...
73323 cycles, COPYAtoB_XZZE
73475 cycles, COPYAtoB_XZE
73599 cycles, COPYAtoB_XZZF
73779 cycles, COPYAtoB_XZZC
73885 cycles, memcpy_4
74152 cycles, COPYAtoB_YZZE
110006 cycles, memcpy_2
113457 cycles, memcpy_3
121506 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

RuiLoureiro

Hi,
    In the following test i did

JUMPCODE            MACRO
                    LOCAL       label0
                    invoke      Sleep, 100
                    jmp         label0
                    db 4096 dup('x')
    label0:               
ENDM

REPEAT 3
        JUMPCODE
        BEGIN_COUNTER_CYCLE_HIGH_PRIORITY_CLASS  $start, $end 
        invoke      PROCEDUERE_???, ...
        END_COUNTER_CYCLE <...>
        ...
        5 times
       ;-----------------------------------------------------
        JUMPCODE
        BEGIN_COUNTER_CYCLE_HIGH_PRIORITY_CLASS  $start, $end 
        invoke      newPROCEDUERE_???, ...
        END_COUNTER_CYCLE <...>
        ...
        5 times               
       ;-----------------------------------------------------
        ...
ENDM

Here are the results:
SAMPLE 1

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

178495 cycles, crt_memcpy
178933 cycles, crt_memcpy
243835 cycles, COPYAtoB_SSEK
243843 cycles, COPYAtoB_SSEN
245876 cycles, COPYAtoB_SSEB
247870 cycles, COPYAtoB_SSEM
248160 cycles, COPYAtoB_SSEJ
251895 cycles, COPYAtoB_SSEH
252662 cycles, COPYAtoB_SSEY
252845 cycles, COPYAtoB_SSEE
258006 cycles, COPYAtoB_SSEI
260916 cycles, COPYAtoB_WZZE
261092 cycles, COPYAtoB_SSEL
262787 cycles, COPYAtoB_YZZH
263338 cycles, COPYAtoB_YZZI
263700 cycles, COPYAtoB_YZZK
263713 cycles, COPYAtoB_YZZJ
267937 cycles, COPYAtoB_SSEX
269713 cycles, COPYAtoB_XZZC
270168 cycles, COPYAtoB_XZZF
270763 cycles, COPYAtoB_YZZG
276750 cycles, memcpy_4
279076 cycles, COPYAtoB_YZZE
279560 cycles, COPYAtoB_WZZF
288258 cycles, memcpy_1
295782 cycles, COPYAtoB_XZE
297157 cycles, MOVEAtoB_SSEC
300198 cycles, COPYAtoB_SSEA
303091 cycles, COPYAtoB_SSEC
305265 cycles, COPYAtoB_XZZE
325851 cycles, memcpy_2
335337 cycles, memcpy_3
461265 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, crt_memcpy
241691 cycles, COPYAtoB_SSEB
243236 cycles, COPYAtoB_SSEM
243653 cycles, COPYAtoB_SSEK
243672 cycles, COPYAtoB_SSEN
247654 cycles, COPYAtoB_SSEH
247750 cycles, COPYAtoB_SSEJ
251988 cycles, COPYAtoB_SSEY
252137 cycles, COPYAtoB_SSEE
256390 cycles, COPYAtoB_SSEI
260426 cycles, COPYAtoB_WZZE
260548 cycles, COPYAtoB_SSEL
262277 cycles, memcpy_1
262317 cycles, COPYAtoB_SSEX
262507 cycles, COPYAtoB_YZZH
262724 cycles, COPYAtoB_YZZI
262966 cycles, COPYAtoB_YZZK
263086 cycles, COPYAtoB_YZZJ
267991 cycles, COPYAtoB_YZZG
268505 cycles, COPYAtoB_XZZC
268694 cycles, COPYAtoB_XZZF
271661 cycles, COPYAtoB_YZZE
273190 cycles, memcpy_4
278803 cycles, COPYAtoB_WZZF
294645 cycles, COPYAtoB_XZE
296811 cycles, MOVEAtoB_SSEC
299978 cycles, COPYAtoB_SSEA
301585 cycles, COPYAtoB_XZZE
302374 cycles, COPYAtoB_SSEC
325495 cycles, memcpy_2
328389 cycles, memcpy_3
459663 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

SAMPLE2

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

178495 cycles, crt_memcpy
178933 cycles, COPYAtoB_MOVSB
243835 cycles, COPYAtoB_SSEI
243843 cycles, COPYAtoB_SSEY
245876 cycles, COPYAtoB_SSEJ
247870 cycles, COPYAtoB_SSEE
248160 cycles, COPYAtoB_WZZE
251895 cycles, COPYAtoB_YZZH
252662 cycles, COPYAtoB_SSEK
252845 cycles, COPYAtoB_YZZI
258006 cycles, COPYAtoB_SSEL
260916 cycles, COPYAtoB_SSEX
261092 cycles, COPYAtoB_YZZK
262787 cycles, COPYAtoB_YZZE
263338 cycles, memcpy_4
263700 cycles, COPYAtoB_XZZC
263713 cycles, COPYAtoB_YZZG
267937 cycles, COPYAtoB_SSEN
269713 cycles, COPYAtoB_XZE
270168 cycles, memcpy_1
270763 cycles, COPYAtoB_XZZF
276750 cycles, COPYAtoB_SSEC
279076 cycles, COPYAtoB_WZZF
279560 cycles, COPYAtoB_YZZJ
288258 cycles, memcpy_3
295782 cycles, COPYAtoB_SSEA
297157 cycles, crt_memcpy
300198 cycles, COPYAtoB_SSEH
303091 cycles, COPYAtoB_SSEM
305265 cycles, MOVEAtoB_SSEC
325851 cycles, memcpy_2
335337 cycles, COPYAtoB_XZZE
461265 cycles, COPYAtoB_SSEB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, COPYAtoB_MOVSB
241691 cycles, COPYAtoB_SSEJ
243236 cycles, COPYAtoB_SSEE
243653 cycles, COPYAtoB_SSEI
243672 cycles, COPYAtoB_SSEY
247654 cycles, COPYAtoB_YZZH
247750 cycles, COPYAtoB_WZZE
251988 cycles, COPYAtoB_SSEK
252137 cycles, COPYAtoB_YZZI
256390 cycles, COPYAtoB_SSEL
260426 cycles, COPYAtoB_SSEX
260548 cycles, COPYAtoB_YZZK
262277 cycles, memcpy_3
262317 cycles, COPYAtoB_SSEN
262507 cycles, COPYAtoB_YZZE
262724 cycles, memcpy_4
262966 cycles, COPYAtoB_XZZC
263086 cycles, COPYAtoB_YZZG
267991 cycles, COPYAtoB_XZZF
268505 cycles, COPYAtoB_XZE
268694 cycles, memcpy_1
271661 cycles, COPYAtoB_WZZF
273190 cycles, COPYAtoB_SSEC
278803 cycles, COPYAtoB_YZZJ
294645 cycles, COPYAtoB_SSEA
296811 cycles, crt_memcpy
299978 cycles, COPYAtoB_SSEH
301585 cycles, MOVEAtoB_SSEC
302374 cycles, COPYAtoB_SSEM
325495 cycles, memcpy_2
328389 cycles, COPYAtoB_XZZE
459663 cycles, COPYAtoB_SSEB
********** END SortMeans **********

SAMPLE 3

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

178495 cycles, crt_memcpy
178933 cycles, COPYAtoB_SSEB
243835 cycles, COPYAtoB_SSEL
243843 cycles, COPYAtoB_SSEK
245876 cycles, COPYAtoB_WZZE
247870 cycles, COPYAtoB_YZZI
248160 cycles, COPYAtoB_SSEX
251895 cycles, COPYAtoB_YZZE
252662 cycles, COPYAtoB_SSEI
252845 cycles, memcpy_4
258006 cycles, COPYAtoB_YZZK
260916 cycles, COPYAtoB_SSEN
261092 cycles, COPYAtoB_XZZC
262787 cycles, COPYAtoB_WZZF
263338 cycles, COPYAtoB_SSEC
263700 cycles, COPYAtoB_XZE
263713 cycles, COPYAtoB_XZZF
267937 cycles, COPYAtoB_SSEY
269713 cycles, COPYAtoB_SSEA
270168 cycles, memcpy_3
270763 cycles, memcpy_1
276750 cycles, COPYAtoB_SSEM
279076 cycles, COPYAtoB_YZZJ
279560 cycles, COPYAtoB_YZZG
288258 cycles, COPYAtoB_XZZE
295782 cycles, COPYAtoB_SSEH
297157 cycles, COPYAtoB_MOVSB
300198 cycles, COPYAtoB_YZZH
303091 cycles, COPYAtoB_SSEE
305265 cycles, crt_memcpy
325851 cycles, memcpy_2
335337 cycles, MOVEAtoB_SSEC
461265 cycles, COPYAtoB_SSEJ
********** END SortMeans **********


-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, COPYAtoB_SSEB
241691 cycles, COPYAtoB_WZZE
243236 cycles, COPYAtoB_YZZI
243653 cycles, COPYAtoB_SSEL
243672 cycles, COPYAtoB_SSEK
247654 cycles, COPYAtoB_YZZE
247750 cycles, COPYAtoB_SSEX
251988 cycles, COPYAtoB_SSEI
252137 cycles, memcpy_4
256390 cycles, COPYAtoB_YZZK
260426 cycles, COPYAtoB_SSEN
260548 cycles, COPYAtoB_XZZC
262277 cycles, COPYAtoB_XZZE
262317 cycles, COPYAtoB_SSEY
262507 cycles, COPYAtoB_WZZF
262724 cycles, COPYAtoB_SSEC
262966 cycles, COPYAtoB_XZE
263086 cycles, COPYAtoB_XZZF
267991 cycles, memcpy_1
268505 cycles, COPYAtoB_SSEA
268694 cycles, memcpy_3
271661 cycles, COPYAtoB_YZZJ
273190 cycles, COPYAtoB_SSEM
278803 cycles, COPYAtoB_YZZG
294645 cycles, COPYAtoB_SSEH
296811 cycles, COPYAtoB_MOVSB
299978 cycles, COPYAtoB_YZZH
301585 cycles, crt_memcpy
302374 cycles, COPYAtoB_SSEE
325495 cycles, memcpy_2
328389 cycles, MOVEAtoB_SSEC
459663 cycles, COPYAtoB_SSEJ
********** END SortMeans **********


RuiLoureiro

#147
Now, see the tables of the mean values.

SAMPLE 1
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means- worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, crt_memcpy
241691 cycles, COPYAtoB_SSEB
243236 cycles, COPYAtoB_SSEM
243653 cycles, COPYAtoB_SSEK
243672 cycles, COPYAtoB_SSEN
247654 cycles, COPYAtoB_SSEH
247750 cycles, COPYAtoB_SSEJ
251988 cycles, COPYAtoB_SSEY
252137 cycles, COPYAtoB_SSEE
256390 cycles, COPYAtoB_SSEI
260426 cycles, COPYAtoB_WZZE
260548 cycles, COPYAtoB_SSEL
262277 cycles, memcpy_1
262317 cycles, COPYAtoB_SSEX
262507 cycles, COPYAtoB_YZZH
262724 cycles, COPYAtoB_YZZI
262966 cycles, COPYAtoB_YZZK
263086 cycles, COPYAtoB_YZZJ
267991 cycles, COPYAtoB_YZZG
268505 cycles, COPYAtoB_XZZC
268694 cycles, COPYAtoB_XZZF
271661 cycles, COPYAtoB_YZZE
273190 cycles, memcpy_4
278803 cycles, COPYAtoB_WZZF
294645 cycles, COPYAtoB_XZE
296811 cycles, MOVEAtoB_SSEC
299978 cycles, COPYAtoB_SSEA
301585 cycles, COPYAtoB_XZZE
302374 cycles, COPYAtoB_SSEC
325495 cycles, memcpy_2
328389 cycles, memcpy_3
459663 cycles, COPYAtoB_MOVSB
********** END SortMeans **********
What happened here ?

COPYAtoB_SSEB is the last and one of crt_memcpy
is not the first. COPYAtoB_MOVSB is the second.

SAMPLE 2
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, COPYAtoB_MOVSB
241691 cycles, COPYAtoB_SSEJ
243236 cycles, COPYAtoB_SSEE
243653 cycles, COPYAtoB_SSEI
243672 cycles, COPYAtoB_SSEY
247654 cycles, COPYAtoB_YZZH
247750 cycles, COPYAtoB_WZZE
251988 cycles, COPYAtoB_SSEK
252137 cycles, COPYAtoB_YZZI
256390 cycles, COPYAtoB_SSEL
260426 cycles, COPYAtoB_SSEX
260548 cycles, COPYAtoB_YZZK
262277 cycles, memcpy_3
262317 cycles, COPYAtoB_SSEN
262507 cycles, COPYAtoB_YZZE
262724 cycles, memcpy_4
262966 cycles, COPYAtoB_XZZC
263086 cycles, COPYAtoB_YZZG
267991 cycles, COPYAtoB_XZZF
268505 cycles, COPYAtoB_XZE
268694 cycles, memcpy_1
271661 cycles, COPYAtoB_WZZF
273190 cycles, COPYAtoB_SSEC
278803 cycles, COPYAtoB_YZZJ
294645 cycles, COPYAtoB_SSEA
296811 cycles, crt_memcpy
299978 cycles, COPYAtoB_SSEH
301585 cycles, MOVEAtoB_SSEC
302374 cycles, COPYAtoB_SSEM
325495 cycles, memcpy_2
328389 cycles, COPYAtoB_XZZE
459663 cycles, COPYAtoB_SSEB
********** END SortMeans **********
Here, COPYAtoB_SSEB is very close to crt_memcpy

SAMPLE 3
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - worst case REMOVED *****

174774 cycles, crt_memcpy
175668 cycles, COPYAtoB_SSEB
241691 cycles, COPYAtoB_WZZE
243236 cycles, COPYAtoB_YZZI
243653 cycles, COPYAtoB_SSEL
243672 cycles, COPYAtoB_SSEK
247654 cycles, COPYAtoB_YZZE
247750 cycles, COPYAtoB_SSEX
251988 cycles, COPYAtoB_SSEI
252137 cycles, memcpy_4
256390 cycles, COPYAtoB_YZZK
260426 cycles, COPYAtoB_SSEN
260548 cycles, COPYAtoB_XZZC
262277 cycles, COPYAtoB_XZZE
262317 cycles, COPYAtoB_SSEY
262507 cycles, COPYAtoB_WZZF
262724 cycles, COPYAtoB_SSEC
262966 cycles, COPYAtoB_XZE
263086 cycles, COPYAtoB_XZZF
267991 cycles, memcpy_1
268505 cycles, COPYAtoB_SSEA
268694 cycles, memcpy_3
271661 cycles, COPYAtoB_YZZJ
273190 cycles, COPYAtoB_SSEM
278803 cycles, COPYAtoB_YZZG
294645 cycles, COPYAtoB_SSEH
296811 cycles, COPYAtoB_MOVSB
299978 cycles, COPYAtoB_YZZH
301585 cycles, crt_memcpy
302374 cycles, COPYAtoB_SSEE
325495 cycles, memcpy_2
328389 cycles, MOVEAtoB_SSEC
459663 cycles, COPYAtoB_SSEJ
********** END SortMeans **********
----------------------------------------------------------
EDIT:
Sorry, i put REPEAT 3 in the wrong place.
The problem is with messages.
The results are very regulars always.
The SSE COPYAtoB_SSEL is far from crt_memcpy.

COPYAtoB_SSEL uses only 1 push/pop ebx;
copy forward 16 bytes at a time (one movdqu)
the remainder is copied byte by byte.
If less than 16, copy byte by byte forward.

COPYAtoB_SSEX and COPYAtoB_SSEY uses
the idea in the reply #140.
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

171473 cycles, crt_memcpy
183587 cycles, crt_memcpy
265709 cycles, COPYAtoB_SSEL
266661 cycles, COPYAtoB_SSEI
267588 cycles, COPYAtoB_SSEJ
268569 cycles, COPYAtoB_SSEK
268958 cycles, COPYAtoB_SSEM
269628 cycles, COPYAtoB_SSEN
270791 cycles, COPYAtoB_SSEH
271806 cycles, COPYAtoB_SSEB
278893 cycles, COPYAtoB_SSEY
279461 cycles, COPYAtoB_SSEX
282468 cycles, COPYAtoB_SSEE
285235 cycles, memcpy_1
286518 cycles, COPYAtoB_WZZF
287311 cycles, COPYAtoB_YZZI
287359 cycles, COPYAtoB_YZZE
289489 cycles, COPYAtoB_YZZH
289555 cycles, COPYAtoB_YZZG
295935 cycles, COPYAtoB_YZZJ
296890 cycles, COPYAtoB_YZZK
297993 cycles, COPYAtoB_WZZE
298908 cycles, memcpy_4
300577 cycles, MOVEAtoB_SSEC
301085 cycles, COPYAtoB_XZE
301981 cycles, COPYAtoB_XZZC
302835 cycles, COPYAtoB_XZZF
303579 cycles, COPYAtoB_SSEC
304886 cycles, COPYAtoB_SSEA
321286 cycles, COPYAtoB_XZZE
337077 cycles, memcpy_3
338400 cycles, memcpy_2
377135 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case removed *****

169657 cycles, crt_memcpy
177504 cycles, crt_memcpy
265000 cycles, COPYAtoB_SSEL
265024 cycles, COPYAtoB_SSEK
266432 cycles, COPYAtoB_SSEI
266540 cycles, COPYAtoB_SSEJ
268096 cycles, COPYAtoB_SSEM
269270 cycles, COPYAtoB_SSEN
270243 cycles, COPYAtoB_SSEH
270836 cycles, COPYAtoB_SSEB
277654 cycles, COPYAtoB_SSEY
277846 cycles, COPYAtoB_SSEE
278930 cycles, COPYAtoB_SSEX
282976 cycles, COPYAtoB_WZZF- use registers
284974 cycles, memcpy_1
286742 cycles, COPYAtoB_YZZI- use registers
286824 cycles, COPYAtoB_YZZE- use registers
287296 cycles, COPYAtoB_YZZG- use registers
288297 cycles, COPYAtoB_YZZH- use registers
295148 cycles, COPYAtoB_YZZJ- use registers
296640 cycles, COPYAtoB_YZZK- use registers
296923 cycles, memcpy_4
297154 cycles, COPYAtoB_WZZE- use registers
298818 cycles, COPYAtoB_XZZC- use registers
300358 cycles, COPYAtoB_XZE- use registers
300368 cycles, MOVEAtoB_SSEC
300885 cycles, COPYAtoB_XZZF- use registers
303185 cycles, COPYAtoB_SSEC
303831 cycles, COPYAtoB_SSEA
308246 cycles, COPYAtoB_XZZE- use registers
336636 cycles, memcpy_3
338190 cycles, memcpy_2
376028 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

RuiLoureiro

#148
Hi
    It seems i found one good way of getting
    good results using SSE in my P4.
   

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means *****

98848 cycles, COPYAtoB_DQUB
171329 cycles, crt_memcpy
172769 cycles, crt_memcpy
249969 cycles, COPYAtoB_SSEP
251874 cycles, COPYAtoB_SSEM
251935 cycles, COPYAtoB_SSEB
253224 cycles, COPYAtoB_SSEY
254572 cycles, COPYAtoB_SSEX
254574 cycles, COPYAtoB_SSEH
255315 cycles, COPYAtoB_SSEK
255530 cycles, COPYAtoB_SSEN
255983 cycles, COPYAtoB_SSEE
257129 cycles, COPYAtoB_SSEL
257529 cycles, COPYAtoB_SSEO
258470 cycles, COPYAtoB_SSEJ
258917 cycles, COPYAtoB_SSEI
259654 cycles, COPYAtoB_DQUA
271520 cycles, memcpy_1
272597 cycles, COPYAtoB_WZZE- use registers
273014 cycles, COPYAtoB_WZZF- use registers
273203 cycles, COPYAtoB_YZZI- use registers
273599 cycles, memcpy_4
274972 cycles, COPYAtoB_YZZJ- use registers
276144 cycles, COPYAtoB_YZZH- use registers
276701 cycles, COPYAtoB_YZZK- use registers
276805 cycles, COPYAtoB_YZZG- use registers
276938 cycles, COPYAtoB_YZZE- use registers
282309 cycles, COPYAtoB_XZZC- use registers
283912 cycles, COPYAtoB_XZZF- use registers
285191 cycles, COPYAtoB_XZE - use registers
290683 cycles, COPYAtoB_XZZE- use registers
297654 cycles, MOVEAtoB_SSEC
301184 cycles, COPYAtoB_SSEA
303333 cycles, COPYAtoB_SSEC
329163 cycles, memcpy_3
340136 cycles, memcpy_2
379940 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case REMOVED*****

94062 cycles, COPYAtoB_DQUB
171003 cycles, crt_memcpy
171048 cycles, crt_memcpy
249520 cycles, COPYAtoB_SSEP
251563 cycles, COPYAtoB_SSEB
251637 cycles, COPYAtoB_SSEM
252778 cycles, COPYAtoB_SSEY
252926 cycles, COPYAtoB_SSEN
253431 cycles, COPYAtoB_SSEX
253792 cycles, COPYAtoB_SSEH
254940 cycles, COPYAtoB_SSEL
255056 cycles, COPYAtoB_SSEK
255683 cycles, COPYAtoB_SSEE
257176 cycles, COPYAtoB_SSEO
257666 cycles, COPYAtoB_SSEJ
258695 cycles, COPYAtoB_SSEI
259191 cycles, COPYAtoB_DQUA
268607 cycles, COPYAtoB_WZZF- use registers
271345 cycles, memcpy_1
272253 cycles, COPYAtoB_WZZE- use registers
272671 cycles, memcpy_4
272980 cycles, COPYAtoB_YZZI- use registers
273729 cycles, COPYAtoB_YZZG- use registers
274777 cycles, COPYAtoB_YZZJ- use registers
275167 cycles, COPYAtoB_YZZH- use registers
276398 cycles, COPYAtoB_YZZK- use registers
276478 cycles, COPYAtoB_YZZE- use registers
281989 cycles, COPYAtoB_XZZC- use registers
282306 cycles, COPYAtoB_XZZF- use registers
283778 cycles, COPYAtoB_XZE - use registers
288701 cycles, COPYAtoB_XZZE- use registers
296902 cycles, MOVEAtoB_SSEC
298823 cycles, COPYAtoB_SSEA
302362 cycles, COPYAtoB_SSEC
328756 cycles, memcpy_3
338998 cycles, memcpy_2
379394 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

Gunther

Hi Rui,

Quote from: RuiLoureiro on July 18, 2014, 03:56:59 AM
Hi
    It seems i found one good way of getting
    good results using SSE in my P4.

sounds good.  :t Go forward.

Gunther
You have to know the facts before you can distort them.