News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Sorting strings

Started by RuiLoureiro, May 29, 2014, 06:15:48 AM

Previous topic - Next topic

RuiLoureiro

Hi Gunther,  :t

            Now we can see that
            we can get good results
            using registers (not SSE)
            and it has nothing to do with
            these helps (?) that we get here
            (Compare these results with the previous).
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means worst case removed *****

68156 cycles, COPYAtoB_DQUB-
69378 cycles, COPYAtoB_DQUC-
93990 cycles, COPYAtoB_WZZE- use registers
94727 cycles, memcpy_1
97214 cycles, COPYAtoB_YZZJ- use registers
97233 cycles, COPYAtoB_YZZK- use registers
98405 cycles, COPYAtoB_XZZF- use registers
98632 cycles, COPYAtoB_WZZF- use registers
99312 cycles, COPYAtoB_SSEB
99919 cycles, COPYAtoB_YZZE- use registers
100136 cycles, COPYAtoB_YZZI- use registers
100177 cycles, COPYAtoB_YZZG- use registers
100616 cycles, COPYAtoB_XZZC- use registers
100691 cycles, COPYAtoB_YZZH- use registers
105280 cycles, COPYAtoB_XZE - use registers
108290 cycles, COPYAtoB_DQUA-
108642 cycles, COPYAtoB_XZZE- use registers
120814 cycles, COPYAtoB_SSEO
120996 cycles, COPYAtoB_SSEM
122012 cycles, COPYAtoB_SSEP
125973 cycles, COPYAtoB_SSEY
126106 cycles, COPYAtoB_SSEN
126260 cycles, COPYAtoB_SSEX
126943 cycles, COPYAtoB_SSEH
127500 cycles, crt_memcpy
131576 cycles, COPYAtoB_SSEK
132600 cycles, COPYAtoB_SSEE
133228 cycles, COPYAtoB_SSEJ
134229 cycles, COPYAtoB_SSEI
134992 cycles, COPYAtoB_SSEL
154953 cycles, memcpy_4
159392 cycles, crt_memcpy
165111 cycles, MOVEAtoB_SSEC
167700 cycles, COPYAtoB_SSEA
170294 cycles, COPYAtoB_SSEC
185132 cycles, memcpy_3
211344 cycles, memcpy_2
473021 cycles, COPYAtoB_MOVSB
********** END SortMeans **********

Gunther

Hi Rui,

Quote from: RuiLoureiro on July 18, 2014, 07:15:24 PM
Hi Gunther,  :t

            Now we can see that
            we can get good results
            using registers (not SSE)
            and it has nothing to do with
            these helps (?) that we get here
            (Compare these results with the previous).

yes, very interesting. Good work.  :t

Gunther
You have to know the facts before you can distort them.

RuiLoureiro

Hi
        I did all tests i wanted to do
        The best seems to be COPYAtoB_DQUE
        in my P4 when the address is aligned.
        From 8160 to 8223 bytes, the mean is
        defined by 256264 cycles.
        If the address is unaligned the best
        seems to be COPYAtoB_DQUB.

Here are the results
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 1...512 *****

130482 cycles, COPYAtoB_DQUB-    ALIGNED --> 89 670 cycles
131587 cycles, COPYAtoB_DQUD-            --> 85 424 cycles
131788 cycles, COPYAtoB_DQUC-
134390 cycles, COPYAtoB_DQUE-            --> 89 661 cycles
135774 cycles, crt_memcpy
248412 cycles, COPYAtoB_SSEM
252771 cycles, COPYAtoB_SSEH
253568 cycles, COPYAtoB_SSEO
253608 cycles, COPYAtoB_SSEN
254133 cycles, COPYAtoB_SSEL
255974 cycles, COPYAtoB_SSEP
256932 cycles, COPYAtoB_SSEB
257831 cycles, COPYAtoB_SSEK
258858 cycles, COPYAtoB_SSEJ
261882 cycles, COPYAtoB_SSEI
262240 cycles, COPYAtoB_SSEY
262609 cycles, COPYAtoB_SSEX
263376 cycles, COPYAtoB_WZZE- use registers
264587 cycles, memcpy_4
265348 cycles, COPYAtoB_SSEE
270437 cycles, COPYAtoB_WZZF- use registers
276074 cycles, COPYAtoB_YZZK- use registers
277334 cycles, COPYAtoB_YZZJ- use registers
281863 cycles, COPYAtoB_YZZG- use registers
281877 cycles, COPYAtoB_YZZE- use registers
281942 cycles, COPYAtoB_YZZI- use registers
282510 cycles, COPYAtoB_YZZH- use registers
291839 cycles, memcpy_1
296347 cycles, COPYAtoB_DQUA-
309981 cycles, COPYAtoB_SSEA
311312 cycles, COPYAtoB_SSEC
313779 cycles, MOVEAtoB_SSEC
340787 cycles, memcpy_2
341081 cycles, COPYAtoB_XZZE- use registers
342978 cycles, memcpy_3
344042 cycles, COPYAtoB_XZZF- use registers
351251 cycles, COPYAtoB_XZZC- use registers
351355 cycles, COPYAtoB_XZE - use registers
378696 cycles, COPYAtoB_MOVSB
********** STOP SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 992...1055 *****

41318 cycles, COPYAtoB_DQUB-    ALIGNED --> 34 982 cycles
41516 cycles, COPYAtoB_DQUD-
41638 cycles, COPYAtoB_DQUC-
42056 cycles, COPYAtoB_DQUE-            --> 33 072 cycles
60389 cycles, crt_memcpy
126227 cycles, COPYAtoB_SSEE
127620 cycles, COPYAtoB_SSEN
127659 cycles, COPYAtoB_SSEH
127845 cycles, COPYAtoB_SSEM
128484 cycles, COPYAtoB_SSEL
128532 cycles, COPYAtoB_SSEY
129163 cycles, COPYAtoB_SSEK
129230 cycles, COPYAtoB_SSEP
129432 cycles, COPYAtoB_SSEX
129955 cycles, COPYAtoB_SSEI
130076 cycles, COPYAtoB_WZZE- use registers
130196 cycles, COPYAtoB_DQUA-
130327 cycles, COPYAtoB_SSEJ
130936 cycles, memcpy_4
132255 cycles, COPYAtoB_SSEO
132921 cycles, COPYAtoB_SSEB
137932 cycles, COPYAtoB_WZZF- use registers
139090 cycles, COPYAtoB_YZZG- use registers
139273 cycles, COPYAtoB_YZZI- use registers
139625 cycles, COPYAtoB_YZZH- use registers
140394 cycles, COPYAtoB_YZZJ- use registers
140546 cycles, memcpy_1
140832 cycles, COPYAtoB_YZZE- use registers
141484 cycles, COPYAtoB_SSEC
141676 cycles, COPYAtoB_SSEA
141896 cycles, memcpy_2
142128 cycles, memcpy_3
143169 cycles, COPYAtoB_YZZK- use registers
143228 cycles, MOVEAtoB_SSEC
156236 cycles, COPYAtoB_MOVSB
161533 cycles, COPYAtoB_XZZE- use registers
161612 cycles, COPYAtoB_XZZF- use registers
162562 cycles, COPYAtoB_XZE - use registers
163142 cycles, COPYAtoB_XZZC- use registers
********** STOP SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 2016...2079 *****

68657 cycles, COPYAtoB_DQUB-    ALIGNED --> 67 286 cycles
69082 cycles, COPYAtoB_DQUD-
69388 cycles, COPYAtoB_DQUC-
71219 cycles, COPYAtoB_DQUE-            --> 59 517 cycles
106330 cycles, crt_memcpy
244328 cycles, COPYAtoB_SSEY
247038 cycles, COPYAtoB_SSEE
248428 cycles, COPYAtoB_SSEX
250080 cycles, COPYAtoB_SSEP
251992 cycles, COPYAtoB_DQUA-
254174 cycles, COPYAtoB_SSEN
254364 cycles, COPYAtoB_SSEH
255110 cycles, COPYAtoB_SSEM
256986 cycles, COPYAtoB_SSEL
257096 cycles, COPYAtoB_SSEK
257837 cycles, COPYAtoB_SSEI
258466 cycles, COPYAtoB_SSEJ
260602 cycles, memcpy_4
260685 cycles, COPYAtoB_WZZE- use registers
263622 cycles, COPYAtoB_SSEO
265307 cycles, COPYAtoB_SSEB
273274 cycles, memcpy_2
273774 cycles, COPYAtoB_SSEC
273835 cycles, memcpy_3
274148 cycles, MOVEAtoB_SSEC
274170 cycles, COPYAtoB_WZZF- use registers
274871 cycles, COPYAtoB_YZZH- use registers
275333 cycles, COPYAtoB_YZZG- use registers
275539 cycles, COPYAtoB_SSEA
275914 cycles, COPYAtoB_YZZI- use registers
275975 cycles, memcpy_1
280238 cycles, COPYAtoB_YZZE- use registers
280803 cycles, COPYAtoB_YZZJ- use registers
282198 cycles, COPYAtoB_MOVSB
288596 cycles, COPYAtoB_YZZK- use registers
315167 cycles, COPYAtoB_XZZF- use registers
321628 cycles, COPYAtoB_XZE - use registers
322624 cycles, COPYAtoB_XZZE- use registers
324055 cycles, COPYAtoB_XZZC- use registers
********** STOP SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 4064...4127 ** *****

133524 cycles, COPYAtoB_DQUB-    ALIGNED --> 138 234 cycles
133974 cycles, COPYAtoB_DQUD-
134362 cycles, COPYAtoB_DQUC-
139861 cycles, COPYAtoB_DQUE-            --> 131 426 cycles
219491 cycles, crt_memcpy
494054 cycles, COPYAtoB_DQUA-
499609 cycles, COPYAtoB_SSEY
500909 cycles, COPYAtoB_SSEP
501054 cycles, COPYAtoB_SSEX
503233 cycles, COPYAtoB_SSEE
514665 cycles, COPYAtoB_SSEH
515061 cycles, COPYAtoB_SSEN
518292 cycles, COPYAtoB_SSEM
519263 cycles, COPYAtoB_SSEL
520033 cycles, COPYAtoB_SSEK
521694 cycles, COPYAtoB_SSEI
522140 cycles, COPYAtoB_SSEJ
529019 cycles, COPYAtoB_WZZE- use registers
531639 cycles, memcpy_4
537646 cycles, COPYAtoB_SSEO
538366 cycles, COPYAtoB_SSEB
544214 cycles, memcpy_2
545965 cycles, memcpy_3
546893 cycles, COPYAtoB_SSEA
548279 cycles, MOVEAtoB_SSEC
550654 cycles, COPYAtoB_SSEC
552476 cycles, COPYAtoB_MOVSB
555669 cycles, COPYAtoB_YZZG- use registers
555863 cycles, COPYAtoB_YZZI- use registers
556214 cycles, COPYAtoB_WZZF- use registers
557247 cycles, COPYAtoB_YZZH- use registers
558249 cycles, memcpy_1
563636 cycles, COPYAtoB_YZZE- use registers
569430 cycles, COPYAtoB_YZZJ- use registers
584721 cycles, COPYAtoB_YZZK- use registers
629480 cycles, COPYAtoB_XZZF- use registers
631297 cycles, COPYAtoB_XZE - use registers
631658 cycles, COPYAtoB_XZZC- use registers
632115 cycles, COPYAtoB_XZZE- use registers
********** STOP SortMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 8160...8223 **

267038 cycles, COPYAtoB_DQUE-   ALIGNED --> 256 264 cycles
284186 cycles, COPYAtoB_DQUB-
284812 cycles, COPYAtoB_DQUC-
294833 cycles, COPYAtoB_DQUD-           --> 282 396 cycles
456833 cycles, crt_memcpy
988290 cycles, COPYAtoB_SSEP
990949 cycles, COPYAtoB_DQUA-
1004369 cycles, COPYAtoB_SSEE
1023446 cycles, COPYAtoB_SSEY
1024652 cycles, COPYAtoB_SSEX
1034689 cycles, COPYAtoB_SSEN
1036664 cycles, COPYAtoB_SSEH
1042597 cycles, COPYAtoB_SSEM
1043202 cycles, COPYAtoB_SSEK
1046816 cycles, COPYAtoB_SSEI
1048882 cycles, COPYAtoB_SSEJ
1055650 cycles, COPYAtoB_SSEL
1062057 cycles, COPYAtoB_WZZE- use registers
1066017 cycles, memcpy_4
1074905 cycles, COPYAtoB_SSEO
1077120 cycles, COPYAtoB_SSEB
1077648 cycles, memcpy_2
1079904 cycles, memcpy_3
1084412 cycles, COPYAtoB_SSEC
1084800 cycles, MOVEAtoB_SSEC
1087530 cycles, memcpy_1
1087803 cycles, COPYAtoB_SSEA
1098031 cycles, COPYAtoB_MOVSB
1125443 cycles, COPYAtoB_YZZE- use registers
1126103 cycles, COPYAtoB_YZZI- use registers
1126736 cycles, COPYAtoB_YZZG- use registers
1128012 cycles, COPYAtoB_YZZH- use registers
1128422 cycles, COPYAtoB_WZZF- use registers
1139396 cycles, COPYAtoB_YZZJ- use registers
1158849 cycles, COPYAtoB_YZZK- use registers
1262942 cycles, COPYAtoB_XZZF- use registers
1268578 cycles, COPYAtoB_XZZC- use registers
1269365 cycles, COPYAtoB_XZZE- use registers
1271633 cycles, COPYAtoB_XZE - use registers
********** STOP SortMeans **********

Gunther

Rui,

you're a hard working man.  :t

Gunther
You have to know the facts before you can distort them.

RuiLoureiro

#154
Hi
    Here are the results of my last test
    to copy from 1 to 512 bytes.

All procedures were tested to copy from 2025 bytes
to 0 bytes and all work correctly. At the end,
source and destination are exactly equal. The
destination ends with a null terminator in the
correct place.

note1:   MOVEAtoB_DQUB is a macro to copy
         a string with a predefined length.
         It defines the length of the destination
         and ends with a null terminator.


note2:   COPYAtoB_DQU? is a procedure to copy
         a given length from source to destination.
         It defines the length of the destination
         and ends with a null terminator.


note3:   COPYAtoB_DQA? is a procedure to copy
         a string with a predefined length.
         It defines the length of the destination
         and ends with a null terminator.

Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 1...512 **

67340 cycles, MOVEAtoB_DQUB
72405 cycles, COPYAtoB_DQUB-
72424 cycles, COPYAtoB_DQAB-
73656 cycles, COPYAtoB_DQAD-
73777 cycles, COPYAtoB_DQUC-
73785 cycles, COPYAtoB_DQAC-
74340 cycles, COPYAtoB_DQAE-
74509 cycles, COPYAtoB_DQUE-
74615 cycles, COPYAtoB_DQUD-
95053 cycles, COPYAtoB_WZZG- use registers
107305 cycles, crt_memcpy
122627 cycles, COPYAtoB_DQAF-
123921 cycles, COPYAtoB_DQUF-
203780 cycles, COPYAtoB_DQAA-
207501 cycles, COPYAtoB_SSEP
208377 cycles, COPYAtoB_DQUA-
208983 cycles, COPYAtoB_SSEX
209303 cycles, COPYAtoB_SSEY
209411 cycles, COPYAtoB_SSEN
210635 cycles, COPYAtoB_SSEM
210645 cycles, COPYAtoB_SSEH
211094 cycles, COPYAtoB_SSEL
211875 cycles, COPYAtoB_SSEB
212347 cycles, COPYAtoB_SSEO
212850 cycles, COPYAtoB_SSEK
213498 cycles, COPYAtoB_SSEF
215333 cycles, COPYAtoB_SSEI
215825 cycles, COPYAtoB_SSEE
216628 cycles, COPYAtoB_SSEJ
218807 cycles, memcpy_4
219464 cycles, COPYAtoB_WZZE- use registers
223717 cycles, COPYAtoB_WZZF- use registers
230277 cycles, COPYAtoB_YZZJ- use registers
231627 cycles, COPYAtoB_YZZK- use registers
231689 cycles, COPYAtoB_YZZE- use registers
232139 cycles, COPYAtoB_XZE - use registers
232573 cycles, memcpy_1
233462 cycles, COPYAtoB_XZZE- use registers
233801 cycles, COPYAtoB_XZZF- use registers
234350 cycles, COPYAtoB_YZZH- use registers
234512 cycles, COPYAtoB_YZZI- use registers
234802 cycles, COPYAtoB_YZZG- use registers
235530 cycles, COPYAtoB_XZZC- use registers
248025 cycles, COPYAtoB_SSEA
249516 cycles, COPYAtoB_SSEC
278887 cycles, memcpy_2
282360 cycles, memcpy_3
300852 cycles, COPYAtoB_MOVSB
********** STOP SortMeans **********
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - ALIGNED-worst case removed - 1...512 **

66970 cycles, MOVEAtoB_DQUB
67416 cycles, COPYAtoB_DQAC-
68079 cycles, COPYAtoB_DQAD-
68437 cycles, COPYAtoB_DQUC-
69240 cycles, COPYAtoB_DQAB-
69305 cycles, COPYAtoB_DQUD-
69496 cycles, COPYAtoB_DQAE-
70094 cycles, COPYAtoB_DQUE-
71099 cycles, COPYAtoB_DQUB-
82996 cycles, COPYAtoB_SSEB
90245 cycles, COPYAtoB_WZZG- use registers
90598 cycles, COPYAtoB_YZZK- use registers
91764 cycles, COPYAtoB_YZZJ- use registers
92333 cycles, COPYAtoB_WZZE- use registers
93184 cycles, COPYAtoB_YZZE- use registers
93192 cycles, memcpy_1
93742 cycles, COPYAtoB_WZZF- use registers
93823 cycles, COPYAtoB_YZZG- use registers
94104 cycles, COPYAtoB_YZZH- use registers
94752 cycles, COPYAtoB_YZZI- use registers
99354 cycles, COPYAtoB_SSEP
99480 cycles, COPYAtoB_SSEO
101173 cycles, COPYAtoB_SSEM
101757 cycles, COPYAtoB_SSEY
102397 cycles, COPYAtoB_SSEN
102470 cycles, COPYAtoB_SSEX
102610 cycles, COPYAtoB_SSEH
104716 cycles, COPYAtoB_SSEK
105919 cycles, COPYAtoB_SSEL
106437 cycles, COPYAtoB_SSEI
107891 cycles, COPYAtoB_SSEF
108012 cycles, COPYAtoB_SSEJ
108711 cycles, COPYAtoB_SSEE
120209 cycles, COPYAtoB_DQUF-
120272 cycles, COPYAtoB_DQAF-
127760 cycles, memcpy_4
129327 cycles, crt_memcpy
135247 cycles, COPYAtoB_SSEA
154312 cycles, memcpy_3
168126 cycles, memcpy_2
198618 cycles, COPYAtoB_DQAA-
200537 cycles, COPYAtoB_DQUA-
231671 cycles, COPYAtoB_XZZF- use registers
232702 cycles, COPYAtoB_XZE - use registers
233570 cycles, COPYAtoB_XZZE- use registers
235548 cycles, COPYAtoB_XZZC- use registers
236412 cycles, COPYAtoB_MOVSB
249124 cycles, COPYAtoB_SSEC
********** STOP SortMeans **********

Gunther

Hi Rui,

you've nothing attached.

Gunther
You have to know the facts before you can distort them.

RuiLoureiro

Quote from: Gunther on July 23, 2014, 09:46:59 PM
Hi Rui,

you've nothing attached.

Gunther
Hi Gunther,

           I don't need to test any code now.
           If you need any one i will send to you.

Now, see what cls do. It doesn't work properly.
After cls i got this junk:
(i solved the problem with my ScreenClear.
It seems to work properly.)

How do i replace ClearScreen by ScreenClear
in MASM ?
Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means - UNALIGNED-992...1055--

33186 cycles, COPYAtoB_DQUE-
33955 cycles, COPYAtoB_DQAE-
35447 cycles, MOVEAtoB_DQUB
36020 cycles, COPYAtoB_DQUB-
36075 cycles, COPYAtoB_DQAD-
36096 cycles, COPYAtoB_DQUD-
37942 cycles, COPYAtoB_DQUC-
38617 cycles, COPYAtoB_DQAC-
38732 cycles, COPYAtoB_DQAB-
42871 cycles, COPYAtoB_DQUF-
42976 cycles, COPYAtoB_DQAF-
52857 cycles, COPYAtoB_WZZG- use registers
53586 cycles, crt_memcpy
133432 cycles, COPYAtoB_SSEK
134433 cycles, COPYAtoB_SSEJ
135038 cycles, COPYAtoB_SSEO
135287 cycles, COPYAtoB_YZZK- use registers
135458 cycles, COPYAtoB_WZZE- use registers
135655 cycles, COPYAtoB_SSEN
135759 cycles, COPYAtoB_SSEM
135852 cycles, COPYAtoB_SSEI
135865 cycles, COPYAtoB_YZZJ- use registers
137361 cycles, COPYAtoB_SSEH
138192 cycles, COPYAtoB_SSEB
138272 cycles, COPYAtoB_SSEL
140704 cycles, memcpy_4
141706 cycles, memcpy_2
142657 cycles, COPYAtoB_SSEC
143276 cycles, COPYAtoB_SSEA
145980 cycles, COPYAtoB_WZZF- use registers
147074 cycles, memcpy_3
147502 cycles, COPYAtoB_MOVSB
147890 cycles, COPYAtoB_DQUA-
148308 cycles, COPYAtoB_DQAA-
153250 cycles, COPYAtoB_YZZG- use registers
153341 cycles, COPYAtoB_YZZH- use registers
153443 cycles, COPYAtoB_SSEF
153817 cycles, COPYAtoB_SSEE
153964 cycles, COPYAtoB_SSEP
154037 cycles, COPYAtoB_YZZE- use registers
154083 cycles, COPYAtoB_SSEY
154313 cycles, COPYAtoB_YZZI- use registers
154803 cycles, COPYAtoB_XZE - use registers
155999 cycles, memcpy_1
159922 cycles, COPYAtoB_SSEX
191833 cycles, COPYAtoB_XZZF- use registers
201893 cycles, COPYAtoB_XZZE- use registers
202891 cycles, COPYAtoB_XZZC- use registers
********** STOP SortMeans ********** 055
153747 cycles, COPYAtoB_YZZI - 992...1055
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)5
-----------------------------------------------------
***** Means - UNALIGNED-worst case removed - 992...1055--
156242 cycles, COPYAtoB_YZZH - 992...1055  <-- junk
26524 cycles, COPYAtoB_DQUE- - 992...1055
26633 cycles, COPYAtoB_DQAE- - 992...1055
28275 cycles, MOVEAtoB_DQUBE - 992...1055
28761 cycles, COPYAtoB_DQUB- - 992...1055
28771 cycles, COPYAtoB_DQUD- - 992...1055
28776 cycles, COPYAtoB_DQAD- - 992...1055
28820 cycles, COPYAtoB_DQAC- - 992...1055
29074 cycles, COPYAtoB_DQAB- - 992...1055
29349 cycles, COPYAtoB_DQUC- - 992...1055
34240 cycles, COPYAtoB_DQUF- - 992...1055
34340 cycles, COPYAtoB_DQAF- - 992...1055
42217 cycles, COPYAtoB_WZZG- use registers
42810 cycles, crt_memcpyYZZK - 992...1055    <--- junk
106628 cycles, COPYAtoB_SSEK - 992...1055
107189 cycles, COPYAtoB_SSEI - 992...1055
107265 cycles, COPYAtoB_SSEJ - 992...1055
107908 cycles, COPYAtoB_SSEO - 992...1055
108056 cycles, COPYAtoB_WZZE- use registers
108059 cycles, COPYAtoB_YZZK- use registers
108318 cycles, COPYAtoB_SSEM - 992...1055
108432 cycles, COPYAtoB_SSEN - 992...1055
108483 cycles, COPYAtoB_YZZJ- use registers
108518 cycles, COPYAtoB_SSEH - 992...1055
109727 cycles, memcpy_4_WZZF - 992...1055   <-- junk
109814 cycles, COPYAtoB_SSEL - 992...1055
110396 cycles, COPYAtoB_SSEB- 992...1055
113353 cycles, memcpy_2WZZG - 992...1055
113947 cycles, COPYAtoB_SSEC- 992...1055
114278 cycles, COPYAtoB_SSEA- 992...1055
116339 cycles, COPYAtoB_WZZF- use registers
117104 cycles, memcpy_3_SSEL - 992...1055
117349 cycles, COPYAtoB_MOVSB- 992...1055
118148 cycles, COPYAtoB_DQUA-- 992...1055
118415 cycles, COPYAtoB_DQAA-- 992...1055
121758 cycles, COPYAtoB_SSEF - 992...1055
121808 cycles, COPYAtoB_SSEE - 992...1055
122303 cycles, COPYAtoB_YZZG- use registers
122413 cycles, COPYAtoB_YZZH- use registers
123036 cycles, COPYAtoB_SSEP - 992...1055
123064 cycles, COPYAtoB_YZZI- use registers
123084 cycles, COPYAtoB_YZZE- use registers
123157 cycles, COPYAtoB_SSEY - 992...1055
123724 cycles, COPYAtoB_XZE - use registers
124587 cycles, memcpy_1_SSEP - 992...1055
125577 cycles, COPYAtoB_SSEX - 992...1055
149397 cycles, COPYAtoB_XZZC- use registers
152096 cycles, COPYAtoB_XZZF- use registers
155795 cycles, COPYAtoB_XZZE- use registers
********** STOP SortMeans ********** 55
...
and a lot of junk here after

RuiLoureiro

How do i replace ClearScreen by ScreenClear
in MASM ?

I replaced this ClearScreen

ClearScreen  proc
  ; -----------------------------------------------------------
  ; This procedure reads the column and row count, multiplies
  ; them together to get the number of characters that will fit
  ; onto the screen, writes that number of blank spaces to the
  ; screen and reposition the prompt at position 0,0.
  ; -----------------------------------------------------------
    LOCAL hOutPut:DWORD
    LOCAL noc    :DWORD
    LOCAL cnt    :DWORD
    LOCAL sbi    :CONSOLE_SCREEN_BUFFER_INFO

    invoke GetStdHandle,STD_OUTPUT_HANDLE
    mov hOutPut, eax

    invoke GetConsoleScreenBufferInfo,hOutPut,ADDR sbi
    mov eax, sbi.dwSize     ; 2 word values returned for screen size

  ; -----------------------------------------------
  ; extract the 2 values and multiply them together
  ; -----------------------------------------------
    push ax
    rol eax, 16
    mov cx, ax
    pop ax
    mul cx
    cwde
    mov cnt, eax

    invoke FillConsoleOutputCharacter,hOutPut,32,cnt,NULL,ADDR noc
    invoke locate,0,0
    ret
ClearScreen  endp

by this ClearScreen  in the file clearscr.asm
and i ran make.bat.

ClearScreen         proc
                    LOCAL hOutPut,noc:DWORD
                    LOCAL sbi:CONSOLE_SCREEN_BUFFER_INFO

                    invoke GetStdHandle,STD_OUTPUT_HANDLE
                    mov    hOutPut, eax
                    invoke GetConsoleScreenBufferInfo,hOutPut,ADDR sbi
                    mov     eax, sbi.dwSize     ; 2 word values returned for screen size

                    ; -----------------------------------------------
                    ; extract the 2 values and multiply them together
                    ; -----------------------------------------------
                    mov     edx, eax
                    and     edx, 0FFFFh         ; number of columns
                    shr     eax, 16             ; number of lines                   
                    mul     edx
                    mov     edx, eax

                    invoke  FillConsoleOutputCharacter, hOutPut, 32, edx, 0, ADDR noc
                    invoke  locate,0,0
                    ret
ClearScreen         endp

RuiLoureiro

Hi,
    now, i am testing a procedure to insert
    one string A into one string B of length=???
    at any position from 0 to length+x.

    In this test, the length of string A varies from
    1 to 100. The length of string B = 200.
    In any case, the procedure moves 200 bytes forward
    and then it inserts the string A at position 0.

    note: INSAtoB_?    is a macro
             InsertAtoB_? is a procedure

    My conclusion: the best way is to use SSE in my P4.
   
    The strings may be aligned or unaligned,
    it doesn't matter.
   

INSERTING AT POSITION 0-string length= 200

********** END I - Press a key **********
77675 cycles, INSAtoB_X, insert 1...100--
77465 cycles, INSAtoB_X, insert 1...100--
77586 cycles, INSAtoB_X, insert 1...100--
88744 cycles, INSAtoB_X, insert 1...100--
80409 cycles, INSAtoB_X, insert 1...100--

54655 cycles, INSAtoB_XZZE, insert 1...100--
54866 cycles, INSAtoB_XZZE, insert 1...100--
54472 cycles, INSAtoB_XZZE, insert 1...100--
57474 cycles, INSAtoB_XZZE, insert 1...100--
55063 cycles, INSAtoB_XZZE, insert 1...100--

54452 cycles, INSAtoB_XZZF, insert 1...100--
54363 cycles, INSAtoB_XZZF, insert 1...100--
54351 cycles, INSAtoB_XZZF, insert 1...100--
54272 cycles, INSAtoB_XZZF, insert 1...100--
54556 cycles, INSAtoB_XZZF, insert 1...100--

54204 cycles, INSAtoB_XZZG, insert 1...100--
54211 cycles, INSAtoB_XZZG, insert 1...100--
54224 cycles, INSAtoB_XZZG, insert 1...100--
54209 cycles, INSAtoB_XZZG, insert 1...100--
54185 cycles, INSAtoB_XZZG, insert 1...100--

138734 cycles, INSAtoB_BA, insert 1...100--
135168 cycles, INSAtoB_BA, insert 1...100--
136854 cycles, INSAtoB_BA, insert 1...100--
135718 cycles, INSAtoB_BA, insert 1...100--
135550 cycles, INSAtoB_BA, insert 1...100--

154274 cycles, INSAtoB_BB, insert 1...100--
142298 cycles, INSAtoB_BB, insert 1...100--
143103 cycles, INSAtoB_BB, insert 1...100--
143249 cycles, INSAtoB_BB, insert 1...100--
140827 cycles, INSAtoB_BB, insert 1...100--

54098 cycles, INSAtoB_SSEE, insert 1...100--
53574 cycles, INSAtoB_SSEE, insert 1...100--
53622 cycles, INSAtoB_SSEE, insert 1...100--
53708 cycles, INSAtoB_SSEE, insert 1...100--
53168 cycles, INSAtoB_SSEE, insert 1...100--

54125 cycles, INSAtoB_SSEF, insert 1...100--
51203 cycles, INSAtoB_SSEF, insert 1...100--
51473 cycles, INSAtoB_SSEF, insert 1...100--
51682 cycles, INSAtoB_SSEF, insert 1...100--
51510 cycles, INSAtoB_SSEF, insert 1...100--

47988 cycles, INSAtoB_SSEG, insert 1...100--
48405 cycles, INSAtoB_SSEG, insert 1...100--
47919 cycles, INSAtoB_SSEG, insert 1...100--
48074 cycles, INSAtoB_SSEG, insert 1...100--
48730 cycles, INSAtoB_SSEG, insert 1...100--

48796 cycles, InsertAtoB_SSEG, insert 1...100--
48843 cycles, InsertAtoB_SSEG, insert 1...100--
48854 cycles, InsertAtoB_SSEG, insert 1...100--
49345 cycles, InsertAtoB_SSEG, insert 1...100--
48686 cycles, InsertAtoB_SSEG, insert 1...100--

36357 cycles, InsertAtoB_DQUA, insert 1...100--
37054 cycles, InsertAtoB_DQUA, insert 1...100--
36460 cycles, InsertAtoB_DQUA, insert 1...100--
36575 cycles, InsertAtoB_DQUA, insert 1...100--
35550 cycles, InsertAtoB_DQUA, insert 1...100--

*** Press any key to get the mean values table ***

Quote
-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means -Inserting at position 0 - 1...100--

36399 cycles, InsertAtoB_DQUA- insert 1...100--
48223 cycles, INSAtoB_SSEG- insert 1...100--
48904 cycles, InsertAtoB_SSEG- insert 1...100--
51998 cycles, INSAtoB_SSEF- insert 1...100--
53634 cycles, INSAtoB_SSEE- insert 1...100--
54206 cycles, INSAtoB_XZZG- insert 1...100--
54398 cycles, INSAtoB_XZZF- insert 1...100--
55306 cycles, INSAtoB_XZZE- insert 1...100--
80375 cycles, INSAtoB_X-    insert 1...100--
136404 cycles, INSAtoB_BA-   insert 1...100--
144750 cycles, INSAtoB_BB-   insert 1...100--
********** STOP ShowAllMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means -worst case removed-Inserting at position 0 - 1...100--

36235 cycles, InsertAtoB_DQUA- insert 1...100--
48096 cycles, INSAtoB_SSEG- insert 1...100--
48794 cycles, InsertAtoB_SSEG- insert 1...100--
51467 cycles, INSAtoB_SSEF- insert 1...100--
53518 cycles, INSAtoB_SSEE- insert 1...100--
54202 cycles, INSAtoB_XZZG- insert 1...100--
54359 cycles, INSAtoB_XZZF- insert 1...100--
54764 cycles, INSAtoB_XZZE- insert 1...100--
78283 cycles, INSAtoB_X-    insert 1...100--
135822 cycles, INSAtoB_BA-   insert 1...100--
142369 cycles, INSAtoB_BB-   insert 1...100--
********** STOP ShowAllMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means-MINIMUM values-Inserting at position 0 - 1...100--

35550 cycles, InsertAtoB_DQUA- insert 1...100--
47919 cycles, INSAtoB_SSEG- insert 1...100--
48686 cycles, InsertAtoB_SSEG- insert 1...100--
51203 cycles, INSAtoB_SSEF- insert 1...100--
53168 cycles, INSAtoB_SSEE- insert 1...100--
54185 cycles, INSAtoB_XZZG- insert 1...100--
54272 cycles, INSAtoB_XZZF- insert 1...100--
54472 cycles, INSAtoB_XZZE- insert 1...100--
77465 cycles, INSAtoB_X-    insert 1...100--
135168 cycles, INSAtoB_BA-   insert 1...100--
140827 cycles, INSAtoB_BB-   insert 1...100--
********** STOP ShowAllMinimum **********
Quote
INSERTING AT POSITION 100-string length= 200

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means -Inserting at position 100 - 1...100--

25601 cycles, INSAtoB_SSEG- insert 1...100--
29856 cycles, INSAtoB_SSEF- insert 1...100--
30978 cycles, INSAtoB_SSEE- insert 1...100--
31435 cycles, INSAtoB_XZZF- insert 1...100--
31557 cycles, INSAtoB_XZZE- insert 1...100--
31735 cycles, INSAtoB_XZZG- insert 1...100--
35899 cycles, InsertAtoB_DQUA- insert 1...100--
40805 cycles, InsertAtoB_SSEG- insert 1...100--
49302 cycles, INSAtoB_X-    insert 1...100--
123924 cycles, INSAtoB_BB-   insert 1...100--
133299 cycles, INSAtoB_BA-   insert 1...100--
********** STOP ShowAllMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means -worst case removed-Inserting at position 100 - 1...100--

25492 cycles, INSAtoB_SSEG- insert 1...100--
29797 cycles, INSAtoB_SSEF- insert 1...100--
30820 cycles, INSAtoB_SSEE- insert 1...100--
31406 cycles, INSAtoB_XZZF- insert 1...100--
31434 cycles, INSAtoB_XZZE- insert 1...100--
31534 cycles, INSAtoB_XZZG- insert 1...100--
35843 cycles, InsertAtoB_DQUA- insert 1...100--
40628 cycles, InsertAtoB_SSEG- insert 1...100--
48681 cycles, INSAtoB_X-    insert 1...100--
123564 cycles, INSAtoB_BB-   insert 1...100--
131552 cycles, INSAtoB_BA-   insert 1...100--
********** STOP ShowAllMeans **********

-----------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
-----------------------------------------------------
***** Means-MINIMUM values-Inserting at position 100 - 1...100--

25295 cycles, INSAtoB_SSEG- insert 1...100--
29585 cycles, INSAtoB_SSEF- insert 1...100--
29889 cycles, INSAtoB_SSEE- insert 1...100--
31094 cycles, INSAtoB_XZZE- insert 1...100--
31261 cycles, INSAtoB_XZZG- insert 1...100--
31299 cycles, INSAtoB_XZZF- insert 1...100--
35649 cycles, InsertAtoB_DQUA- insert 1...100--
39612 cycles, InsertAtoB_SSEG- insert 1...100--
47736 cycles, INSAtoB_X-    insert 1...100--
122062 cycles, INSAtoB_BB-   insert 1...100--
129815 cycles, INSAtoB_BA-   insert 1...100--
********** STOP ShowAllMinimum **********