Hi all, Could you run and show here your results of TestTranspose_cycles1 ? (files: all .asm included and Timing.inc) Thanks :t Note: The matrices used here are matrices where each element is a REAL4 number. EDIT:
See all results in my reply #7 below
EDIT: tests using SSE
HERE are my resultsQuote
434 cycles, MatrixTransposeZ, transposeMatX
399 cycles, MatrixTransposeDD, transposeMatZ
411 cycles, MatrixTransposeDF, transposeMatZ
417 cycles, MatrixTransposeX, transposeMatX
476 cycles, MatrixTransposeZZ, transposeMatX
470 cycles, MatrixTransposeDDD, transposeMatZ
540 cycles, MatrixTransposeDFF, transposeMatZ
414 cycles, MatrixTransposeXX, transposeMatX
1311 cycles, MatrixTransposeZ, transposeMatXX
1170 cycles, MatrixTransposeDD, transposeMatZZ
1156 cycles, MatrixTransposeDF, transposeMatZZ
1326 cycles, MatrixTransposeX, transposeMatXX
1620 cycles, MatrixTransposeZZ, transposeMatXX
1548 cycles, MatrixTransposeDDD, transposeMatZZ
1610 cycles, MatrixTransposeDFF, transposeMatZZ
1322 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
399 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
411 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
414 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
417 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
434 cycles, MatrixTransposeZ, testMatX 4x4- last to first
470 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
476 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
540 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
1156 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
1170 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
1311 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
1322 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
1326 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
1548 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1610 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
1620 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
********** END **********
Jochen :t
For now (2 samples:P4,i5) MatrixTransposeDD seems to be the best
Intel Core i5:145 cycles, MatrixTransposeZ, transposeMatX
109 cycles, MatrixTransposeDD, transposeMatZ
115 cycles, MatrixTransposeDF, transposeMatZ
192 cycles, MatrixTransposeX, transposeMatX
264 cycles, MatrixTransposeZZ, transposeMatX
140 cycles, MatrixTransposeDDD, transposeMatZ
118 cycles, MatrixTransposeDFF, transposeMatZ
200 cycles, MatrixTransposeXX, transposeMatX
332 cycles, MatrixTransposeZ, transposeMatXX
316 cycles, MatrixTransposeDD, transposeMatZZ
329 cycles, MatrixTransposeDF, transposeMatZZ
300 cycles, MatrixTransposeX, transposeMatXX
378 cycles, MatrixTransposeZZ, transposeMatXX
402 cycles, MatrixTransposeDDD, transposeMatZZ
387 cycles, MatrixTransposeDFF, transposeMatZZ
310 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
109 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
115 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
118 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
140 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
145 cycles, MatrixTransposeZ, testMatX 4x4- last to first
192 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
200 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
264 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
300 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
310 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
316 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
329 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
332 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
378 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
387 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
402 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
********** END **********
86 cycles, MatrixTransposeZ, transposeMatX
87 cycles, MatrixTransposeDD, transposeMatZ
91 cycles, MatrixTransposeDF, transposeMatZ
88 cycles, MatrixTransposeX, transposeMatX
92 cycles, MatrixTransposeZZ, transposeMatX
90 cycles, MatrixTransposeDDD, transposeMatZ
99 cycles, MatrixTransposeDFF, transposeMatZ
97 cycles, MatrixTransposeXX, transposeMatX
402 cycles, MatrixTransposeZ, transposeMatXX
315 cycles, MatrixTransposeDD, transposeMatZZ
318 cycles, MatrixTransposeDF, transposeMatZZ
324 cycles, MatrixTransposeX, transposeMatXX
664 cycles, MatrixTransposeZZ, transposeMatXX
272 cycles, MatrixTransposeDDD, transposeMatZZ
323 cycles, MatrixTransposeDFF, transposeMatZZ
369 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4) 1.6 Ghz
86 cycles, MatrixTransposeZ, testMatX 4x4- last to first
87 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
88 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
90 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
91 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
92 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
97 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
99 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
272 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
315 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
318 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
323 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
324 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
369 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
402 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
664 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
********** END **********
:biggrin:
223 cycles, MatrixTransposeZ, transposeMatX
217 cycles, MatrixTransposeDD, transposeMatZ
219 cycles, MatrixTransposeDF, transposeMatZ
220 cycles, MatrixTransposeX, transposeMatX
290 cycles, MatrixTransposeZZ, transposeMatX
289 cycles, MatrixTransposeDDD, transposeMatZ
304 cycles, MatrixTransposeDFF, transposeMatZ
230 cycles, MatrixTransposeXX, transposeMatX
743 cycles, MatrixTransposeZ, transposeMatXX
746 cycles, MatrixTransposeDD, transposeMatZZ
743 cycles, MatrixTransposeDF, transposeMatZZ
780 cycles, MatrixTransposeX, transposeMatXX
1021 cycles, MatrixTransposeZZ, transposeMatXX
1016 cycles, MatrixTransposeDDD, transposeMatZZ
1055 cycles, MatrixTransposeDFF, transposeMatZZ
826 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz (SSE4)
217 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
219 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
220 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
223 cycles, MatrixTransposeZ, testMatX 4x4- last to first
230 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
289 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
290 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
304 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
743 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
743 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
746 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
780 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
826 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
1016 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1021 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
1055 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
i5 here too. Windows 8.1. :icon14:
100 cycles, MatrixTransposeZ, transposeMatX
101 cycles, MatrixTransposeDD, transposeMatZ
101 cycles, MatrixTransposeDF, transposeMatZ
79 cycles, MatrixTransposeX, transposeMatX
117 cycles, MatrixTransposeZZ, transposeMatX
98 cycles, MatrixTransposeDDD, transposeMatZ
123 cycles, MatrixTransposeDFF, transposeMatZ
102 cycles, MatrixTransposeXX, transposeMatX
368 cycles, MatrixTransposeZ, transposeMatXX
337 cycles, MatrixTransposeDD, transposeMatZZ
333 cycles, MatrixTransposeDF, transposeMatZZ
258 cycles, MatrixTransposeX, transposeMatXX
413 cycles, MatrixTransposeZZ, transposeMatXX
317 cycles, MatrixTransposeDDD, transposeMatZZ
375 cycles, MatrixTransposeDFF, transposeMatZZ
266 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)
79 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
98 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
100 cycles, MatrixTransposeZ, testMatX 4x4- last to first
101 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
101 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
102 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
117 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
123 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
258 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
266 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
317 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
333 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
337 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
368 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
375 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
413 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
********** END **********
intel i7-4930K Win 8.1
203 cycles, MatrixTransposeZ, transposeMatX
125 cycles, MatrixTransposeDD, transposeMatZ
123 cycles, MatrixTransposeDF, transposeMatZ
101 cycles, MatrixTransposeX, transposeMatX
133 cycles, MatrixTransposeZZ, transposeMatX
117 cycles, MatrixTransposeDDD, transposeMatZ
139 cycles, MatrixTransposeDFF, transposeMatZ
103 cycles, MatrixTransposeXX, transposeMatX
384 cycles, MatrixTransposeZ, transposeMatXX
398 cycles, MatrixTransposeDD, transposeMatZZ
380 cycles, MatrixTransposeDF, transposeMatZZ
318 cycles, MatrixTransposeX, transposeMatXX
430 cycles, MatrixTransposeZZ, transposeMatXX
380 cycles, MatrixTransposeDDD, transposeMatZZ
465 cycles, MatrixTransposeDFF, transposeMatZZ
330 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
101 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
103 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
117 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
123 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
125 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
133 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
139 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
203 cycles, MatrixTransposeZ, testMatX 4x4- last to first
318 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
330 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
380 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
380 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
384 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
398 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
430 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
465 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
248 cycles, MatrixTransposeZ, transposeMatX
203 cycles, MatrixTransposeDD, transposeMatZ
225 cycles, MatrixTransposeDF, transposeMatZ
192 cycles, MatrixTransposeX, transposeMatX
255 cycles, MatrixTransposeZZ, transposeMatX
226 cycles, MatrixTransposeDDD, transposeMatZ
256 cycles, MatrixTransposeDFF, transposeMatZ
190 cycles, MatrixTransposeXX, transposeMatX
752 cycles, MatrixTransposeZ, transposeMatXX
668 cycles, MatrixTransposeDD, transposeMatZZ
727 cycles, MatrixTransposeDF, transposeMatZZ
619 cycles, MatrixTransposeX, transposeMatXX
830 cycles, MatrixTransposeZZ, transposeMatXX
718 cycles, MatrixTransposeDDD, transposeMatZZ
831 cycles, MatrixTransposeDFF, transposeMatZZ
621 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)
190 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
192 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
203 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
225 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
226 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
248 cycles, MatrixTransposeZ, testMatX 4x4- last to first
255 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
256 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
619 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
621 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
668 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
718 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
727 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
752 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
830 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
831 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
Hi all, Thank you for showing your results (they are below). :t
For now, i want to point out the definition of the matrices used in this test. Note that behind each name we have a BYTE (if X) or DWORD (if Z) for the number of LINES and behind, for the number of columns. The procedures ...X , ...Z , ...DD , ...DF use EDI - push edi/pop edi The procedures ...XX, ...ZZ, ...DDD, ...DFF dont use EDI - but they are the same as above In the procedures ...X and ...XX we need to pass an address of a sequence of DWORDS - treated as real4 values - and the number of Lines and the number of Columns... I am using it here only for test purposes. :icon14: Quote
DEFINITIONS OF MATRICES
--------------------------------
MAXLINES_X equ 4 ; number of lines
MAXCOLUMNS_X equ 4 ; number of columns
MAXDWORDS_X equ MAXLINES_X * MAXCOLUMNS_X
;-------------------------------------------------------
; testMatX
;=======================================================
db MAXCOLUMNS_X ; <<< is BYTE
db MAXLINES_X ; <<< is BYTE
testMatX dd 11.0,12.0,13.0,14.0 ; line 1
dd 21.0,22.0,23.0,24.0 ; +16 line 2
dd 31.0,32.0,33.0,34.0 ; +32 line 3
dd 41.0,42.0,43.0,44.0 ; +48 line 4
;-------------------------------------------------------
; testMatZ
;=======================================================
ALIGN 16
dd ?
dd ?
dd MAXCOLUMNS_X ; <<< is DWORD
dd MAXLINES_X ; <<< is DWORD
testMatZ dd 11.1, 12.2, 13.3, 14.4 ; line 1
dd 21.1, 22.2, 23.3, 24.4 ; line 2
dd 31.1, 32.2, 33.3, 34.4 ; line 3
dd 41.1, 42.2, 43.3, 44.4 ; line 4
;=======================================================
MAXLINES_Z equ 7 ; number of lines
MAXCOLUMNS_Z equ 8 ; number of columns
MAXDWORDS_Z equ MAXLINES_Z * MAXCOLUMNS_Z
;-------------------------------------------------------
; testMatXX
;=======================================================
db MAXCOLUMNS_Z ; <<< is BYTE
db MAXLINES_Z ; <<< is BYTE
testMatXX dd 11.1, 12.2, 13.3, 14.4, 15.5, 16.6, 17.7, 18.8 ; line 1
dd 21.1, 22.2, 23.3, 24.4, 25.5, 26.6, 27.7, 28.8 ; line 2
dd 31.1, 32.2, 33.3, 34.4, 35.5, 36.6, 37.7, 38.8 ; line 3
dd 41.1, 42.2, 43.3, 44.4, 45.5, 46.6, 47.7, 48.8 ; line 4
dd 51.1, 52.2, 53.3, 54.4, 55.5, 56.6, 57.7, 58.8 ; line 5
dd 61.1, 62.2, 63.3, 64.4, 65.5, 66.6, 67.7, 68.8 ; line 6
dd 71.1, 72.2, 73.3, 74.4, 75.5, 76.6, 77.7, 78.8 ; line 7
;-------------------------------------------------------
; testMatZZ
;=======================================================
ALIGN 16
dd ?
dd ?
dd MAXCOLUMNS_Z ; <<< is DWORD
dd MAXLINES_Z ; <<< is DWORD
testMatZZ dd 11.1, 12.2, 13.3, 14.4, 15.5, 16.6, 17.7, 18.8 ; line 1
dd 21.1, 22.2, 23.3, 24.4, 25.5, 26.6, 27.7, 28.8 ; line 2
dd 31.1, 32.2, 33.3, 34.4, 35.5, 36.6, 37.7, 38.8 ; line 3
dd 41.1, 42.2, 43.3, 44.4, 45.5, 46.6, 47.7, 48.8 ; line 4
dd 51.1, 52.2, 53.3, 54.4, 55.5, 56.6, 57.7, 58.8 ; line 5
dd 61.1, 62.2, 63.3, 64.4, 65.5, 66.6, 67.7, 68.8 ; line 6
dd 71.1, 72.2, 73.3, 74.4, 75.5, 76.6, 77.7, 78.8 ; line 7
All results Quote
RuiLoureiro:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
399 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
411 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
414 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
417 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
434 cycles, MatrixTransposeZ, testMatX 4x4- last to first
470 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
476 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
540 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
1156 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
1170 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
1311 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
1322 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
1326 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
1548 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1610 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
1620 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
********** END **********
FORTRANS:
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
221 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
222 cycles, MatrixTransposeZ, testMatX 4x4- last to first
249 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
249 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
251 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
282 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
286 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
337 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
722 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
725 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
828 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
868 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
872 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
944 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
952 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
1050 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
Jochen:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
109 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
115 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
118 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
140 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
145 cycles, MatrixTransposeZ, testMatX 4x4- last to first
192 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
200 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
264 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
300 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
310 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
316 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
329 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
332 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
378 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
387 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
402 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
********** END **********
Felipe:
Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz (SSE4)
217 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
219 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
220 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
223 cycles, MatrixTransposeZ, testMatX 4x4- last to first
230 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
289 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
290 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
304 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
743 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
743 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
746 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
780 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
826 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
1016 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1021 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
1055 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
aw27:
Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz (SSE4)
68 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
73 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
84 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
87 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
87 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
94 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
97 cycles, MatrixTransposeZ, testMatX 4x4- last to first
103 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
225 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
231 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
243 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
249 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
255 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
284 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
286 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
289 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
********** END **********
LiaoMi:
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)
79 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
98 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
100 cycles, MatrixTransposeZ, testMatX 4x4- last to first
101 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
101 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
102 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
117 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
123 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
258 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
266 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
317 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
333 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
337 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
368 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
375 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
413 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
********** END **********
Siekmanski:
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
101 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
103 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
117 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
123 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
125 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
133 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
139 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
203 cycles, MatrixTransposeZ, testMatX 4x4- last to first
318 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
330 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
380 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
380 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
384 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
398 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
430 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
465 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
zedd151:
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4) 1.6 Ghz
86 cycles, MatrixTransposeZ, testMatX 4x4- last to first
87 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
88 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
90 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
91 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
92 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
97 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
99 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
272 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
315 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
318 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
323 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
324 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
369 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
402 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
664 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
********** END **********
HSE:
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)
190 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
192 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
203 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
225 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
226 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
248 cycles, MatrixTransposeZ, testMatX 4x4- last to first
255 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
256 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
619 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
621 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
668 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
718 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
727 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
752 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
830 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
831 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
There was some work done on matrix transposing using SSE instructions http://masm32.com/board/index.php?topic=6140.0.
A quick test shows that it is a few times faster and the bigger the matrix the faster it is. No wonders, of course.
Hi,
Windows 2000; "Is not a valid Win32 application." In the past,
some of these would work after reassembling.
Windows XP:
222 cycles, MatrixTransposeZ, transposeMatX
221 cycles, MatrixTransposeDD, transposeMatZ
249 cycles, MatrixTransposeDF, transposeMatZ
251 cycles, MatrixTransposeX, transposeMatX
282 cycles, MatrixTransposeZZ, transposeMatX
286 cycles, MatrixTransposeDDD, transposeMatZ
337 cycles, MatrixTransposeDFF, transposeMatZ
249 cycles, MatrixTransposeXX, transposeMatX
725 cycles, MatrixTransposeZ, transposeMatXX
722 cycles, MatrixTransposeDD, transposeMatZZ
828 cycles, MatrixTransposeDF, transposeMatZZ
872 cycles, MatrixTransposeX, transposeMatXX
952 cycles, MatrixTransposeZZ, transposeMatXX
944 cycles, MatrixTransposeDDD, transposeMatZZ
1050 cycles, MatrixTransposeDFF, transposeMatZZ
868 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
221 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
222 cycles, MatrixTransposeZ, testMatX 4x4- last to first
249 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
249 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
251 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
282 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
286 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
337 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
722 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
725 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
828 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
868 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
872 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
944 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
952 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
1050 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
Cheers,
Steve N.
Hi,
Quote from: FORTRANS on May 07, 2018, 11:09:05 PM
Windows 2000; "Is not a valid Win32 application." In the past,
some of these would work after reassembling.
Well, tried that and it worked...
Win2k:
274 cycles, MatrixTransposeZ, transposeMatX
252 cycles, MatrixTransposeDD, transposeMatZ
258 cycles, MatrixTransposeDF, transposeMatZ
273 cycles, MatrixTransposeX, transposeMatX
288 cycles, MatrixTransposeZZ, transposeMatX
280 cycles, MatrixTransposeDDD, transposeMatZ
301 cycles, MatrixTransposeDFF, transposeMatZ
272 cycles, MatrixTransposeXX, transposeMatX
986 cycles, MatrixTransposeZ, transposeMatXX
963 cycles, MatrixTransposeDD, transposeMatZZ
943 cycles, MatrixTransposeDF, transposeMatZZ
1005 cycles, MatrixTransposeX, transposeMatXX
1048 cycles, MatrixTransposeZZ, transposeMatXX
1032 cycles, MatrixTransposeDDD, transposeMatZZ
1165 cycles, MatrixTransposeDFF, transposeMatZZ
999 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
(SSE1)
252 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
258 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
272 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
273 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
274 cycles, MatrixTransposeZ, testMatX 4x4- last to first
280 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
288 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
301 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
943 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
963 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
986 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
999 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
1005 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
1032 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1048 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
1165 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
Win98:
904 cycles, MatrixTransposeZ, transposeMatX
640 cycles, MatrixTransposeDD, transposeMatZ
639 cycles, MatrixTransposeDF, transposeMatZ
914 cycles, MatrixTransposeX, transposeMatX
1014 cycles, MatrixTransposeZZ, transposeMatX
687 cycles, MatrixTransposeDDD, transposeMatZ
692 cycles, MatrixTransposeDFF, transposeMatZ
869 cycles, MatrixTransposeXX, transposeMatX
3474 cycles, MatrixTransposeZ, transposeMatXX
2164 cycles, MatrixTransposeDD, transposeMatZZ
2181 cycles, MatrixTransposeDF, transposeMatZZ
3278 cycles, MatrixTransposeX, transposeMatXX
3595 cycles, MatrixTransposeZZ, transposeMatXX
2322 cycles, MatrixTransposeDDD, transposeMatZZ
2361 cycles, MatrixTransposeDFF, transposeMatZZ
3137 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
639 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
640 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
687 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
692 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
869 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
904 cycles, MatrixTransposeZ, testMatX 4x4- last to first
914 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
1014 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
2164 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
2181 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
2322 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
2361 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
3137 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
3278 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
3474 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
3595 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
********** END **********
Regards,
Steve N.
Quote from: aw27 on May 07, 2018, 08:03:04 PM
There was some work done on matrix transposing using SSE instructions http://masm32.com/board/index.php?topic=6140.0 (http://masm32.com/board/index.php?topic=6140.0).
A quick test shows that it is a few times faster and the bigger the matrix the faster it is. No wonders, of course.
Hi
Yes, in general, using SSE instructions and aligned data the code is faster.
This is the case.
:icon14:
EDIT:These are my results:Quote
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
180 cycles, MatrixTransposeMO, testMatZ 4x4
182 cycles, MatrixTransposeAW, testMatZ 4x4, Lin, Col
398 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
404 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
405 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
413 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
426 cycles, MatrixTransposeZ, testMatX 4x4- last to first
458 cycles, MatrixTransposeMO, testMatZZ 7x8
462 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
468 cycles, MatrixTransposeAW, testMatZZ 7x8, Lin, Col
487 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
528 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
1153 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
1171 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
1280 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
1294 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
1408 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
1531 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
1563 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1620 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
Note: MatrixTransposeAW is your version
MatrixTransposeMO is your modified version that i did where line and column
is behind the address in both matrices
Hello Rui,
There are significant differences between your code results and my code results:
These are your code results:
97 cycles, MatrixTransposeZ, transposeMatX
103 cycles, MatrixTransposeDD, transposeMatZ
87 cycles, MatrixTransposeDF, transposeMatZ
73 cycles, MatrixTransposeX, transposeMatX
87 cycles, MatrixTransposeZZ, transposeMatX
94 cycles, MatrixTransposeDDD, transposeMatZ
84 cycles, MatrixTransposeDFF, transposeMatZ
68 cycles, MatrixTransposeXX, transposeMatX
249 cycles, MatrixTransposeZ, transposeMatXX
255 cycles, MatrixTransposeDD, transposeMatZZ
243 cycles, MatrixTransposeDF, transposeMatZZ
225 cycles, MatrixTransposeX, transposeMatXX
289 cycles, MatrixTransposeZZ, transposeMatXX
284 cycles, MatrixTransposeDDD, transposeMatZZ
286 cycles, MatrixTransposeDFF, transposeMatZZ
231 cycles, MatrixTransposeXX, transposeMatXX
*** STOP. Press any key to show the Time Table ***
***** Time table *****
Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz (SSE4)
68 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
73 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
84 cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
87 cycles, MatrixTransposeZZ, testMatX 4x4- last to first
87 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
94 cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
97 cycles, MatrixTransposeZ, testMatX 4x4- last to first
103 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
225 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
231 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
243 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
249 cycles, MatrixTransposeZ, testMatXX 7x8- last to first
255 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
284 cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
286 cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
289 cycles, MatrixTransposeZZ, testMatXX 7x8- last to first
********** END **********
These are my code results (I reused your test setup):
32 cycles, MatrixTransposeAW, transposeMatX
81 cycles, MatrixTransposeAW, transposeMatXX
It takes 1/3 the number of cycles.
Hello aw27, I think that your results are reasonable, particularly in your i5 (see all reply #7). Using SSE instructions and aligned data, the code is faster in general. If you cannot use SSE instructions ... But using your trans32.asm in transposeTest folder i got this in my Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3):Quote
245 cycles, MatrixTransposeAW, transposeMatX ; 3* = 735
661 cycles, MatrixTransposeAW, transposeMatXX ; 3* =1983
Some previous results:Quote
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
398 cycles, MatrixTransposeXX, testMatX 4x4, Lin, Col - last to first
404 cycles, MatrixTransposeDD, testMatZ 4x4- last to first
405 cycles, MatrixTransposeX, testMatX 4x4, Lin, Col - last to first
413 cycles, MatrixTransposeDF, testMatZ 4x4- first to last
1153 cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
1171 cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
1280 cycles, MatrixTransposeXX, testMatXX 7x8, Lin, Col - last to first
1294 cycles, MatrixTransposeX, testMatXX 7x8, Lin, Col - last to first
Hi all, I wrote some procedures using SSE instructions (and Siekmanski proc to transpose 4x4 :t ). This work is based on simple concepts and math for kids :biggrin: .
You may test in your CPU. If you want you may show your results.See you :icon14:
Some results for SSE12 using AW as a reference: Quote
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
68 cycles, MatrixTransposeSSE12, testMatX 4x4
187 cycles, MatrixTransposeAW, testMatX 4x4, Lin, Col ; +119 cycles
99 cycles, MatrixTransposeSSE12, testMatS 4x8
245 cycles, MatrixTransposeAW, testMatS 4x8, Lin, Col ; +146 cycles
119 cycles, MatrixTransposeSSE12, testMatR 8x4
257 cycles, MatrixTransposeAW, testMatR 8x4, Lin, Col ; +138 cycles
192 cycles, MatrixTransposeSSE12, testMatV 4x2
219 cycles, MatrixTransposeAW, testMatV 4x2, Lin, Col ; +27 cycles
194 cycles, MatrixTransposeSSE12, testMatY 2x4
204 cycles, MatrixTransposeAW, testMatY 2x4, Lin, Col ; +10 cycles
255 cycles, MatrixTransposeSSE12, testMatW 8x7
477 cycles, MatrixTransposeAW, testMatW 8x7, Lin, Col ; +222 cycles
376 cycles, MatrixTransposeSSE12, testMatZ 7x8
611 cycles, MatrixTransposeAW, testMatZ 7x8, Lin, Col ; +235 cycles
478 cycles, MatrixTransposeSSE12, testMatT 7x7
552 cycles, MatrixTransposeAW, testMatT 7x7, Lin, Col ; +74 cycles
530 cycles, MatrixTransposeSSE12, testMatQ 12x12
794 cycles, MatrixTransposeAW, testMatQ 12x12, Lin, Col ; +264 cycles
So you copied my work like chinese do, but you mentioned it was based on Siekmanski's. I also used the Siekmanski SSE algo, although my original one was almost as fast. So where the difference comes from? Where is your source code?
Your results are obviously fabricated, even tested in a computer nobody uses anymore.
30 cycles, MatrixTransposeAW, transposeMatX
34 cycles, MatrixTransposeMO, transposeMatX
58 cycles, MatrixTransposeAW, transposeMatY
60 cycles, MatrixTransposeMO, transposeMatY
59 cycles, MatrixTransposeAW, transposeMatV
53 cycles, MatrixTransposeMO, transposeMatV
107 cycles, MatrixTransposeAW, transposeMatZ
105 cycles, MatrixTransposeMO, transposeMatZ
124 cycles, MatrixTransposeAW, transposeMatW
116 cycles, MatrixTransposeMO, transposeMatW
132 cycles, MatrixTransposeAW, transposeMatQ
135 cycles, MatrixTransposeMO, transposeMatQ
50 cycles, MatrixTransposeAW, transposeMatR
45 cycles, MatrixTransposeMO, transposeMatR
44 cycles, MatrixTransposeAW, transposeMatS
37 cycles, MatrixTransposeMO, transposeMatS
156 cycles, MatrixTransposeAW, transposeMatT
151 cycles, MatrixTransposeMO, transposeMatT
22 cycles, MatrixTransposeSSE14, transposeMatX
37 cycles, MatrixTransposeSSE14, transposeMatY
41 cycles, MatrixTransposeSSE14, transposeMatV
56 cycles, MatrixTransposeSSE14, transposeMatZ
58 cycles, MatrixTransposeSSE14, transposeMatW
125 cycles, MatrixTransposeSSE14, transposeMatQ
36 cycles, MatrixTransposeSSE14, transposeMatR
33 cycles, MatrixTransposeSSE14, transposeMatS
66 cycles, MatrixTransposeSSE14, transposeMatT
23 cycles, MatrixTransposeSSE15, transposeMatX
41 cycles, MatrixTransposeSSE15, transposeMatY
43 cycles, MatrixTransposeSSE15, transposeMatV
56 cycles, MatrixTransposeSSE15, transposeMatZ
61 cycles, MatrixTransposeSSE15, transposeMatW
123 cycles, MatrixTransposeSSE15, transposeMatQ
37 cycles, MatrixTransposeSSE15, transposeMatR
33 cycles, MatrixTransposeSSE15, transposeMatS
61 cycles, MatrixTransposeSSE15, transposeMatT
25 cycles, MatrixTransposeSSE16, transposeMatX
35 cycles, MatrixTransposeSSE16, transposeMatY
42 cycles, MatrixTransposeSSE16, transposeMatV
64 cycles, MatrixTransposeSSE16, transposeMatZ
68 cycles, MatrixTransposeSSE16, transposeMatW
139 cycles, MatrixTransposeSSE16, transposeMatQ
41 cycles, MatrixTransposeSSE16, transposeMatR
40 cycles, MatrixTransposeSSE16, transposeMatS
72 cycles, MatrixTransposeSSE16, transposeMatT
25 cycles, MatrixTransposeSSE17, transposeMatX
37 cycles, MatrixTransposeSSE17, transposeMatY
40 cycles, MatrixTransposeSSE17, transposeMatV
65 cycles, MatrixTransposeSSE17, transposeMatZ
67 cycles, MatrixTransposeSSE17, transposeMatW
153 cycles, MatrixTransposeSSE17, transposeMatQ
41 cycles, MatrixTransposeSSE17, transposeMatR
38 cycles, MatrixTransposeSSE17, transposeMatS
70 cycles, MatrixTransposeSSE17, transposeMatT
*** STOP. Press any key to show the Time Table ***
***** Time table *****
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4)
22 cycles, MatrixTransposeSSE14, testMatX 4x4, Lin, Col
23 cycles, MatrixTransposeSSE15, testMatX 4x4, Lin, Col
25 cycles, MatrixTransposeSSE17, testMatX 4x4, Lin, Col
25 cycles, MatrixTransposeSSE16, testMatX 4x4, Lin, Col
30 cycles, MatrixTransposeAW, testMatX 4x4, Lin, Col
33 cycles, MatrixTransposeSSE15, testMatS 4x8, Lin, Col
33 cycles, MatrixTransposeSSE14, testMatS 4x8, Lin, Col
34 cycles, MatrixTransposeMO, testMatX 4x4
35 cycles, MatrixTransposeSSE16, testMatY 2x4, Lin, Col
36 cycles, MatrixTransposeSSE14, testMatR 8x4, Lin, Col
37 cycles, MatrixTransposeSSE15, testMatR 8x4, Lin, Col
37 cycles, MatrixTransposeMO, testMatS 4x8
37 cycles, MatrixTransposeSSE17, testMatY 2x4, Lin, Col
37 cycles, MatrixTransposeSSE14, testMatY 2x4, Lin, Col
38 cycles, MatrixTransposeSSE17, testMatS 4x8, Lin, Col
40 cycles, MatrixTransposeSSE17, testMatV 4x2, Lin, Col
40 cycles, MatrixTransposeSSE16, testMatS 4x8, Lin, Col
41 cycles, MatrixTransposeSSE16, testMatR 8x4, Lin, Col
41 cycles, MatrixTransposeSSE15, testMatY 2x4, Lin, Col
41 cycles, MatrixTransposeSSE14, testMatV 4x2, Lin, Col
41 cycles, MatrixTransposeSSE17, testMatR 8x4, Lin, Col
42 cycles, MatrixTransposeSSE16, testMatV 4x2, Lin, Col
43 cycles, MatrixTransposeSSE15, testMatV 4x2, Lin, Col
44 cycles, MatrixTransposeAW, testMatS 4x8, Lin, Col
45 cycles, MatrixTransposeMO, testMatR 8x4
50 cycles, MatrixTransposeAW, testMatR 8x4, Lin, Col
53 cycles, MatrixTransposeMO, testMatV 4x2
56 cycles, MatrixTransposeSSE15, testMatZ 7x8, Lin, Col
56 cycles, MatrixTransposeSSE14, testMatZ 7x8, Lin, Col
58 cycles, MatrixTransposeSSE14, testMatW 8x7, Lin, Col
58 cycles, MatrixTransposeAW, testMatY 2x4, Lin, Col
59 cycles, MatrixTransposeAW, testMatV 4x2, Lin, Col
60 cycles, MatrixTransposeMO, testMatY 2x4
61 cycles, MatrixTransposeSSE15, testMatW 8x7, Lin, Col
61 cycles, MatrixTransposeSSE15, testMatT 7x7, Lin, Col
64 cycles, MatrixTransposeSSE16, testMatZ 7x8, Lin, Col
65 cycles, MatrixTransposeSSE17, testMatZ 7x8, Lin, Col
66 cycles, MatrixTransposeSSE14, testMatT 7x7, Lin, Col
67 cycles, MatrixTransposeSSE17, testMatW 8x7, Lin, Col
68 cycles, MatrixTransposeSSE16, testMatW 8x7, Lin, Col
70 cycles, MatrixTransposeSSE17, testMatT 7x7, Lin, Col
72 cycles, MatrixTransposeSSE16, testMatT 7x7, Lin, Col
105 cycles, MatrixTransposeMO, testMatZ 7x8
107 cycles, MatrixTransposeAW, testMatZ 7x8, Lin, Col
116 cycles, MatrixTransposeMO, testMatW 8x7
123 cycles, MatrixTransposeSSE15, testMatQ 12x12, Lin, Col
124 cycles, MatrixTransposeAW, testMatW 8x7, Lin, Col
125 cycles, MatrixTransposeSSE14, testMatQ 12x12, Lin, Col
132 cycles, MatrixTransposeAW, testMatQ 12x12, Lin, Col
135 cycles, MatrixTransposeMO, testMatQ 12x12
139 cycles, MatrixTransposeSSE16, testMatQ 12x12, Lin, Col
151 cycles, MatrixTransposeMO, testMatT 7x7
153 cycles, MatrixTransposeSSE17, testMatQ 12x12, Lin, Col
156 cycles, MatrixTransposeAW, testMatT 7x7, Lin, Col
********** END **********
Quote from: aw27 on May 15, 2018, 09:33:54 AM
So you copied my work like chinese do, but you mentioned it was based on Siekmanski's. I also used the Siekmanski SSE algo, although my original one was almost as fast. So where the difference comes from? Where is your source code?
Your results are obviously fabricated, even tested in a computer nobody uses anymore.
Hi aw27,
Sorry but you are not right, i dont need to use any part of your algorithm. I dont think like you do and didnt write my algos based on what you did. I know enough assembly and math to do what i think to do.
The calculator does matrix transpose ...
and all things written by me. All. Another different thing is to understand what is made. So it is correct to say that SSE2 to SSE17 procedures use that block of
Siekmanski code (but not as is). But is it the same you use ? Give you the answer.
To answer your question "So where the difference comes from?" i would say think about it again: what you have, what you want to get and a lot of different ways to solve that problem.Now i am still working on that issue, so i have a lot of work to do yet That's all.
See you
:icon14:
Thank you
zedd151 :t
By the way, your results are fabricated zedd151 ? So you get a lot of money ... ;)
Quote from: RuiLoureiro on May 15, 2018, 10:59:39 AMBy the way, your results are fabricated zedd151 ? So you get a lot of money ... ;)
What??? I was just testing the performance of my new netbook. :(
nothing fabricated :icon_confused:
Quote from: zedd151 on May 15, 2018, 11:13:35 AM
Quote from: RuiLoureiro on May 15, 2018, 10:59:39 AMBy the way, your results are fabricated zedd151 ? So you get a lot of money ... ;)
What??? I was just testing the performance of my new netbook. :(
nothing fabricated :icon_confused:
:t
Quote from: aw27 on May 15, 2018, 09:33:54 AM
Your results are obviously fabricated,...
Hi aw27,
About what you say above i have a question to you: Is it what you usually do ?
I never do it, i never dit it. No here, no in any part of the world. I am here, in this forum since 2007.
Have you any doubt ?
:icon14:
TestTranspose_cyclesSSE10_13
34 cycles, MatrixTransposeAW, transposeMatX
34 cycles, MatrixTransposeMO, transposeMatX
40 cycles, MatrixTransposeAW, transposeMatY
44 cycles, MatrixTransposeMO, transposeMatY
42 cycles, MatrixTransposeAW, transposeMatV
43 cycles, MatrixTransposeMO, transposeMatV
85 cycles, MatrixTransposeAW, transposeMatZ
84 cycles, MatrixTransposeMO, transposeMatZ
91 cycles, MatrixTransposeAW, transposeMatW
90 cycles, MatrixTransposeMO, transposeMatW
117 cycles, MatrixTransposeAW, transposeMatQ
119 cycles, MatrixTransposeMO, transposeMatQ
48 cycles, MatrixTransposeAW, transposeMatR
47 cycles, MatrixTransposeMO, transposeMatR
55 cycles, MatrixTransposeAW, transposeMatS
42 cycles, MatrixTransposeMO, transposeMatS
133 cycles, MatrixTransposeAW, transposeMatT
116 cycles, MatrixTransposeMO, transposeMatT
25 cycles, MatrixTransposeSSE10, transposeMatX
35 cycles, MatrixTransposeSSE10, transposeMatY
40 cycles, MatrixTransposeSSE10, transposeMatV
57 cycles, MatrixTransposeSSE10, transposeMatZ
59 cycles, MatrixTransposeSSE10, transposeMatW
104 cycles, MatrixTransposeSSE10, transposeMatQ
36 cycles, MatrixTransposeSSE10, transposeMatR
34 cycles, MatrixTransposeSSE10, transposeMatS
58 cycles, MatrixTransposeSSE10, transposeMatT
25 cycles, MatrixTransposeSSE11, transposeMatX
35 cycles, MatrixTransposeSSE11, transposeMatY
41 cycles, MatrixTransposeSSE11, transposeMatV
55 cycles, MatrixTransposeSSE11, transposeMatZ
59 cycles, MatrixTransposeSSE11, transposeMatW
106 cycles, MatrixTransposeSSE11, transposeMatQ
35 cycles, MatrixTransposeSSE11, transposeMatR
34 cycles, MatrixTransposeSSE11, transposeMatS
58 cycles, MatrixTransposeSSE11, transposeMatT
26 cycles, MatrixTransposeSSE12, transposeMatX
47 cycles, MatrixTransposeSSE12, transposeMatY
60 cycles, MatrixTransposeSSE12, transposeMatV
92 cycles, MatrixTransposeSSE12, transposeMatZ
90 cycles, MatrixTransposeSSE12, transposeMatW
126 cycles, MatrixTransposeSSE12, transposeMatQ
41 cycles, MatrixTransposeSSE12, transposeMatR
45 cycles, MatrixTransposeSSE12, transposeMatS
106 cycles, MatrixTransposeSSE12, transposeMatT
38 cycles, MatrixTransposeSSE13, transposeMatX
64 cycles, MatrixTransposeSSE13, transposeMatY
71 cycles, MatrixTransposeSSE13, transposeMatV
74 cycles, MatrixTransposeSSE13, transposeMatZ
82 cycles, MatrixTransposeSSE13, transposeMatW
147 cycles, MatrixTransposeSSE13, transposeMatQ
57 cycles, MatrixTransposeSSE13, transposeMatR
47 cycles, MatrixTransposeSSE13, transposeMatS
91 cycles, MatrixTransposeSSE13, transposeMatT
*** STOP. Press any key to show the Time Table ***
***** Time table *****
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)
25 cycles, MatrixTransposeSSE11, testMatX 4x4
25 cycles, MatrixTransposeSSE10, testMatX 4x4
26 cycles, MatrixTransposeSSE12, testMatX 4x4
34 cycles, MatrixTransposeSSE10, testMatS 4x8
34 cycles, MatrixTransposeSSE11, testMatS 4x8
34 cycles, MatrixTransposeMO, testMatX 4x4
34 cycles, MatrixTransposeAW, testMatX 4x4, Lin, Col
35 cycles, MatrixTransposeSSE11, testMatR 8x4
35 cycles, MatrixTransposeSSE11, testMatY 2x4
35 cycles, MatrixTransposeSSE10, testMatY 2x4
36 cycles, MatrixTransposeSSE10, testMatR 8x4
38 cycles, MatrixTransposeSSE13, testMatX 4x4
40 cycles, MatrixTransposeSSE10, testMatV 4x2
40 cycles, MatrixTransposeAW, testMatY 2x4, Lin, Col
41 cycles, MatrixTransposeSSE12, testMatR 8x4
41 cycles, MatrixTransposeSSE11, testMatV 4x2
42 cycles, MatrixTransposeAW, testMatV 4x2, Lin, Col
42 cycles, MatrixTransposeMO, testMatS 4x8
43 cycles, MatrixTransposeMO, testMatV 4x2
44 cycles, MatrixTransposeMO, testMatY 2x4
45 cycles, MatrixTransposeSSE12, testMatS 4x8
47 cycles, MatrixTransposeSSE13, testMatS 4x8
47 cycles, MatrixTransposeSSE12, testMatY 2x4
47 cycles, MatrixTransposeMO, testMatR 8x4
48 cycles, MatrixTransposeAW, testMatR 8x4, Lin, Col
55 cycles, MatrixTransposeSSE11, testMatZ 7x8
55 cycles, MatrixTransposeAW, testMatS 4x8, Lin, Col
57 cycles, MatrixTransposeSSE10, testMatZ 7x8
57 cycles, MatrixTransposeSSE13, testMatR 8x4
58 cycles, MatrixTransposeSSE11, testMatT 7x7
58 cycles, MatrixTransposeSSE10, testMatT 7x7
59 cycles, MatrixTransposeSSE11, testMatW 8x7
59 cycles, MatrixTransposeSSE10, testMatW 8x7
60 cycles, MatrixTransposeSSE12, testMatV 4x2
64 cycles, MatrixTransposeSSE13, testMatY 2x4
71 cycles, MatrixTransposeSSE13, testMatV 4x2
74 cycles, MatrixTransposeSSE13, testMatZ 7x8
82 cycles, MatrixTransposeSSE13, testMatW 8x7
84 cycles, MatrixTransposeMO, testMatZ 7x8
85 cycles, MatrixTransposeAW, testMatZ 7x8, Lin, Col
90 cycles, MatrixTransposeSSE12, testMatW 8x7
90 cycles, MatrixTransposeMO, testMatW 8x7
91 cycles, MatrixTransposeSSE13, testMatT 7x7
91 cycles, MatrixTransposeAW, testMatW 8x7, Lin, Col
92 cycles, MatrixTransposeSSE12, testMatZ 7x8
104 cycles, MatrixTransposeSSE10, testMatQ 12x12
106 cycles, MatrixTransposeSSE12, testMatT 7x7
106 cycles, MatrixTransposeSSE11, testMatQ 12x12
116 cycles, MatrixTransposeMO, testMatT 7x7
117 cycles, MatrixTransposeAW, testMatQ 12x12, Lin, Col
119 cycles, MatrixTransposeMO, testMatQ 12x12
126 cycles, MatrixTransposeSSE12, testMatQ 12x12
133 cycles, MatrixTransposeAW, testMatT 7x7, Lin, Col
147 cycles, MatrixTransposeSSE13, testMatQ 12x12
********** END **********
TestTranspose_cyclesSSE14_17
33 cycles, MatrixTransposeAW, transposeMatX
35 cycles, MatrixTransposeMO, transposeMatX
40 cycles, MatrixTransposeAW, transposeMatY
44 cycles, MatrixTransposeMO, transposeMatY
40 cycles, MatrixTransposeAW, transposeMatV
48 cycles, MatrixTransposeMO, transposeMatV
85 cycles, MatrixTransposeAW, transposeMatZ
84 cycles, MatrixTransposeMO, transposeMatZ
89 cycles, MatrixTransposeAW, transposeMatW
93 cycles, MatrixTransposeMO, transposeMatW
117 cycles, MatrixTransposeAW, transposeMatQ
116 cycles, MatrixTransposeMO, transposeMatQ
45 cycles, MatrixTransposeAW, transposeMatR
48 cycles, MatrixTransposeMO, transposeMatR
45 cycles, MatrixTransposeAW, transposeMatS
41 cycles, MatrixTransposeMO, transposeMatS
115 cycles, MatrixTransposeAW, transposeMatT
115 cycles, MatrixTransposeMO, transposeMatT
20 cycles, MatrixTransposeSSE14, transposeMatX
33 cycles, MatrixTransposeSSE14, transposeMatY
37 cycles, MatrixTransposeSSE14, transposeMatV
51 cycles, MatrixTransposeSSE14, transposeMatZ
54 cycles, MatrixTransposeSSE14, transposeMatW
100 cycles, MatrixTransposeSSE14, transposeMatQ
32 cycles, MatrixTransposeSSE14, transposeMatR
32 cycles, MatrixTransposeSSE14, transposeMatS
55 cycles, MatrixTransposeSSE14, transposeMatT
21 cycles, MatrixTransposeSSE15, transposeMatX
32 cycles, MatrixTransposeSSE15, transposeMatY
37 cycles, MatrixTransposeSSE15, transposeMatV
50 cycles, MatrixTransposeSSE15, transposeMatZ
54 cycles, MatrixTransposeSSE15, transposeMatW
97 cycles, MatrixTransposeSSE15, transposeMatQ
31 cycles, MatrixTransposeSSE15, transposeMatR
31 cycles, MatrixTransposeSSE15, transposeMatS
55 cycles, MatrixTransposeSSE15, transposeMatT
21 cycles, MatrixTransposeSSE16, transposeMatX
30 cycles, MatrixTransposeSSE16, transposeMatY
40 cycles, MatrixTransposeSSE16, transposeMatV
52 cycles, MatrixTransposeSSE16, transposeMatZ
57 cycles, MatrixTransposeSSE16, transposeMatW
106 cycles, MatrixTransposeSSE16, transposeMatQ
33 cycles, MatrixTransposeSSE16, transposeMatR
34 cycles, MatrixTransposeSSE16, transposeMatS
60 cycles, MatrixTransposeSSE16, transposeMatT
21 cycles, MatrixTransposeSSE17, transposeMatX
30 cycles, MatrixTransposeSSE17, transposeMatY
33 cycles, MatrixTransposeSSE17, transposeMatV
52 cycles, MatrixTransposeSSE17, transposeMatZ
57 cycles, MatrixTransposeSSE17, transposeMatW
108 cycles, MatrixTransposeSSE17, transposeMatQ
33 cycles, MatrixTransposeSSE17, transposeMatR
34 cycles, MatrixTransposeSSE17, transposeMatS
57 cycles, MatrixTransposeSSE17, transposeMatT
*** STOP. Press any key to show the Time Table ***
***** Time table *****
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)
20 cycles, MatrixTransposeSSE14, testMatX 4x4, Lin, Col
21 cycles, MatrixTransposeSSE16, testMatX 4x4, Lin, Col
21 cycles, MatrixTransposeSSE15, testMatX 4x4, Lin, Col
21 cycles, MatrixTransposeSSE17, testMatX 4x4, Lin, Col
30 cycles, MatrixTransposeSSE16, testMatY 2x4, Lin, Col
30 cycles, MatrixTransposeSSE17, testMatY 2x4, Lin, Col
31 cycles, MatrixTransposeSSE15, testMatS 4x8, Lin, Col
31 cycles, MatrixTransposeSSE15, testMatR 8x4, Lin, Col
32 cycles, MatrixTransposeSSE14, testMatS 4x8, Lin, Col
32 cycles, MatrixTransposeSSE14, testMatR 8x4, Lin, Col
32 cycles, MatrixTransposeSSE15, testMatY 2x4, Lin, Col
33 cycles, MatrixTransposeSSE17, testMatR 8x4, Lin, Col
33 cycles, MatrixTransposeSSE17, testMatV 4x2, Lin, Col
33 cycles, MatrixTransposeSSE14, testMatY 2x4, Lin, Col
33 cycles, MatrixTransposeSSE16, testMatR 8x4, Lin, Col
33 cycles, MatrixTransposeAW, testMatX 4x4, Lin, Col
34 cycles, MatrixTransposeSSE17, testMatS 4x8, Lin, Col
34 cycles, MatrixTransposeSSE16, testMatS 4x8, Lin, Col
35 cycles, MatrixTransposeMO, testMatX 4x4
37 cycles, MatrixTransposeSSE15, testMatV 4x2, Lin, Col
37 cycles, MatrixTransposeSSE14, testMatV 4x2, Lin, Col
40 cycles, MatrixTransposeAW, testMatY 2x4, Lin, Col
40 cycles, MatrixTransposeSSE16, testMatV 4x2, Lin, Col
40 cycles, MatrixTransposeAW, testMatV 4x2, Lin, Col
41 cycles, MatrixTransposeMO, testMatS 4x8
44 cycles, MatrixTransposeMO, testMatY 2x4
45 cycles, MatrixTransposeAW, testMatR 8x4, Lin, Col
45 cycles, MatrixTransposeAW, testMatS 4x8, Lin, Col
48 cycles, MatrixTransposeMO, testMatV 4x2
48 cycles, MatrixTransposeMO, testMatR 8x4
50 cycles, MatrixTransposeSSE15, testMatZ 7x8, Lin, Col
51 cycles, MatrixTransposeSSE14, testMatZ 7x8, Lin, Col
52 cycles, MatrixTransposeSSE16, testMatZ 7x8, Lin, Col
52 cycles, MatrixTransposeSSE17, testMatZ 7x8, Lin, Col
54 cycles, MatrixTransposeSSE15, testMatW 8x7, Lin, Col
54 cycles, MatrixTransposeSSE14, testMatW 8x7, Lin, Col
55 cycles, MatrixTransposeSSE14, testMatT 7x7, Lin, Col
55 cycles, MatrixTransposeSSE15, testMatT 7x7, Lin, Col
57 cycles, MatrixTransposeSSE17, testMatT 7x7, Lin, Col
57 cycles, MatrixTransposeSSE17, testMatW 8x7, Lin, Col
57 cycles, MatrixTransposeSSE16, testMatW 8x7, Lin, Col
60 cycles, MatrixTransposeSSE16, testMatT 7x7, Lin, Col
84 cycles, MatrixTransposeMO, testMatZ 7x8
85 cycles, MatrixTransposeAW, testMatZ 7x8, Lin, Col
89 cycles, MatrixTransposeAW, testMatW 8x7, Lin, Col
93 cycles, MatrixTransposeMO, testMatW 8x7
97 cycles, MatrixTransposeSSE15, testMatQ 12x12, Lin, Col
100 cycles, MatrixTransposeSSE14, testMatQ 12x12, Lin, Col
106 cycles, MatrixTransposeSSE16, testMatQ 12x12, Lin, Col
108 cycles, MatrixTransposeSSE17, testMatQ 12x12, Lin, Col
115 cycles, MatrixTransposeMO, testMatT 7x7
115 cycles, MatrixTransposeAW, testMatT 7x7, Lin, Col
116 cycles, MatrixTransposeMO, testMatQ 12x12
117 cycles, MatrixTransposeAW, testMatQ 12x12, Lin, Col
********** END **********
Hi LiaoMi,
Thanks for your work :t
:icon14:
Hi all,HERE are all results so far:note: AW procedure is used as a reference (i dont know any other). I am using SSE14 and SSE10 but you may do the list for all other.Good luckQuote
RuiLoureiro:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
85 cycles, MatrixTransposeSSE14, testMatX 4x4, Lin, Col
186 cycles, MatrixTransposeAW, testMatX 4x4, Lin, Col ; +101 cycles
133 cycles, MatrixTransposeSSE14, testMatS 4x8, Lin, Col
233 cycles, MatrixTransposeAW, testMatS 4x8, Lin, Col ; +100 cycles
135 cycles, MatrixTransposeSSE14, testMatR 8x4, Lin, Col
256 cycles, MatrixTransposeAW, testMatR 8x4, Lin, Col ; +121 cycles
178 cycles, MatrixTransposeSSE14, testMatY 2x4, Lin, Col
203 cycles, MatrixTransposeAW, testMatY 2x4, Lin, Col ; +25 cycles
185 cycles, MatrixTransposeSSE14, testMatV 4x2, Lin, Col
219 cycles, MatrixTransposeAW, testMatV 4x2, Lin, Col ; +34 cycles
279 cycles, MatrixTransposeSSE14, testMatW 8x7, Lin, Col
465 cycles, MatrixTransposeAW, testMatW 8x7, Lin, Col ; +186 cycles
280 cycles, MatrixTransposeSSE14, testMatT 7x7, Lin, Col
564 cycles, MatrixTransposeAW, testMatT 7x7, Lin, Col ; +284 cycles
397 cycles, MatrixTransposeSSE14, testMatZ 7x8, Lin, Col
473 cycles, MatrixTransposeAW, testMatZ 7x8, Lin, Col ; +76 cycles
560 cycles, MatrixTransposeSSE14, testMatQ 12x12, Lin, Col
814 cycles, MatrixTransposeAW, testMatQ 12x12, Lin, Col ; +254 cycles
zedd151:
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4)
22 cycles, MatrixTransposeSSE14, testMatX 4x4, Lin, Col
30 cycles, MatrixTransposeAW, testMatX 4x4, Lin, Col ; +8 cycles
33 cycles, MatrixTransposeSSE14, testMatS 4x8, Lin, Col
44 cycles, MatrixTransposeAW, testMatS 4x8, Lin, Col ; +11 cycles
36 cycles, MatrixTransposeSSE14, testMatR 8x4, Lin, Col
50 cycles, MatrixTransposeAW, testMatR 8x4, Lin, Col ; +14 cycles
37 cycles, MatrixTransposeSSE14, testMatY 2x4, Lin, Col
58 cycles, MatrixTransposeAW, testMatY 2x4, Lin, Col ; +21 cycles
41 cycles, MatrixTransposeSSE14, testMatV 4x2, Lin, Col
59 cycles, MatrixTransposeAW, testMatV 4x2, Lin, Col ; +18 cycles
56 cycles, MatrixTransposeSSE14, testMatZ 7x8, Lin, Col
107 cycles, MatrixTransposeAW, testMatZ 7x8, Lin, Col ; +51 cycles
58 cycles, MatrixTransposeSSE14, testMatW 8x7, Lin, Col
124 cycles, MatrixTransposeAW, testMatW 8x7, Lin, Col ; +66 cycles
66 cycles, MatrixTransposeSSE14, testMatT 7x7, Lin, Col
156 cycles, MatrixTransposeAW, testMatT 7x7, Lin, Col ; +90 cycles
125 cycles, MatrixTransposeSSE14, testMatQ 12x12, Lin, Col
132 cycles, MatrixTransposeAW, testMatQ 12x12, Lin, Col ; +7 cycles
LiaoMi:
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)
20 cycles, MatrixTransposeSSE14, testMatX 4x4, Lin, Col
33 cycles, MatrixTransposeAW, testMatX 4x4, Lin, Col ; +13 cycles
32 cycles, MatrixTransposeSSE14, testMatS 4x8, Lin, Col
45 cycles, MatrixTransposeAW, testMatS 4x8, Lin, Col ; +13 cycles
32 cycles, MatrixTransposeSSE14, testMatR 8x4, Lin, Col
45 cycles, MatrixTransposeAW, testMatR 8x4, Lin, Col ; +13 cycles
33 cycles, MatrixTransposeSSE14, testMatY 2x4, Lin, Col
40 cycles, MatrixTransposeAW, testMatY 2x4, Lin, Col ; +7 cycles
37 cycles, MatrixTransposeSSE14, testMatV 4x2, Lin, Col
40 cycles, MatrixTransposeAW, testMatV 4x2, Lin, Col ; +3 cycles
51 cycles, MatrixTransposeSSE14, testMatZ 7x8, Lin, Col
85 cycles, MatrixTransposeAW, testMatZ 7x8, Lin, Col ; +34 cycles
54 cycles, MatrixTransposeSSE14, testMatW 8x7, Lin, Col
89 cycles, MatrixTransposeAW, testMatW 8x7, Lin, Col ; +35 cycles
55 cycles, MatrixTransposeSSE14, testMatT 7x7, Lin, Col
115 cycles, MatrixTransposeAW, testMatT 7x7, Lin, Col ; +60 cycles
100 cycles, MatrixTransposeSSE14, testMatQ 12x12, Lin, Col
117 cycles, MatrixTransposeAW, testMatQ 12x12, Lin, Col ; +17 cycles
RuiLoureiro:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
85 cycles, MatrixTransposeSSE10, testMatX 4x4
187 cycles, MatrixTransposeAW, testMatX 4x4, Lin, Col ; +102 cycles
136 cycles, MatrixTransposeSSE10, testMatS 4x8
245 cycles, MatrixTransposeAW, testMatS 4x8, Lin, Col ; +109 cycles
137 cycles, MatrixTransposeSSE10, testMatR 8x4
257 cycles, MatrixTransposeAW, testMatR 8x4, Lin, Col ; +120 cycles
184 cycles, MatrixTransposeSSE10, testMatY 2x4
204 cycles, MatrixTransposeAW, testMatY 2x4, Lin, Col ; +20 cycles
196 cycles, MatrixTransposeSSE10, testMatV 4x2
219 cycles, MatrixTransposeAW, testMatV 4x2, Lin, Col ; +23 cycles
280 cycles, MatrixTransposeSSE10, testMatW 8x7
477 cycles, MatrixTransposeAW, testMatW 8x7, Lin, Col ; +197 cycles
387 cycles, MatrixTransposeSSE10, testMatZ 7x8
611 cycles, MatrixTransposeAW, testMatZ 7x8, Lin, Col ; +224 cycles
478 cycles, MatrixTransposeSSE10, testMatT 7x7
552 cycles, MatrixTransposeAW, testMatT 7x7, Lin, Col ; +74 cycles
554 cycles, MatrixTransposeSSE10, testMatQ 12x12
794 cycles, MatrixTransposeAW, testMatQ 12x12, Lin, Col ; +240 cycles
LiaoMi:
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)
25 cycles, MatrixTransposeSSE10, testMatX 4x4
34 cycles, MatrixTransposeAW, testMatX 4x4, Lin, Col ; +9 cycles
34 cycles, MatrixTransposeSSE10, testMatS 4x8
55 cycles, MatrixTransposeAW, testMatS 4x8, Lin, Col ; +21 cycles
35 cycles, MatrixTransposeSSE10, testMatY 2x4
40 cycles, MatrixTransposeAW, testMatY 2x4, Lin, Col ; +5 cycles
36 cycles, MatrixTransposeSSE10, testMatR 8x4
48 cycles, MatrixTransposeAW, testMatR 8x4, Lin, Col ; +12 cycles
40 cycles, MatrixTransposeSSE10, testMatV 4x2
42 cycles, MatrixTransposeAW, testMatV 4x2, Lin, Col ; +2 cycles
57 cycles, MatrixTransposeSSE10, testMatZ 7x8
85 cycles, MatrixTransposeAW, testMatZ 7x8, Lin, Col ; +28 cycles
58 cycles, MatrixTransposeSSE10, testMatT 7x7
133 cycles, MatrixTransposeAW, testMatT 7x7, Lin, Col ; +75 cycles
59 cycles, MatrixTransposeSSE10, testMatW 8x7
91 cycles, MatrixTransposeAW, testMatW 8x7, Lin, Col ; +32 cycles
104 cycles, MatrixTransposeSSE10, testMatQ 12x12
117 cycles, MatrixTransposeAW, testMatQ 12x12, Lin, Col ; +13 cycles
Quote
Hi aw27,
About what you say above i have a question to you: Is it what you usually do ?
I never do it, i never dit it. No here, no in any part of the world. I am here, in this forum since 2007.
Have you any doubt ?
Of course, I have all doubts about you. You are using my nick and code in tests for which you don't even supply the source code. You look a bit snicky, so what are you hiding or what are you trying to prove?
Are you trying to prove that you are a smart little guy or do you want to compete with me?
Hi Rui,
The last results look really fast :t
Can't test on this dumbphone, though.
And of course, almost everybody here has full confidence in you, don't worry ;-)
Quote from: aw27 on May 16, 2018, 02:54:47 PM
Quote
Hi aw27,
About what you say above i have a question to you: Is it what you usually do ?
I never do it, i never dit it. No here, no in any part of the world. I am here, in this forum since 2007.
Have you any doubt ?
Of course, I have all doubts about you. You are using my nick and code in tests for which you don't even supply the source code. You look a bit snicky, so what are you hiding or what are you trying to prove?
Are you trying to prove that you are a smart little guy or do you want to compete with me?
:biggrin: https://www.youtube.com/watch?v=9LZ35Ar3r2k
Hi all,
When we are developing a new algorithm to solve some problem we need to study the problem first. In the first post i showed algorithms to transpose 1 dword by 1 dword at a time. Now the problem is to transpose blocks of 4 lines x 4 columns at a time using SSE instructions.
When we write the code to implement the algorithm, we need to do some tests to confirm it follows the algorithm correctly. After this we need to do a lot of tests to confirm if it is doing what
it should do. Here we may find out that the algorithm does not do some cases as it should do.
So we should not show it before it is completely tested.
AFAIK, it seems to me incorrect to use some code/proc written by a member X without any identification. The reason why i used some identification is this. There is no other reason behind it. And nothing is against anyone. When anyone show some test results they are only the results of that set of tests. No more than this.
aw27,
«Of course, I have all doubts about you...»
About what you say above, it seems that the only thing you want is to "see" the algorithm that
i didnt show for now. But i suppose that you know very well that anyone may solve any problem following some/many different ways. Your procedure doenst follow the algorithm that is behind the SSE procedures that i am writing and testing. I want to say also that i dont need to use
what i am writing. I do it to pass the time and because i like to study...
Cheers
Hi Rui,
Ever considered to use the video card for Matrix Transpose calculations?
The results are very fast for any size of Matrices and data types.
1 restriction, it can't be done in a console app.
Quote from: Siekmanski on May 19, 2018, 12:23:41 AM
Hi Rui,
Ever considered to use the video card for Matrix Transpose calculations?
The results are very fast for any size of Matrices and data types.
1 restriction, it can't be done in a console app.
Hi Siekmanski,
No. I use only Matrix Transpose calculations in my TheCalculator but it is REAL10
and all matrices up to 20x20 only or 20x21 i am not sure now (time is not a problem here).
Let me say that i think that you use it in the video card for Matrix Transpose calculations.
So you have your own algorithm to do it. I am saying it because i read some topics
where you gave answers about this issue.
Thanks for all :t
Cheers
:t
create a matrix of pointers to the real10's
transpose the pointer matrix :biggrin:
Quote from: dedndave on May 19, 2018, 05:28:07 AM
create a matrix of pointers to the real10's
transpose the pointer matrix :biggrin:
Hi Dave,
How are you ? I hope you are fine !
Well it is well known that when we dont want to move a lot of an array elements
we use an array of pointers: it is an array of dwords...
Your solution seems to be expensive: each matrix, each array of pointer matrix.
If we define 100 matrices we need to define 200 arrays (100 for matrices+100 for pointers).
It seems because if a=[1,2;3,4] and we do a=a^t; Now where is a ? What are the elemens of a ?(In TheCalculator, when we write a matrix name "a" and press enter/compute it shows "a")TheCalculator uses 16 bytes for each real10 so... Dave, have you the same Pentium 4 CPU yet ? Do you remember why ?Good luck :t
hi Rui :biggrin:
Hi all,
Inside the folder SSE30_33_tests.zip we have my results of a set of tests
of SSE30 to SSE33 procedures to transpose any matrix of any size
1x1 up to NxM. You may test it also and if you want post your results
if you have a i5 / i7 / AMD CPU or better. Your contribution may be useful
to me to understand what i should do next. I have a very slow P4 yet.
You may do all tests but i would like to know the results for this:
TestTranspose97x97_100x100_cyclesSSE30_33_100000
TestTranspose250x256_evenlines_cyclesSSE30_33_10000
I am developing and testing news algorithms SSE38_41 but
the time doesnt stretch... and i am trying to optimize some
cases so i need more time... and more tests. For example i dont
know if push/pop esi is better than a local variable. ... you know ?
Thanks
Good luck :t
Note: 1000000/100000/10000 is the loop counter used.
At the end of VerifyProcsFrom1x1_to_120x120_SSE30_33
all 4 procs are tested using matrices of 1x1 up to 120x120
defined in the .data? segment. I wrote a general proc
for any size NxM but it is not working for procedures that
doesnt use the dimensions behind the address.
Sample for some types of matrices- like256x256(RE=REference):
***** Time table - LoopCount =10 000 *****
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
274723 cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col
277606 cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col
278301 cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col
280608 cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col
288570 cycles, MatrixTransposeRE, testMatYY 252x252, Lin, Col
276360 cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col
277126 cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col
278578 cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col
289265 cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col
291922 cycles, MatrixTransposeRE, testMatYA 252x250, Lin, Col
277665 cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col
278713 cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col
301969 cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col
302056 cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col
305275 cycles, MatrixTransposeRE, testMatWA 256x250, Lin, Col
277906 cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col
280112 cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col
281857 cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col
283046 cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col
300882 cycles, MatrixTransposeRE, testMatYC 252x256, Lin, Col
278085 cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col
281408 cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col
289149 cycles, MatrixTransposeRE, testMatWB 256x252, Lin, Col
303014 cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col
303088 cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col
280843 cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col
282137 cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col
282411 cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col
295377 cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col
298502 cycles, MatrixTransposeRE, testMatYB 252x254, Lin, Col
284465 cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col
289484 cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col
295322 cycles, MatrixTransposeRE, testMatWC 256x254, Lin, Col
307567 cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col
307757 cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col
287766 cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col
287891 cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col
307719 cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col
307860 cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col
308480 cycles, MatrixTransposeRE, testMatWW 256x256, Lin, Col
301073 cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col
302764 cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col
303633 cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col
303680 cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col
317857 cycles, MatrixTransposeRE, testMatXA 250x252, Lin, Col
302701 cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col
305413 cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col
308789 cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col
309537 cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col
311748 cycles, MatrixTransposeRE, testMatXX 250x250, Lin, Col
305303 cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col
309247 cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col
310500 cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col
317925 cycles, MatrixTransposeRE, testMatXB 250x254, Lin, Col
316275 cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col
309104 cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col
309146 cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col
309344 cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col
315558 cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col
320653 cycles, MatrixTransposeRE, testMatXC 250x256, Lin, Col
314387 cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col
315688 cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col
318761 cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col
321526 cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col
323225 cycles, MatrixTransposeRE, testMatZA 254x250, Lin, Col
315387 cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col
315677 cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col
316444 cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col
322256 cycles, MatrixTransposeRE, testMatZB 254x252, Lin, Col
324936 cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col
318743 cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col
322824 cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col
324874 cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col
328921 cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col
340988 cycles, MatrixTransposeRE, testMatZZ 254x254, Lin, Col
319148 cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col
319973 cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col
321797 cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col
321990 cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col
337617 cycles, MatrixTransposeRE, testMatZC 254x256, Lin, Col
Siekmanski:
***** Time table - LoopCount =10 000 ****
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
169683 cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col
169790 cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col
170152 cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col
170215 cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col
171765 cycles, MatrixTransposeRE, testMatWA 256x250, Lin, Col
170144 cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col
170190 cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col
170553 cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col
172547 cycles, MatrixTransposeRE, testMatWB 256x252, Lin, Col
177553 cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col
172411 cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col
172939 cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col
173005 cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col
174693 cycles, MatrixTransposeRE, testMatWC 256x254, Lin, Col
175477 cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col
173518 cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col
173646 cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col
173653 cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col
173962 cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col
176945 cycles, MatrixTransposeRE, testMatWW 256x256, Lin, Col
175354 cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col
175500 cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col
175655 cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col
176391 cycles, MatrixTransposeRE, testMatYA 252x250, Lin, Col
176458 cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col
176490 cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col
176542 cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col
176618 cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col
176639 cycles, MatrixTransposeRE, testMatYY 252x252, Lin, Col
176651 cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col
178076 cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col
178186 cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col
178206 cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col
178366 cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col
178791 cycles, MatrixTransposeRE, testMatYB 252x254, Lin, Col
179118 cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col
179137 cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col
179194 cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col
179201 cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col
179252 cycles, MatrixTransposeRE, testMatYC 252x256, Lin, Col
181607 cycles, MatrixTransposeRE, testMatXX 250x250, Lin, Col ; <<<<<-----
182332 cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col
182494 cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col
182530 cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col
189825 cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col
181717 cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col
181995 cycles, MatrixTransposeRE, testMatXA 250x252, Lin, Col
182570 cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col
182609 cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col
182696 cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col
182441 cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col
182963 cycles, MatrixTransposeRE, testMatZB 254x252, Lin, Col
183670 cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col
183767 cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col
183898 cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col
183673 cycles, MatrixTransposeRE, testMatZA 254x250, Lin, Col ; <<<<<-----
184208 cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col
184249 cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col
184322 cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col
184480 cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col
184616 cycles, MatrixTransposeRE, testMatXC 250x256, Lin, Col ; <<<<<-----
185553 cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col
185616 cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col
185628 cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col
185849 cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col
185145 cycles, MatrixTransposeRE, testMatXB 250x254, Lin, Col ; <<<<<-----
185595 cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col
185601 cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col
185650 cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col
185790 cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col
185163 cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col
186095 cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col
186169 cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col
186210 cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col
186934 cycles, MatrixTransposeRE, testMatZC 254x256, Lin, Col
187866 cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col
187890 cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col
187986 cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col
188069 cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col
188178 cycles, MatrixTransposeRE, testMatZZ 254x254, Lin, Col
"TestTranspose97x97_100x100_cyclesSSE30_33_100000"
***** Time table - LoopCount =1 000 000 *****
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
9267 cycles, MatrixTransposeSSE31, testMatYY 98x98, Lin, Col
9269 cycles, MatrixTransposeSSE30, testMatYY 98x98, Lin, Col
9610 cycles, MatrixTransposeSSE33, testMatYY 98x98, Lin, Col
9610 cycles, MatrixTransposeSSE32, testMatYY 98x98, Lin, Col
10171 cycles, MatrixTransposeRE, testMatXX 97x97, Lin, Col
10179 cycles, MatrixTransposeSSE31, testMatXX 97x97, Lin, Col
10240 cycles, MatrixTransposeSSE30, testMatXX 97x97, Lin, Col
10348 cycles, MatrixTransposeSSE33, testMatXX 97x97, Lin, Col
10366 cycles, MatrixTransposeSSE32, testMatXX 97x97, Lin, Col
10519 cycles, MatrixTransposeRE, testMatYY 98x98, Lin, Col
10674 cycles, MatrixTransposeSSE30, testMatZZ 99x99, Lin, Col
10684 cycles, MatrixTransposeSSE31, testMatZZ 99x99, Lin, Col
10714 cycles, MatrixTransposeSSE33, testMatZZ 99x99, Lin, Col
10717 cycles, MatrixTransposeSSE32, testMatZZ 99x99, Lin, Col
10999 cycles, MatrixTransposeSSE30, testMatWW 100x100, Lin, Col
11002 cycles, MatrixTransposeSSE31, testMatWW 100x100, Lin, Col
11049 cycles, MatrixTransposeSSE32, testMatWW 100x100, Lin, Col
11052 cycles, MatrixTransposeSSE33, testMatWW 100x100, Lin, Col
11192 cycles, MatrixTransposeRE, testMatWW 100x100, Lin, Col
11840 cycles, MatrixTransposeRE, testMatZZ 99x99, Lin, Col
********** END **********
"TestTranspose250x256_evenlines_cyclesSSE30_33_10000"
***** Time table - LoopCount =10 000 *****
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
169683 cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col
169790 cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col
170144 cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col
170152 cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col
170190 cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col
170215 cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col
170553 cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col
171765 cycles, MatrixTransposeRE, testMatWA 256x250, Lin, Col
172411 cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col
172547 cycles, MatrixTransposeRE, testMatWB 256x252, Lin, Col
172939 cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col
173005 cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col
173518 cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col
173646 cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col
173653 cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col
173962 cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col
174693 cycles, MatrixTransposeRE, testMatWC 256x254, Lin, Col
175354 cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col
175477 cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col
175500 cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col
175655 cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col
176391 cycles, MatrixTransposeRE, testMatYA 252x250, Lin, Col
176458 cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col
176490 cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col
176542 cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col
176618 cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col
176639 cycles, MatrixTransposeRE, testMatYY 252x252, Lin, Col
176651 cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col
176945 cycles, MatrixTransposeRE, testMatWW 256x256, Lin, Col
177553 cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col
178076 cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col
178186 cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col
178206 cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col
178366 cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col
178791 cycles, MatrixTransposeRE, testMatYB 252x254, Lin, Col
179118 cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col
179137 cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col
179194 cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col
179201 cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col
179252 cycles, MatrixTransposeRE, testMatYC 252x256, Lin, Col
181607 cycles, MatrixTransposeRE, testMatXX 250x250, Lin, Col
181717 cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col
181995 cycles, MatrixTransposeRE, testMatXA 250x252, Lin, Col
182332 cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col
182441 cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col
182494 cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col
182530 cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col
182570 cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col
182609 cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col
182696 cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col
182963 cycles, MatrixTransposeRE, testMatZB 254x252, Lin, Col
183670 cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col
183673 cycles, MatrixTransposeRE, testMatZA 254x250, Lin, Col
183767 cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col
183898 cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col
184208 cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col
184249 cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col
184322 cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col
184480 cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col
184616 cycles, MatrixTransposeRE, testMatXC 250x256, Lin, Col
185145 cycles, MatrixTransposeRE, testMatXB 250x254, Lin, Col
185163 cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col
185553 cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col
185595 cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col
185601 cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col
185616 cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col
185628 cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col
185650 cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col
185790 cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col
185849 cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col
186095 cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col
186169 cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col
186210 cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col
186934 cycles, MatrixTransposeRE, testMatZC 254x256, Lin, Col
187866 cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col
187890 cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col
187986 cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col
188069 cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col
188178 cycles, MatrixTransposeRE, testMatZZ 254x254, Lin, Col
189825 cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col
********** END **********
Hi
If you have a i5 / i7 / AMD CPU or better
would you mind to post the results for this:
TestTranspose506x512_evenlines_cyclesSSE30_33_10000
My results for big matrices like 512x512
***** Time table - LoopCount =10 000 *****
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
1207440 cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
1343612 cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
1344018 cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
1561033 cycles, MatrixTransposeRE, testMatWB 512x508, Lin, Col
1663892 cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
1228184 cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
1245623 cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
1383897 cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
1399765 cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
1610377 cycles, MatrixTransposeRE, testMatWW 512x512, Lin, Col
1300479 cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
1371443 cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
1374818 cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
1377963 cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
1519329 cycles, MatrixTransposeRE, testMatYY 508x508, Lin, Col
1329759 cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
1387799 cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
1391123 cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
1969762 cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
1679268 cycles, MatrixTransposeRE, testMatYC 508x512, Lin, Col
1344344 cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
1373482 cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
1418670 cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
1446757 cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
1600959 cycles, MatrixTransposeRE, testMatWC 512x510, Lin, Col
1351207 cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
1353411 cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
1375846 cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
1383705 cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
1588341 cycles, MatrixTransposeRE, testMatYA 508x506, Lin, Col
1352139 cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
1366498 cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
1381257 cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
1446800 cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
1581571 cycles, MatrixTransposeRE, testMatWA 512x506, Lin, Col
1364639 cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1399896 cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
1401838 cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
1553876 cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
1558572 cycles, MatrixTransposeRE, testMatYB 508x510, Lin, Col
1408078 cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
1409122 cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
1416770 cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
1420956 cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
1625558 cycles, MatrixTransposeRE, testMatXA 506x508, Lin, Col
1411634 cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
1413983 cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
1419621 cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
1419933 cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
1606143 cycles, MatrixTransposeRE, testMatXC 506x512, Lin, Col
1443784 cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
1450609 cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
1548318 cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
1717637 cycles, MatrixTransposeRE, testMatXX 506x506, Lin, Col
1974898 cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
1451095 cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1460987 cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1474059 cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1479797 cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1601902 cycles, MatrixTransposeRE, testMatZA 510x506, Lin, Col
1461182 cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1466117 cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1509165 cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1565127 cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1618744 cycles, MatrixTransposeRE, testMatZB 510x508, Lin, Col
1473912 cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
1474975 cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
1594094 cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
1595387 cycles, MatrixTransposeRE, testMatXB 506x510, Lin, Col
1903505 cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
1480526 cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1483428 cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1483558 cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1632179 cycles, MatrixTransposeRE, testMatZC 510x512, Lin, Col
1659401 cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1494118 cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1505676 cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1530651 cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1542758 cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
1662796 cycles, MatrixTransposeRE, testMatZZ 510x510, Lin, Col
Hi Rui!
***** Time table - LoopCount =10 000 *****
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)
1582930 cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
1589090 cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
1629882 cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
1634399 cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
1634981 cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
1654561 cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
1655602 cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
1657649 cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
1675287 cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1679104 cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
1681008 cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
1681443 cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
1681677 cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1684656 cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1696919 cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
1698205 cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
1706127 cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
1706813 cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
1708254 cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
1710365 cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1718140 cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
1718243 cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
1723216 cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
1727195 cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
1738273 cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1743494 cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
1745981 cycles, MatrixTransposeRE, testMatZA 510x506, Lin, Col
1746192 cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1750273 cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1752060 cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
1760193 cycles, MatrixTransposeRE, testMatYY 508x508, Lin, Col
1765761 cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1770073 cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1771423 cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1782817 cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
1784885 cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
1789888 cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
1792618 cycles, MatrixTransposeRE, testMatYC 508x512, Lin, Col
1798966 cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
1799624 cycles, MatrixTransposeRE, testMatYA 508x506, Lin, Col
1804543 cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
1808388 cycles, MatrixTransposeRE, testMatYB 508x510, Lin, Col
1815006 cycles, MatrixTransposeRE, testMatZB 510x508, Lin, Col
1821494 cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
1821663 cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1828442 cycles, MatrixTransposeRE, testMatZZ 510x510, Lin, Col
1832988 cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
1848994 cycles, MatrixTransposeRE, testMatZC 510x512, Lin, Col
1850935 cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1851496 cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1860365 cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1863502 cycles, MatrixTransposeRE, testMatXC 506x512, Lin, Col
1869659 cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1872352 cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
1876061 cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
1905205 cycles, MatrixTransposeRE, testMatXX 506x506, Lin, Col
1905353 cycles, MatrixTransposeRE, testMatXA 506x508, Lin, Col
1905891 cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1912748 cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
2014807 cycles, MatrixTransposeRE, testMatXB 506x510, Lin, Col
2025312 cycles, MatrixTransposeRE, testMatWB 512x508, Lin, Col
2027611 cycles, MatrixTransposeRE, testMatWC 512x510, Lin, Col
2130691 cycles, MatrixTransposeRE, testMatWA 512x506, Lin, Col
2199452 cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
2208662 cycles, MatrixTransposeRE, testMatWW 512x512, Lin, Col
2218100 cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
2272197 cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
2284930 cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
2304169 cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
2309517 cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
2315637 cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
2318191 cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
2335750 cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
2349308 cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
2356218 cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
2367015 cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
2384230 cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
2393212 cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
2430160 cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
2471007 cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
TestTranspose506x512_evenlines_cyclesSSE30_33_10000
***** Time table - LoopCount =10 000 *****
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
516016 cycles, MatrixTransposeRE, testMatYC 508x512, Lin, Col
516032 cycles, MatrixTransposeRE, testMatYY 508x508, Lin, Col
518944 cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
519159 cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
520085 cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
521001 cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
521046 cycles, MatrixTransposeRE, testMatYB 508x510, Lin, Col
521454 cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
521850 cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
523831 cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
529275 cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
532225 cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
533638 cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
534200 cycles, MatrixTransposeRE, testMatYA 508x506, Lin, Col
536010 cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
536133 cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
536956 cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
539305 cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
553189 cycles, MatrixTransposeRE, testMatXA 506x508, Lin, Col
556279 cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
556556 cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
558625 cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
561847 cycles, MatrixTransposeRE, testMatXC 506x512, Lin, Col
561946 cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
562192 cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
563973 cycles, MatrixTransposeRE, testMatXB 506x510, Lin, Col
563994 cycles, MatrixTransposeRE, testMatXX 506x506, Lin, Col
564207 cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
564973 cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
566778 cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
567599 cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
567628 cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
567824 cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
568741 cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
569236 cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
569664 cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
570366 cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
570429 cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
577716 cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
649385 cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
697810 cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
697820 cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
701489 cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
703018 cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
703127 cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
703380 cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
703681 cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
705732 cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
707101 cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
707653 cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
707703 cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
711017 cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
714643 cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
717562 cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
748249 cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
750745 cycles, MatrixTransposeRE, testMatZC 510x512, Lin, Col
754049 cycles, MatrixTransposeRE, testMatZB 510x508, Lin, Col
759664 cycles, MatrixTransposeRE, testMatZZ 510x510, Lin, Col
760467 cycles, MatrixTransposeRE, testMatZA 510x506, Lin, Col
784521 cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
821192 cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
821220 cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
822507 cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
824443 cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
825652 cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
826155 cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
828776 cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
828826 cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
830018 cycles, MatrixTransposeRE, testMatWB 512x508, Lin, Col
830026 cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
832190 cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
832262 cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
832842 cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
833257 cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
833705 cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
835167 cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
835822 cycles, MatrixTransposeRE, testMatWA 512x506, Lin, Col
842265 cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
842879 cycles, MatrixTransposeRE, testMatWW 512x512, Lin, Col
844842 cycles, MatrixTransposeRE, testMatWC 512x510, Lin, Col
********** END **********
***** Time table - LoopCount =10 000 *****
Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (SSE4)
345439 cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
345581 cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
346869 cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
347229 cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
347355 cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
349661 cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
353445 cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
353570 cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
355062 cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
356014 cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
356298 cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
356396 cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
357701 cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
359675 cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
365421 cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
366776 cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
368150 cycles, MatrixTransposeRE, testMatYY 508x508, Lin, Col
369632 cycles, MatrixTransposeRE, testMatYA 508x506, Lin, Col
373951 cycles, MatrixTransposeRE, testMatYC 508x512, Lin, Col
374200 cycles, MatrixTransposeRE, testMatYB 508x510, Lin, Col
379271 cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
379788 cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
381998 cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
382001 cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
382428 cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
382806 cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
384346 cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
386417 cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
386421 cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
387250 cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
388306 cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
388765 cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
389782 cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
390197 cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
391379 cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
394047 cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
398618 cycles, MatrixTransposeRE, testMatXB 506x510, Lin, Col
398927 cycles, MatrixTransposeRE, testMatXA 506x508, Lin, Col
399796 cycles, MatrixTransposeRE, testMatXC 506x512, Lin, Col
415985 cycles, MatrixTransposeRE, testMatXX 506x506, Lin, Col
436669 cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
440358 cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
446288 cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
447876 cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
449851 cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
450205 cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
451526 cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
454294 cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
458092 cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
460574 cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
461186 cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
462350 cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
464766 cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
466883 cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
472146 cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
475473 cycles, MatrixTransposeRE, testMatZA 510x506, Lin, Col
475762 cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
481264 cycles, MatrixTransposeRE, testMatZB 510x508, Lin, Col
487516 cycles, MatrixTransposeRE, testMatZZ 510x510, Lin, Col
490776 cycles, MatrixTransposeRE, testMatZC 510x512, Lin, Col
770667 cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
771167 cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
771268 cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
774841 cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
775229 cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
775562 cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
778550 cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
778918 cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
779108 cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
783743 cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
784066 cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
784178 cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
784690 cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
787297 cycles, MatrixTransposeRE, testMatWA 512x506, Lin, Col
789322 cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
792116 cycles, MatrixTransposeRE, testMatWB 512x508, Lin, Col
792221 cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
794690 cycles, MatrixTransposeRE, testMatWC 512x510, Lin, Col
796554 cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
797408 cycles, MatrixTransposeRE, testMatWW 512x512, Lin, Col
********** END **********
Thanks all :t
:icon14:
These are the results (i7/AMD) sorted by matrix type:
mineiro:
***** Time table - LoopCount =10 000 *****
Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (SSE4)
345439 cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
345581 cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
347355 cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
356396 cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
369632 cycles, MatrixTransposeRE, testMatYA 508x506, Lin, Col
346869 cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
347229 cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
349661 cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
357701 cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
368150 cycles, MatrixTransposeRE, testMatYY 508x508, Lin, Col
353445 cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
353570 cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
356014 cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
365421 cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
374200 cycles, MatrixTransposeRE, testMatYB 508x510, Lin, Col
355062 cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
356298 cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
359675 cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
366776 cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
373951 cycles, MatrixTransposeRE, testMatYC 508x512, Lin, Col
379271 cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
382001 cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
389782 cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
390197 cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
398927 cycles, MatrixTransposeRE, testMatXA 506x508, Lin, Col
379788 cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
382428 cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
387250 cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
388306 cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
415985 cycles, MatrixTransposeRE, testMatXX 506x506, Lin, Col
381998 cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
384346 cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
388765 cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
394047 cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
399796 cycles, MatrixTransposeRE, testMatXC 506x512, Lin, Col
382806 cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
386417 cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
386421 cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
391379 cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
398618 cycles, MatrixTransposeRE, testMatXB 506x510, Lin, Col
436669 cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
450205 cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
451526 cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
461186 cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
481264 cycles, MatrixTransposeRE, testMatZB 510x508, Lin, Col
440358 cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
446288 cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
447876 cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
458092 cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
475473 cycles, MatrixTransposeRE, testMatZA 510x506, Lin, Col
449851 cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
464766 cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
466883 cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
475762 cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
490776 cycles, MatrixTransposeRE, testMatZC 510x512, Lin, Col
454294 cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
460574 cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
462350 cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
472146 cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
487516 cycles, MatrixTransposeRE, testMatZZ 510x510, Lin, Col
770667 cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
771167 cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
771268 cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
784690 cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
787297 cycles, MatrixTransposeRE, testMatWA 512x506, Lin, Col
774841 cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
775229 cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
775562 cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
789322 cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
792116 cycles, MatrixTransposeRE, testMatWB 512x508, Lin, Col
778550 cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
778918 cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
779108 cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
792221 cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
794690 cycles, MatrixTransposeRE, testMatWC 512x510, Lin, Col
783743 cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
784066 cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
784178 cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
796554 cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
797408 cycles, MatrixTransposeRE, testMatWW 512x512, Lin, Col
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
siekmanski:
***** Time table - LoopCount =10 000 *****
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
516016 cycles, MatrixTransposeRE, testMatYC 508x512, Lin, Col ; <<<<<<----
520085 cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
521001 cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
536956 cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
521850 cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
516032 cycles, MatrixTransposeRE, testMatYY 508x508, Lin, Col ; <<<<<<----
518944 cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
519159 cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
521454 cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
523831 cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
521046 cycles, MatrixTransposeRE, testMatYB 508x510, Lin, Col ; <<<<<<----
529275 cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
532225 cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
533638 cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
536010 cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
534200 cycles, MatrixTransposeRE, testMatYA 508x506, Lin, Col ; <<<<<<----
536133 cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
539305 cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
564207 cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
577716 cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
553189 cycles, MatrixTransposeRE, testMatXA 506x508, Lin, Col ; <<<<<<----
556279 cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
556556 cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
558625 cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
566778 cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
561847 cycles, MatrixTransposeRE, testMatXC 506x512, Lin, Col ; <<<<<<----
561946 cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
562192 cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
569664 cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
649385 cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
563973 cycles, MatrixTransposeRE, testMatXB 506x510, Lin, Col ; <<<<<<----
567599 cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
567628 cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
570366 cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
570429 cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
563994 cycles, MatrixTransposeRE, testMatXX 506x506, Lin, Col ; <<<<<<----
564973 cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
567824 cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
568741 cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
569236 cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
697810 cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
697820 cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
703127 cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
714643 cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
754049 cycles, MatrixTransposeRE, testMatZB 510x508, Lin, Col
701489 cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
703018 cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
703380 cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
703681 cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
760467 cycles, MatrixTransposeRE, testMatZA 510x506, Lin, Col
705732 cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
707101 cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
717562 cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
750745 cycles, MatrixTransposeRE, testMatZC 510x512, Lin, Col
784521 cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
707653 cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
707703 cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
711017 cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
748249 cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
759664 cycles, MatrixTransposeRE, testMatZZ 510x510, Lin, Col
821192 cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
821220 cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
822507 cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
824443 cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
830018 cycles, MatrixTransposeRE, testMatWB 512x508, Lin, Col
825652 cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
826155 cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
828776 cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
828826 cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
835822 cycles, MatrixTransposeRE, testMatWA 512x506, Lin, Col
830026 cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
832842 cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
833257 cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
835167 cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
842879 cycles, MatrixTransposeRE, testMatWW 512x512, Lin, Col
832190 cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
832262 cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
833705 cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
842265 cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
844842 cycles, MatrixTransposeRE, testMatWC 512x510, Lin, Col
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
HSE:
***** Time table - LoopCount =10 000 *****
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)
1582930 cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
1589090 cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
1657649 cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
1679104 cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
1905353 cycles, MatrixTransposeRE, testMatXA 506x508, Lin, Col
1629882 cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
1654561 cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
1655602 cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
1718243 cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
1863502 cycles, MatrixTransposeRE, testMatXC 506x512, Lin, Col
1634399 cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
1634981 cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
1876061 cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
1905205 cycles, MatrixTransposeRE, testMatXX 506x506, Lin, Col
1804543 cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
1675287 cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1738273 cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1750273 cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1815006 cycles, MatrixTransposeRE, testMatZB 510x508, Lin, Col
1850935 cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1681008 cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
1743494 cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
1799624 cycles, MatrixTransposeRE, testMatYA 508x506, Lin, Col
1832988 cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
1872352 cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
1681443 cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
1718140 cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
1760193 cycles, MatrixTransposeRE, testMatYY 508x508, Lin, Col
1784885 cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
1798966 cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
1681677 cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1745981 cycles, MatrixTransposeRE, testMatZA 510x506, Lin, Col
1746192 cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1851496 cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1869659 cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1684656 cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1765761 cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1771423 cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1848994 cycles, MatrixTransposeRE, testMatZC 510x512, Lin, Col
1860365 cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1696919 cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
1723216 cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
1727195 cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
1752060 cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
2014807 cycles, MatrixTransposeRE, testMatXB 506x510, Lin, Col
1698205 cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
1706127 cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
1792618 cycles, MatrixTransposeRE, testMatYC 508x512, Lin, Col
1708254 cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
1782817 cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
1706813 cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
1789888 cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
1808388 cycles, MatrixTransposeRE, testMatYB 508x510, Lin, Col
1821494 cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
1905891 cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1710365 cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1770073 cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1821663 cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1828442 cycles, MatrixTransposeRE, testMatZZ 510x510, Lin, Col
1912748 cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
2025312 cycles, MatrixTransposeRE, testMatWB 512x508, Lin, Col ; <<<<<<----
2199452 cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
2304169 cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
2315637 cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
2471007 cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
2027611 cycles, MatrixTransposeRE, testMatWC 512x510, Lin, Col ; <<<<<<----
2218100 cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
2284930 cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
2335750 cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
2393212 cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
2130691 cycles, MatrixTransposeRE, testMatWA 512x506, Lin, Col ; <<<<<<----
2309517 cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
2318191 cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
2349308 cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
2430160 cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
2208662 cycles, MatrixTransposeRE, testMatWW 512x512, Lin, Col ; <<<<<<----
2272197 cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
2356218 cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
2367015 cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
2384230 cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
As always, I'm stylishly late. :P
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4)
810005 cycles, MatrixTransposeRE, testMatYY 508x508, Lin, Col
820058 cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
821950 cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
823983 cycles, MatrixTransposeRE, testMatYA 508x506, Lin, Col
828697 cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
829394 cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
830046 cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
832485 cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
832895 cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
833612 cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
834352 cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
835091 cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
839675 cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
846346 cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
846411 cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
848844 cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
849311 cycles, MatrixTransposeRE, testMatYC 508x512, Lin, Col
852962 cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
855288 cycles, MatrixTransposeRE, testMatYB 508x510, Lin, Col
862895 cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
871913 cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
874990 cycles, MatrixTransposeRE, testMatXX 506x506, Lin, Col
878483 cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
878772 cycles, MatrixTransposeRE, testMatXC 506x512, Lin, Col
879291 cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
884764 cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
886302 cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
887827 cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
888715 cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
889632 cycles, MatrixTransposeRE, testMatXB 506x510, Lin, Col
890132 cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
891733 cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
892603 cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
898395 cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
902633 cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
902953 cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
902953 cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
902999 cycles, MatrixTransposeRE, testMatXA 506x508, Lin, Col
921149 cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
965255 cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1503461 cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1508044 cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
1508102 cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1539823 cycles, MatrixTransposeRE, testMatWB 512x508, Lin, Col
1553754 cycles, MatrixTransposeRE, testMatWA 512x506, Lin, Col
1589446 cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
1612950 cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
1617458 cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
1618238 cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
1619925 cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
1625738 cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
1628523 cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
1629810 cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
1631325 cycles, MatrixTransposeRE, testMatWC 512x510, Lin, Col
1632128 cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
1632552 cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
1632744 cycles, MatrixTransposeRE, testMatWW 512x512, Lin, Col
1633612 cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
1634083 cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
1638780 cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
1640885 cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
1643499 cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
1681922 cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1708885 cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1709024 cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1729089 cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1745465 cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1751541 cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1753087 cycles, MatrixTransposeRE, testMatZA 510x506, Lin, Col
1754633 cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1759147 cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1759762 cycles, MatrixTransposeRE, testMatZB 510x508, Lin, Col
1760467 cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
1761773 cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1761883 cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1763743 cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1765680 cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1767771 cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1768096 cycles, MatrixTransposeRE, testMatZZ 510x510, Lin, Col
1773351 cycles, MatrixTransposeRE, testMatZC 510x512, Lin, Col
********** END **********
:bgrin:
although my processor speed isn't listed by the program, it is 1.60 Ghz...
Hi
Thanks :t
It is not possible to add this to the previous set (more than 20000 characters)
Here are your results (AMD) sorted by matrix type
zedd151:
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4)
810005 cycles, MatrixTransposeRE, testMatYY 508x508, Lin, Col ; <<<<<<----
828697 cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
829394 cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
830046 cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
832485 cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
820058 cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
833612 cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
839675 cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
846346 cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
849311 cycles, MatrixTransposeRE, testMatYC 508x512, Lin, Col
821950 cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
823983 cycles, MatrixTransposeRE, testMatYA 508x506, Lin, Col
832895 cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
834352 cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
835091 cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
846411 cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
848844 cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
852962 cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
855288 cycles, MatrixTransposeRE, testMatYB 508x510, Lin, Col
965255 cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
862895 cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
871913 cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
878772 cycles, MatrixTransposeRE, testMatXC 506x512, Lin, Col
879291 cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
887827 cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
874990 cycles, MatrixTransposeRE, testMatXX 506x506, Lin, Col ; <<<<<<----
884764 cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
886302 cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
891733 cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
892603 cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
878483 cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
889632 cycles, MatrixTransposeRE, testMatXB 506x510, Lin, Col
890132 cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
898395 cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
921149 cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
888715 cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
902633 cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
902953 cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
902953 cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
902999 cycles, MatrixTransposeRE, testMatXA 506x508, Lin, Col
1503461 cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1745465 cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1751541 cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1754633 cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1759762 cycles, MatrixTransposeRE, testMatZB 510x508, Lin, Col
1508044 cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
1631325 cycles, MatrixTransposeRE, testMatWC 512x510, Lin, Col
1632128 cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
1638780 cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
1640885 cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
1508102 cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1760467 cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
1761773 cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1767771 cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1768096 cycles, MatrixTransposeRE, testMatZZ 510x510, Lin, Col
1539823 cycles, MatrixTransposeRE, testMatWB 512x508, Lin, Col ; <<<<<<----
1625738 cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
1628523 cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
1632552 cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
1633612 cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
1553754 cycles, MatrixTransposeRE, testMatWA 512x506, Lin, Col ; <<<<<<----
1612950 cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
1617458 cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
1618238 cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
1619925 cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
1589446 cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
1629810 cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
1632744 cycles, MatrixTransposeRE, testMatWW 512x512, Lin, Col
1634083 cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
1643499 cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
1681922 cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1708885 cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1709024 cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1729089 cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1753087 cycles, MatrixTransposeRE, testMatZA 510x506, Lin, Col
1759147 cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1761883 cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1763743 cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1765680 cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1773351 cycles, MatrixTransposeRE, testMatZC 510x512, Lin, Col
Hi all,
Here is the new version SSE46_49 that you may test/see my results.
If you have a i5 / i7 / AMD CPU and you want to show me
your results, please ZIP it and post. Use only these (or what you want):
TestTranspose97_100_cyclesSSE46_49_1000000
TestTranspose506x512_evenlines_cyclesSSE46_49_1000
TestTranspose506x512_oddlines_cyclesSSE46_49_1000
Thank you
My little sample:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
40390 cycles, MatrixTransposeSSE49, testMatWW 100x100, Lin, Col
40493 cycles, MatrixTransposeSSE48, testMatWW 100x100, Lin, Col
42654 cycles, MatrixTransposeSSE46, testMatWW 100x100, Lin, Col
42794 cycles, MatrixTransposeSSE47, testMatWW 100x100, Lin, Col
43395 cycles, MatrixTransposeRE, testMatWW 100x100, Lin, Col
47565 cycles, MatrixTransposeSSE46, testMatYY 98x98, Lin, Col
47702 cycles, MatrixTransposeSSE49, testMatYY 98x98, Lin, Col
47770 cycles, MatrixTransposeSSE48, testMatYY 98x98, Lin, Col
47969 cycles, MatrixTransposeSSE47, testMatYY 98x98, Lin, Col
48718 cycles, MatrixTransposeRE, testMatYY 98x98, Lin, Col
72327 cycles, MatrixTransposeSSE49, testMatXX 97x97, Lin, Col
72434 cycles, MatrixTransposeSSE48, testMatXX 97x97, Lin, Col
72506 cycles, MatrixTransposeSSE46, testMatXX 97x97, Lin, Col
72532 cycles, MatrixTransposeSSE47, testMatXX 97x97, Lin, Col
77414 cycles, MatrixTransposeRE, testMatXX 97x97, Lin, Col
76113 cycles, MatrixTransposeSSE49, testMatZZ 99x99, Lin, Col
76201 cycles, MatrixTransposeSSE48, testMatZZ 99x99, Lin, Col
76228 cycles, MatrixTransposeSSE47, testMatZZ 99x99, Lin, Col
76241 cycles, MatrixTransposeSSE46, testMatZZ 99x99, Lin, Col
78145 cycles, MatrixTransposeRE, testMatZZ 99x99, Lin, Col
Rui, which one of the 16 exes should we test?
Quote from: jj2007 on June 06, 2018, 02:49:25 PM
Rui, which one of the 16 exes should we test?
These 3 TestTranspose97_100_cyclesSSE46_49_1000000 TestTranspose506x512_evenlines_cyclesSSE46_49_1000 TestTranspose506x512_oddlines_cyclesSSE46_49_1000
Make a batch file?
Quote from: jj2007 on June 06, 2018, 04:57:03 PM
Make a batch file?
batch file ? It was in the 80's years i dont remember well now how. Help me andyou may did it.Yes ? I have so many things in my head that i dont have more space there now! :biggrin:
EDIT: Jochen, yes a batch file is a good solution to exec each prog and add the resulsts to one text file. But i dont remember now how to do it . Do you want to help me ?
Normally, it should look like this (save as test4Rui.bat):
TestTranspose97_100_cyclesSSE46_49.exe
TestTranspose506x512_evenlines_cyclesSSE46_49.exe
TestTranspose506x512_oddlines_cyclesSSE46_49.exe
pause
But it gives me 3x file not found, although I extracted the whole archive (16 exe) to the same folder. Are you sure these are the files we should test for you?
Quote from: jj2007 on June 06, 2018, 07:29:02 PM
Normally, it should look like this (save as test4Rui.bat):
TestTranspose97_100_cyclesSSE46_49_1000000.exe <<<<<<-----
TestTranspose506x512_evenlines_cyclesSSE46_49_1000.exe <<<<----
TestTranspose506x512_oddlines_cyclesSSE46_49_1000.exe <<<<<<<<----
pause
But it gives me 3x file not found, although I extracted the whole archive (16 exe) to the same folder. Are you sure these are the files we should test for you?
because the name ends in _1000000 for first and and _1000 for others 2.
The results could be sent to a text file no ?
Yes, something like this:
TestTranspose97_100_cyclesSSE46_49.exe > results.txt
TestTranspose506x512_evenlines_cyclesSSE46_49.exe >> results.txt
TestTranspose506x512_oddlines_cyclesSSE46_49.exe >> results.txt
But if program ask for user input I think will be more harder to do, if program only echo results to screen this can be like that.
The first ">" create a file and overwrite their contents, the second ">>" will append results at end of created file. Works from ms-dos to new cmd versions.
Quote from: mineiro on June 06, 2018, 08:53:38 PM
Yes, something like this:
TestTranspose97_100_cyclesSSE46_49.exe > results.txt
TestTranspose506x512_evenlines_cyclesSSE46_49.exe >> results.txt
TestTranspose506x512_oddlines_cyclesSSE46_49.exe >> results.txt
But if program ask for user input I think will be more harder to do, if program only echo results to screen this can be like that.
The first ">" create a file and overwrite their contents, the second ">>" will append results at end of created file. Works from ms-dos to new cmd versions.
Thanks,
The first works, the other (>>) no.
Quote from: jj2007 on June 06, 2018, 02:49:25 PM
Rui, which one of the 16 exes should we test?
Jochen, some are not timing tests - try and see - and i would like to get the results of 3 only as i wrote. If with a batch file we cannot create a single file of all tests - it doesnt work here- i need to create up to 4 batch files for upload the files, i think. In any way your idea is good i i will remove all inkeys in the exe files. I try to do it. By the way, that set of files means at least much hard work - we may see all the data organised.
Thanks
Hi all,
Here is the new version SSE46_49 that you may test/see my results.
I made some minor revisions is some macros.
If you have a i5 / i7 / AMD CPU and you want to show me
your results, please ZIP each .txt file and post.
If we use the batch file TestThis46_49.bat
we get:
resultsSSE46_49_506_512_evenlines.txt
resultsSSE46_49_506_512_oddlines.txt
resultsSSE46_49_97_100.txt
Thank you
Much better :t
The results.
Results:
results:
as always cpu speed 1.60 Ghz.
:biggrin:
Hi all,
Here are all the results by matrix type if you want to see it
They are the results in your CPU.
Thanks for your work.
EDIT: all plus results from LiaoMe
Results -
Hi all,
Would you mind to test my new SSE46 if you have a i5/i7 or AMD CPU ? It is a .bat file and the output is only 1 file .txt.
You may see my results also. They are "much faster than quick" :P
Probably, i will post all work SSE46 (.inc, .asm, .mac, .txt, etc ) in the next week in the workshop.
In the .txt file we will get all information about this work.
Thanks :t
EDIT: here it is, i'm sorry
There is only a TestThisAllSSE46.bat
Where are the executables?
Quote from: Siekmanski on June 14, 2018, 05:21:29 AM
There is only a TestThisAllSSE46.bat
Where are the executables?
Sorry is in the folder. Now it is correct
The results,
results: