The MASM Forum

Miscellaneous => Miscellaneous Projects => Topic started by: RuiLoureiro on May 07, 2018, 02:13:55 AM

Title: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 07, 2018, 02:13:55 AM
Hi all,
        Could you run and show here your results of TestTranspose_cycles1 ?
        (files: all .asm included and Timing.inc)
       
        Thanks  :t       
Note: The matrices used here are matrices where each element  is a REAL4 number.

EDITSee all results in my reply #7 below
EDIT:  tests using SSE

HERE are my results
Quote
434 cycles, MatrixTransposeZ, transposeMatX

399 cycles, MatrixTransposeDD, transposeMatZ

411 cycles, MatrixTransposeDF, transposeMatZ

417 cycles, MatrixTransposeX, transposeMatX

476 cycles, MatrixTransposeZZ, transposeMatX

470 cycles, MatrixTransposeDDD, transposeMatZ

540 cycles, MatrixTransposeDFF, transposeMatZ

414 cycles, MatrixTransposeXX, transposeMatX

1311 cycles, MatrixTransposeZ, transposeMatXX

1170 cycles, MatrixTransposeDD, transposeMatZZ

1156 cycles, MatrixTransposeDF, transposeMatZZ

1326 cycles, MatrixTransposeX, transposeMatXX

1620 cycles, MatrixTransposeZZ, transposeMatXX

1548 cycles, MatrixTransposeDDD, transposeMatZZ

1610 cycles, MatrixTransposeDFF, transposeMatZZ

1322 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

399  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
411  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last
414  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
417  cycles, MatrixTransposeX,   testMatX 4x4, Lin, Col - last to first
434  cycles, MatrixTransposeZ,   testMatX 4x4- last to first
470  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
476  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
540  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last

1156  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
1170  cycles, MatrixTransposeDDtestMatZZ 7x8- last to first
1311  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
1322  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
1326  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
1548  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1610  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
1620  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
********** END **********

Jochen  :t
For now (2 samples:P4,i5) MatrixTransposeDD seems to be the best
Title: Re: Testing Transpose of a Matrix
Post by: jj2007 on May 07, 2018, 02:17:13 AM
Intel Core i5:145 cycles, MatrixTransposeZ, transposeMatX

109 cycles, MatrixTransposeDD, transposeMatZ

115 cycles, MatrixTransposeDF, transposeMatZ

192 cycles, MatrixTransposeX, transposeMatX

264 cycles, MatrixTransposeZZ, transposeMatX

140 cycles, MatrixTransposeDDD, transposeMatZ

118 cycles, MatrixTransposeDFF, transposeMatZ

200 cycles, MatrixTransposeXX, transposeMatX

332 cycles, MatrixTransposeZ, transposeMatXX

316 cycles, MatrixTransposeDD, transposeMatZZ

329 cycles, MatrixTransposeDF, transposeMatZZ

300 cycles, MatrixTransposeX, transposeMatXX

378 cycles, MatrixTransposeZZ, transposeMatXX

402 cycles, MatrixTransposeDDD, transposeMatZZ

387 cycles, MatrixTransposeDFF, transposeMatZZ

310 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

109  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
115  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
118  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
140  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
145  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
192  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
200  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
264  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
300  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
310  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
316  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
329  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
332  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
378  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
387  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
402  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: zedd151 on May 07, 2018, 02:56:26 AM

86 cycles, MatrixTransposeZ, transposeMatX

87 cycles, MatrixTransposeDD, transposeMatZ

91 cycles, MatrixTransposeDF, transposeMatZ

88 cycles, MatrixTransposeX, transposeMatX

92 cycles, MatrixTransposeZZ, transposeMatX

90 cycles, MatrixTransposeDDD, transposeMatZ

99 cycles, MatrixTransposeDFF, transposeMatZ

97 cycles, MatrixTransposeXX, transposeMatX

402 cycles, MatrixTransposeZ, transposeMatXX

315 cycles, MatrixTransposeDD, transposeMatZZ

318 cycles, MatrixTransposeDF, transposeMatZZ

324 cycles, MatrixTransposeX, transposeMatXX

664 cycles, MatrixTransposeZZ, transposeMatXX

272 cycles, MatrixTransposeDDD, transposeMatZZ

323 cycles, MatrixTransposeDFF, transposeMatZZ

369 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4) 1.6 Ghz

86  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
87  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
88  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
90  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
91  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
92  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
97  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
99  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
272  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
315  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
318  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
323  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
324  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
369  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
402  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
664  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
********** END **********


:biggrin:
Title: Re: Testing Transpose of a Matrix
Post by: felipe on May 07, 2018, 03:55:59 AM

223 cycles, MatrixTransposeZ, transposeMatX

217 cycles, MatrixTransposeDD, transposeMatZ

219 cycles, MatrixTransposeDF, transposeMatZ

220 cycles, MatrixTransposeX, transposeMatX

290 cycles, MatrixTransposeZZ, transposeMatX

289 cycles, MatrixTransposeDDD, transposeMatZ

304 cycles, MatrixTransposeDFF, transposeMatZ

230 cycles, MatrixTransposeXX, transposeMatX

743 cycles, MatrixTransposeZ, transposeMatXX

746 cycles, MatrixTransposeDD, transposeMatZZ

743 cycles, MatrixTransposeDF, transposeMatZZ

780 cycles, MatrixTransposeX, transposeMatXX

1021 cycles, MatrixTransposeZZ, transposeMatXX

1016 cycles, MatrixTransposeDDD, transposeMatZZ

1055 cycles, MatrixTransposeDFF, transposeMatZZ

826 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz (SSE4)

217  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
219  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
220  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
223  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
230  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
289  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
290  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
304  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
743  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
743  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
746  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
780  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
826  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
1016  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1021  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
1055  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********


i5 here too. Windows 8.1.  :icon14:
Title: Re: Testing Transpose of a Matrix
Post by: LiaoMi on May 07, 2018, 04:39:51 AM
100 cycles, MatrixTransposeZ, transposeMatX

101 cycles, MatrixTransposeDD, transposeMatZ

101 cycles, MatrixTransposeDF, transposeMatZ

79 cycles, MatrixTransposeX, transposeMatX

117 cycles, MatrixTransposeZZ, transposeMatX

98 cycles, MatrixTransposeDDD, transposeMatZ

123 cycles, MatrixTransposeDFF, transposeMatZ

102 cycles, MatrixTransposeXX, transposeMatX

368 cycles, MatrixTransposeZ, transposeMatXX

337 cycles, MatrixTransposeDD, transposeMatZZ

333 cycles, MatrixTransposeDF, transposeMatZZ

258 cycles, MatrixTransposeX, transposeMatXX

413 cycles, MatrixTransposeZZ, transposeMatXX

317 cycles, MatrixTransposeDDD, transposeMatZZ

375 cycles, MatrixTransposeDFF, transposeMatZZ

266 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

79  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
98  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
100  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
101  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
101  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
102  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
117  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
123  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
258  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
266  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
317  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
333  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
337  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
368  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
375  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
413  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: Siekmanski on May 07, 2018, 05:20:18 AM
intel i7-4930K Win 8.1

203 cycles, MatrixTransposeZ, transposeMatX

125 cycles, MatrixTransposeDD, transposeMatZ

123 cycles, MatrixTransposeDF, transposeMatZ

101 cycles, MatrixTransposeX, transposeMatX

133 cycles, MatrixTransposeZZ, transposeMatX

117 cycles, MatrixTransposeDDD, transposeMatZ

139 cycles, MatrixTransposeDFF, transposeMatZ

103 cycles, MatrixTransposeXX, transposeMatX

384 cycles, MatrixTransposeZ, transposeMatXX

398 cycles, MatrixTransposeDD, transposeMatZZ

380 cycles, MatrixTransposeDF, transposeMatZZ

318 cycles, MatrixTransposeX, transposeMatXX

430 cycles, MatrixTransposeZZ, transposeMatXX

380 cycles, MatrixTransposeDDD, transposeMatZZ

465 cycles, MatrixTransposeDFF, transposeMatZZ

330 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

101  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
103  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
117  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
123  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
125  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
133  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
139  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
203  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
318  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
330  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
380  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
380  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
384  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
398  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
430  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
465  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: HSE on May 07, 2018, 05:35:33 AM
248 cycles, MatrixTransposeZ, transposeMatX

203 cycles, MatrixTransposeDD, transposeMatZ

225 cycles, MatrixTransposeDF, transposeMatZ

192 cycles, MatrixTransposeX, transposeMatX

255 cycles, MatrixTransposeZZ, transposeMatX

226 cycles, MatrixTransposeDDD, transposeMatZ

256 cycles, MatrixTransposeDFF, transposeMatZ

190 cycles, MatrixTransposeXX, transposeMatX

752 cycles, MatrixTransposeZ, transposeMatXX

668 cycles, MatrixTransposeDD, transposeMatZZ

727 cycles, MatrixTransposeDF, transposeMatZZ

619 cycles, MatrixTransposeX, transposeMatXX

830 cycles, MatrixTransposeZZ, transposeMatXX

718 cycles, MatrixTransposeDDD, transposeMatZZ

831 cycles, MatrixTransposeDFF, transposeMatZZ

621 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

190  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
192  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
203  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
225  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
226  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
248  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
255  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
256  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
619  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
621  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
668  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
718  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
727  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
752  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
830  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
831  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 07, 2018, 09:26:22 AM
Hi all,
        Thank you for showing your results (they are below). :t
       
        For now, i want to point out the definition of the matrices used
        in this test. Note that behind each name we have a BYTE (if X) or DWORD (if Z)
        for the number of LINES and behind, for the number of columns.

        The procedures ...X , ...Z , ...DD , ...DF       use EDI - push edi/pop edi
        The procedures ...XX, ...ZZ, ...DDD, ...DFF dont use EDI - but they are the same as above

        In the procedures ...X and ...XX we need to pass an address of a sequence of DWORDS -
        treated as real4 values - and the number of Lines and the number of Columns... I am
        using it here only for test purposes.
        :icon14:   
Quote
                    DEFINITIONS OF MATRICES
                     --------------------------------
MAXLINES_X      equ 4                 ; number of lines
MAXCOLUMNS_X    equ 4                 ; number of columns
MAXDWORDS_X     equ MAXLINES_X * MAXCOLUMNS_X
;-------------------------------------------------------
;                       testMatX
;=======================================================
                  db MAXCOLUMNS_X           ;           <<< is BYTE
                  db MAXLINES_X             ;           <<< is BYTE
testMatX      dd 11.0,12.0,13.0,14.0    ;     line 1
                  dd 21.0,22.0,23.0,24.0    ; +16 line 2
                  dd 31.0,32.0,33.0,34.0    ; +32 line 3
                  dd 41.0,42.0,43.0,44.0    ; +48 line 4
;-------------------------------------------------------
;                       testMatZ
;=======================================================
ALIGN 16
                  dd ?
                  dd ?
                  dd MAXCOLUMNS_X               ;           <<< is DWORD
                  dd MAXLINES_X                 ;           <<< is DWORD
testMatZ      dd 11.1, 12.2, 13.3, 14.4     ;   line 1
                  dd 21.1, 22.2, 23.3, 24.4     ;   line 2
                  dd 31.1, 32.2, 33.3, 34.4     ;   line 3
                  dd 41.1, 42.2, 43.3, 44.4     ;   line 4
;=======================================================
MAXLINES_Z      equ 7                 ; number of lines
MAXCOLUMNS_Z    equ 8                 ; number of columns
MAXDWORDS_Z     equ MAXLINES_Z * MAXCOLUMNS_Z
;-------------------------------------------------------
;                       testMatXX
;=======================================================
                  db MAXCOLUMNS_Z               ;          <<< is BYTE
                  db MAXLINES_Z                 ;          <<< is BYTE
testMatXX      dd 11.1, 12.2, 13.3, 14.4, 15.5, 16.6, 17.7, 18.8    ;   line 1
                  dd 21.1, 22.2, 23.3, 24.4, 25.5, 26.6, 27.7, 28.8    ;   line 2
                  dd 31.1, 32.2, 33.3, 34.4, 35.5, 36.6, 37.7, 38.8    ;   line 3
                  dd 41.1, 42.2, 43.3, 44.4, 45.5, 46.6, 47.7, 48.8    ;   line 4
                  dd 51.1, 52.2, 53.3, 54.4, 55.5, 56.6, 57.7, 58.8    ;   line 5
                  dd 61.1, 62.2, 63.3, 64.4, 65.5, 66.6, 67.7, 68.8    ;   line 6
                  dd 71.1, 72.2, 73.3, 74.4, 75.5, 76.6, 77.7, 78.8    ;   line 7
;-------------------------------------------------------
;                       testMatZZ
;=======================================================
ALIGN 16
                  dd ?
                  dd ?
                  dd MAXCOLUMNS_Z               ;           <<< is DWORD
                  dd MAXLINES_Z                 ;           <<< is DWORD
testMatZZ      dd 11.1, 12.2, 13.3, 14.4, 15.5, 16.6, 17.7, 18.8    ;   line 1
                  dd 21.1, 22.2, 23.3, 24.4, 25.5, 26.6, 27.7, 28.8    ;   line 2
                  dd 31.1, 32.2, 33.3, 34.4, 35.5, 36.6, 37.7, 38.8    ;   line 3
                  dd 41.1, 42.2, 43.3, 44.4, 45.5, 46.6, 47.7, 48.8    ;   line 4
                  dd 51.1, 52.2, 53.3, 54.4, 55.5, 56.6, 57.7, 58.8    ;   line 5
                  dd 61.1, 62.2, 63.3, 64.4, 65.5, 66.6, 67.7, 68.8    ;   line 6
                  dd 71.1, 72.2, 73.3, 74.4, 75.5, 76.6, 77.7, 78.8    ;   line 7
All results
Quote
RuiLoureiro:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

399  cycles, MatrixTransposeDD,  testMatZ 4x4- last to first
411  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last
414  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
417  cycles, MatrixTransposeX,   testMatX 4x4, Lin, Col - last to first
434  cycles, MatrixTransposeZ,   testMatX 4x4- last to first
470  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
476  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
540  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last

1156  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
1170  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
1311  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
1322  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
1326  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
1548  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1610  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
1620  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
********** END **********

FORTRANS:
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

221  cycles, MatrixTransposeDD,  testMatZ 4x4- last to first
222  cycles, MatrixTransposeZ,   testMatX 4x4- last to first
249  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
249  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last
251  cycles, MatrixTransposeX,   testMatX 4x4, Lin, Col - last to first
282  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
286  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
337  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last

722  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
725  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
828  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
868  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
872  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
944  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
952  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
1050  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
Jochen:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

109  cycles, MatrixTransposeDD,  testMatZ 4x4- last to first
115  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last
118  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
140  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
145  cycles, MatrixTransposeZ,   testMatX 4x4- last to first
192  cycles, MatrixTransposeX,   testMatX 4x4, Lin, Col - last to first
200  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
264  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first

300  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
310  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
316  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
329  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
332  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
378  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
387  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
402  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
********** END **********
Felipe:
Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz (SSE4)

217  cycles, MatrixTransposeDD,  testMatZ 4x4- last to first
219  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last
220  cycles, MatrixTransposeX,   testMatX 4x4, Lin, Col - last to first
223  cycles, MatrixTransposeZ,   testMatX 4x4- last to first
230  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
289  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
290  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
304  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last

743  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
743  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
746  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
780  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
826  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
1016  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1021  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
1055  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
aw27:
Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz (SSE4)

68  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
73  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
84  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
87  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
87  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
94  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
97  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
103  cycles, MatrixTransposeDD, testMatZ 4x4- last to first

225  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
231  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
243  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
249  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
255  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
284  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
286  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
289  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
********** END **********
LiaoMi:

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

79  cycles, MatrixTransposeX,     testMatX 4x4, Lin, Col - last to first
98  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
100  cycles, MatrixTransposeZ,    testMatX 4x4- last to first
101  cycles, MatrixTransposeDD,  testMatZ 4x4- last to first
101  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last
102  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
117  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
123  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last

258  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
266  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
317  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
333  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
337  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
368  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
375  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
413  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
********** END **********
Siekmanski:
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

101  cycles, MatrixTransposeX,   testMatX 4x4, Lin, Col - last to first
103  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
117  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
123  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last
125  cycles, MatrixTransposeDD,  testMatZ 4x4- last to first
133  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
139  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
203  cycles, MatrixTransposeZ,   testMatX 4x4- last to first

318  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
330  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
380  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
380  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
384  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
398  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
430  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
465  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
zedd151:
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4) 1.6 Ghz

86  cycles, MatrixTransposeZ,    testMatX 4x4- last to first
87  cycles, MatrixTransposeDD,   testMatZ 4x4- last to first
88  cycles, MatrixTransposeX,    testMatX 4x4, Lin, Col - last to first
90  cycles, MatrixTransposeDDD,  testMatZ 4x4- last to first
91  cycles, MatrixTransposeDF,   testMatZ 4x4- first to last
92  cycles, MatrixTransposeZZ,   testMatX 4x4- last to first
97  cycles, MatrixTransposeXX,   testMatX 4x4, Lin, Col - last to first
99  cycles, MatrixTransposeDFF,  testMatZ 4x4- first to last

272  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
315  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
318  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
323  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
324  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
369  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
402  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
664  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
********** END **********
HSE:
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

190  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
192  cycles, MatrixTransposeX,   testMatX 4x4, Lin, Col - last to first
203  cycles, MatrixTransposeDD,  testMatZ 4x4- last to first
225  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last
226  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
248  cycles, MatrixTransposeZ,   testMatX 4x4- last to first
255  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
256  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last

619  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
621  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
668  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
718  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
727  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
752  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
830  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
831  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: aw27 on May 07, 2018, 08:03:04 PM
There was some work done on matrix transposing using SSE instructions http://masm32.com/board/index.php?topic=6140.0.
A quick test shows that it is a few times faster and the bigger the matrix the faster it is. No wonders, of course.
Title: Re: Testing Transpose of a Matrix
Post by: FORTRANS on May 07, 2018, 11:09:05 PM
Hi,

   Windows 2000; "Is not a valid Win32 application."  In the past,
some of these would work after reassembling.

   Windows XP:
222 cycles, MatrixTransposeZ, transposeMatX

221 cycles, MatrixTransposeDD, transposeMatZ

249 cycles, MatrixTransposeDF, transposeMatZ

251 cycles, MatrixTransposeX, transposeMatX

282 cycles, MatrixTransposeZZ, transposeMatX

286 cycles, MatrixTransposeDDD, transposeMatZ

337 cycles, MatrixTransposeDFF, transposeMatZ

249 cycles, MatrixTransposeXX, transposeMatX

725 cycles, MatrixTransposeZ, transposeMatXX

722 cycles, MatrixTransposeDD, transposeMatZZ

828 cycles, MatrixTransposeDF, transposeMatZZ

872 cycles, MatrixTransposeX, transposeMatXX

952 cycles, MatrixTransposeZZ, transposeMatXX

944 cycles, MatrixTransposeDDD, transposeMatZZ

1050 cycles, MatrixTransposeDFF, transposeMatZZ

868 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

221  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
222  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
249  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
249  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
251  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
282  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
286  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
337  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
722  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
725  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
828  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
868  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
872  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
944  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
952  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
1050  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********


Cheers,

Steve N.
Title: Re: Testing Transpose of a Matrix
Post by: FORTRANS on May 08, 2018, 01:45:58 AM
Hi,

Quote from: FORTRANS on May 07, 2018, 11:09:05 PM
   Windows 2000; "Is not a valid Win32 application."  In the past,
some of these would work after reassembling.

   Well, tried that and it worked...

Win2k:

274 cycles, MatrixTransposeZ, transposeMatX

252 cycles, MatrixTransposeDD, transposeMatZ

258 cycles, MatrixTransposeDF, transposeMatZ

273 cycles, MatrixTransposeX, transposeMatX

288 cycles, MatrixTransposeZZ, transposeMatX

280 cycles, MatrixTransposeDDD, transposeMatZ

301 cycles, MatrixTransposeDFF, transposeMatZ

272 cycles, MatrixTransposeXX, transposeMatX

986 cycles, MatrixTransposeZ, transposeMatXX

963 cycles, MatrixTransposeDD, transposeMatZZ

943 cycles, MatrixTransposeDF, transposeMatZZ

1005 cycles, MatrixTransposeX, transposeMatXX

1048 cycles, MatrixTransposeZZ, transposeMatXX

1032 cycles, MatrixTransposeDDD, transposeMatZZ

1165 cycles, MatrixTransposeDFF, transposeMatZZ

999 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

 (SSE1)

252  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
258  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
272  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
273  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
274  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
280  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
288  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
301  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
943  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
963  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
986  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
999  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
1005  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
1032  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1048  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
1165  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********


Win98:


904 cycles, MatrixTransposeZ, transposeMatX

640 cycles, MatrixTransposeDD, transposeMatZ

639 cycles, MatrixTransposeDF, transposeMatZ

914 cycles, MatrixTransposeX, transposeMatX

1014 cycles, MatrixTransposeZZ, transposeMatX

687 cycles, MatrixTransposeDDD, transposeMatZ

692 cycles, MatrixTransposeDFF, transposeMatZ

869 cycles, MatrixTransposeXX, transposeMatX

3474 cycles, MatrixTransposeZ, transposeMatXX

2164 cycles, MatrixTransposeDD, transposeMatZZ

2181 cycles, MatrixTransposeDF, transposeMatZZ

3278 cycles, MatrixTransposeX, transposeMatXX

3595 cycles, MatrixTransposeZZ, transposeMatXX

2322 cycles, MatrixTransposeDDD, transposeMatZZ

2361 cycles, MatrixTransposeDFF, transposeMatZZ

3137 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****


639  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
640  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
687  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
692  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
869  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
904  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
914  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
1014  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
2164  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
2181  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
2322  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
2361  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
3137  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
3278  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
3474  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
3595  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
********** END **********


Regards,

Steve N.
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 08, 2018, 04:40:12 AM
Quote from: aw27 on May 07, 2018, 08:03:04 PM
There was some work done on matrix transposing using SSE instructions http://masm32.com/board/index.php?topic=6140.0 (http://masm32.com/board/index.php?topic=6140.0).
A quick test shows that it is a few times faster and the bigger the matrix the faster it is. No wonders, of course.
Hi
   Yes, in general, using SSE instructions and aligned data the code is faster.
   This is the case.
:icon14:
EDIT:
These are my results:
Quote
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

180  cycles, MatrixTransposeMO,  testMatZ 4x4
182  cycles, MatrixTransposeAW,  testMatZ 4x4, Lin, Col
398  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
404  cycles, MatrixTransposeDD,  testMatZ 4x4- last to first
405  cycles, MatrixTransposeX,   testMatX 4x4, Lin, Col - last to first
413  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last
426  cycles, MatrixTransposeZ,   testMatX 4x4- last to first

458  cycles, MatrixTransposeMO,  testMatZZ 7x8
462  cycles, MatrixTransposeZZ,  testMatX  4x4- last to first
468  cycles, MatrixTransposeAW,  testMatZZ 7x8, Lin, Col
487  cycles, MatrixTransposeDDD, testMatZ  4x4- last to first
528  cycles, MatrixTransposeDFF, testMatZ  4x4- first to last

1153  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
1171  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
1280  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
1294  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
1408  cycles, MatrixTransposeZ,   testMatXX 7x8- last to first
1531  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
1563  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
1620  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
********** END **********

Note: MatrixTransposeAW is your version
         MatrixTransposeMO is your modified version that i did where line and column
         is behind the address in both matrices
Title: Re: Testing Transpose of a Matrix
Post by: aw27 on May 08, 2018, 01:15:27 PM
Hello Rui,

There are significant differences between your code results and my code results:
These are your code results:
97 cycles, MatrixTransposeZ, transposeMatX

103 cycles, MatrixTransposeDD, transposeMatZ

87 cycles, MatrixTransposeDF, transposeMatZ

73 cycles, MatrixTransposeX, transposeMatX

87 cycles, MatrixTransposeZZ, transposeMatX

94 cycles, MatrixTransposeDDD, transposeMatZ

84 cycles, MatrixTransposeDFF, transposeMatZ

68 cycles, MatrixTransposeXX, transposeMatX

249 cycles, MatrixTransposeZ, transposeMatXX

255 cycles, MatrixTransposeDD, transposeMatZZ

243 cycles, MatrixTransposeDF, transposeMatZZ

225 cycles, MatrixTransposeX, transposeMatXX

289 cycles, MatrixTransposeZZ, transposeMatXX

284 cycles, MatrixTransposeDDD, transposeMatZZ

286 cycles, MatrixTransposeDFF, transposeMatZZ

231 cycles, MatrixTransposeXX, transposeMatXX

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz (SSE4)

68  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
73  cycles, MatrixTransposeX,  testMatX 4x4, Lin, Col - last to first
84  cycles, MatrixTransposeDFF, testMatZ 4x4- first to last
87  cycles, MatrixTransposeZZ,  testMatX 4x4- last to first
87  cycles, MatrixTransposeDF, testMatZ 4x4- first to last
94  cycles, MatrixTransposeDDD, testMatZ 4x4- last to first
97  cycles, MatrixTransposeZ,  testMatX 4x4- last to first
103  cycles, MatrixTransposeDD, testMatZ 4x4- last to first
225  cycles, MatrixTransposeX,  testMatXX 7x8, Lin, Col - last to first
231  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
243  cycles, MatrixTransposeDF, testMatZZ 7x8- first to last
249  cycles, MatrixTransposeZ,  testMatXX 7x8- last to first
255  cycles, MatrixTransposeDD, testMatZZ 7x8- last to first
284  cycles, MatrixTransposeDDD, testMatZZ 7x8- last to first
286  cycles, MatrixTransposeDFF, testMatZZ 7x8- first to last
289  cycles, MatrixTransposeZZ,  testMatXX 7x8- last to first
********** END **********

These are my code results (I reused your test setup):

32 cycles, MatrixTransposeAW, transposeMatX
81 cycles, MatrixTransposeAW, transposeMatXX

It takes 1/3 the number of cycles.
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 09, 2018, 01:29:54 AM
Hello aw27,
           I think that your results are reasonable, particularly in your i5 (see all reply #7).
           Using SSE instructions and aligned data, the code is faster in general.
           If you cannot use SSE instructions ...
           
But using your trans32.asm in transposeTest folder
i got this in my Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3):
Quote
245 cycles, MatrixTransposeAW, transposeMatX         ; 3* = 735

661 cycles, MatrixTransposeAW, transposeMatXX       ; 3* =1983
Some previous results:
Quote
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

398  cycles, MatrixTransposeXX,  testMatX 4x4, Lin, Col - last to first
404  cycles, MatrixTransposeDD,  testMatZ 4x4- last to first
405  cycles, MatrixTransposeX,   testMatX 4x4, Lin, Col - last to first
413  cycles, MatrixTransposeDF,  testMatZ 4x4- first to last

1153  cycles, MatrixTransposeDF,  testMatZZ 7x8- first to last
1171  cycles, MatrixTransposeDD,  testMatZZ 7x8- last to first
1280  cycles, MatrixTransposeXX,  testMatXX 7x8, Lin, Col - last to first
1294  cycles, MatrixTransposeX,   testMatXX 7x8, Lin, Col - last to first
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 15, 2018, 08:16:25 AM
Hi all,
        I wrote some procedures using SSE instructions (and Siekmanski proc to transpose 4x4 :t ).
       This work is based on simple concepts and math for kids :biggrin: .
        You may test in your CPU. If you want you may show your results.
See you
:icon14:
Some results for SSE12 using AW as a reference:         
Quote
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

68  cycles, MatrixTransposeSSE12,  testMatX 4x4
187  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col      ; +119 cycles

99  cycles, MatrixTransposeSSE12,  testMatS 4x8
245  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col      ; +146 cycles

119  cycles, MatrixTransposeSSE12,  testMatR 8x4
257  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col      ; +138 cycles

192  cycles, MatrixTransposeSSE12,  testMatV 4x2
219  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col      ; +27 cycles

194  cycles, MatrixTransposeSSE12,  testMatY 2x4
204  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col      ; +10 cycles

255  cycles, MatrixTransposeSSE12,  testMatW 8x7
477  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col      ; +222 cycles

376  cycles, MatrixTransposeSSE12,  testMatZ 7x8
611  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col      ; +235 cycles

478  cycles, MatrixTransposeSSE12,  testMatT 7x7
552  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col      ; +74 cycles

530  cycles, MatrixTransposeSSE12,  testMatQ 12x12
794  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col    ; +264 cycles
Title: Re: Testing Transpose of a Matrix
Post by: aw27 on May 15, 2018, 09:33:54 AM
So you copied my work like chinese do, but you mentioned it was based on Siekmanski's. I also used the Siekmanski SSE algo, although my original one was almost as fast. So where the difference comes from? Where is your source code?

Your results are obviously fabricated, even tested in a computer nobody uses anymore.
Title: Re: Testing Transpose of a Matrix
Post by: zedd151 on May 15, 2018, 09:40:19 AM


30 cycles, MatrixTransposeAW, transposeMatX

34 cycles, MatrixTransposeMO, transposeMatX

58 cycles, MatrixTransposeAW, transposeMatY

60 cycles, MatrixTransposeMO, transposeMatY

59 cycles, MatrixTransposeAW, transposeMatV

53 cycles, MatrixTransposeMO, transposeMatV

107 cycles, MatrixTransposeAW, transposeMatZ

105 cycles, MatrixTransposeMO, transposeMatZ

124 cycles, MatrixTransposeAW, transposeMatW

116 cycles, MatrixTransposeMO, transposeMatW

132 cycles, MatrixTransposeAW, transposeMatQ

135 cycles, MatrixTransposeMO, transposeMatQ

50 cycles, MatrixTransposeAW, transposeMatR

45 cycles, MatrixTransposeMO, transposeMatR

44 cycles, MatrixTransposeAW, transposeMatS

37 cycles, MatrixTransposeMO, transposeMatS

156 cycles, MatrixTransposeAW, transposeMatT

151 cycles, MatrixTransposeMO, transposeMatT

22 cycles, MatrixTransposeSSE14, transposeMatX

37 cycles, MatrixTransposeSSE14, transposeMatY

41 cycles, MatrixTransposeSSE14, transposeMatV

56 cycles, MatrixTransposeSSE14, transposeMatZ

58 cycles, MatrixTransposeSSE14, transposeMatW

125 cycles, MatrixTransposeSSE14, transposeMatQ

36 cycles, MatrixTransposeSSE14, transposeMatR

33 cycles, MatrixTransposeSSE14, transposeMatS

66 cycles, MatrixTransposeSSE14, transposeMatT

23 cycles, MatrixTransposeSSE15, transposeMatX

41 cycles, MatrixTransposeSSE15, transposeMatY

43 cycles, MatrixTransposeSSE15, transposeMatV

56 cycles, MatrixTransposeSSE15, transposeMatZ

61 cycles, MatrixTransposeSSE15, transposeMatW

123 cycles, MatrixTransposeSSE15, transposeMatQ

37 cycles, MatrixTransposeSSE15, transposeMatR

33 cycles, MatrixTransposeSSE15, transposeMatS

61 cycles, MatrixTransposeSSE15, transposeMatT

25 cycles, MatrixTransposeSSE16, transposeMatX

35 cycles, MatrixTransposeSSE16, transposeMatY

42 cycles, MatrixTransposeSSE16, transposeMatV

64 cycles, MatrixTransposeSSE16, transposeMatZ

68 cycles, MatrixTransposeSSE16, transposeMatW

139 cycles, MatrixTransposeSSE16, transposeMatQ

41 cycles, MatrixTransposeSSE16, transposeMatR

40 cycles, MatrixTransposeSSE16, transposeMatS

72 cycles, MatrixTransposeSSE16, transposeMatT

25 cycles, MatrixTransposeSSE17, transposeMatX

37 cycles, MatrixTransposeSSE17, transposeMatY

40 cycles, MatrixTransposeSSE17, transposeMatV

65 cycles, MatrixTransposeSSE17, transposeMatZ

67 cycles, MatrixTransposeSSE17, transposeMatW

153 cycles, MatrixTransposeSSE17, transposeMatQ

41 cycles, MatrixTransposeSSE17, transposeMatR

38 cycles, MatrixTransposeSSE17, transposeMatS

70 cycles, MatrixTransposeSSE17, transposeMatT

*** STOP. Press any key to show the Time Table ***

***** Time table *****

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

22  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
23  cycles, MatrixTransposeSSE15,  testMatX 4x4, Lin, Col
25  cycles, MatrixTransposeSSE17,  testMatX 4x4, Lin, Col
25  cycles, MatrixTransposeSSE16,  testMatX 4x4, Lin, Col
30  cycles, MatrixTransposeAW,  testMatX 4x4, Lin, Col
33  cycles, MatrixTransposeSSE15,  testMatS 4x8, Lin, Col
33  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
34  cycles, MatrixTransposeMO,  testMatX 4x4
35  cycles, MatrixTransposeSSE16,  testMatY 2x4, Lin, Col
36  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
37  cycles, MatrixTransposeSSE15,  testMatR 8x4, Lin, Col
37  cycles, MatrixTransposeMO,  testMatS 4x8
37  cycles, MatrixTransposeSSE17,  testMatY 2x4, Lin, Col
37  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
38  cycles, MatrixTransposeSSE17,  testMatS 4x8, Lin, Col
40  cycles, MatrixTransposeSSE17,  testMatV 4x2, Lin, Col
40  cycles, MatrixTransposeSSE16,  testMatS 4x8, Lin, Col
41  cycles, MatrixTransposeSSE16,  testMatR 8x4, Lin, Col
41  cycles, MatrixTransposeSSE15,  testMatY 2x4, Lin, Col
41  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col
41  cycles, MatrixTransposeSSE17,  testMatR 8x4, Lin, Col
42  cycles, MatrixTransposeSSE16,  testMatV 4x2, Lin, Col
43  cycles, MatrixTransposeSSE15,  testMatV 4x2, Lin, Col
44  cycles, MatrixTransposeAW,  testMatS 4x8, Lin, Col
45  cycles, MatrixTransposeMO,  testMatR 8x4
50  cycles, MatrixTransposeAW,  testMatR 8x4, Lin, Col
53  cycles, MatrixTransposeMO,  testMatV 4x2
56  cycles, MatrixTransposeSSE15,  testMatZ 7x8, Lin, Col
56  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
58  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
58  cycles, MatrixTransposeAW,  testMatY 2x4, Lin, Col
59  cycles, MatrixTransposeAW,  testMatV 4x2, Lin, Col
60  cycles, MatrixTransposeMO,  testMatY 2x4
61  cycles, MatrixTransposeSSE15,  testMatW 8x7, Lin, Col
61  cycles, MatrixTransposeSSE15,  testMatT 7x7, Lin, Col
64  cycles, MatrixTransposeSSE16,  testMatZ 7x8, Lin, Col
65  cycles, MatrixTransposeSSE17,  testMatZ 7x8, Lin, Col
66  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
67  cycles, MatrixTransposeSSE17,  testMatW 8x7, Lin, Col
68  cycles, MatrixTransposeSSE16,  testMatW 8x7, Lin, Col
70  cycles, MatrixTransposeSSE17,  testMatT 7x7, Lin, Col
72  cycles, MatrixTransposeSSE16,  testMatT 7x7, Lin, Col
105  cycles, MatrixTransposeMO,  testMatZ 7x8
107  cycles, MatrixTransposeAW,  testMatZ 7x8, Lin, Col
116  cycles, MatrixTransposeMO,  testMatW 8x7
123  cycles, MatrixTransposeSSE15,  testMatQ 12x12, Lin, Col
124  cycles, MatrixTransposeAW,  testMatW 8x7, Lin, Col
125  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
132  cycles, MatrixTransposeAW,  testMatQ 12x12, Lin, Col
135  cycles, MatrixTransposeMO,  testMatQ 12x12
139  cycles, MatrixTransposeSSE16,  testMatQ 12x12, Lin, Col
151  cycles, MatrixTransposeMO,  testMatT 7x7
153  cycles, MatrixTransposeSSE17,  testMatQ 12x12, Lin, Col
156  cycles, MatrixTransposeAW,  testMatT 7x7, Lin, Col
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 15, 2018, 10:59:39 AM
Quote from: aw27 on May 15, 2018, 09:33:54 AM
So you copied my work like chinese do, but you mentioned it was based on Siekmanski's. I also used the Siekmanski SSE algo, although my original one was almost as fast. So where the difference comes from? Where is your source code?

Your results are obviously fabricated, even tested in a computer nobody uses anymore.
Hi aw27,
            Sorry but you are not right, i dont need to use any part of your algorithm. I dont think like you do and didnt write my algos based on what you did. I know enough assembly and math to do what i think to do. The calculator does matrix transpose ... and all things written by me. All. Another different thing is to understand what is made. So it is correct to say that SSE2 to SSE17 procedures use that block of Siekmanski code (but not as is). But is it the same you use ? Give you the answer. To answer your question "So where the difference comes from?" i would say think about it again: what you have, what you want to get and a lot of different ways to solve that problem.
Now i am still working on that issue, so i have a lot of work to do yet That's all.
See you
:icon14:   
Thank you zedd151  :t
By the way, your results are fabricated zedd151 ? So you get a lot of money ...  ;)
Title: Re: Testing Transpose of a Matrix
Post by: zedd151 on May 15, 2018, 11:13:35 AM
Quote from: RuiLoureiro on May 15, 2018, 10:59:39 AMBy the way, your results are fabricated zedd151 ? So you get a lot of money ...  ;)

What??? I was just testing the performance of my new netbook.   :(

nothing fabricated   :icon_confused:
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 15, 2018, 11:37:26 AM
Quote from: zedd151 on May 15, 2018, 11:13:35 AM
Quote from: RuiLoureiro on May 15, 2018, 10:59:39 AMBy the way, your results are fabricated zedd151 ? So you get a lot of money ...  ;)

What??? I was just testing the performance of my new netbook.   :(

nothing fabricated   :icon_confused:
:t
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 16, 2018, 02:02:38 AM
Quote from: aw27 on May 15, 2018, 09:33:54 AM
Your results are obviously fabricated,...
Hi aw27,
              About what you say above i have a question to you: Is it what you usually do ?
I never do it, i never dit it. No here, no in any part of the world. I am here, in this forum since 2007.
Have you any doubt ?
:icon14:
Title: Re: Testing Transpose of a Matrix
Post by: LiaoMi on May 16, 2018, 02:44:47 AM
TestTranspose_cyclesSSE10_13
34 cycles, MatrixTransposeAW, transposeMatX

34 cycles, MatrixTransposeMO, transposeMatX

40 cycles, MatrixTransposeAW, transposeMatY

44 cycles, MatrixTransposeMO, transposeMatY

42 cycles, MatrixTransposeAW, transposeMatV

43 cycles, MatrixTransposeMO, transposeMatV

85 cycles, MatrixTransposeAW, transposeMatZ

84 cycles, MatrixTransposeMO, transposeMatZ

91 cycles, MatrixTransposeAW, transposeMatW

90 cycles, MatrixTransposeMO, transposeMatW

117 cycles, MatrixTransposeAW, transposeMatQ

119 cycles, MatrixTransposeMO, transposeMatQ

48 cycles, MatrixTransposeAW, transposeMatR

47 cycles, MatrixTransposeMO, transposeMatR

55 cycles, MatrixTransposeAW, transposeMatS

42 cycles, MatrixTransposeMO, transposeMatS

133 cycles, MatrixTransposeAW, transposeMatT

116 cycles, MatrixTransposeMO, transposeMatT

25 cycles, MatrixTransposeSSE10, transposeMatX

35 cycles, MatrixTransposeSSE10, transposeMatY

40 cycles, MatrixTransposeSSE10, transposeMatV

57 cycles, MatrixTransposeSSE10, transposeMatZ

59 cycles, MatrixTransposeSSE10, transposeMatW

104 cycles, MatrixTransposeSSE10, transposeMatQ

36 cycles, MatrixTransposeSSE10, transposeMatR

34 cycles, MatrixTransposeSSE10, transposeMatS

58 cycles, MatrixTransposeSSE10, transposeMatT

25 cycles, MatrixTransposeSSE11, transposeMatX

35 cycles, MatrixTransposeSSE11, transposeMatY

41 cycles, MatrixTransposeSSE11, transposeMatV

55 cycles, MatrixTransposeSSE11, transposeMatZ

59 cycles, MatrixTransposeSSE11, transposeMatW

106 cycles, MatrixTransposeSSE11, transposeMatQ

35 cycles, MatrixTransposeSSE11, transposeMatR

34 cycles, MatrixTransposeSSE11, transposeMatS

58 cycles, MatrixTransposeSSE11, transposeMatT

26 cycles, MatrixTransposeSSE12, transposeMatX

47 cycles, MatrixTransposeSSE12, transposeMatY

60 cycles, MatrixTransposeSSE12, transposeMatV

92 cycles, MatrixTransposeSSE12, transposeMatZ

90 cycles, MatrixTransposeSSE12, transposeMatW

126 cycles, MatrixTransposeSSE12, transposeMatQ

41 cycles, MatrixTransposeSSE12, transposeMatR

45 cycles, MatrixTransposeSSE12, transposeMatS

106 cycles, MatrixTransposeSSE12, transposeMatT

38 cycles, MatrixTransposeSSE13, transposeMatX

64 cycles, MatrixTransposeSSE13, transposeMatY

71 cycles, MatrixTransposeSSE13, transposeMatV

74 cycles, MatrixTransposeSSE13, transposeMatZ

82 cycles, MatrixTransposeSSE13, transposeMatW

147 cycles, MatrixTransposeSSE13, transposeMatQ

57 cycles, MatrixTransposeSSE13, transposeMatR

47 cycles, MatrixTransposeSSE13, transposeMatS

91 cycles, MatrixTransposeSSE13, transposeMatT

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

25  cycles, MatrixTransposeSSE11,  testMatX 4x4
25  cycles, MatrixTransposeSSE10,  testMatX 4x4
26  cycles, MatrixTransposeSSE12,  testMatX 4x4
34  cycles, MatrixTransposeSSE10,  testMatS 4x8
34  cycles, MatrixTransposeSSE11,  testMatS 4x8
34  cycles, MatrixTransposeMO,  testMatX 4x4
34  cycles, MatrixTransposeAW,  testMatX 4x4, Lin, Col
35  cycles, MatrixTransposeSSE11,  testMatR 8x4
35  cycles, MatrixTransposeSSE11,  testMatY 2x4
35  cycles, MatrixTransposeSSE10,  testMatY 2x4
36  cycles, MatrixTransposeSSE10,  testMatR 8x4
38  cycles, MatrixTransposeSSE13,  testMatX 4x4
40  cycles, MatrixTransposeSSE10,  testMatV 4x2
40  cycles, MatrixTransposeAW,  testMatY 2x4, Lin, Col
41  cycles, MatrixTransposeSSE12,  testMatR 8x4
41  cycles, MatrixTransposeSSE11,  testMatV 4x2
42  cycles, MatrixTransposeAW,  testMatV 4x2, Lin, Col
42  cycles, MatrixTransposeMO,  testMatS 4x8
43  cycles, MatrixTransposeMO,  testMatV 4x2
44  cycles, MatrixTransposeMO,  testMatY 2x4
45  cycles, MatrixTransposeSSE12,  testMatS 4x8
47  cycles, MatrixTransposeSSE13,  testMatS 4x8
47  cycles, MatrixTransposeSSE12,  testMatY 2x4
47  cycles, MatrixTransposeMO,  testMatR 8x4
48  cycles, MatrixTransposeAW,  testMatR 8x4, Lin, Col
55  cycles, MatrixTransposeSSE11,  testMatZ 7x8
55  cycles, MatrixTransposeAW,  testMatS 4x8, Lin, Col
57  cycles, MatrixTransposeSSE10,  testMatZ 7x8
57  cycles, MatrixTransposeSSE13,  testMatR 8x4
58  cycles, MatrixTransposeSSE11,  testMatT 7x7
58  cycles, MatrixTransposeSSE10,  testMatT 7x7
59  cycles, MatrixTransposeSSE11,  testMatW 8x7
59  cycles, MatrixTransposeSSE10,  testMatW 8x7
60  cycles, MatrixTransposeSSE12,  testMatV 4x2
64  cycles, MatrixTransposeSSE13,  testMatY 2x4
71  cycles, MatrixTransposeSSE13,  testMatV 4x2
74  cycles, MatrixTransposeSSE13,  testMatZ 7x8
82  cycles, MatrixTransposeSSE13,  testMatW 8x7
84  cycles, MatrixTransposeMO,  testMatZ 7x8
85  cycles, MatrixTransposeAW,  testMatZ 7x8, Lin, Col
90  cycles, MatrixTransposeSSE12,  testMatW 8x7
90  cycles, MatrixTransposeMO,  testMatW 8x7
91  cycles, MatrixTransposeSSE13,  testMatT 7x7
91  cycles, MatrixTransposeAW,  testMatW 8x7, Lin, Col
92  cycles, MatrixTransposeSSE12,  testMatZ 7x8
104  cycles, MatrixTransposeSSE10,  testMatQ 12x12
106  cycles, MatrixTransposeSSE12,  testMatT 7x7
106  cycles, MatrixTransposeSSE11,  testMatQ 12x12
116  cycles, MatrixTransposeMO,  testMatT 7x7
117  cycles, MatrixTransposeAW,  testMatQ 12x12, Lin, Col
119  cycles, MatrixTransposeMO,  testMatQ 12x12
126  cycles, MatrixTransposeSSE12,  testMatQ 12x12
133  cycles, MatrixTransposeAW,  testMatT 7x7, Lin, Col
147  cycles, MatrixTransposeSSE13,  testMatQ 12x12
********** END **********


TestTranspose_cyclesSSE14_17
33 cycles, MatrixTransposeAW, transposeMatX

35 cycles, MatrixTransposeMO, transposeMatX

40 cycles, MatrixTransposeAW, transposeMatY

44 cycles, MatrixTransposeMO, transposeMatY

40 cycles, MatrixTransposeAW, transposeMatV

48 cycles, MatrixTransposeMO, transposeMatV

85 cycles, MatrixTransposeAW, transposeMatZ

84 cycles, MatrixTransposeMO, transposeMatZ

89 cycles, MatrixTransposeAW, transposeMatW

93 cycles, MatrixTransposeMO, transposeMatW

117 cycles, MatrixTransposeAW, transposeMatQ

116 cycles, MatrixTransposeMO, transposeMatQ

45 cycles, MatrixTransposeAW, transposeMatR

48 cycles, MatrixTransposeMO, transposeMatR

45 cycles, MatrixTransposeAW, transposeMatS

41 cycles, MatrixTransposeMO, transposeMatS

115 cycles, MatrixTransposeAW, transposeMatT

115 cycles, MatrixTransposeMO, transposeMatT

20 cycles, MatrixTransposeSSE14, transposeMatX

33 cycles, MatrixTransposeSSE14, transposeMatY

37 cycles, MatrixTransposeSSE14, transposeMatV

51 cycles, MatrixTransposeSSE14, transposeMatZ

54 cycles, MatrixTransposeSSE14, transposeMatW

100 cycles, MatrixTransposeSSE14, transposeMatQ

32 cycles, MatrixTransposeSSE14, transposeMatR

32 cycles, MatrixTransposeSSE14, transposeMatS

55 cycles, MatrixTransposeSSE14, transposeMatT

21 cycles, MatrixTransposeSSE15, transposeMatX

32 cycles, MatrixTransposeSSE15, transposeMatY

37 cycles, MatrixTransposeSSE15, transposeMatV

50 cycles, MatrixTransposeSSE15, transposeMatZ

54 cycles, MatrixTransposeSSE15, transposeMatW

97 cycles, MatrixTransposeSSE15, transposeMatQ

31 cycles, MatrixTransposeSSE15, transposeMatR

31 cycles, MatrixTransposeSSE15, transposeMatS

55 cycles, MatrixTransposeSSE15, transposeMatT

21 cycles, MatrixTransposeSSE16, transposeMatX

30 cycles, MatrixTransposeSSE16, transposeMatY

40 cycles, MatrixTransposeSSE16, transposeMatV

52 cycles, MatrixTransposeSSE16, transposeMatZ

57 cycles, MatrixTransposeSSE16, transposeMatW

106 cycles, MatrixTransposeSSE16, transposeMatQ

33 cycles, MatrixTransposeSSE16, transposeMatR

34 cycles, MatrixTransposeSSE16, transposeMatS

60 cycles, MatrixTransposeSSE16, transposeMatT

21 cycles, MatrixTransposeSSE17, transposeMatX

30 cycles, MatrixTransposeSSE17, transposeMatY

33 cycles, MatrixTransposeSSE17, transposeMatV

52 cycles, MatrixTransposeSSE17, transposeMatZ

57 cycles, MatrixTransposeSSE17, transposeMatW

108 cycles, MatrixTransposeSSE17, transposeMatQ

33 cycles, MatrixTransposeSSE17, transposeMatR

34 cycles, MatrixTransposeSSE17, transposeMatS

57 cycles, MatrixTransposeSSE17, transposeMatT

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

20  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
21  cycles, MatrixTransposeSSE16,  testMatX 4x4, Lin, Col
21  cycles, MatrixTransposeSSE15,  testMatX 4x4, Lin, Col
21  cycles, MatrixTransposeSSE17,  testMatX 4x4, Lin, Col
30  cycles, MatrixTransposeSSE16,  testMatY 2x4, Lin, Col
30  cycles, MatrixTransposeSSE17,  testMatY 2x4, Lin, Col
31  cycles, MatrixTransposeSSE15,  testMatS 4x8, Lin, Col
31  cycles, MatrixTransposeSSE15,  testMatR 8x4, Lin, Col
32  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
32  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
32  cycles, MatrixTransposeSSE15,  testMatY 2x4, Lin, Col
33  cycles, MatrixTransposeSSE17,  testMatR 8x4, Lin, Col
33  cycles, MatrixTransposeSSE17,  testMatV 4x2, Lin, Col
33  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
33  cycles, MatrixTransposeSSE16,  testMatR 8x4, Lin, Col
33  cycles, MatrixTransposeAW,  testMatX 4x4, Lin, Col
34  cycles, MatrixTransposeSSE17,  testMatS 4x8, Lin, Col
34  cycles, MatrixTransposeSSE16,  testMatS 4x8, Lin, Col
35  cycles, MatrixTransposeMO,  testMatX 4x4
37  cycles, MatrixTransposeSSE15,  testMatV 4x2, Lin, Col
37  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col
40  cycles, MatrixTransposeAW,  testMatY 2x4, Lin, Col
40  cycles, MatrixTransposeSSE16,  testMatV 4x2, Lin, Col
40  cycles, MatrixTransposeAW,  testMatV 4x2, Lin, Col
41  cycles, MatrixTransposeMO,  testMatS 4x8
44  cycles, MatrixTransposeMO,  testMatY 2x4
45  cycles, MatrixTransposeAW,  testMatR 8x4, Lin, Col
45  cycles, MatrixTransposeAW,  testMatS 4x8, Lin, Col
48  cycles, MatrixTransposeMO,  testMatV 4x2
48  cycles, MatrixTransposeMO,  testMatR 8x4
50  cycles, MatrixTransposeSSE15,  testMatZ 7x8, Lin, Col
51  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
52  cycles, MatrixTransposeSSE16,  testMatZ 7x8, Lin, Col
52  cycles, MatrixTransposeSSE17,  testMatZ 7x8, Lin, Col
54  cycles, MatrixTransposeSSE15,  testMatW 8x7, Lin, Col
54  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
55  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
55  cycles, MatrixTransposeSSE15,  testMatT 7x7, Lin, Col
57  cycles, MatrixTransposeSSE17,  testMatT 7x7, Lin, Col
57  cycles, MatrixTransposeSSE17,  testMatW 8x7, Lin, Col
57  cycles, MatrixTransposeSSE16,  testMatW 8x7, Lin, Col
60  cycles, MatrixTransposeSSE16,  testMatT 7x7, Lin, Col
84  cycles, MatrixTransposeMO,  testMatZ 7x8
85  cycles, MatrixTransposeAW,  testMatZ 7x8, Lin, Col
89  cycles, MatrixTransposeAW,  testMatW 8x7, Lin, Col
93  cycles, MatrixTransposeMO,  testMatW 8x7
97  cycles, MatrixTransposeSSE15,  testMatQ 12x12, Lin, Col
100  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
106  cycles, MatrixTransposeSSE16,  testMatQ 12x12, Lin, Col
108  cycles, MatrixTransposeSSE17,  testMatQ 12x12, Lin, Col
115  cycles, MatrixTransposeMO,  testMatT 7x7
115  cycles, MatrixTransposeAW,  testMatT 7x7, Lin, Col
116  cycles, MatrixTransposeMO,  testMatQ 12x12
117  cycles, MatrixTransposeAW,  testMatQ 12x12, Lin, Col
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 16, 2018, 05:17:31 AM
Hi LiaoMi,
               Thanks for your work  :t
:icon14:
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 16, 2018, 08:19:23 AM
Hi all,

HERE are all results so far:
note: AW procedure is used as a reference (i dont know any other).
          I am using SSE14 and SSE10 but you may do the list for all other.

Good luck
Quote
RuiLoureiro:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

85  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
186  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col      ; +101 cycles

133  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
233  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col      ; +100 cycles

135  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
256  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col      ; +121 cycles

178  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
203  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col      ; +25 cycles

185  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col
219  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col      ; +34 cycles

279  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
465  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col      ; +186 cycles

280  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
564  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col      ; +284 cycles

397  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
473  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col      ; +76 cycles

560  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
814  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col    ; +254 cycles

zedd151:
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

22  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
30  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col  ; +8 cycles

33  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
44  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col  ; +11 cycles

36  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
50  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col  ; +14 cycles

37  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
58  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col  ; +21 cycles

41  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col 
59  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col  ; +18 cycles

56  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
107  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col  ; +51 cycles

58  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
124  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col  ; +66 cycles

66  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
156  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col  ; +90 cycles

125  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
132  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col ; +7 cycles

LiaoMi:
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

20  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
33  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col      ; +13 cycles

32  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
45  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col      ; +13 cycles

32  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
45  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col      ; +13 cycles

33  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
40  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col      ; +7 cycles

37  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col
40  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col      ; +3 cycles

51  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
85  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col      ; +34 cycles

54  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
89  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col      ; +35 cycles

55  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
115  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col      ; +60 cycles

100  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
117  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col    ; +17 cycles

RuiLoureiro:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

85  cycles, MatrixTransposeSSE10,  testMatX 4x4
187  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col      ; +102 cycles

136  cycles, MatrixTransposeSSE10,  testMatS 4x8
245  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col      ; +109 cycles

137  cycles, MatrixTransposeSSE10,  testMatR 8x4
257  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col      ; +120 cycles

184  cycles, MatrixTransposeSSE10,  testMatY 2x4
204  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col      ; +20   cycles

196  cycles, MatrixTransposeSSE10,  testMatV 4x2
219  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col      ; +23   cycles

280  cycles, MatrixTransposeSSE10,  testMatW 8x7
477  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col      ; +197 cycles

387  cycles, MatrixTransposeSSE10,  testMatZ 7x8
611  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col      ; +224 cycles

478  cycles, MatrixTransposeSSE10,  testMatT 7x7
552  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col      ; +74 cycles

554  cycles, MatrixTransposeSSE10,  testMatQ 12x12
794  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col    ; +240 cycles

LiaoMi:
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

25  cycles, MatrixTransposeSSE10,  testMatX 4x4
34  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col      ; +9 cycles

34  cycles, MatrixTransposeSSE10,  testMatS 4x8
55  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col      ; +21 cycles

35  cycles, MatrixTransposeSSE10,  testMatY 2x4
40  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col      ; +5 cycles

36  cycles, MatrixTransposeSSE10,  testMatR 8x4
48  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col      ; +12 cycles

40  cycles, MatrixTransposeSSE10,  testMatV 4x2
42  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col      ; +2 cycles

57  cycles, MatrixTransposeSSE10,  testMatZ 7x8
85  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col      ; +28 cycles

58  cycles, MatrixTransposeSSE10,  testMatT 7x7
133  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col      ; +75 cycles

59  cycles, MatrixTransposeSSE10,  testMatW 8x7
91  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col      ; +32 cycles

104  cycles, MatrixTransposeSSE10,  testMatQ 12x12
117  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col    ; +13 cycles
Title: Re: Testing Transpose of a Matrix
Post by: aw27 on May 16, 2018, 02:54:47 PM
Quote
Hi aw27,
About what you say above i have a question to you: Is it what you usually do ?
I never do it, i never dit it. No here, no in any part of the world. I am here, in this forum since 2007.
Have you any doubt ?

Of course, I have all doubts about you. You are using my nick and code in tests for which you don't even supply the source code. You look a bit snicky, so what are you hiding or what are you trying to prove?
Are you trying to prove that you are a smart little guy or do you want to compete with me?
Title: Re: Testing Transpose of a Matrix
Post by: jj2007 on May 16, 2018, 03:39:43 PM
Hi Rui,
The last results look really fast :t
Can't test on this dumbphone, though.
And of course, almost everybody here has full confidence in you, don't worry ;-)
Title: Re: Testing Transpose of a Matrix
Post by: LiaoMi on May 17, 2018, 12:35:43 AM
Quote from: aw27 on May 16, 2018, 02:54:47 PM
Quote
Hi aw27,
About what you say above i have a question to you: Is it what you usually do ?
I never do it, i never dit it. No here, no in any part of the world. I am here, in this forum since 2007.
Have you any doubt ?

Of course, I have all doubts about you. You are using my nick and code in tests for which you don't even supply the source code. You look a bit snicky, so what are you hiding or what are you trying to prove?
Are you trying to prove that you are a smart little guy or do you want to compete with me?

:biggrin: https://www.youtube.com/watch?v=9LZ35Ar3r2k
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 18, 2018, 11:35:30 PM
Hi all,
       When we are developing a new algorithm to solve some problem we need to study the problem first. In the first post i showed algorithms to transpose 1 dword by 1 dword at a time. Now the problem  is to transpose blocks of 4 lines x 4 columns at a time using SSE instructions.
       When we write the code to implement the algorithm, we need to do some tests to confirm it follows the algorithm correctly. After this we need to do a lot of tests to confirm if it is doing what
       it should do. Here we may find out that the algorithm does not do some cases as it should do.
       So we should not show it before it is completely tested.

       AFAIK, it seems to me incorrect to use some code/proc written by a member X without any identification. The reason why i used some identification is this. There is no other reason behind it. And nothing is against anyone. When anyone show some test results they are only the results of that set of tests. No more than this.

aw27,       
«Of course, I have all doubts about you...»
       About what you say above, it seems that the only thing you want is to "see" the algorithm that
       i didnt show for now. But i suppose that you know very well that anyone may solve any problem following some/many different ways. Your procedure doenst follow the algorithm that is behind the SSE procedures that i am writing and testing. I want to say also that i dont need to use
what i am writing. I do it to pass the time and because i like to study...

Cheers
Title: Re: Testing Transpose of a Matrix
Post by: Siekmanski on May 19, 2018, 12:23:41 AM
Hi Rui,

Ever considered to use the video card for Matrix Transpose calculations?
The results are very fast for any size of Matrices and data types.
1 restriction, it can't be done in a console app.
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 19, 2018, 12:52:01 AM
Quote from: Siekmanski on May 19, 2018, 12:23:41 AM
Hi Rui,

Ever considered to use the video card for Matrix Transpose calculations?
The results are very fast for any size of Matrices and data types.
1 restriction, it can't be done in a console app.
Hi Siekmanski,
              No. I use only Matrix Transpose calculations in my TheCalculator but it is REAL10
              and all matrices up to 20x20 only or 20x21 i am not sure now (
time is not a problem here).
              Let me say that i think that you use it in the video card for Matrix Transpose calculations.
              So you have your own algorithm to do it. I am saying it because i read some topics
              where you gave answers about this issue.
              Thanks for all  :t
Cheers
Title: Re: Testing Transpose of a Matrix
Post by: Siekmanski on May 19, 2018, 01:05:09 AM
 :t
Title: Re: Testing Transpose of a Matrix
Post by: dedndave on May 19, 2018, 05:28:07 AM
create a matrix of pointers to the real10's
transpose the pointer matrix   :biggrin:
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on May 19, 2018, 07:38:35 AM
Quote from: dedndave on May 19, 2018, 05:28:07 AM
create a matrix of pointers to the real10's
transpose the pointer matrix   :biggrin:
Hi Dave,
             How are you ? I hope you are fine !

Well it is well known that when we dont want to move a lot of an array elements
we use an array of pointers: it is an array of dwords...
Your solution seems to be expensive: each matrix, each array of pointer matrix.
If we define 100 matrices we need to define 200 arrays (100 for matrices+100 for pointers).
It seems because if a=[1,2;3,4] and we do a=a^t; Now where is a ? What are the elemens of a ?

(In TheCalculator, when we write a matrix name "a" and press enter/compute it shows "a")
TheCalculator uses 16 bytes for each real10 so...

            Dave, have you the same Pentium 4 CPU yet ? Do you remember why ?
Good luck :t  
             
Title: Re: Testing Transpose of a Matrix
Post by: dedndave on May 21, 2018, 03:17:42 AM
hi Rui  :biggrin:
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 02, 2018, 06:49:37 AM
Hi all,
        Inside the folder SSE30_33_tests.zip we have my results of a set of tests
        of SSE30 to SSE33 procedures to transpose any matrix of any size
        1x1 up to NxM. You may test it also and if you want post your results
        if you have a i5 / i7 / AMD CPU or better. Your contribution may be useful
       to me to understand what i should do next. I have a very slow P4 yet.
        You may do all tests but i would like to know the results for this:

                TestTranspose97x97_100x100_cyclesSSE30_33_100000
                TestTranspose250x256_evenlines_cyclesSSE30_33_10000
        I am developing and testing news algorithms SSE38_41 but
        the time doesnt stretch... and i am trying to optimize some
        cases so i need more time... and more tests. For example i dont
       know if push/pop esi is better than a local variable. ... you know ?
       
        Thanks
        Good luck :t
       
Note: 1000000/100000/10000 is the loop counter used.
      At the end of VerifyProcsFrom1x1_to_120x120_SSE30_33
      all 4 procs are tested using matrices of 1x1 up to 120x120
      defined in the .data? segment. I wrote a general proc
      for any size NxM but it is not working for procedures that
      doesnt use the dimensions behind the address.

Sample for some types of matrices- like256x256(RE=REference):

***** Time table - LoopCount =10 000 *****

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

274723  cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col
277606  cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col
278301  cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col
280608  cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col
288570  cycles, MatrixTransposeRE,    testMatYY 252x252, Lin, Col

276360  cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col
277126  cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col
278578  cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col
289265  cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col
291922  cycles, MatrixTransposeRE,    testMatYA 252x250, Lin, Col

277665  cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col
278713  cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col
301969  cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col
302056  cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col
305275  cycles, MatrixTransposeRE,    testMatWA 256x250, Lin, Col

277906  cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col
280112  cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col
281857  cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col
283046  cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col
300882  cycles, MatrixTransposeRE,    testMatYC 252x256, Lin, Col

278085  cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col
281408  cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col
289149  cycles, MatrixTransposeRE,    testMatWB 256x252, Lin, Col
303014  cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col
303088  cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col

280843  cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col
282137  cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col
282411  cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col
295377  cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col
298502  cycles, MatrixTransposeRE,    testMatYB 252x254, Lin, Col

284465  cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col
289484  cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col
295322  cycles, MatrixTransposeRE,    testMatWC 256x254, Lin, Col
307567  cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col
307757  cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col

287766  cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col
287891  cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col
307719  cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col
307860  cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col
308480  cycles, MatrixTransposeRE,    testMatWW 256x256, Lin, Col

301073  cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col
302764  cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col
303633  cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col
303680  cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col
317857  cycles, MatrixTransposeRE,    testMatXA 250x252, Lin, Col

302701  cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col
305413  cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col
308789  cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col
309537  cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col
311748  cycles, MatrixTransposeRE,    testMatXX 250x250, Lin, Col

305303  cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col
309247  cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col
310500  cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col
317925  cycles, MatrixTransposeRE,    testMatXB 250x254, Lin, Col
316275  cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col

309104  cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col
309146  cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col
309344  cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col
315558  cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col
320653  cycles, MatrixTransposeRE,    testMatXC 250x256, Lin, Col

314387  cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col
315688  cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col
318761  cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col
321526  cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col
323225  cycles, MatrixTransposeRE,    testMatZA 254x250, Lin, Col

315387  cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col
315677  cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col
316444  cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col
322256  cycles, MatrixTransposeRE,    testMatZB 254x252, Lin, Col
324936  cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col

318743  cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col
322824  cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col
324874  cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col
328921  cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col
340988  cycles, MatrixTransposeRE,    testMatZZ 254x254, Lin, Col

319148  cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col
319973  cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col
321797  cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col
321990  cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col
337617  cycles, MatrixTransposeRE,    testMatZC 254x256, Lin, Col


Siekmanski:

***** Time table - LoopCount =10 000 ****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

169683  cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col
169790  cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col
170152  cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col
170215  cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col
171765  cycles, MatrixTransposeRE,    testMatWA 256x250, Lin, Col

170144  cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col
170190  cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col
170553  cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col
172547  cycles, MatrixTransposeRE,    testMatWB 256x252, Lin, Col
177553  cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col

172411  cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col
172939  cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col
173005  cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col
174693  cycles, MatrixTransposeRE,    testMatWC 256x254, Lin, Col
175477  cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col

173518  cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col
173646  cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col
173653  cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col
173962  cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col
176945  cycles, MatrixTransposeRE,    testMatWW 256x256, Lin, Col

175354  cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col
175500  cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col
175655  cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col
176391  cycles, MatrixTransposeRE,    testMatYA 252x250, Lin, Col
176458  cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col

176490  cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col
176542  cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col
176618  cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col
176639  cycles, MatrixTransposeRE,    testMatYY 252x252, Lin, Col
176651  cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col

178076  cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col
178186  cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col
178206  cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col
178366  cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col
178791  cycles, MatrixTransposeRE,    testMatYB 252x254, Lin, Col

179118  cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col
179137  cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col
179194  cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col
179201  cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col
179252  cycles, MatrixTransposeRE,    testMatYC 252x256, Lin, Col

181607  cycles, MatrixTransposeRE,    testMatXX 250x250, Lin, Col   ; <<<<<-----
182332  cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col
182494  cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col
182530  cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col
189825  cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col

181717  cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col
181995  cycles, MatrixTransposeRE,    testMatXA 250x252, Lin, Col
182570  cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col
182609  cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col
182696  cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col

182441  cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col
182963  cycles, MatrixTransposeRE,    testMatZB 254x252, Lin, Col
183670  cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col
183767  cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col
183898  cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col

183673  cycles, MatrixTransposeRE,    testMatZA 254x250, Lin, Col   ; <<<<<-----
184208  cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col
184249  cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col
184322  cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col
184480  cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col

184616  cycles, MatrixTransposeRE,    testMatXC 250x256, Lin, Col   ; <<<<<-----
185553  cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col
185616  cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col
185628  cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col
185849  cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col

185145  cycles, MatrixTransposeRE,    testMatXB 250x254, Lin, Col    ; <<<<<-----
185595  cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col
185601  cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col
185650  cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col
185790  cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col

185163  cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col
186095  cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col
186169  cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col
186210  cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col
186934  cycles, MatrixTransposeRE,    testMatZC 254x256, Lin, Col

187866  cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col
187890  cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col
187986  cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col
188069  cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col
188178  cycles, MatrixTransposeRE,    testMatZZ 254x254, Lin, Col

Title: Re: Testing Transpose of a Matrix
Post by: Siekmanski on June 02, 2018, 07:22:40 AM
"TestTranspose97x97_100x100_cyclesSSE30_33_100000"

***** Time table - LoopCount =1 000 000 *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

9267  cycles, MatrixTransposeSSE31,  testMatYY 98x98, Lin, Col
9269  cycles, MatrixTransposeSSE30,  testMatYY 98x98, Lin, Col
9610  cycles, MatrixTransposeSSE33,  testMatYY 98x98, Lin, Col
9610  cycles, MatrixTransposeSSE32,  testMatYY 98x98, Lin, Col
10171  cycles, MatrixTransposeRE,     testMatXX 97x97, Lin, Col
10179  cycles, MatrixTransposeSSE31,  testMatXX 97x97, Lin, Col
10240  cycles, MatrixTransposeSSE30,  testMatXX 97x97, Lin, Col
10348  cycles, MatrixTransposeSSE33,  testMatXX 97x97, Lin, Col
10366  cycles, MatrixTransposeSSE32,  testMatXX 97x97, Lin, Col
10519  cycles, MatrixTransposeRE,     testMatYY 98x98, Lin, Col
10674  cycles, MatrixTransposeSSE30,  testMatZZ 99x99, Lin, Col
10684  cycles, MatrixTransposeSSE31,  testMatZZ 99x99, Lin, Col
10714  cycles, MatrixTransposeSSE33,  testMatZZ 99x99, Lin, Col
10717  cycles, MatrixTransposeSSE32,  testMatZZ 99x99, Lin, Col
10999  cycles, MatrixTransposeSSE30,  testMatWW 100x100, Lin, Col
11002  cycles, MatrixTransposeSSE31,  testMatWW 100x100, Lin, Col
11049  cycles, MatrixTransposeSSE32,  testMatWW 100x100, Lin, Col
11052  cycles, MatrixTransposeSSE33,  testMatWW 100x100, Lin, Col
11192  cycles, MatrixTransposeRE,     testMatWW 100x100, Lin, Col
11840  cycles, MatrixTransposeRE,     testMatZZ 99x99, Lin, Col
********** END **********


"TestTranspose250x256_evenlines_cyclesSSE30_33_10000"

***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

169683  cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col
169790  cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col
170144  cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col
170152  cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col
170190  cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col
170215  cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col
170553  cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col
171765  cycles, MatrixTransposeRE,    testMatWA 256x250, Lin, Col
172411  cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col
172547  cycles, MatrixTransposeRE,    testMatWB 256x252, Lin, Col
172939  cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col
173005  cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col
173518  cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col
173646  cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col
173653  cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col
173962  cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col
174693  cycles, MatrixTransposeRE,    testMatWC 256x254, Lin, Col
175354  cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col
175477  cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col
175500  cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col
175655  cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col
176391  cycles, MatrixTransposeRE,    testMatYA 252x250, Lin, Col
176458  cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col
176490  cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col
176542  cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col
176618  cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col
176639  cycles, MatrixTransposeRE,    testMatYY 252x252, Lin, Col
176651  cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col
176945  cycles, MatrixTransposeRE,    testMatWW 256x256, Lin, Col
177553  cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col
178076  cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col
178186  cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col
178206  cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col
178366  cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col
178791  cycles, MatrixTransposeRE,    testMatYB 252x254, Lin, Col
179118  cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col
179137  cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col
179194  cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col
179201  cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col
179252  cycles, MatrixTransposeRE,    testMatYC 252x256, Lin, Col
181607  cycles, MatrixTransposeRE,    testMatXX 250x250, Lin, Col
181717  cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col
181995  cycles, MatrixTransposeRE,    testMatXA 250x252, Lin, Col
182332  cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col
182441  cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col
182494  cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col
182530  cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col
182570  cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col
182609  cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col
182696  cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col
182963  cycles, MatrixTransposeRE,    testMatZB 254x252, Lin, Col
183670  cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col
183673  cycles, MatrixTransposeRE,    testMatZA 254x250, Lin, Col
183767  cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col
183898  cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col
184208  cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col
184249  cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col
184322  cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col
184480  cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col
184616  cycles, MatrixTransposeRE,    testMatXC 250x256, Lin, Col
185145  cycles, MatrixTransposeRE,    testMatXB 250x254, Lin, Col
185163  cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col
185553  cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col
185595  cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col
185601  cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col
185616  cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col
185628  cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col
185650  cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col
185790  cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col
185849  cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col
186095  cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col
186169  cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col
186210  cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col
186934  cycles, MatrixTransposeRE,    testMatZC 254x256, Lin, Col
187866  cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col
187890  cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col
187986  cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col
188069  cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col
188178  cycles, MatrixTransposeRE,    testMatZZ 254x254, Lin, Col
189825  cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 02, 2018, 09:35:17 PM
Hi
        If you have a i5 / i7 / AMD CPU or better
        would you mind to post the results for this:

            TestTranspose506x512_evenlines_cyclesSSE30_33_10000

My results for big matrices like 512x512

***** Time table - LoopCount =10 000 *****

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

1207440  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
1343612  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
1344018  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
1561033  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
1663892  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col

1228184  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
1245623  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
1383897  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
1399765  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
1610377  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col

1300479  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
1371443  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
1374818  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
1377963  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
1519329  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col

1329759  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
1387799  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
1391123  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
1969762  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
1679268  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col

1344344  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
1373482  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
1418670  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
1446757  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
1600959  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col

1351207  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
1353411  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
1375846  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
1383705  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
1588341  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col

1352139  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
1366498  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
1381257  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
1446800  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
1581571  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col

1364639  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1399896  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
1401838  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
1553876  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
1558572  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col

1408078  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
1409122  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
1416770  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
1420956  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
1625558  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col

1411634  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
1413983  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
1419621  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
1419933  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
1606143  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col

1443784  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
1450609  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
1548318  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
1717637  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
1974898  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col

1451095  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1460987  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1474059  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1479797  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1601902  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col

1461182  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1466117  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1509165  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1565127  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1618744  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col

1473912  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
1474975  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
1594094  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
1595387  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
1903505  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col

1480526  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1483428  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1483558  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1632179  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
1659401  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col

1494118  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1505676  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1530651  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1542758  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
1662796  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col

Title: Re: Testing Transpose of a Matrix
Post by: HSE on June 03, 2018, 12:05:44 AM
Hi Rui!

***** Time table - LoopCount =10 000 *****

AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

1582930  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
1589090  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
1629882  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
1634399  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
1634981  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
1654561  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
1655602  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
1657649  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
1675287  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1679104  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
1681008  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
1681443  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
1681677  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1684656  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1696919  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
1698205  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
1706127  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
1706813  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
1708254  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
1710365  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1718140  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
1718243  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
1723216  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
1727195  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
1738273  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1743494  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
1745981  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
1746192  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1750273  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1752060  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
1760193  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
1765761  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1770073  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1771423  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1782817  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
1784885  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
1789888  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
1792618  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
1798966  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
1799624  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
1804543  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
1808388  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
1815006  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
1821494  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
1821663  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1828442  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
1832988  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
1848994  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
1850935  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1851496  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1860365  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1863502  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
1869659  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1872352  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
1876061  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
1905205  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
1905353  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col
1905891  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1912748  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
2014807  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
2025312  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
2027611  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
2130691  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col
2199452  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
2208662  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
2218100  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
2272197  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
2284930  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
2304169  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
2309517  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
2315637  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
2318191  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
2335750  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
2349308  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
2356218  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
2367015  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
2384230  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
2393212  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
2430160  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
2471007  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
Title: Re: Testing Transpose of a Matrix
Post by: Siekmanski on June 03, 2018, 01:00:25 AM
TestTranspose506x512_evenlines_cyclesSSE30_33_10000

***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

516016  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
516032  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
518944  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
519159  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
520085  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
521001  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
521046  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
521454  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
521850  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
523831  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
529275  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
532225  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
533638  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
534200  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
536010  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
536133  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
536956  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
539305  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
553189  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col
556279  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
556556  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
558625  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
561847  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
561946  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
562192  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
563973  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
563994  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
564207  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
564973  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
566778  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
567599  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
567628  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
567824  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
568741  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
569236  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
569664  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
570366  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
570429  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
577716  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
649385  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
697810  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
697820  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
701489  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
703018  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
703127  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
703380  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
703681  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
705732  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
707101  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
707653  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
707703  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
711017  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
714643  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
717562  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
748249  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
750745  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
754049  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
759664  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
760467  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
784521  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
821192  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
821220  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
822507  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
824443  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
825652  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
826155  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
828776  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
828826  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
830018  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
830026  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
832190  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
832262  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
832842  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
833257  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
833705  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
835167  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
835822  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col
842265  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
842879  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
844842  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: mineiro on June 03, 2018, 01:35:41 AM

***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (SSE4)

345439  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
345581  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
346869  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
347229  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
347355  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
349661  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
353445  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
353570  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
355062  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
356014  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
356298  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
356396  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
357701  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
359675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
365421  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
366776  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
368150  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
369632  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
373951  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
374200  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
379271  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
379788  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
381998  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
382001  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
382428  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
382806  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
384346  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
386417  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
386421  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
387250  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
388306  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
388765  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
389782  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
390197  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
391379  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
394047  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
398618  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
398927  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col
399796  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
415985  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
436669  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
440358  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
446288  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
447876  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
449851  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
450205  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
451526  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
454294  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
458092  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
460574  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
461186  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
462350  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
464766  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
466883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
472146  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
475473  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
475762  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
481264  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
487516  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
490776  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
770667  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
771167  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
771268  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
774841  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
775229  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
775562  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
778550  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
778918  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
779108  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
783743  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
784066  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
784178  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
784690  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
787297  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col
789322  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
792116  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
792221  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
794690  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
796554  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
797408  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
********** END **********
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 03, 2018, 07:40:10 AM
Thanks all :t
:icon14:
These are the results (i7/AMD) sorted by matrix type:

mineiro:
***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (SSE4)

345439  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
345581  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
347355  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
356396  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
369632  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col

346869  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
347229  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
349661  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
357701  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
368150  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col

353445  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
353570  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
356014  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
365421  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
374200  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col

355062  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
356298  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
359675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
366776  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
373951  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col

379271  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
382001  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
389782  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
390197  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
398927  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col

379788  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
382428  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
387250  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
388306  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
415985  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col

381998  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
384346  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
388765  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
394047  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
399796  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col

382806  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
386417  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
386421  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
391379  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
398618  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col

436669  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
450205  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
451526  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
461186  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
481264  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col

440358  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
446288  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
447876  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
458092  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
475473  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col

449851  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
464766  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
466883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
475762  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
490776  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col

454294  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
460574  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
462350  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
472146  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
487516  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col

770667  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
771167  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
771268  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
784690  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
787297  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col

774841  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
775229  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
775562  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
789322  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
792116  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col

778550  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
778918  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
779108  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
792221  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
794690  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col

783743  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
784066  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
784178  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
796554  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
797408  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
siekmanski:
***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

516016  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col       ; <<<<<<----
520085  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
521001  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
536956  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
521850  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col

516032  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col       ; <<<<<<----
518944  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
519159  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
521454  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
523831  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col

521046  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col       ; <<<<<<----
529275  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
532225  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
533638  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
536010  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col

534200  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col       ; <<<<<<----
536133  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
539305  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
564207  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
577716  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col

553189  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col       ; <<<<<<----
556279  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
556556  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
558625  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
566778  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col

561847  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col       ; <<<<<<----
561946  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
562192  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
569664  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
649385  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col

563973  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col       ; <<<<<<----
567599  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
567628  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
570366  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
570429  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col

563994  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col       ; <<<<<<----
564973  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
567824  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
568741  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
569236  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col

697810  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
697820  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
703127  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
714643  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
754049  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col

701489  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
703018  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
703380  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
703681  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
760467  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col

705732  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
707101  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
717562  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
750745  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
784521  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col

707653  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
707703  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
711017  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
748249  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
759664  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col

821192  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
821220  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
822507  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
824443  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
830018  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col

825652  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
826155  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
828776  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
828826  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
835822  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col

830026  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
832842  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
833257  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
835167  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
842879  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col

832190  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
832262  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
833705  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
842265  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
844842  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
HSE:
***** Time table - LoopCount =10 000 *****

AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

1582930  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
1589090  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
1657649  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
1679104  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
1905353  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col

1629882  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
1654561  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
1655602  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
1718243  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
1863502  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col

1634399  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
1634981  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
1876061  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
1905205  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
1804543  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col

1675287  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1738273  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1750273  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1815006  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
1850935  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col

1681008  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
1743494  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
1799624  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
1832988  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
1872352  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col

1681443  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
1718140  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
1760193  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
1784885  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
1798966  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col

1681677  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1745981  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
1746192  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1851496  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1869659  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col

1684656  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1765761  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1771423  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1848994  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
1860365  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col

1696919  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
1723216  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
1727195  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
1752060  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
2014807  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col

1698205  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
1706127  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
1792618  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
1708254  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
1782817  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col

1706813  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
1789888  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
1808388  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
1821494  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
1905891  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col

1710365  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1770073  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1821663  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1828442  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
1912748  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col

2025312  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col       ; <<<<<<----
2199452  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
2304169  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
2315637  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
2471007  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col

2027611  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col       ; <<<<<<----
2218100  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
2284930  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
2335750  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
2393212  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col

2130691  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col       ; <<<<<<----
2309517  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
2318191  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
2349308  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
2430160  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col

2208662  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col       ; <<<<<<----
2272197  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
2356218  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
2367015  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
2384230  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
Title: Re: Testing Transpose of a Matrix
Post by: zedd151 on June 03, 2018, 04:10:29 PM
As always, I'm stylishly late.   :P


AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

810005  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
820058  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
821950  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
823983  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
828697  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
829394  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
830046  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
832485  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
832895  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
833612  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
834352  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
835091  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
839675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
846346  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
846411  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
848844  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
849311  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
852962  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
855288  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
862895  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
871913  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
874990  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
878483  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
878772  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
879291  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
884764  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
886302  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
887827  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
888715  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
889632  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
890132  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
891733  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
892603  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
898395  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
902633  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
902953  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
902953  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
902999  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col
921149  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
965255  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1503461  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1508044  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
1508102  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1539823  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
1553754  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col
1589446  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
1612950  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
1617458  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
1618238  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
1619925  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
1625738  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
1628523  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
1629810  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
1631325  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
1632128  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
1632552  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
1632744  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
1633612  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
1634083  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
1638780  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
1640885  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
1643499  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
1681922  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1708885  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1709024  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1729089  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1745465  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1751541  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1753087  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
1754633  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1759147  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1759762  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
1760467  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
1761773  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1761883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1763743  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1765680  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1767771  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1768096  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
1773351  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
********** END **********


  :bgrin:

although my processor speed isn't listed by the program, it is 1.60 Ghz...
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 03, 2018, 06:00:52 PM
Hi
         Thanks  :t
          It is not possible to add this to the previous set (more than 20000 characters)

          Here are your results (AMD) sorted by matrix type
zedd151:

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

810005  cycles, MatrixTransposeRE,      testMatYY 508x508, Lin, Col       ; <<<<<<----
828697  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
829394  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
830046  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
832485  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col

820058  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
833612  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
839675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
846346  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
849311  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col

821950  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
823983  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
832895  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
834352  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
835091  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col

846411  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
848844  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
852962  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
855288  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
965255  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col

862895  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
871913  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
878772  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
879291  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
887827  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col

874990  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col       ; <<<<<<----
884764  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
886302  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
891733  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
892603  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col

878483  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
889632  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
890132  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
898395  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
921149  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col

888715  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
902633  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
902953  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
902953  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
902999  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col

1503461  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1745465  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1751541  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1754633  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1759762  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col

1508044  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
1631325  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
1632128  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
1638780  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
1640885  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col

1508102  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1760467  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
1761773  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1767771  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1768096  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col

1539823  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col       ; <<<<<<----
1625738  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
1628523  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
1632552  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
1633612  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col

1553754  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col       ; <<<<<<----
1612950  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
1617458  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
1618238  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
1619925  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col

1589446  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
1629810  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
1632744  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
1634083  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
1643499  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col

1681922  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1708885  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1709024  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1729089  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1753087  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col

1759147  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1761883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1763743  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1765680  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1773351  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 06, 2018, 08:08:19 AM
Hi all,
        Here is the new version SSE46_49 that you may test/see my results.
        If you have a i5 / i7 / AMD CPU and you want to show me
        your results, please ZIP it and post. Use only these (or what you want):

                TestTranspose97_100_cyclesSSE46_49_1000000
                TestTranspose506x512_evenlines_cyclesSSE46_49_1000
                TestTranspose506x512_oddlines_cyclesSSE46_49_1000
Thank you

My little sample:

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

40390  cycles, MatrixTransposeSSE49,  testMatWW 100x100, Lin, Col
40493  cycles, MatrixTransposeSSE48,  testMatWW 100x100, Lin, Col
42654  cycles, MatrixTransposeSSE46,  testMatWW 100x100, Lin, Col
42794  cycles, MatrixTransposeSSE47,  testMatWW 100x100, Lin, Col
43395  cycles, MatrixTransposeRE,     testMatWW 100x100, Lin, Col

47565  cycles, MatrixTransposeSSE46,  testMatYY 98x98, Lin, Col
47702  cycles, MatrixTransposeSSE49,  testMatYY 98x98, Lin, Col
47770  cycles, MatrixTransposeSSE48,  testMatYY 98x98, Lin, Col
47969  cycles, MatrixTransposeSSE47,  testMatYY 98x98, Lin, Col
48718  cycles, MatrixTransposeRE,     testMatYY 98x98, Lin, Col

72327  cycles, MatrixTransposeSSE49,  testMatXX 97x97, Lin, Col
72434  cycles, MatrixTransposeSSE48,  testMatXX 97x97, Lin, Col
72506  cycles, MatrixTransposeSSE46,  testMatXX 97x97, Lin, Col
72532  cycles, MatrixTransposeSSE47,  testMatXX 97x97, Lin, Col
77414  cycles, MatrixTransposeRE,     testMatXX 97x97, Lin, Col

76113  cycles, MatrixTransposeSSE49,  testMatZZ 99x99, Lin, Col
76201  cycles, MatrixTransposeSSE48,  testMatZZ 99x99, Lin, Col
76228  cycles, MatrixTransposeSSE47,  testMatZZ 99x99, Lin, Col
76241  cycles, MatrixTransposeSSE46,  testMatZZ 99x99, Lin, Col
78145  cycles, MatrixTransposeRE,     testMatZZ 99x99, Lin, Col
Title: Re: Testing Transpose of a Matrix
Post by: jj2007 on June 06, 2018, 02:49:25 PM
Rui, which one of the 16 exes should we test?
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 06, 2018, 04:05:38 PM
Quote from: jj2007 on June 06, 2018, 02:49:25 PM
Rui, which one of the 16 exes should we test?
These 3
                TestTranspose97_100_cyclesSSE46_49_1000000
                TestTranspose506x512_evenlines_cyclesSSE46_49_1000
                TestTranspose506x512_oddlines_cyclesSSE46_49_1000

Title: Re: Testing Transpose of a Matrix
Post by: jj2007 on June 06, 2018, 04:57:03 PM
Make a batch file?
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 06, 2018, 06:24:57 PM
Quote from: jj2007 on June 06, 2018, 04:57:03 PM
Make a batch file?
batch file ? It was in the 80's years i dont remember well now how. Help me andyou may  did it.
Yes ? I have so many things in my head that i dont have more space there now! :biggrin:
          EDIT: Jochen, yes a batch file is a good solution to exec each prog and add the resulsts to one text file. But i dont remember now how to do it . Do you want to help me ?
Title: Re: Testing Transpose of a Matrix
Post by: jj2007 on June 06, 2018, 07:29:02 PM
Normally, it should look like this (save as test4Rui.bat):
TestTranspose97_100_cyclesSSE46_49.exe
TestTranspose506x512_evenlines_cyclesSSE46_49.exe
TestTranspose506x512_oddlines_cyclesSSE46_49.exe
pause


But it gives me 3x file not found, although I extracted the whole archive (16 exe) to the same folder. Are you sure these are the files we should test for you?
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 06, 2018, 07:37:36 PM
Quote from: jj2007 on June 06, 2018, 07:29:02 PM
Normally, it should look like this (save as test4Rui.bat):
TestTranspose97_100_cyclesSSE46_49_1000000.exe      <<<<<<-----
TestTranspose506x512_evenlines_cyclesSSE46_49_1000.exe    <<<<----
TestTranspose506x512_oddlines_cyclesSSE46_49_1000.exe      <<<<<<<<----
pause


But it gives me 3x file not found, although I extracted the whole archive (16 exe) to the same folder. Are you sure these are the files we should test for you?
because the name ends in _1000000 for first and and _1000 for others 2.
             The results could be sent to a text file no ?
Title: Re: Testing Transpose of a Matrix
Post by: mineiro on June 06, 2018, 08:53:38 PM
Yes, something like this:

TestTranspose97_100_cyclesSSE46_49.exe > results.txt
TestTranspose506x512_evenlines_cyclesSSE46_49.exe >> results.txt
TestTranspose506x512_oddlines_cyclesSSE46_49.exe >> results.txt

But if program ask for user input I think will be more harder to do, if program only echo results to screen this can be like that.
The first ">" create a file and overwrite their contents, the second ">>" will append results at end of created file. Works from ms-dos to new cmd versions.
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 07, 2018, 12:29:42 AM
Quote from: mineiro on June 06, 2018, 08:53:38 PM
Yes, something like this:

TestTranspose97_100_cyclesSSE46_49.exe > results.txt
TestTranspose506x512_evenlines_cyclesSSE46_49.exe >> results.txt
TestTranspose506x512_oddlines_cyclesSSE46_49.exe >> results.txt

But if program ask for user input I think will be more harder to do, if program only echo results to screen this can be like that.
The first ">" create a file and overwrite their contents, the second ">>" will append results at end of created file. Works from ms-dos to new cmd versions.
Thanks,
               The first works, the other (>>) no.
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 07, 2018, 05:04:45 AM
Quote from: jj2007 on June 06, 2018, 02:49:25 PM
Rui, which one of the 16 exes should we test?
Jochen,  some are not timing tests - try and see - and i would like to get the results of 3 only as i wrote. If with a batch file we cannot  create a single file of all tests - it doesnt work here- i need to create up to 4 batch files for upload the files, i think. In any way your idea is good i i will remove all inkeys in the exe files. I try to do it. By the way, that set of files means at least much hard work - we may see all the data organised.

Thanks
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 07, 2018, 08:59:27 AM
Hi all,
        Here is the new version SSE46_49 that you may test/see my results.
        I made some minor revisions is some macros.
        If you have a i5 / i7 / AMD CPU and you want to show me
        your results, please ZIP each .txt file and post.
       
        If we use the batch file      TestThis46_49.bat                             
        we get:                     
                     resultsSSE46_49_506_512_evenlines.txt
                     resultsSSE46_49_506_512_oddlines.txt
                     resultsSSE46_49_97_100.txt
Thank you
Title: Re: Testing Transpose of a Matrix
Post by: jj2007 on June 07, 2018, 11:10:38 AM
Much better :t
Title: Re: Testing Transpose of a Matrix
Post by: Siekmanski on June 07, 2018, 08:10:54 PM
The results.
Title: Re: Testing Transpose of a Matrix
Post by: mineiro on June 08, 2018, 02:53:15 AM
Results:
Title: Re: Testing Transpose of a Matrix
Post by: zedd151 on June 08, 2018, 08:14:11 AM
results:

as always cpu speed 1.60 Ghz.

:biggrin:
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 09, 2018, 07:38:06 AM
Hi all,
        Here are all the results by matrix type if you want to see it
        They are the results in your CPU.
Thanks for your work.


EDIT: all plus results from LiaoMe
Title: Re: Testing Transpose of a Matrix
Post by: LiaoMi on June 10, 2018, 08:09:21 AM
Results -
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 14, 2018, 04:05:57 AM
Hi all,
          Would you mind to test my new SSE46 if you have a i5/i7 or AMD CPU ? It is a .bat file and the output is only 1 file .txt.
You may see my results also. They are "much faster than quick"  :P
Probably, i will post all work SSE46 (.inc, .asm, .mac, .txt, etc ) in the next week in the workshop.
In the  .txt file we will get all information about this work.
Thanks  :t


EDIT: here it is, i'm sorry
Title: Re: Testing Transpose of a Matrix
Post by: Siekmanski on June 14, 2018, 05:21:29 AM
There is only a TestThisAllSSE46.bat
Where are the executables?
Title: Re: Testing Transpose of a Matrix
Post by: RuiLoureiro on June 14, 2018, 07:06:19 AM
Quote from: Siekmanski on June 14, 2018, 05:21:29 AM
There is only a TestThisAllSSE46.bat
Where are the executables?
Sorry is in the folder. Now it is correct
Title: Re: Testing Transpose of a Matrix
Post by: Siekmanski on June 14, 2018, 08:08:30 AM
The results,
Title: Re: Testing Transpose of a Matrix
Post by: zedd151 on June 16, 2018, 08:08:24 AM
results: