News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Testing Transpose of a Matrix

Started by RuiLoureiro, May 07, 2018, 02:13:55 AM

Previous topic - Next topic

Siekmanski

Creative coders use backward thinking techniques as a strategy.

dedndave

create a matrix of pointers to the real10's
transpose the pointer matrix   :biggrin:

RuiLoureiro

#32
Quote from: dedndave on May 19, 2018, 05:28:07 AM
create a matrix of pointers to the real10's
transpose the pointer matrix   :biggrin:
Hi Dave,
             How are you ? I hope you are fine !

Well it is well known that when we dont want to move a lot of an array elements
we use an array of pointers: it is an array of dwords...
Your solution seems to be expensive: each matrix, each array of pointer matrix.
If we define 100 matrices we need to define 200 arrays (100 for matrices+100 for pointers).
It seems because if a=[1,2;3,4] and we do a=a^t; Now where is a ? What are the elemens of a ?

(In TheCalculator, when we write a matrix name "a" and press enter/compute it shows "a")
TheCalculator uses 16 bytes for each real10 so...

            Dave, have you the same Pentium 4 CPU yet ? Do you remember why ?
Good luck :t  
             

dedndave


RuiLoureiro

#34
Hi all,
        Inside the folder SSE30_33_tests.zip we have my results of a set of tests
        of SSE30 to SSE33 procedures to transpose any matrix of any size
        1x1 up to NxM. You may test it also and if you want post your results
        if you have a i5 / i7 / AMD CPU or better. Your contribution may be useful
       to me to understand what i should do next. I have a very slow P4 yet.
        You may do all tests but i would like to know the results for this:

                TestTranspose97x97_100x100_cyclesSSE30_33_100000
                TestTranspose250x256_evenlines_cyclesSSE30_33_10000
        I am developing and testing news algorithms SSE38_41 but
        the time doesnt stretch... and i am trying to optimize some
        cases so i need more time... and more tests. For example i dont
       know if push/pop esi is better than a local variable. ... you know ?
       
        Thanks
        Good luck :t
       
Note: 1000000/100000/10000 is the loop counter used.
      At the end of VerifyProcsFrom1x1_to_120x120_SSE30_33
      all 4 procs are tested using matrices of 1x1 up to 120x120
      defined in the .data? segment. I wrote a general proc
      for any size NxM but it is not working for procedures that
      doesnt use the dimensions behind the address.

Sample for some types of matrices- like256x256(RE=REference):

***** Time table - LoopCount =10 000 *****

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

274723  cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col
277606  cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col
278301  cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col
280608  cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col
288570  cycles, MatrixTransposeRE,    testMatYY 252x252, Lin, Col

276360  cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col
277126  cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col
278578  cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col
289265  cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col
291922  cycles, MatrixTransposeRE,    testMatYA 252x250, Lin, Col

277665  cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col
278713  cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col
301969  cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col
302056  cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col
305275  cycles, MatrixTransposeRE,    testMatWA 256x250, Lin, Col

277906  cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col
280112  cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col
281857  cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col
283046  cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col
300882  cycles, MatrixTransposeRE,    testMatYC 252x256, Lin, Col

278085  cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col
281408  cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col
289149  cycles, MatrixTransposeRE,    testMatWB 256x252, Lin, Col
303014  cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col
303088  cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col

280843  cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col
282137  cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col
282411  cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col
295377  cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col
298502  cycles, MatrixTransposeRE,    testMatYB 252x254, Lin, Col

284465  cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col
289484  cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col
295322  cycles, MatrixTransposeRE,    testMatWC 256x254, Lin, Col
307567  cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col
307757  cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col

287766  cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col
287891  cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col
307719  cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col
307860  cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col
308480  cycles, MatrixTransposeRE,    testMatWW 256x256, Lin, Col

301073  cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col
302764  cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col
303633  cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col
303680  cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col
317857  cycles, MatrixTransposeRE,    testMatXA 250x252, Lin, Col

302701  cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col
305413  cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col
308789  cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col
309537  cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col
311748  cycles, MatrixTransposeRE,    testMatXX 250x250, Lin, Col

305303  cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col
309247  cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col
310500  cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col
317925  cycles, MatrixTransposeRE,    testMatXB 250x254, Lin, Col
316275  cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col

309104  cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col
309146  cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col
309344  cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col
315558  cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col
320653  cycles, MatrixTransposeRE,    testMatXC 250x256, Lin, Col

314387  cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col
315688  cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col
318761  cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col
321526  cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col
323225  cycles, MatrixTransposeRE,    testMatZA 254x250, Lin, Col

315387  cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col
315677  cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col
316444  cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col
322256  cycles, MatrixTransposeRE,    testMatZB 254x252, Lin, Col
324936  cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col

318743  cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col
322824  cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col
324874  cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col
328921  cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col
340988  cycles, MatrixTransposeRE,    testMatZZ 254x254, Lin, Col

319148  cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col
319973  cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col
321797  cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col
321990  cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col
337617  cycles, MatrixTransposeRE,    testMatZC 254x256, Lin, Col


Siekmanski:

***** Time table - LoopCount =10 000 ****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

169683  cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col
169790  cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col
170152  cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col
170215  cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col
171765  cycles, MatrixTransposeRE,    testMatWA 256x250, Lin, Col

170144  cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col
170190  cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col
170553  cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col
172547  cycles, MatrixTransposeRE,    testMatWB 256x252, Lin, Col
177553  cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col

172411  cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col
172939  cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col
173005  cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col
174693  cycles, MatrixTransposeRE,    testMatWC 256x254, Lin, Col
175477  cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col

173518  cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col
173646  cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col
173653  cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col
173962  cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col
176945  cycles, MatrixTransposeRE,    testMatWW 256x256, Lin, Col

175354  cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col
175500  cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col
175655  cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col
176391  cycles, MatrixTransposeRE,    testMatYA 252x250, Lin, Col
176458  cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col

176490  cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col
176542  cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col
176618  cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col
176639  cycles, MatrixTransposeRE,    testMatYY 252x252, Lin, Col
176651  cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col

178076  cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col
178186  cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col
178206  cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col
178366  cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col
178791  cycles, MatrixTransposeRE,    testMatYB 252x254, Lin, Col

179118  cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col
179137  cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col
179194  cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col
179201  cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col
179252  cycles, MatrixTransposeRE,    testMatYC 252x256, Lin, Col

181607  cycles, MatrixTransposeRE,    testMatXX 250x250, Lin, Col   ; <<<<<-----
182332  cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col
182494  cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col
182530  cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col
189825  cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col

181717  cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col
181995  cycles, MatrixTransposeRE,    testMatXA 250x252, Lin, Col
182570  cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col
182609  cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col
182696  cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col

182441  cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col
182963  cycles, MatrixTransposeRE,    testMatZB 254x252, Lin, Col
183670  cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col
183767  cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col
183898  cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col

183673  cycles, MatrixTransposeRE,    testMatZA 254x250, Lin, Col   ; <<<<<-----
184208  cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col
184249  cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col
184322  cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col
184480  cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col

184616  cycles, MatrixTransposeRE,    testMatXC 250x256, Lin, Col   ; <<<<<-----
185553  cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col
185616  cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col
185628  cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col
185849  cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col

185145  cycles, MatrixTransposeRE,    testMatXB 250x254, Lin, Col    ; <<<<<-----
185595  cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col
185601  cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col
185650  cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col
185790  cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col

185163  cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col
186095  cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col
186169  cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col
186210  cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col
186934  cycles, MatrixTransposeRE,    testMatZC 254x256, Lin, Col

187866  cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col
187890  cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col
187986  cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col
188069  cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col
188178  cycles, MatrixTransposeRE,    testMatZZ 254x254, Lin, Col


Siekmanski

"TestTranspose97x97_100x100_cyclesSSE30_33_100000"

***** Time table - LoopCount =1 000 000 *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

9267  cycles, MatrixTransposeSSE31,  testMatYY 98x98, Lin, Col
9269  cycles, MatrixTransposeSSE30,  testMatYY 98x98, Lin, Col
9610  cycles, MatrixTransposeSSE33,  testMatYY 98x98, Lin, Col
9610  cycles, MatrixTransposeSSE32,  testMatYY 98x98, Lin, Col
10171  cycles, MatrixTransposeRE,     testMatXX 97x97, Lin, Col
10179  cycles, MatrixTransposeSSE31,  testMatXX 97x97, Lin, Col
10240  cycles, MatrixTransposeSSE30,  testMatXX 97x97, Lin, Col
10348  cycles, MatrixTransposeSSE33,  testMatXX 97x97, Lin, Col
10366  cycles, MatrixTransposeSSE32,  testMatXX 97x97, Lin, Col
10519  cycles, MatrixTransposeRE,     testMatYY 98x98, Lin, Col
10674  cycles, MatrixTransposeSSE30,  testMatZZ 99x99, Lin, Col
10684  cycles, MatrixTransposeSSE31,  testMatZZ 99x99, Lin, Col
10714  cycles, MatrixTransposeSSE33,  testMatZZ 99x99, Lin, Col
10717  cycles, MatrixTransposeSSE32,  testMatZZ 99x99, Lin, Col
10999  cycles, MatrixTransposeSSE30,  testMatWW 100x100, Lin, Col
11002  cycles, MatrixTransposeSSE31,  testMatWW 100x100, Lin, Col
11049  cycles, MatrixTransposeSSE32,  testMatWW 100x100, Lin, Col
11052  cycles, MatrixTransposeSSE33,  testMatWW 100x100, Lin, Col
11192  cycles, MatrixTransposeRE,     testMatWW 100x100, Lin, Col
11840  cycles, MatrixTransposeRE,     testMatZZ 99x99, Lin, Col
********** END **********


"TestTranspose250x256_evenlines_cyclesSSE30_33_10000"

***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

169683  cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col
169790  cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col
170144  cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col
170152  cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col
170190  cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col
170215  cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col
170553  cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col
171765  cycles, MatrixTransposeRE,    testMatWA 256x250, Lin, Col
172411  cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col
172547  cycles, MatrixTransposeRE,    testMatWB 256x252, Lin, Col
172939  cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col
173005  cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col
173518  cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col
173646  cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col
173653  cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col
173962  cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col
174693  cycles, MatrixTransposeRE,    testMatWC 256x254, Lin, Col
175354  cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col
175477  cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col
175500  cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col
175655  cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col
176391  cycles, MatrixTransposeRE,    testMatYA 252x250, Lin, Col
176458  cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col
176490  cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col
176542  cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col
176618  cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col
176639  cycles, MatrixTransposeRE,    testMatYY 252x252, Lin, Col
176651  cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col
176945  cycles, MatrixTransposeRE,    testMatWW 256x256, Lin, Col
177553  cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col
178076  cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col
178186  cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col
178206  cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col
178366  cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col
178791  cycles, MatrixTransposeRE,    testMatYB 252x254, Lin, Col
179118  cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col
179137  cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col
179194  cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col
179201  cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col
179252  cycles, MatrixTransposeRE,    testMatYC 252x256, Lin, Col
181607  cycles, MatrixTransposeRE,    testMatXX 250x250, Lin, Col
181717  cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col
181995  cycles, MatrixTransposeRE,    testMatXA 250x252, Lin, Col
182332  cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col
182441  cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col
182494  cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col
182530  cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col
182570  cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col
182609  cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col
182696  cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col
182963  cycles, MatrixTransposeRE,    testMatZB 254x252, Lin, Col
183670  cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col
183673  cycles, MatrixTransposeRE,    testMatZA 254x250, Lin, Col
183767  cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col
183898  cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col
184208  cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col
184249  cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col
184322  cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col
184480  cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col
184616  cycles, MatrixTransposeRE,    testMatXC 250x256, Lin, Col
185145  cycles, MatrixTransposeRE,    testMatXB 250x254, Lin, Col
185163  cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col
185553  cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col
185595  cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col
185601  cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col
185616  cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col
185628  cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col
185650  cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col
185790  cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col
185849  cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col
186095  cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col
186169  cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col
186210  cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col
186934  cycles, MatrixTransposeRE,    testMatZC 254x256, Lin, Col
187866  cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col
187890  cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col
187986  cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col
188069  cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col
188178  cycles, MatrixTransposeRE,    testMatZZ 254x254, Lin, Col
189825  cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col
********** END **********
Creative coders use backward thinking techniques as a strategy.

RuiLoureiro

Hi
        If you have a i5 / i7 / AMD CPU or better
        would you mind to post the results for this:

            TestTranspose506x512_evenlines_cyclesSSE30_33_10000

My results for big matrices like 512x512

***** Time table - LoopCount =10 000 *****

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

1207440  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
1343612  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
1344018  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
1561033  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
1663892  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col

1228184  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
1245623  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
1383897  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
1399765  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
1610377  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col

1300479  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
1371443  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
1374818  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
1377963  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
1519329  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col

1329759  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
1387799  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
1391123  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
1969762  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
1679268  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col

1344344  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
1373482  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
1418670  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
1446757  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
1600959  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col

1351207  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
1353411  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
1375846  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
1383705  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
1588341  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col

1352139  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
1366498  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
1381257  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
1446800  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
1581571  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col

1364639  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1399896  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
1401838  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
1553876  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
1558572  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col

1408078  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
1409122  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
1416770  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
1420956  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
1625558  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col

1411634  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
1413983  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
1419621  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
1419933  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
1606143  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col

1443784  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
1450609  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
1548318  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
1717637  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
1974898  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col

1451095  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1460987  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1474059  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1479797  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1601902  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col

1461182  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1466117  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1509165  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1565127  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1618744  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col

1473912  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
1474975  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
1594094  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
1595387  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
1903505  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col

1480526  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1483428  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1483558  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1632179  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
1659401  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col

1494118  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1505676  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1530651  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1542758  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
1662796  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col


HSE

Hi Rui!

***** Time table - LoopCount =10 000 *****

AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

1582930  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
1589090  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
1629882  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
1634399  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
1634981  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
1654561  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
1655602  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
1657649  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
1675287  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1679104  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
1681008  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
1681443  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
1681677  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1684656  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1696919  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
1698205  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
1706127  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
1706813  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
1708254  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
1710365  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1718140  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
1718243  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
1723216  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
1727195  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
1738273  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1743494  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
1745981  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
1746192  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1750273  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1752060  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
1760193  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
1765761  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1770073  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1771423  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1782817  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
1784885  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
1789888  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
1792618  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
1798966  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
1799624  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
1804543  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
1808388  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
1815006  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
1821494  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
1821663  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1828442  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
1832988  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
1848994  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
1850935  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1851496  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1860365  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1863502  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
1869659  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1872352  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
1876061  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
1905205  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
1905353  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col
1905891  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1912748  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
2014807  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
2025312  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
2027611  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
2130691  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col
2199452  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
2208662  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
2218100  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
2272197  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
2284930  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
2304169  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
2309517  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
2315637  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
2318191  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
2335750  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
2349308  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
2356218  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
2367015  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
2384230  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
2393212  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
2430160  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
2471007  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
Equations in Assembly: SmplMath

Siekmanski

TestTranspose506x512_evenlines_cyclesSSE30_33_10000

***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

516016  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
516032  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
518944  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
519159  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
520085  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
521001  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
521046  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
521454  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
521850  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
523831  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
529275  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
532225  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
533638  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
534200  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
536010  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
536133  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
536956  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
539305  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
553189  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col
556279  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
556556  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
558625  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
561847  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
561946  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
562192  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
563973  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
563994  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
564207  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
564973  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
566778  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
567599  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
567628  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
567824  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
568741  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
569236  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
569664  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
570366  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
570429  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
577716  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
649385  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
697810  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
697820  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
701489  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
703018  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
703127  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
703380  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
703681  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
705732  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
707101  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
707653  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
707703  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
711017  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
714643  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
717562  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
748249  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
750745  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
754049  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
759664  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
760467  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
784521  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
821192  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
821220  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
822507  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
824443  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
825652  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
826155  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
828776  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
828826  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
830018  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
830026  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
832190  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
832262  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
832842  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
833257  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
833705  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
835167  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
835822  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col
842265  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
842879  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
844842  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
********** END **********
Creative coders use backward thinking techniques as a strategy.

mineiro


***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (SSE4)

345439  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
345581  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
346869  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
347229  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
347355  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
349661  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
353445  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
353570  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
355062  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
356014  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
356298  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
356396  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
357701  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
359675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
365421  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
366776  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
368150  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
369632  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
373951  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
374200  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
379271  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
379788  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
381998  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
382001  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
382428  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
382806  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
384346  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
386417  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
386421  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
387250  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
388306  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
388765  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
389782  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
390197  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
391379  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
394047  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
398618  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
398927  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col
399796  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
415985  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
436669  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
440358  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
446288  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
447876  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
449851  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
450205  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
451526  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
454294  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
458092  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
460574  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
461186  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
462350  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
464766  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
466883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
472146  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
475473  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
475762  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
481264  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
487516  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
490776  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
770667  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
771167  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
771268  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
774841  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
775229  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
775562  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
778550  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
778918  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
779108  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
783743  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
784066  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
784178  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
784690  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
787297  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col
789322  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
792116  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
792221  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
794690  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
796554  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
797408  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
********** END **********
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

RuiLoureiro

Thanks all :t
:icon14:
These are the results (i7/AMD) sorted by matrix type:

mineiro:
***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (SSE4)

345439  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
345581  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
347355  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
356396  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
369632  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col

346869  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
347229  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
349661  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
357701  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
368150  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col

353445  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
353570  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
356014  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
365421  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
374200  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col

355062  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
356298  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
359675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
366776  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
373951  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col

379271  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
382001  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
389782  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
390197  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
398927  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col

379788  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
382428  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
387250  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
388306  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
415985  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col

381998  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
384346  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
388765  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
394047  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
399796  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col

382806  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
386417  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
386421  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
391379  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
398618  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col

436669  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
450205  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
451526  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
461186  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
481264  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col

440358  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
446288  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
447876  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
458092  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
475473  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col

449851  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
464766  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
466883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
475762  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
490776  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col

454294  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
460574  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
462350  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
472146  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
487516  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col

770667  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
771167  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
771268  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
784690  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
787297  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col

774841  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
775229  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
775562  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
789322  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
792116  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col

778550  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
778918  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
779108  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
792221  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
794690  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col

783743  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
784066  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
784178  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
796554  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
797408  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
siekmanski:
***** Time table - LoopCount =10 000 *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

516016  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col       ; <<<<<<----
520085  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
521001  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
536956  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
521850  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col

516032  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col       ; <<<<<<----
518944  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
519159  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
521454  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
523831  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col

521046  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col       ; <<<<<<----
529275  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
532225  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
533638  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
536010  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col

534200  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col       ; <<<<<<----
536133  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
539305  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
564207  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
577716  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col

553189  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col       ; <<<<<<----
556279  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
556556  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
558625  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
566778  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col

561847  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col       ; <<<<<<----
561946  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
562192  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
569664  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
649385  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col

563973  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col       ; <<<<<<----
567599  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
567628  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
570366  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
570429  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col

563994  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col       ; <<<<<<----
564973  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
567824  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
568741  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
569236  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col

697810  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
697820  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
703127  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
714643  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
754049  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col

701489  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
703018  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
703380  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
703681  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
760467  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col

705732  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
707101  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
717562  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
750745  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
784521  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col

707653  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
707703  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
711017  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
748249  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
759664  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col

821192  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
821220  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
822507  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
824443  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
830018  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col

825652  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
826155  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
828776  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
828826  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
835822  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col

830026  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
832842  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
833257  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
835167  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
842879  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col

832190  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
832262  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
833705  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
842265  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
844842  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
HSE:
***** Time table - LoopCount =10 000 *****

AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

1582930  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
1589090  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
1657649  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
1679104  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
1905353  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col

1629882  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
1654561  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
1655602  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
1718243  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
1863502  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col

1634399  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
1634981  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
1876061  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
1905205  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
1804543  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col

1675287  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1738273  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1750273  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1815006  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
1850935  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col

1681008  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
1743494  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
1799624  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
1832988  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
1872352  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col

1681443  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
1718140  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
1760193  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
1784885  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
1798966  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col

1681677  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1745981  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
1746192  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1851496  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1869659  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col

1684656  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1765761  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1771423  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1848994  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
1860365  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col

1696919  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
1723216  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
1727195  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
1752060  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
2014807  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col

1698205  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
1706127  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
1792618  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
1708254  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
1782817  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col

1706813  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
1789888  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
1808388  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
1821494  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
1905891  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col

1710365  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1770073  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1821663  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1828442  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
1912748  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col

2025312  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col       ; <<<<<<----
2199452  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
2304169  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
2315637  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
2471007  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col

2027611  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col       ; <<<<<<----
2218100  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
2284930  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
2335750  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
2393212  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col

2130691  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col       ; <<<<<<----
2309517  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
2318191  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
2349308  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
2430160  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col

2208662  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col       ; <<<<<<----
2272197  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
2356218  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
2367015  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
2384230  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col

zedd151

As always, I'm stylishly late.   :P


AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

810005  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col
820058  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
821950  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
823983  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
828697  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
829394  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
830046  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
832485  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col
832895  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
833612  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
834352  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
835091  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col
839675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
846346  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
846411  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
848844  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
849311  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col
852962  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
855288  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
862895  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
871913  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
874990  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col
878483  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
878772  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
879291  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
884764  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
886302  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
887827  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col
888715  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
889632  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
890132  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
891733  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
892603  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col
898395  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
902633  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
902953  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
902953  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
902999  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col
921149  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col
965255  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col
1503461  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1508044  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
1508102  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1539823  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col
1553754  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col
1589446  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
1612950  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
1617458  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
1618238  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
1619925  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col
1625738  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
1628523  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
1629810  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
1631325  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
1632128  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
1632552  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
1632744  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
1633612  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col
1634083  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
1638780  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
1640885  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col
1643499  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col
1681922  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1708885  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1709024  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1729089  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1745465  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1751541  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1753087  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col
1754633  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1759147  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1759762  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col
1760467  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
1761773  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1761883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1763743  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1765680  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1767771  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1768096  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col
1773351  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col
********** END **********


  :bgrin:

although my processor speed isn't listed by the program, it is 1.60 Ghz...

RuiLoureiro

Hi
         Thanks  :t
          It is not possible to add this to the previous set (more than 20000 characters)

          Here are your results (AMD) sorted by matrix type
zedd151:

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

810005  cycles, MatrixTransposeRE,      testMatYY 508x508, Lin, Col       ; <<<<<<----
828697  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col
829394  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col
830046  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col
832485  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col

820058  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col
833612  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col
839675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col
846346  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col
849311  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col

821950  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col
823983  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col
832895  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col
834352  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col
835091  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col

846411  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col
848844  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col
852962  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col
855288  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col
965255  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col

862895  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col
871913  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col
878772  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col
879291  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col
887827  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col

874990  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col       ; <<<<<<----
884764  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col
886302  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col
891733  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col
892603  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col

878483  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col
889632  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col
890132  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col
898395  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col
921149  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col

888715  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col
902633  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col
902953  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col
902953  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col
902999  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col

1503461  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col
1745465  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col
1751541  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col
1754633  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col
1759762  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col

1508044  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col
1631325  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col
1632128  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col
1638780  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col
1640885  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col

1508102  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col
1760467  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col
1761773  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col
1767771  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col
1768096  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col

1539823  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col       ; <<<<<<----
1625738  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col
1628523  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col
1632552  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col
1633612  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col

1553754  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col       ; <<<<<<----
1612950  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col
1617458  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col
1618238  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col
1619925  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col

1589446  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col
1629810  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col
1632744  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col
1634083  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col
1643499  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col

1681922  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col
1708885  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col
1709024  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col
1729089  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col
1753087  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col

1759147  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col
1761883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col
1763743  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col
1765680  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col
1773351  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col

RuiLoureiro

#43
Hi all,
        Here is the new version SSE46_49 that you may test/see my results.
        If you have a i5 / i7 / AMD CPU and you want to show me
        your results, please ZIP it and post. Use only these (or what you want):

                TestTranspose97_100_cyclesSSE46_49_1000000
                TestTranspose506x512_evenlines_cyclesSSE46_49_1000
                TestTranspose506x512_oddlines_cyclesSSE46_49_1000
Thank you

My little sample:

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

40390  cycles, MatrixTransposeSSE49,  testMatWW 100x100, Lin, Col
40493  cycles, MatrixTransposeSSE48,  testMatWW 100x100, Lin, Col
42654  cycles, MatrixTransposeSSE46,  testMatWW 100x100, Lin, Col
42794  cycles, MatrixTransposeSSE47,  testMatWW 100x100, Lin, Col
43395  cycles, MatrixTransposeRE,     testMatWW 100x100, Lin, Col

47565  cycles, MatrixTransposeSSE46,  testMatYY 98x98, Lin, Col
47702  cycles, MatrixTransposeSSE49,  testMatYY 98x98, Lin, Col
47770  cycles, MatrixTransposeSSE48,  testMatYY 98x98, Lin, Col
47969  cycles, MatrixTransposeSSE47,  testMatYY 98x98, Lin, Col
48718  cycles, MatrixTransposeRE,     testMatYY 98x98, Lin, Col

72327  cycles, MatrixTransposeSSE49,  testMatXX 97x97, Lin, Col
72434  cycles, MatrixTransposeSSE48,  testMatXX 97x97, Lin, Col
72506  cycles, MatrixTransposeSSE46,  testMatXX 97x97, Lin, Col
72532  cycles, MatrixTransposeSSE47,  testMatXX 97x97, Lin, Col
77414  cycles, MatrixTransposeRE,     testMatXX 97x97, Lin, Col

76113  cycles, MatrixTransposeSSE49,  testMatZZ 99x99, Lin, Col
76201  cycles, MatrixTransposeSSE48,  testMatZZ 99x99, Lin, Col
76228  cycles, MatrixTransposeSSE47,  testMatZZ 99x99, Lin, Col
76241  cycles, MatrixTransposeSSE46,  testMatZZ 99x99, Lin, Col
78145  cycles, MatrixTransposeRE,     testMatZZ 99x99, Lin, Col

jj2007

Rui, which one of the 16 exes should we test?