### Author Topic: Testing Transpose of a Matrix  (Read 2801 times)

#### Siekmanski

• Member
• Posts: 1684
##### Re: Testing Transpose of a Matrix
« Reply #30 on: May 19, 2018, 01:05:09 AM »

Creative coders use backward thinking techniques as a strategy.

#### dedndave

• Member
• Posts: 8808
• Still using Abacus 2.0
##### Re: Testing Transpose of a Matrix
« Reply #31 on: May 19, 2018, 05:28:07 AM »
create a matrix of pointers to the real10's
transpose the pointer matrix

#### RuiLoureiro

• Member
• Posts: 819
##### Re: Testing Transpose of a Matrix
« Reply #32 on: May 19, 2018, 07:38:35 AM »
create a matrix of pointers to the real10's
transpose the pointer matrix
Hi Dave,
How are you ? I hope you are fine !

Well it is well known that when we dont want to move a lot of an array elements
we use an array of pointers: it is an array of dwords...
Your solution seems to be expensive: each matrix, each array of pointer matrix.
If we define 100 matrices we need to define 200 arrays (100 for matrices+100 for pointers).
It seems because if a=[1,2;3,4] and we do a=a^t; Now where is a ? What are the elemens of a ?

(In TheCalculator, when we write a matrix name "a" and press enter/compute it shows "a")
TheCalculator uses 16 bytes for each real10 so...

Dave, have you the same Pentium 4 CPU yet ? Do you remember why ?
Good luck

« Last Edit: May 19, 2018, 09:02:21 AM by RuiLoureiro »

#### dedndave

• Member
• Posts: 8808
• Still using Abacus 2.0
##### Re: Testing Transpose of a Matrix
« Reply #33 on: May 21, 2018, 03:17:42 AM »
hi Rui

#### RuiLoureiro

• Member
• Posts: 819
##### Re: Testing Transpose of a Matrix
« Reply #34 on: June 02, 2018, 06:49:37 AM »
Hi all,
Inside the folder SSE30_33_tests.zip we have my results of a set of tests
of SSE30 to SSE33 procedures to transpose any matrix of any size
1x1 up to NxM. You may test it also and if you want post your results
if you have a i5 / i7 / AMD CPU or better. Your contribution may be useful
to me to understand what i should do next. I have a very slow P4 yet.
You may do all tests but i would like to know the results for this:

TestTranspose97x97_100x100_cyclesSSE30_33_100000
TestTranspose250x256_evenlines_cyclesSSE30_33_10000
I am developing and testing news algorithms SSE38_41 but
the time doesnt stretch... and i am trying to optimize some
cases so i need more time... and more tests. For example i dont
know if push/pop esi is better than a local variable. ... you know ?

Thanks
Good luck

Note: 1000000/100000/10000 is the loop counter used.
At the end of VerifyProcsFrom1x1_to_120x120_SSE30_33
all 4 procs are tested using matrices of 1x1 up to 120x120
defined in the .data? segment. I wrote a general proc
for any size NxM but it is not working for procedures that
doesnt use the dimensions behind the address.

Sample for some types of matrices- like256x256(RE=REference):
Code: [Select]
` ***** Time table - LoopCount =10 000 *****Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)274723  cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col277606  cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col278301  cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col280608  cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col288570  cycles, MatrixTransposeRE,    testMatYY 252x252, Lin, Col276360  cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col277126  cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col278578  cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col289265  cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col291922  cycles, MatrixTransposeRE,    testMatYA 252x250, Lin, Col277665  cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col278713  cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col301969  cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col302056  cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col305275  cycles, MatrixTransposeRE,    testMatWA 256x250, Lin, Col277906  cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col280112  cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col281857  cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col283046  cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col300882  cycles, MatrixTransposeRE,    testMatYC 252x256, Lin, Col278085  cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col281408  cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col289149  cycles, MatrixTransposeRE,    testMatWB 256x252, Lin, Col303014  cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col303088  cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col280843  cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col282137  cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col282411  cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col295377  cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col298502  cycles, MatrixTransposeRE,    testMatYB 252x254, Lin, Col284465  cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col289484  cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col295322  cycles, MatrixTransposeRE,    testMatWC 256x254, Lin, Col307567  cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col307757  cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col287766  cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col287891  cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col307719  cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col307860  cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col308480  cycles, MatrixTransposeRE,    testMatWW 256x256, Lin, Col301073  cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col302764  cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col303633  cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col303680  cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col317857  cycles, MatrixTransposeRE,    testMatXA 250x252, Lin, Col302701  cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col305413  cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col308789  cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col309537  cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col311748  cycles, MatrixTransposeRE,    testMatXX 250x250, Lin, Col305303  cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col309247  cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col310500  cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col317925  cycles, MatrixTransposeRE,    testMatXB 250x254, Lin, Col316275  cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col309104  cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col309146  cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col309344  cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col315558  cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col320653  cycles, MatrixTransposeRE,    testMatXC 250x256, Lin, Col314387  cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col315688  cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col318761  cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col321526  cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col323225  cycles, MatrixTransposeRE,    testMatZA 254x250, Lin, Col315387  cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col315677  cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col316444  cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col322256  cycles, MatrixTransposeRE,    testMatZB 254x252, Lin, Col324936  cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col318743  cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col322824  cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col324874  cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col328921  cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col340988  cycles, MatrixTransposeRE,    testMatZZ 254x254, Lin, Col319148  cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col319973  cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col321797  cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col321990  cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col337617  cycles, MatrixTransposeRE,    testMatZC 254x256, Lin, Col`
Siekmanski:
Code: [Select]
` ***** Time table - LoopCount =10 000 ****Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)169683  cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col169790  cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col170152  cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col170215  cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col171765  cycles, MatrixTransposeRE,    testMatWA 256x250, Lin, Col170144  cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col170190  cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col170553  cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col172547  cycles, MatrixTransposeRE,    testMatWB 256x252, Lin, Col177553  cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col172411  cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col172939  cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col173005  cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col174693  cycles, MatrixTransposeRE,    testMatWC 256x254, Lin, Col175477  cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col173518  cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col173646  cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col173653  cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col173962  cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col176945  cycles, MatrixTransposeRE,    testMatWW 256x256, Lin, Col175354  cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col175500  cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col175655  cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col176391  cycles, MatrixTransposeRE,    testMatYA 252x250, Lin, Col176458  cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col176490  cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col176542  cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col176618  cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col176639  cycles, MatrixTransposeRE,    testMatYY 252x252, Lin, Col176651  cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col178076  cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col178186  cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col178206  cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col178366  cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col178791  cycles, MatrixTransposeRE,    testMatYB 252x254, Lin, Col179118  cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col179137  cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col179194  cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col179201  cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col179252  cycles, MatrixTransposeRE,    testMatYC 252x256, Lin, Col181607  cycles, MatrixTransposeRE,    testMatXX 250x250, Lin, Col   ; <<<<<-----182332  cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col182494  cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col182530  cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col189825  cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col181717  cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col181995  cycles, MatrixTransposeRE,    testMatXA 250x252, Lin, Col182570  cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col182609  cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col182696  cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col182441  cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col182963  cycles, MatrixTransposeRE,    testMatZB 254x252, Lin, Col183670  cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col183767  cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col183898  cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col183673  cycles, MatrixTransposeRE,    testMatZA 254x250, Lin, Col   ; <<<<<-----184208  cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col184249  cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col184322  cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col184480  cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col184616  cycles, MatrixTransposeRE,    testMatXC 250x256, Lin, Col   ; <<<<<-----185553  cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col185616  cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col185628  cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col185849  cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col185145  cycles, MatrixTransposeRE,    testMatXB 250x254, Lin, Col    ; <<<<<-----185595  cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col185601  cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col185650  cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col185790  cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col185163  cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col186095  cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col186169  cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col186210  cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col186934  cycles, MatrixTransposeRE,    testMatZC 254x256, Lin, Col187866  cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col187890  cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col187986  cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col188069  cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col188178  cycles, MatrixTransposeRE,    testMatZZ 254x254, Lin, Col`
« Last Edit: June 02, 2018, 08:12:20 AM by RuiLoureiro »

#### Siekmanski

• Member
• Posts: 1684
##### Re: Testing Transpose of a Matrix
« Reply #35 on: June 02, 2018, 07:22:40 AM »
"TestTranspose97x97_100x100_cyclesSSE30_33_100000"

Code: [Select]
` ***** Time table - LoopCount =1 000 000 *****Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)9267  cycles, MatrixTransposeSSE31,  testMatYY 98x98, Lin, Col9269  cycles, MatrixTransposeSSE30,  testMatYY 98x98, Lin, Col9610  cycles, MatrixTransposeSSE33,  testMatYY 98x98, Lin, Col9610  cycles, MatrixTransposeSSE32,  testMatYY 98x98, Lin, Col10171  cycles, MatrixTransposeRE,     testMatXX 97x97, Lin, Col10179  cycles, MatrixTransposeSSE31,  testMatXX 97x97, Lin, Col10240  cycles, MatrixTransposeSSE30,  testMatXX 97x97, Lin, Col10348  cycles, MatrixTransposeSSE33,  testMatXX 97x97, Lin, Col10366  cycles, MatrixTransposeSSE32,  testMatXX 97x97, Lin, Col10519  cycles, MatrixTransposeRE,     testMatYY 98x98, Lin, Col10674  cycles, MatrixTransposeSSE30,  testMatZZ 99x99, Lin, Col10684  cycles, MatrixTransposeSSE31,  testMatZZ 99x99, Lin, Col10714  cycles, MatrixTransposeSSE33,  testMatZZ 99x99, Lin, Col10717  cycles, MatrixTransposeSSE32,  testMatZZ 99x99, Lin, Col10999  cycles, MatrixTransposeSSE30,  testMatWW 100x100, Lin, Col11002  cycles, MatrixTransposeSSE31,  testMatWW 100x100, Lin, Col11049  cycles, MatrixTransposeSSE32,  testMatWW 100x100, Lin, Col11052  cycles, MatrixTransposeSSE33,  testMatWW 100x100, Lin, Col11192  cycles, MatrixTransposeRE,     testMatWW 100x100, Lin, Col11840  cycles, MatrixTransposeRE,     testMatZZ 99x99, Lin, Col ********** END **********`
"TestTranspose250x256_evenlines_cyclesSSE30_33_10000"

Code: [Select]
` ***** Time table - LoopCount =10 000 *****Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)169683  cycles, MatrixTransposeSSE30, testMatWA 256x250, Lin, Col169790  cycles, MatrixTransposeSSE31, testMatWA 256x250, Lin, Col170144  cycles, MatrixTransposeSSE30, testMatWB 256x252, Lin, Col170152  cycles, MatrixTransposeSSE32, testMatWA 256x250, Lin, Col170190  cycles, MatrixTransposeSSE33, testMatWB 256x252, Lin, Col170215  cycles, MatrixTransposeSSE33, testMatWA 256x250, Lin, Col170553  cycles, MatrixTransposeSSE32, testMatWB 256x252, Lin, Col171765  cycles, MatrixTransposeRE,    testMatWA 256x250, Lin, Col172411  cycles, MatrixTransposeSSE30, testMatWC 256x254, Lin, Col172547  cycles, MatrixTransposeRE,    testMatWB 256x252, Lin, Col172939  cycles, MatrixTransposeSSE32, testMatWC 256x254, Lin, Col173005  cycles, MatrixTransposeSSE33, testMatWC 256x254, Lin, Col173518  cycles, MatrixTransposeSSE31, testMatWW 256x256, Lin, Col173646  cycles, MatrixTransposeSSE30, testMatWW 256x256, Lin, Col173653  cycles, MatrixTransposeSSE33, testMatWW 256x256, Lin, Col173962  cycles, MatrixTransposeSSE32, testMatWW 256x256, Lin, Col174693  cycles, MatrixTransposeRE,    testMatWC 256x254, Lin, Col175354  cycles, MatrixTransposeSSE30, testMatYA 252x250, Lin, Col175477  cycles, MatrixTransposeSSE31, testMatWC 256x254, Lin, Col175500  cycles, MatrixTransposeSSE31, testMatYA 252x250, Lin, Col175655  cycles, MatrixTransposeSSE33, testMatYA 252x250, Lin, Col176391  cycles, MatrixTransposeRE,    testMatYA 252x250, Lin, Col176458  cycles, MatrixTransposeSSE32, testMatYA 252x250, Lin, Col176490  cycles, MatrixTransposeSSE31, testMatYY 252x252, Lin, Col176542  cycles, MatrixTransposeSSE33, testMatYY 252x252, Lin, Col176618  cycles, MatrixTransposeSSE30, testMatYY 252x252, Lin, Col176639  cycles, MatrixTransposeRE,    testMatYY 252x252, Lin, Col176651  cycles, MatrixTransposeSSE32, testMatYY 252x252, Lin, Col176945  cycles, MatrixTransposeRE,    testMatWW 256x256, Lin, Col177553  cycles, MatrixTransposeSSE31, testMatWB 256x252, Lin, Col178076  cycles, MatrixTransposeSSE30, testMatYB 252x254, Lin, Col178186  cycles, MatrixTransposeSSE31, testMatYB 252x254, Lin, Col178206  cycles, MatrixTransposeSSE32, testMatYB 252x254, Lin, Col178366  cycles, MatrixTransposeSSE33, testMatYB 252x254, Lin, Col178791  cycles, MatrixTransposeRE,    testMatYB 252x254, Lin, Col179118  cycles, MatrixTransposeSSE31, testMatYC 252x256, Lin, Col179137  cycles, MatrixTransposeSSE33, testMatYC 252x256, Lin, Col179194  cycles, MatrixTransposeSSE32, testMatYC 252x256, Lin, Col179201  cycles, MatrixTransposeSSE30, testMatYC 252x256, Lin, Col179252  cycles, MatrixTransposeRE,    testMatYC 252x256, Lin, Col181607  cycles, MatrixTransposeRE,    testMatXX 250x250, Lin, Col181717  cycles, MatrixTransposeSSE33, testMatXA 250x252, Lin, Col181995  cycles, MatrixTransposeRE,    testMatXA 250x252, Lin, Col182332  cycles, MatrixTransposeSSE31, testMatXX 250x250, Lin, Col182441  cycles, MatrixTransposeSSE33, testMatZB 254x252, Lin, Col182494  cycles, MatrixTransposeSSE33, testMatXX 250x250, Lin, Col182530  cycles, MatrixTransposeSSE32, testMatXX 250x250, Lin, Col182570  cycles, MatrixTransposeSSE32, testMatXA 250x252, Lin, Col182609  cycles, MatrixTransposeSSE30, testMatXA 250x252, Lin, Col182696  cycles, MatrixTransposeSSE31, testMatXA 250x252, Lin, Col182963  cycles, MatrixTransposeRE,    testMatZB 254x252, Lin, Col183670  cycles, MatrixTransposeSSE31, testMatZB 254x252, Lin, Col183673  cycles, MatrixTransposeRE,    testMatZA 254x250, Lin, Col183767  cycles, MatrixTransposeSSE30, testMatZB 254x252, Lin, Col183898  cycles, MatrixTransposeSSE32, testMatZB 254x252, Lin, Col184208  cycles, MatrixTransposeSSE30, testMatZA 254x250, Lin, Col184249  cycles, MatrixTransposeSSE32, testMatZA 254x250, Lin, Col184322  cycles, MatrixTransposeSSE31, testMatZA 254x250, Lin, Col184480  cycles, MatrixTransposeSSE33, testMatZA 254x250, Lin, Col184616  cycles, MatrixTransposeRE,    testMatXC 250x256, Lin, Col185145  cycles, MatrixTransposeRE,    testMatXB 250x254, Lin, Col185163  cycles, MatrixTransposeSSE33, testMatZC 254x256, Lin, Col185553  cycles, MatrixTransposeSSE30, testMatXC 250x256, Lin, Col185595  cycles, MatrixTransposeSSE31, testMatXB 250x254, Lin, Col185601  cycles, MatrixTransposeSSE30, testMatXB 250x254, Lin, Col185616  cycles, MatrixTransposeSSE32, testMatXC 250x256, Lin, Col185628  cycles, MatrixTransposeSSE33, testMatXC 250x256, Lin, Col185650  cycles, MatrixTransposeSSE33, testMatXB 250x254, Lin, Col185790  cycles, MatrixTransposeSSE32, testMatXB 250x254, Lin, Col185849  cycles, MatrixTransposeSSE31, testMatXC 250x256, Lin, Col186095  cycles, MatrixTransposeSSE31, testMatZC 254x256, Lin, Col186169  cycles, MatrixTransposeSSE30, testMatZC 254x256, Lin, Col186210  cycles, MatrixTransposeSSE32, testMatZC 254x256, Lin, Col186934  cycles, MatrixTransposeRE,    testMatZC 254x256, Lin, Col187866  cycles, MatrixTransposeSSE32, testMatZZ 254x254, Lin, Col187890  cycles, MatrixTransposeSSE30, testMatZZ 254x254, Lin, Col187986  cycles, MatrixTransposeSSE31, testMatZZ 254x254, Lin, Col188069  cycles, MatrixTransposeSSE33, testMatZZ 254x254, Lin, Col188178  cycles, MatrixTransposeRE,    testMatZZ 254x254, Lin, Col189825  cycles, MatrixTransposeSSE30, testMatXX 250x250, Lin, Col ********** END **********`
Creative coders use backward thinking techniques as a strategy.

#### RuiLoureiro

• Member
• Posts: 819
##### Re: Testing Transpose of a Matrix
« Reply #36 on: June 02, 2018, 09:35:17 PM »
Hi
If you have a i5 / i7 / AMD CPU or better
would you mind to post the results for this:

TestTranspose506x512_evenlines_cyclesSSE30_33_10000

My results for big matrices like 512x512
Code: [Select]
` ***** Time table - LoopCount =10 000 *****Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)1207440  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col1343612  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col1344018  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col1561033  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col1663892  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col1228184  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col1245623  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col1383897  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col1399765  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col1610377  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col1300479  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col1371443  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col1374818  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col1377963  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col1519329  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col1329759  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col1387799  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col1391123  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col1969762  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col1679268  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col1344344  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col1373482  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col1418670  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col1446757  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col1600959  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col1351207  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col1353411  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col1375846  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col1383705  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col1588341  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col1352139  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col1366498  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col1381257  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col1446800  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col1581571  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col1364639  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col1399896  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col1401838  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col1553876  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col1558572  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col1408078  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col1409122  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col1416770  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col1420956  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col1625558  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col1411634  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col1413983  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col1419621  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col1419933  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col1606143  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col1443784  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col1450609  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col1548318  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col1717637  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col1974898  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col1451095  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col1460987  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col1474059  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col1479797  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col1601902  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col1461182  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col1466117  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col1509165  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col1565127  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col1618744  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col1473912  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col1474975  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col1594094  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col1595387  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col1903505  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col1480526  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col1483428  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col1483558  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col1632179  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col1659401  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col1494118  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col1505676  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col1530651  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col1542758  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col1662796  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col`

#### HSE

• Member
• Posts: 845
• <AMD>< 7-32>
##### Re: Testing Transpose of a Matrix
« Reply #37 on: June 03, 2018, 12:05:44 AM »
Hi Rui!

Code: [Select]
` ***** Time table - LoopCount =10 000 *****AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)1582930  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col1589090  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col1629882  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col1634399  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col1634981  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col1654561  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col1655602  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col1657649  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col1675287  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col1679104  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col1681008  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col1681443  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col1681677  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col1684656  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col1696919  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col1698205  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col1706127  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col1706813  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col1708254  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col1710365  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col1718140  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col1718243  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col1723216  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col1727195  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col1738273  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col1743494  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col1745981  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col1746192  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col1750273  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col1752060  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col1760193  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col1765761  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col1770073  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col1771423  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col1782817  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col1784885  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col1789888  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col1792618  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col1798966  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col1799624  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col1804543  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col1808388  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col1815006  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col1821494  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col1821663  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col1828442  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col1832988  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col1848994  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col1850935  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col1851496  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col1860365  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col1863502  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col1869659  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col1872352  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col1876061  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col1905205  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col1905353  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col1905891  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col1912748  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col2014807  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col2025312  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col2027611  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col2130691  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col2199452  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col2208662  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col2218100  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col2272197  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col2284930  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col2304169  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col2309517  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col2315637  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col2318191  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col2335750  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col2349308  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col2356218  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col2367015  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col2384230  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col2393212  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col2430160  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col2471007  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col`

#### Siekmanski

• Member
• Posts: 1684
##### Re: Testing Transpose of a Matrix
« Reply #38 on: June 03, 2018, 01:00:25 AM »
TestTranspose506x512_evenlines_cyclesSSE30_33_10000

Code: [Select]
` ***** Time table - LoopCount =10 000 *****Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)516016  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col516032  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col518944  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col519159  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col520085  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col521001  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col521046  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col521454  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col521850  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col523831  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col529275  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col532225  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col533638  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col534200  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col536010  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col536133  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col536956  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col539305  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col553189  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col556279  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col556556  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col558625  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col561847  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col561946  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col562192  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col563973  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col563994  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col564207  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col564973  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col566778  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col567599  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col567628  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col567824  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col568741  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col569236  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col569664  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col570366  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col570429  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col577716  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col649385  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col697810  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col697820  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col701489  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col703018  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col703127  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col703380  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col703681  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col705732  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col707101  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col707653  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col707703  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col711017  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col714643  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col717562  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col748249  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col750745  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col754049  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col759664  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col760467  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col784521  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col821192  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col821220  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col822507  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col824443  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col825652  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col826155  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col828776  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col828826  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col830018  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col830026  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col832190  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col832262  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col832842  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col833257  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col833705  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col835167  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col835822  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col842265  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col842879  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col844842  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col ********** END **********`
Creative coders use backward thinking techniques as a strategy.

#### mineiro

• Member
• Posts: 450
##### Re: Testing Transpose of a Matrix
« Reply #39 on: June 03, 2018, 01:35:41 AM »
Code: [Select]
` ***** Time table - LoopCount =10 000 *****Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (SSE4)345439  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col345581  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col346869  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col347229  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col347355  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col349661  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col353445  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col353570  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col355062  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col356014  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col356298  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col356396  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col357701  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col359675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col365421  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col366776  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col368150  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col369632  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col373951  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col374200  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col379271  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col379788  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col381998  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col382001  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col382428  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col382806  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col384346  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col386417  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col386421  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col387250  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col388306  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col388765  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col389782  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col390197  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col391379  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col394047  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col398618  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col398927  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col399796  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col415985  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col436669  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col440358  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col446288  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col447876  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col449851  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col450205  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col451526  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col454294  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col458092  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col460574  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col461186  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col462350  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col464766  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col466883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col472146  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col475473  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col475762  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col481264  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col487516  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col490776  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col770667  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col771167  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col771268  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col774841  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col775229  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col775562  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col778550  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col778918  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col779108  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col783743  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col784066  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col784178  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col784690  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col787297  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col789322  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col792116  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col792221  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col794690  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col796554  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col797408  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col ********** END **********`
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

#### RuiLoureiro

• Member
• Posts: 819
##### Re: Testing Transpose of a Matrix
« Reply #40 on: June 03, 2018, 07:40:10 AM »
Thanks all

These are the results (i7/AMD) sorted by matrix type:
Code: [Select]
`mineiro:***** Time table - LoopCount =10 000 *****Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (SSE4)345439  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col345581  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col347355  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col356396  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col369632  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col346869  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col347229  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col349661  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col357701  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col368150  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col353445  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col353570  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col356014  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col365421  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col374200  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col355062  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col356298  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col359675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col366776  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col373951  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col379271  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col382001  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col389782  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col390197  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col398927  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col379788  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col382428  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col387250  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col388306  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col415985  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col381998  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col384346  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col388765  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col394047  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col399796  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col382806  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col386417  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col386421  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col391379  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col398618  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col436669  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col450205  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col451526  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col461186  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col481264  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col440358  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col446288  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col447876  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col458092  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col475473  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col449851  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col464766  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col466883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col475762  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col490776  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col454294  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col460574  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col462350  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col472146  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col487516  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col770667  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col771167  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col771268  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col784690  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col787297  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col774841  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col775229  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col775562  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col789322  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col792116  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col778550  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col778918  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col779108  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col792221  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col794690  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col783743  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col784066  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col784178  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col796554  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col797408  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««siekmanski:***** Time table - LoopCount =10 000 *****Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)516016  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col       ; <<<<<<----520085  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col521001  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col536956  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col521850  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col516032  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col       ; <<<<<<----518944  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col519159  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col521454  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col523831  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col521046  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col       ; <<<<<<----529275  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col532225  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col533638  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col536010  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col534200  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col       ; <<<<<<----536133  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col539305  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col564207  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col577716  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col553189  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col       ; <<<<<<----556279  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col556556  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col558625  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col566778  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col561847  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col       ; <<<<<<----561946  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col562192  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col569664  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col649385  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col563973  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col       ; <<<<<<----567599  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col567628  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col570366  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col570429  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col563994  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col       ; <<<<<<----564973  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col567824  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col568741  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col569236  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col697810  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col697820  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col703127  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col714643  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col754049  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col701489  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col703018  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col703380  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col703681  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col760467  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col705732  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col707101  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col717562  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col750745  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col784521  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col707653  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col707703  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col711017  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col748249  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col759664  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col821192  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col821220  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col822507  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col824443  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col830018  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col825652  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col826155  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col828776  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col828826  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col835822  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col830026  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col832842  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col833257  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col835167  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col842879  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col832190  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col832262  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col833705  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col842265  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col844842  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««  HSE: ***** Time table - LoopCount =10 000 *****AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)1582930  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col1589090  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col1657649  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col1679104  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col1905353  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col1629882  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col1654561  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col1655602  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col1718243  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col1863502  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col1634399  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col1634981  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col1876061  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col1905205  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col1804543  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col1675287  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col1738273  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col1750273  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col1815006  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col1850935  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col1681008  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col1743494  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col1799624  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col1832988  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col1872352  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col1681443  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col1718140  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col1760193  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col1784885  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col1798966  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col1681677  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col1745981  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col1746192  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col1851496  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col1869659  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col1684656  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col1765761  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col1771423  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col1848994  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col1860365  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col1696919  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col1723216  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col1727195  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col1752060  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col2014807  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col1698205  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col1706127  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col1792618  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col1708254  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col1782817  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col1706813  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col1789888  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col1808388  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col1821494  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col1905891  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col1710365  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col1770073  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col1821663  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col1828442  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col1912748  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col2025312  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col       ; <<<<<<----2199452  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col2304169  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col2315637  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col2471007  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col2027611  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col       ; <<<<<<----2218100  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col2284930  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col2335750  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col2393212  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col2130691  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col       ; <<<<<<----2309517  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col2318191  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col2349308  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col2430160  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col2208662  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col       ; <<<<<<----2272197  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col2356218  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col2367015  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col2384230  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col`

#### zedd151

• Member
• Posts: 850
##### Re: Testing Transpose of a Matrix
« Reply #41 on: June 03, 2018, 04:10:29 PM »
As always, I'm stylishly late.

Code: [Select]
`AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)810005  cycles, MatrixTransposeRE,    testMatYY 508x508, Lin, Col820058  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col821950  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col823983  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col828697  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col829394  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col830046  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col832485  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col832895  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col833612  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col834352  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col835091  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col839675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col846346  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col846411  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col848844  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col849311  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col852962  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col855288  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col862895  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col871913  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col874990  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col878483  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col878772  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col879291  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col884764  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col886302  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col887827  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col888715  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col889632  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col890132  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col891733  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col892603  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col898395  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col902633  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col902953  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col902953  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col902999  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col921149  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col965255  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col1503461  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col1508044  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col1508102  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col1539823  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col1553754  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col1589446  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col1612950  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col1617458  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col1618238  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col1619925  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col1625738  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col1628523  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col1629810  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col1631325  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col1632128  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col1632552  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col1632744  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col1633612  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col1634083  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col1638780  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col1640885  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col1643499  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col1681922  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col1708885  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col1709024  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col1729089  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col1745465  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col1751541  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col1753087  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col1754633  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col1759147  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col1759762  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col1760467  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col1761773  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col1761883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col1763743  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col1765680  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col1767771  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col1768096  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col1773351  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col ********** END **********`

although my processor speed isn't listed by the program, it is 1.60 Ghz...
I'm not always the sharpest knife in the drawer, but I have my moments.

#### RuiLoureiro

• Member
• Posts: 819
##### Re: Testing Transpose of a Matrix
« Reply #42 on: June 03, 2018, 06:00:52 PM »
Hi
Thanks
It is not possible to add this to the previous set (more than 20000 characters)

Here are your results (AMD) sorted by matrix type
zedd151:
Code: [Select]
`AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4) 810005  cycles, MatrixTransposeRE,      testMatYY 508x508, Lin, Col       ; <<<<<<---- 828697  cycles, MatrixTransposeSSE31, testMatYY 508x508, Lin, Col 829394  cycles, MatrixTransposeSSE32, testMatYY 508x508, Lin, Col 830046  cycles, MatrixTransposeSSE33, testMatYY 508x508, Lin, Col 832485  cycles, MatrixTransposeSSE30, testMatYY 508x508, Lin, Col 820058  cycles, MatrixTransposeSSE30, testMatYC 508x512, Lin, Col 833612  cycles, MatrixTransposeSSE31, testMatYC 508x512, Lin, Col 839675  cycles, MatrixTransposeSSE32, testMatYC 508x512, Lin, Col 846346  cycles, MatrixTransposeSSE33, testMatYC 508x512, Lin, Col 849311  cycles, MatrixTransposeRE,    testMatYC 508x512, Lin, Col 821950  cycles, MatrixTransposeSSE30, testMatYA 508x506, Lin, Col 823983  cycles, MatrixTransposeRE,    testMatYA 508x506, Lin, Col 832895  cycles, MatrixTransposeSSE33, testMatYA 508x506, Lin, Col 834352  cycles, MatrixTransposeSSE32, testMatYA 508x506, Lin, Col 835091  cycles, MatrixTransposeSSE31, testMatYA 508x506, Lin, Col 846411  cycles, MatrixTransposeSSE30, testMatYB 508x510, Lin, Col 848844  cycles, MatrixTransposeSSE32, testMatYB 508x510, Lin, Col 852962  cycles, MatrixTransposeSSE31, testMatYB 508x510, Lin, Col 855288  cycles, MatrixTransposeRE,    testMatYB 508x510, Lin, Col 965255  cycles, MatrixTransposeSSE33, testMatYB 508x510, Lin, Col 862895  cycles, MatrixTransposeSSE33, testMatXC 506x512, Lin, Col 871913  cycles, MatrixTransposeSSE31, testMatXC 506x512, Lin, Col 878772  cycles, MatrixTransposeRE,    testMatXC 506x512, Lin, Col 879291  cycles, MatrixTransposeSSE32, testMatXC 506x512, Lin, Col 887827  cycles, MatrixTransposeSSE30, testMatXC 506x512, Lin, Col 874990  cycles, MatrixTransposeRE,    testMatXX 506x506, Lin, Col       ; <<<<<<---- 884764  cycles, MatrixTransposeSSE32, testMatXX 506x506, Lin, Col 886302  cycles, MatrixTransposeSSE33, testMatXX 506x506, Lin, Col 891733  cycles, MatrixTransposeSSE30, testMatXX 506x506, Lin, Col 892603  cycles, MatrixTransposeSSE31, testMatXX 506x506, Lin, Col 878483  cycles, MatrixTransposeSSE33, testMatXB 506x510, Lin, Col 889632  cycles, MatrixTransposeRE,    testMatXB 506x510, Lin, Col 890132  cycles, MatrixTransposeSSE31, testMatXB 506x510, Lin, Col 898395  cycles, MatrixTransposeSSE30, testMatXB 506x510, Lin, Col 921149  cycles, MatrixTransposeSSE32, testMatXB 506x510, Lin, Col 888715  cycles, MatrixTransposeSSE33, testMatXA 506x508, Lin, Col 902633  cycles, MatrixTransposeSSE30, testMatXA 506x508, Lin, Col 902953  cycles, MatrixTransposeSSE32, testMatXA 506x508, Lin, Col 902953  cycles, MatrixTransposeSSE31, testMatXA 506x508, Lin, Col 902999  cycles, MatrixTransposeRE,    testMatXA 506x508, Lin, Col1503461  cycles, MatrixTransposeSSE31, testMatZB 510x508, Lin, Col1745465  cycles, MatrixTransposeSSE32, testMatZB 510x508, Lin, Col1751541  cycles, MatrixTransposeSSE30, testMatZB 510x508, Lin, Col1754633  cycles, MatrixTransposeSSE33, testMatZB 510x508, Lin, Col1759762  cycles, MatrixTransposeRE,    testMatZB 510x508, Lin, Col1508044  cycles, MatrixTransposeSSE30, testMatWC 512x510, Lin, Col1631325  cycles, MatrixTransposeRE,    testMatWC 512x510, Lin, Col1632128  cycles, MatrixTransposeSSE32, testMatWC 512x510, Lin, Col1638780  cycles, MatrixTransposeSSE33, testMatWC 512x510, Lin, Col1640885  cycles, MatrixTransposeSSE31, testMatWC 512x510, Lin, Col1508102  cycles, MatrixTransposeSSE32, testMatZZ 510x510, Lin, Col1760467  cycles, MatrixTransposeSSE33, testMatZZ 510x510, Lin, Col1761773  cycles, MatrixTransposeSSE31, testMatZZ 510x510, Lin, Col1767771  cycles, MatrixTransposeSSE30, testMatZZ 510x510, Lin, Col1768096  cycles, MatrixTransposeRE,    testMatZZ 510x510, Lin, Col1539823  cycles, MatrixTransposeRE,    testMatWB 512x508, Lin, Col       ; <<<<<<----1625738  cycles, MatrixTransposeSSE31, testMatWB 512x508, Lin, Col1628523  cycles, MatrixTransposeSSE32, testMatWB 512x508, Lin, Col1632552  cycles, MatrixTransposeSSE33, testMatWB 512x508, Lin, Col1633612  cycles, MatrixTransposeSSE30, testMatWB 512x508, Lin, Col1553754  cycles, MatrixTransposeRE,    testMatWA 512x506, Lin, Col       ; <<<<<<----1612950  cycles, MatrixTransposeSSE32, testMatWA 512x506, Lin, Col1617458  cycles, MatrixTransposeSSE30, testMatWA 512x506, Lin, Col1618238  cycles, MatrixTransposeSSE33, testMatWA 512x506, Lin, Col1619925  cycles, MatrixTransposeSSE31, testMatWA 512x506, Lin, Col1589446  cycles, MatrixTransposeSSE30, testMatWW 512x512, Lin, Col1629810  cycles, MatrixTransposeSSE31, testMatWW 512x512, Lin, Col1632744  cycles, MatrixTransposeRE,    testMatWW 512x512, Lin, Col1634083  cycles, MatrixTransposeSSE33, testMatWW 512x512, Lin, Col1643499  cycles, MatrixTransposeSSE32, testMatWW 512x512, Lin, Col1681922  cycles, MatrixTransposeSSE30, testMatZA 510x506, Lin, Col1708885  cycles, MatrixTransposeSSE32, testMatZA 510x506, Lin, Col1709024  cycles, MatrixTransposeSSE33, testMatZA 510x506, Lin, Col1729089  cycles, MatrixTransposeSSE31, testMatZA 510x506, Lin, Col1753087  cycles, MatrixTransposeRE,    testMatZA 510x506, Lin, Col1759147  cycles, MatrixTransposeSSE30, testMatZC 510x512, Lin, Col1761883  cycles, MatrixTransposeSSE32, testMatZC 510x512, Lin, Col1763743  cycles, MatrixTransposeSSE33, testMatZC 510x512, Lin, Col1765680  cycles, MatrixTransposeSSE31, testMatZC 510x512, Lin, Col1773351  cycles, MatrixTransposeRE,    testMatZC 510x512, Lin, Col`

#### RuiLoureiro

• Member
• Posts: 819
##### Re: Testing Transpose of a Matrix
« Reply #43 on: June 06, 2018, 08:08:19 AM »
Hi all,
Here is the new version SSE46_49 that you may test/see my results.
If you have a i5 / i7 / AMD CPU and you want to show me
your results, please ZIP it and post. Use only these (or what you want):

TestTranspose97_100_cyclesSSE46_49_1000000
TestTranspose506x512_evenlines_cyclesSSE46_49_1000
TestTranspose506x512_oddlines_cyclesSSE46_49_1000
Thank you

My little sample:
Code: [Select]
`Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)40390  cycles, MatrixTransposeSSE49,  testMatWW 100x100, Lin, Col40493  cycles, MatrixTransposeSSE48,  testMatWW 100x100, Lin, Col42654  cycles, MatrixTransposeSSE46,  testMatWW 100x100, Lin, Col42794  cycles, MatrixTransposeSSE47,  testMatWW 100x100, Lin, Col43395  cycles, MatrixTransposeRE,     testMatWW 100x100, Lin, Col47565  cycles, MatrixTransposeSSE46,  testMatYY 98x98, Lin, Col47702  cycles, MatrixTransposeSSE49,  testMatYY 98x98, Lin, Col47770  cycles, MatrixTransposeSSE48,  testMatYY 98x98, Lin, Col47969  cycles, MatrixTransposeSSE47,  testMatYY 98x98, Lin, Col48718  cycles, MatrixTransposeRE,     testMatYY 98x98, Lin, Col72327  cycles, MatrixTransposeSSE49,  testMatXX 97x97, Lin, Col72434  cycles, MatrixTransposeSSE48,  testMatXX 97x97, Lin, Col72506  cycles, MatrixTransposeSSE46,  testMatXX 97x97, Lin, Col72532  cycles, MatrixTransposeSSE47,  testMatXX 97x97, Lin, Col77414  cycles, MatrixTransposeRE,     testMatXX 97x97, Lin, Col76113  cycles, MatrixTransposeSSE49,  testMatZZ 99x99, Lin, Col76201  cycles, MatrixTransposeSSE48,  testMatZZ 99x99, Lin, Col76228  cycles, MatrixTransposeSSE47,  testMatZZ 99x99, Lin, Col76241  cycles, MatrixTransposeSSE46,  testMatZZ 99x99, Lin, Col78145  cycles, MatrixTransposeRE,     testMatZZ 99x99, Lin, Col`
« Last Edit: June 07, 2018, 02:03:43 AM by RuiLoureiro »

#### jj2007

• Member
• Posts: 8845
• Assembler is fun ;-)
##### Re: Testing Transpose of a Matrix
« Reply #44 on: June 06, 2018, 02:49:25 PM »
Rui, which one of the 16 exes should we test?