News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Testing Transpose of a Matrix

Started by RuiLoureiro, May 07, 2018, 02:13:55 AM

Previous topic - Next topic

aw27

So you copied my work like chinese do, but you mentioned it was based on Siekmanski's. I also used the Siekmanski SSE algo, although my original one was almost as fast. So where the difference comes from? Where is your source code?

Your results are obviously fabricated, even tested in a computer nobody uses anymore.

zedd151



30 cycles, MatrixTransposeAW, transposeMatX

34 cycles, MatrixTransposeMO, transposeMatX

58 cycles, MatrixTransposeAW, transposeMatY

60 cycles, MatrixTransposeMO, transposeMatY

59 cycles, MatrixTransposeAW, transposeMatV

53 cycles, MatrixTransposeMO, transposeMatV

107 cycles, MatrixTransposeAW, transposeMatZ

105 cycles, MatrixTransposeMO, transposeMatZ

124 cycles, MatrixTransposeAW, transposeMatW

116 cycles, MatrixTransposeMO, transposeMatW

132 cycles, MatrixTransposeAW, transposeMatQ

135 cycles, MatrixTransposeMO, transposeMatQ

50 cycles, MatrixTransposeAW, transposeMatR

45 cycles, MatrixTransposeMO, transposeMatR

44 cycles, MatrixTransposeAW, transposeMatS

37 cycles, MatrixTransposeMO, transposeMatS

156 cycles, MatrixTransposeAW, transposeMatT

151 cycles, MatrixTransposeMO, transposeMatT

22 cycles, MatrixTransposeSSE14, transposeMatX

37 cycles, MatrixTransposeSSE14, transposeMatY

41 cycles, MatrixTransposeSSE14, transposeMatV

56 cycles, MatrixTransposeSSE14, transposeMatZ

58 cycles, MatrixTransposeSSE14, transposeMatW

125 cycles, MatrixTransposeSSE14, transposeMatQ

36 cycles, MatrixTransposeSSE14, transposeMatR

33 cycles, MatrixTransposeSSE14, transposeMatS

66 cycles, MatrixTransposeSSE14, transposeMatT

23 cycles, MatrixTransposeSSE15, transposeMatX

41 cycles, MatrixTransposeSSE15, transposeMatY

43 cycles, MatrixTransposeSSE15, transposeMatV

56 cycles, MatrixTransposeSSE15, transposeMatZ

61 cycles, MatrixTransposeSSE15, transposeMatW

123 cycles, MatrixTransposeSSE15, transposeMatQ

37 cycles, MatrixTransposeSSE15, transposeMatR

33 cycles, MatrixTransposeSSE15, transposeMatS

61 cycles, MatrixTransposeSSE15, transposeMatT

25 cycles, MatrixTransposeSSE16, transposeMatX

35 cycles, MatrixTransposeSSE16, transposeMatY

42 cycles, MatrixTransposeSSE16, transposeMatV

64 cycles, MatrixTransposeSSE16, transposeMatZ

68 cycles, MatrixTransposeSSE16, transposeMatW

139 cycles, MatrixTransposeSSE16, transposeMatQ

41 cycles, MatrixTransposeSSE16, transposeMatR

40 cycles, MatrixTransposeSSE16, transposeMatS

72 cycles, MatrixTransposeSSE16, transposeMatT

25 cycles, MatrixTransposeSSE17, transposeMatX

37 cycles, MatrixTransposeSSE17, transposeMatY

40 cycles, MatrixTransposeSSE17, transposeMatV

65 cycles, MatrixTransposeSSE17, transposeMatZ

67 cycles, MatrixTransposeSSE17, transposeMatW

153 cycles, MatrixTransposeSSE17, transposeMatQ

41 cycles, MatrixTransposeSSE17, transposeMatR

38 cycles, MatrixTransposeSSE17, transposeMatS

70 cycles, MatrixTransposeSSE17, transposeMatT

*** STOP. Press any key to show the Time Table ***

***** Time table *****

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

22  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
23  cycles, MatrixTransposeSSE15,  testMatX 4x4, Lin, Col
25  cycles, MatrixTransposeSSE17,  testMatX 4x4, Lin, Col
25  cycles, MatrixTransposeSSE16,  testMatX 4x4, Lin, Col
30  cycles, MatrixTransposeAW,  testMatX 4x4, Lin, Col
33  cycles, MatrixTransposeSSE15,  testMatS 4x8, Lin, Col
33  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
34  cycles, MatrixTransposeMO,  testMatX 4x4
35  cycles, MatrixTransposeSSE16,  testMatY 2x4, Lin, Col
36  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
37  cycles, MatrixTransposeSSE15,  testMatR 8x4, Lin, Col
37  cycles, MatrixTransposeMO,  testMatS 4x8
37  cycles, MatrixTransposeSSE17,  testMatY 2x4, Lin, Col
37  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
38  cycles, MatrixTransposeSSE17,  testMatS 4x8, Lin, Col
40  cycles, MatrixTransposeSSE17,  testMatV 4x2, Lin, Col
40  cycles, MatrixTransposeSSE16,  testMatS 4x8, Lin, Col
41  cycles, MatrixTransposeSSE16,  testMatR 8x4, Lin, Col
41  cycles, MatrixTransposeSSE15,  testMatY 2x4, Lin, Col
41  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col
41  cycles, MatrixTransposeSSE17,  testMatR 8x4, Lin, Col
42  cycles, MatrixTransposeSSE16,  testMatV 4x2, Lin, Col
43  cycles, MatrixTransposeSSE15,  testMatV 4x2, Lin, Col
44  cycles, MatrixTransposeAW,  testMatS 4x8, Lin, Col
45  cycles, MatrixTransposeMO,  testMatR 8x4
50  cycles, MatrixTransposeAW,  testMatR 8x4, Lin, Col
53  cycles, MatrixTransposeMO,  testMatV 4x2
56  cycles, MatrixTransposeSSE15,  testMatZ 7x8, Lin, Col
56  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
58  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
58  cycles, MatrixTransposeAW,  testMatY 2x4, Lin, Col
59  cycles, MatrixTransposeAW,  testMatV 4x2, Lin, Col
60  cycles, MatrixTransposeMO,  testMatY 2x4
61  cycles, MatrixTransposeSSE15,  testMatW 8x7, Lin, Col
61  cycles, MatrixTransposeSSE15,  testMatT 7x7, Lin, Col
64  cycles, MatrixTransposeSSE16,  testMatZ 7x8, Lin, Col
65  cycles, MatrixTransposeSSE17,  testMatZ 7x8, Lin, Col
66  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
67  cycles, MatrixTransposeSSE17,  testMatW 8x7, Lin, Col
68  cycles, MatrixTransposeSSE16,  testMatW 8x7, Lin, Col
70  cycles, MatrixTransposeSSE17,  testMatT 7x7, Lin, Col
72  cycles, MatrixTransposeSSE16,  testMatT 7x7, Lin, Col
105  cycles, MatrixTransposeMO,  testMatZ 7x8
107  cycles, MatrixTransposeAW,  testMatZ 7x8, Lin, Col
116  cycles, MatrixTransposeMO,  testMatW 8x7
123  cycles, MatrixTransposeSSE15,  testMatQ 12x12, Lin, Col
124  cycles, MatrixTransposeAW,  testMatW 8x7, Lin, Col
125  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
132  cycles, MatrixTransposeAW,  testMatQ 12x12, Lin, Col
135  cycles, MatrixTransposeMO,  testMatQ 12x12
139  cycles, MatrixTransposeSSE16,  testMatQ 12x12, Lin, Col
151  cycles, MatrixTransposeMO,  testMatT 7x7
153  cycles, MatrixTransposeSSE17,  testMatQ 12x12, Lin, Col
156  cycles, MatrixTransposeAW,  testMatT 7x7, Lin, Col
********** END **********

RuiLoureiro

#17
Quote from: aw27 on May 15, 2018, 09:33:54 AM
So you copied my work like chinese do, but you mentioned it was based on Siekmanski's. I also used the Siekmanski SSE algo, although my original one was almost as fast. So where the difference comes from? Where is your source code?

Your results are obviously fabricated, even tested in a computer nobody uses anymore.
Hi aw27,
            Sorry but you are not right, i dont need to use any part of your algorithm. I dont think like you do and didnt write my algos based on what you did. I know enough assembly and math to do what i think to do. The calculator does matrix transpose ... and all things written by me. All. Another different thing is to understand what is made. So it is correct to say that SSE2 to SSE17 procedures use that block of Siekmanski code (but not as is). But is it the same you use ? Give you the answer. To answer your question "So where the difference comes from?" i would say think about it again: what you have, what you want to get and a lot of different ways to solve that problem.
Now i am still working on that issue, so i have a lot of work to do yet That's all.
See you
:icon14:   
Thank you zedd151  :t
By the way, your results are fabricated zedd151 ? So you get a lot of money ...  ;)

zedd151

Quote from: RuiLoureiro on May 15, 2018, 10:59:39 AMBy the way, your results are fabricated zedd151 ? So you get a lot of money ...  ;)

What??? I was just testing the performance of my new netbook.   :(

nothing fabricated   :icon_confused:

RuiLoureiro

Quote from: zedd151 on May 15, 2018, 11:13:35 AM
Quote from: RuiLoureiro on May 15, 2018, 10:59:39 AMBy the way, your results are fabricated zedd151 ? So you get a lot of money ...  ;)

What??? I was just testing the performance of my new netbook.   :(

nothing fabricated   :icon_confused:
:t

RuiLoureiro

Quote from: aw27 on May 15, 2018, 09:33:54 AM
Your results are obviously fabricated,...
Hi aw27,
              About what you say above i have a question to you: Is it what you usually do ?
I never do it, i never dit it. No here, no in any part of the world. I am here, in this forum since 2007.
Have you any doubt ?
:icon14:

LiaoMi

TestTranspose_cyclesSSE10_13
34 cycles, MatrixTransposeAW, transposeMatX

34 cycles, MatrixTransposeMO, transposeMatX

40 cycles, MatrixTransposeAW, transposeMatY

44 cycles, MatrixTransposeMO, transposeMatY

42 cycles, MatrixTransposeAW, transposeMatV

43 cycles, MatrixTransposeMO, transposeMatV

85 cycles, MatrixTransposeAW, transposeMatZ

84 cycles, MatrixTransposeMO, transposeMatZ

91 cycles, MatrixTransposeAW, transposeMatW

90 cycles, MatrixTransposeMO, transposeMatW

117 cycles, MatrixTransposeAW, transposeMatQ

119 cycles, MatrixTransposeMO, transposeMatQ

48 cycles, MatrixTransposeAW, transposeMatR

47 cycles, MatrixTransposeMO, transposeMatR

55 cycles, MatrixTransposeAW, transposeMatS

42 cycles, MatrixTransposeMO, transposeMatS

133 cycles, MatrixTransposeAW, transposeMatT

116 cycles, MatrixTransposeMO, transposeMatT

25 cycles, MatrixTransposeSSE10, transposeMatX

35 cycles, MatrixTransposeSSE10, transposeMatY

40 cycles, MatrixTransposeSSE10, transposeMatV

57 cycles, MatrixTransposeSSE10, transposeMatZ

59 cycles, MatrixTransposeSSE10, transposeMatW

104 cycles, MatrixTransposeSSE10, transposeMatQ

36 cycles, MatrixTransposeSSE10, transposeMatR

34 cycles, MatrixTransposeSSE10, transposeMatS

58 cycles, MatrixTransposeSSE10, transposeMatT

25 cycles, MatrixTransposeSSE11, transposeMatX

35 cycles, MatrixTransposeSSE11, transposeMatY

41 cycles, MatrixTransposeSSE11, transposeMatV

55 cycles, MatrixTransposeSSE11, transposeMatZ

59 cycles, MatrixTransposeSSE11, transposeMatW

106 cycles, MatrixTransposeSSE11, transposeMatQ

35 cycles, MatrixTransposeSSE11, transposeMatR

34 cycles, MatrixTransposeSSE11, transposeMatS

58 cycles, MatrixTransposeSSE11, transposeMatT

26 cycles, MatrixTransposeSSE12, transposeMatX

47 cycles, MatrixTransposeSSE12, transposeMatY

60 cycles, MatrixTransposeSSE12, transposeMatV

92 cycles, MatrixTransposeSSE12, transposeMatZ

90 cycles, MatrixTransposeSSE12, transposeMatW

126 cycles, MatrixTransposeSSE12, transposeMatQ

41 cycles, MatrixTransposeSSE12, transposeMatR

45 cycles, MatrixTransposeSSE12, transposeMatS

106 cycles, MatrixTransposeSSE12, transposeMatT

38 cycles, MatrixTransposeSSE13, transposeMatX

64 cycles, MatrixTransposeSSE13, transposeMatY

71 cycles, MatrixTransposeSSE13, transposeMatV

74 cycles, MatrixTransposeSSE13, transposeMatZ

82 cycles, MatrixTransposeSSE13, transposeMatW

147 cycles, MatrixTransposeSSE13, transposeMatQ

57 cycles, MatrixTransposeSSE13, transposeMatR

47 cycles, MatrixTransposeSSE13, transposeMatS

91 cycles, MatrixTransposeSSE13, transposeMatT

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

25  cycles, MatrixTransposeSSE11,  testMatX 4x4
25  cycles, MatrixTransposeSSE10,  testMatX 4x4
26  cycles, MatrixTransposeSSE12,  testMatX 4x4
34  cycles, MatrixTransposeSSE10,  testMatS 4x8
34  cycles, MatrixTransposeSSE11,  testMatS 4x8
34  cycles, MatrixTransposeMO,  testMatX 4x4
34  cycles, MatrixTransposeAW,  testMatX 4x4, Lin, Col
35  cycles, MatrixTransposeSSE11,  testMatR 8x4
35  cycles, MatrixTransposeSSE11,  testMatY 2x4
35  cycles, MatrixTransposeSSE10,  testMatY 2x4
36  cycles, MatrixTransposeSSE10,  testMatR 8x4
38  cycles, MatrixTransposeSSE13,  testMatX 4x4
40  cycles, MatrixTransposeSSE10,  testMatV 4x2
40  cycles, MatrixTransposeAW,  testMatY 2x4, Lin, Col
41  cycles, MatrixTransposeSSE12,  testMatR 8x4
41  cycles, MatrixTransposeSSE11,  testMatV 4x2
42  cycles, MatrixTransposeAW,  testMatV 4x2, Lin, Col
42  cycles, MatrixTransposeMO,  testMatS 4x8
43  cycles, MatrixTransposeMO,  testMatV 4x2
44  cycles, MatrixTransposeMO,  testMatY 2x4
45  cycles, MatrixTransposeSSE12,  testMatS 4x8
47  cycles, MatrixTransposeSSE13,  testMatS 4x8
47  cycles, MatrixTransposeSSE12,  testMatY 2x4
47  cycles, MatrixTransposeMO,  testMatR 8x4
48  cycles, MatrixTransposeAW,  testMatR 8x4, Lin, Col
55  cycles, MatrixTransposeSSE11,  testMatZ 7x8
55  cycles, MatrixTransposeAW,  testMatS 4x8, Lin, Col
57  cycles, MatrixTransposeSSE10,  testMatZ 7x8
57  cycles, MatrixTransposeSSE13,  testMatR 8x4
58  cycles, MatrixTransposeSSE11,  testMatT 7x7
58  cycles, MatrixTransposeSSE10,  testMatT 7x7
59  cycles, MatrixTransposeSSE11,  testMatW 8x7
59  cycles, MatrixTransposeSSE10,  testMatW 8x7
60  cycles, MatrixTransposeSSE12,  testMatV 4x2
64  cycles, MatrixTransposeSSE13,  testMatY 2x4
71  cycles, MatrixTransposeSSE13,  testMatV 4x2
74  cycles, MatrixTransposeSSE13,  testMatZ 7x8
82  cycles, MatrixTransposeSSE13,  testMatW 8x7
84  cycles, MatrixTransposeMO,  testMatZ 7x8
85  cycles, MatrixTransposeAW,  testMatZ 7x8, Lin, Col
90  cycles, MatrixTransposeSSE12,  testMatW 8x7
90  cycles, MatrixTransposeMO,  testMatW 8x7
91  cycles, MatrixTransposeSSE13,  testMatT 7x7
91  cycles, MatrixTransposeAW,  testMatW 8x7, Lin, Col
92  cycles, MatrixTransposeSSE12,  testMatZ 7x8
104  cycles, MatrixTransposeSSE10,  testMatQ 12x12
106  cycles, MatrixTransposeSSE12,  testMatT 7x7
106  cycles, MatrixTransposeSSE11,  testMatQ 12x12
116  cycles, MatrixTransposeMO,  testMatT 7x7
117  cycles, MatrixTransposeAW,  testMatQ 12x12, Lin, Col
119  cycles, MatrixTransposeMO,  testMatQ 12x12
126  cycles, MatrixTransposeSSE12,  testMatQ 12x12
133  cycles, MatrixTransposeAW,  testMatT 7x7, Lin, Col
147  cycles, MatrixTransposeSSE13,  testMatQ 12x12
********** END **********


TestTranspose_cyclesSSE14_17
33 cycles, MatrixTransposeAW, transposeMatX

35 cycles, MatrixTransposeMO, transposeMatX

40 cycles, MatrixTransposeAW, transposeMatY

44 cycles, MatrixTransposeMO, transposeMatY

40 cycles, MatrixTransposeAW, transposeMatV

48 cycles, MatrixTransposeMO, transposeMatV

85 cycles, MatrixTransposeAW, transposeMatZ

84 cycles, MatrixTransposeMO, transposeMatZ

89 cycles, MatrixTransposeAW, transposeMatW

93 cycles, MatrixTransposeMO, transposeMatW

117 cycles, MatrixTransposeAW, transposeMatQ

116 cycles, MatrixTransposeMO, transposeMatQ

45 cycles, MatrixTransposeAW, transposeMatR

48 cycles, MatrixTransposeMO, transposeMatR

45 cycles, MatrixTransposeAW, transposeMatS

41 cycles, MatrixTransposeMO, transposeMatS

115 cycles, MatrixTransposeAW, transposeMatT

115 cycles, MatrixTransposeMO, transposeMatT

20 cycles, MatrixTransposeSSE14, transposeMatX

33 cycles, MatrixTransposeSSE14, transposeMatY

37 cycles, MatrixTransposeSSE14, transposeMatV

51 cycles, MatrixTransposeSSE14, transposeMatZ

54 cycles, MatrixTransposeSSE14, transposeMatW

100 cycles, MatrixTransposeSSE14, transposeMatQ

32 cycles, MatrixTransposeSSE14, transposeMatR

32 cycles, MatrixTransposeSSE14, transposeMatS

55 cycles, MatrixTransposeSSE14, transposeMatT

21 cycles, MatrixTransposeSSE15, transposeMatX

32 cycles, MatrixTransposeSSE15, transposeMatY

37 cycles, MatrixTransposeSSE15, transposeMatV

50 cycles, MatrixTransposeSSE15, transposeMatZ

54 cycles, MatrixTransposeSSE15, transposeMatW

97 cycles, MatrixTransposeSSE15, transposeMatQ

31 cycles, MatrixTransposeSSE15, transposeMatR

31 cycles, MatrixTransposeSSE15, transposeMatS

55 cycles, MatrixTransposeSSE15, transposeMatT

21 cycles, MatrixTransposeSSE16, transposeMatX

30 cycles, MatrixTransposeSSE16, transposeMatY

40 cycles, MatrixTransposeSSE16, transposeMatV

52 cycles, MatrixTransposeSSE16, transposeMatZ

57 cycles, MatrixTransposeSSE16, transposeMatW

106 cycles, MatrixTransposeSSE16, transposeMatQ

33 cycles, MatrixTransposeSSE16, transposeMatR

34 cycles, MatrixTransposeSSE16, transposeMatS

60 cycles, MatrixTransposeSSE16, transposeMatT

21 cycles, MatrixTransposeSSE17, transposeMatX

30 cycles, MatrixTransposeSSE17, transposeMatY

33 cycles, MatrixTransposeSSE17, transposeMatV

52 cycles, MatrixTransposeSSE17, transposeMatZ

57 cycles, MatrixTransposeSSE17, transposeMatW

108 cycles, MatrixTransposeSSE17, transposeMatQ

33 cycles, MatrixTransposeSSE17, transposeMatR

34 cycles, MatrixTransposeSSE17, transposeMatS

57 cycles, MatrixTransposeSSE17, transposeMatT

*** STOP. Press any key to show the Time Table ***

***** Time table *****

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

20  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
21  cycles, MatrixTransposeSSE16,  testMatX 4x4, Lin, Col
21  cycles, MatrixTransposeSSE15,  testMatX 4x4, Lin, Col
21  cycles, MatrixTransposeSSE17,  testMatX 4x4, Lin, Col
30  cycles, MatrixTransposeSSE16,  testMatY 2x4, Lin, Col
30  cycles, MatrixTransposeSSE17,  testMatY 2x4, Lin, Col
31  cycles, MatrixTransposeSSE15,  testMatS 4x8, Lin, Col
31  cycles, MatrixTransposeSSE15,  testMatR 8x4, Lin, Col
32  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
32  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
32  cycles, MatrixTransposeSSE15,  testMatY 2x4, Lin, Col
33  cycles, MatrixTransposeSSE17,  testMatR 8x4, Lin, Col
33  cycles, MatrixTransposeSSE17,  testMatV 4x2, Lin, Col
33  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
33  cycles, MatrixTransposeSSE16,  testMatR 8x4, Lin, Col
33  cycles, MatrixTransposeAW,  testMatX 4x4, Lin, Col
34  cycles, MatrixTransposeSSE17,  testMatS 4x8, Lin, Col
34  cycles, MatrixTransposeSSE16,  testMatS 4x8, Lin, Col
35  cycles, MatrixTransposeMO,  testMatX 4x4
37  cycles, MatrixTransposeSSE15,  testMatV 4x2, Lin, Col
37  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col
40  cycles, MatrixTransposeAW,  testMatY 2x4, Lin, Col
40  cycles, MatrixTransposeSSE16,  testMatV 4x2, Lin, Col
40  cycles, MatrixTransposeAW,  testMatV 4x2, Lin, Col
41  cycles, MatrixTransposeMO,  testMatS 4x8
44  cycles, MatrixTransposeMO,  testMatY 2x4
45  cycles, MatrixTransposeAW,  testMatR 8x4, Lin, Col
45  cycles, MatrixTransposeAW,  testMatS 4x8, Lin, Col
48  cycles, MatrixTransposeMO,  testMatV 4x2
48  cycles, MatrixTransposeMO,  testMatR 8x4
50  cycles, MatrixTransposeSSE15,  testMatZ 7x8, Lin, Col
51  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
52  cycles, MatrixTransposeSSE16,  testMatZ 7x8, Lin, Col
52  cycles, MatrixTransposeSSE17,  testMatZ 7x8, Lin, Col
54  cycles, MatrixTransposeSSE15,  testMatW 8x7, Lin, Col
54  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
55  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
55  cycles, MatrixTransposeSSE15,  testMatT 7x7, Lin, Col
57  cycles, MatrixTransposeSSE17,  testMatT 7x7, Lin, Col
57  cycles, MatrixTransposeSSE17,  testMatW 8x7, Lin, Col
57  cycles, MatrixTransposeSSE16,  testMatW 8x7, Lin, Col
60  cycles, MatrixTransposeSSE16,  testMatT 7x7, Lin, Col
84  cycles, MatrixTransposeMO,  testMatZ 7x8
85  cycles, MatrixTransposeAW,  testMatZ 7x8, Lin, Col
89  cycles, MatrixTransposeAW,  testMatW 8x7, Lin, Col
93  cycles, MatrixTransposeMO,  testMatW 8x7
97  cycles, MatrixTransposeSSE15,  testMatQ 12x12, Lin, Col
100  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
106  cycles, MatrixTransposeSSE16,  testMatQ 12x12, Lin, Col
108  cycles, MatrixTransposeSSE17,  testMatQ 12x12, Lin, Col
115  cycles, MatrixTransposeMO,  testMatT 7x7
115  cycles, MatrixTransposeAW,  testMatT 7x7, Lin, Col
116  cycles, MatrixTransposeMO,  testMatQ 12x12
117  cycles, MatrixTransposeAW,  testMatQ 12x12, Lin, Col
********** END **********

RuiLoureiro

Hi LiaoMi,
               Thanks for your work  :t
:icon14:

RuiLoureiro

Hi all,

HERE are all results so far:
note: AW procedure is used as a reference (i dont know any other).
          I am using SSE14 and SSE10 but you may do the list for all other.

Good luck
Quote
RuiLoureiro:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

85  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
186  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col      ; +101 cycles

133  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
233  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col      ; +100 cycles

135  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
256  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col      ; +121 cycles

178  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
203  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col      ; +25 cycles

185  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col
219  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col      ; +34 cycles

279  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
465  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col      ; +186 cycles

280  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
564  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col      ; +284 cycles

397  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
473  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col      ; +76 cycles

560  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
814  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col    ; +254 cycles

zedd151:
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

22  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
30  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col  ; +8 cycles

33  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
44  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col  ; +11 cycles

36  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
50  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col  ; +14 cycles

37  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
58  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col  ; +21 cycles

41  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col 
59  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col  ; +18 cycles

56  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
107  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col  ; +51 cycles

58  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
124  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col  ; +66 cycles

66  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
156  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col  ; +90 cycles

125  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
132  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col ; +7 cycles

LiaoMi:
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

20  cycles, MatrixTransposeSSE14,  testMatX 4x4, Lin, Col
33  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col      ; +13 cycles

32  cycles, MatrixTransposeSSE14,  testMatS 4x8, Lin, Col
45  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col      ; +13 cycles

32  cycles, MatrixTransposeSSE14,  testMatR 8x4, Lin, Col
45  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col      ; +13 cycles

33  cycles, MatrixTransposeSSE14,  testMatY 2x4, Lin, Col
40  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col      ; +7 cycles

37  cycles, MatrixTransposeSSE14,  testMatV 4x2, Lin, Col
40  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col      ; +3 cycles

51  cycles, MatrixTransposeSSE14,  testMatZ 7x8, Lin, Col
85  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col      ; +34 cycles

54  cycles, MatrixTransposeSSE14,  testMatW 8x7, Lin, Col
89  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col      ; +35 cycles

55  cycles, MatrixTransposeSSE14,  testMatT 7x7, Lin, Col
115  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col      ; +60 cycles

100  cycles, MatrixTransposeSSE14,  testMatQ 12x12, Lin, Col
117  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col    ; +17 cycles

RuiLoureiro:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)

85  cycles, MatrixTransposeSSE10,  testMatX 4x4
187  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col      ; +102 cycles

136  cycles, MatrixTransposeSSE10,  testMatS 4x8
245  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col      ; +109 cycles

137  cycles, MatrixTransposeSSE10,  testMatR 8x4
257  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col      ; +120 cycles

184  cycles, MatrixTransposeSSE10,  testMatY 2x4
204  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col      ; +20   cycles

196  cycles, MatrixTransposeSSE10,  testMatV 4x2
219  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col      ; +23   cycles

280  cycles, MatrixTransposeSSE10,  testMatW 8x7
477  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col      ; +197 cycles

387  cycles, MatrixTransposeSSE10,  testMatZ 7x8
611  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col      ; +224 cycles

478  cycles, MatrixTransposeSSE10,  testMatT 7x7
552  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col      ; +74 cycles

554  cycles, MatrixTransposeSSE10,  testMatQ 12x12
794  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col    ; +240 cycles

LiaoMi:
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

25  cycles, MatrixTransposeSSE10,  testMatX 4x4
34  cycles, MatrixTransposeAW,     testMatX 4x4, Lin, Col      ; +9 cycles

34  cycles, MatrixTransposeSSE10,  testMatS 4x8
55  cycles, MatrixTransposeAW,     testMatS 4x8, Lin, Col      ; +21 cycles

35  cycles, MatrixTransposeSSE10,  testMatY 2x4
40  cycles, MatrixTransposeAW,     testMatY 2x4, Lin, Col      ; +5 cycles

36  cycles, MatrixTransposeSSE10,  testMatR 8x4
48  cycles, MatrixTransposeAW,     testMatR 8x4, Lin, Col      ; +12 cycles

40  cycles, MatrixTransposeSSE10,  testMatV 4x2
42  cycles, MatrixTransposeAW,     testMatV 4x2, Lin, Col      ; +2 cycles

57  cycles, MatrixTransposeSSE10,  testMatZ 7x8
85  cycles, MatrixTransposeAW,     testMatZ 7x8, Lin, Col      ; +28 cycles

58  cycles, MatrixTransposeSSE10,  testMatT 7x7
133  cycles, MatrixTransposeAW,     testMatT 7x7, Lin, Col      ; +75 cycles

59  cycles, MatrixTransposeSSE10,  testMatW 8x7
91  cycles, MatrixTransposeAW,     testMatW 8x7, Lin, Col      ; +32 cycles

104  cycles, MatrixTransposeSSE10,  testMatQ 12x12
117  cycles, MatrixTransposeAW,     testMatQ 12x12, Lin, Col    ; +13 cycles

aw27

Quote
Hi aw27,
About what you say above i have a question to you: Is it what you usually do ?
I never do it, i never dit it. No here, no in any part of the world. I am here, in this forum since 2007.
Have you any doubt ?

Of course, I have all doubts about you. You are using my nick and code in tests for which you don't even supply the source code. You look a bit snicky, so what are you hiding or what are you trying to prove?
Are you trying to prove that you are a smart little guy or do you want to compete with me?

jj2007

Hi Rui,
The last results look really fast :t
Can't test on this dumbphone, though.
And of course, almost everybody here has full confidence in you, don't worry ;-)

LiaoMi

Quote from: aw27 on May 16, 2018, 02:54:47 PM
Quote
Hi aw27,
About what you say above i have a question to you: Is it what you usually do ?
I never do it, i never dit it. No here, no in any part of the world. I am here, in this forum since 2007.
Have you any doubt ?

Of course, I have all doubts about you. You are using my nick and code in tests for which you don't even supply the source code. You look a bit snicky, so what are you hiding or what are you trying to prove?
Are you trying to prove that you are a smart little guy or do you want to compete with me?

:biggrin: https://www.youtube.com/watch?v=9LZ35Ar3r2k

RuiLoureiro

Hi all,
       When we are developing a new algorithm to solve some problem we need to study the problem first. In the first post i showed algorithms to transpose 1 dword by 1 dword at a time. Now the problem  is to transpose blocks of 4 lines x 4 columns at a time using SSE instructions.
       When we write the code to implement the algorithm, we need to do some tests to confirm it follows the algorithm correctly. After this we need to do a lot of tests to confirm if it is doing what
       it should do. Here we may find out that the algorithm does not do some cases as it should do.
       So we should not show it before it is completely tested.

       AFAIK, it seems to me incorrect to use some code/proc written by a member X without any identification. The reason why i used some identification is this. There is no other reason behind it. And nothing is against anyone. When anyone show some test results they are only the results of that set of tests. No more than this.

aw27,       
«Of course, I have all doubts about you...»
       About what you say above, it seems that the only thing you want is to "see" the algorithm that
       i didnt show for now. But i suppose that you know very well that anyone may solve any problem following some/many different ways. Your procedure doenst follow the algorithm that is behind the SSE procedures that i am writing and testing. I want to say also that i dont need to use
what i am writing. I do it to pass the time and because i like to study...

Cheers

Siekmanski

Hi Rui,

Ever considered to use the video card for Matrix Transpose calculations?
The results are very fast for any size of Matrices and data types.
1 restriction, it can't be done in a console app.
Creative coders use backward thinking techniques as a strategy.

RuiLoureiro

Quote from: Siekmanski on May 19, 2018, 12:23:41 AM
Hi Rui,

Ever considered to use the video card for Matrix Transpose calculations?
The results are very fast for any size of Matrices and data types.
1 restriction, it can't be done in a console app.
Hi Siekmanski,
              No. I use only Matrix Transpose calculations in my TheCalculator but it is REAL10
              and all matrices up to 20x20 only or 20x21 i am not sure now (
time is not a problem here).
              Let me say that i think that you use it in the video card for Matrix Transpose calculations.
              So you have your own algorithm to do it. I am saying it because i read some topics
              where you gave answers about this issue.
              Thanks for all  :t
Cheers