Hi all, i wrote 8 versions of a little procedure to transpose some matrices NxN ...SSE38XA, ...SSE38XB, ...SSE38XC, ..., ...SSE38XI ...SSE38YA, ...SSE38YB, ...SSE38YC, ..., ...SSE38YI The differences between them are identified in the test results. I want to know what is the best solution: push esi, edi - pop edi, esi ? Or mov LocalVarX, esi + mov LocalVarY, edi - mov esi, LocalVarX mov edi, LocalVarY ? Or mov LocalVarX, esi + mov edx, edi - mov esi, LocalVarX and mov edi, edx ? If you have a i5/i7/AMD CPU, would you mind to post your results, please ?Thanks :t
Note: The prog starts to test all cases up to 120x120EDIT: the problem here is the way we save the registers not the proc to transpose a matrix.
The proc is only the way to waste time. See reply #9
My sampleQuote
***** Time table - LoopCount =10 000 *****
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
...
1233385 cycles, MatrixTransposeSSE38YB, TestMatWA 500x500 -LOCAL var+var- ups B
1235908 cycles, MatrixTransposeSSE38XB, TestMatWA 500x500 -push esi+edi- ups B
1282101 cycles, MatrixTransposeSSE38YI, TestMatWA 500x500 -LOCAL var,edx- ups I
1282982 cycles, MatrixTransposeSSE38XI, TestMatWA 500x500 -push esi,edx- ups I
1383583 cycles, MatrixTransposeSSE38XA, TestMatWA 500x500 -push esi+edi- ups A
1395110 cycles, MatrixTransposeSSE38YA, TestMatWA 500x500 -LOCAL var+var- ups A
1448037 cycles, MatrixTransposeSSE38XC, TestMatWA 500x500 -push esi+edi- ups C
1629964 cycles, MatrixTransposeSSE38YC, TestMatWA 500x500 -LOCAL var+var- ups C
1263363 cycles, MatrixTransposeSSE38XB, TestMatWW 512x512 -push esi+edi- ups B
1268120 cycles, MatrixTransposeSSE38YB, TestMatWW 512x512 -LOCAL var+var- ups B
1548259 cycles, MatrixTransposeSSE38YA, TestMatWW 512x512 -LOCAL var+var- ups A
1549101 cycles, MatrixTransposeSSE38XA, TestMatWW 512x512 -push esi+edi- ups A
1566116 cycles, MatrixTransposeSSE38XC, TestMatWW 512x512 -push esi+edi- ups C
1590263 cycles, MatrixTransposeSSE38YC, TestMatWW 512x512 -LOCAL var+var- ups C
1392685 cycles, MatrixTransposeSSE38YI, TestMatWC 512x512 -LOCAL var,edx- ups I
1400638 cycles, MatrixTransposeSSE38XI, TestMatWW 512x512 -push esi,edx- ups I
1284184 cycles, MatrixTransposeSSE38XB, TestMatWB 504x504 -push esi+edi- ups B
1319583 cycles, MatrixTransposeSSE38YI, TestMatWB 504x504 -LOCAL var,edx- ups I
1320214 cycles, MatrixTransposeSSE38YB, TestMatWB 504x504 -LOCAL var+var- ups B
1324752 cycles, MatrixTransposeSSE38XI, TestMatWB 504x504 -push esi,edx- ups I
1416786 cycles, MatrixTransposeSSE38YC, TestMatWB 504x504 -LOCAL var+var- ups C
1428926 cycles, MatrixTransposeSSE38XA, TestMatWB 504x504 -push esi+edi- ups A
1435277 cycles, MatrixTransposeSSE38YA, TestMatWB 504x504 -LOCAL var+var- ups A
1439214 cycles, MatrixTransposeSSE38XC, TestMatWB 504x504 -push esi+edi- ups C
1333058 cycles, MatrixTransposeSSE38XB, TestMatWC 508x508 -push esi+edi- ups B
1337661 cycles, MatrixTransposeSSE38YB, TestMatWC 508x508 -LOCAL var+var- ups B
1354388 cycles, MatrixTransposeSSE38XI, TestMatWC 508x508 -push esi,edx- ups I
1358452 cycles, MatrixTransposeSSE38YI, TestMatWZ 508x508 -LOCAL var,edx- ups I
1459252 cycles, MatrixTransposeSSE38YA, TestMatWC 508x508 -LOCAL var+var- ups A
1468728 cycles, MatrixTransposeSSE38XA, TestMatWC 508x508 -push esi+edi- ups A
1483957 cycles, MatrixTransposeSSE38XC, TestMatWC 508x508 -push esi+edi- ups C
1530930 cycles, MatrixTransposeSSE38YC, TestMatWC 508x508 -LOCAL var+var- ups C
:tIntel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
26 cycles, MatrixTransposeSSE38XB, TestMatXA 4x4 -push esi+edi- ups B
27 cycles, MatrixTransposeSSE38YA, TestMatXA 4x4 -LOCAL var+var- ups A
27 cycles, MatrixTransposeSSE38XI, TestMatXA 4x4 -push esi,edx- ups I
27 cycles, MatrixTransposeSSE38YI, TestMatXA 4x4 -LOCAL var,edx- ups I
28 cycles, MatrixTransposeSSE38XC, TestMatXA 4x4 -push esi+edi- ups C
28 cycles, MatrixTransposeSSE38YB, TestMatXA 4x4 -LOCAL var+var- ups B
29 cycles, MatrixTransposeSSE38YC, TestMatXA 4x4 -LOCAL var+var- ups C
57 cycles, MatrixTransposeSSE38XB, TestMatXB 8x8 -push esi+edi- ups B
60 cycles, MatrixTransposeSSE38YB, TestMatXB 8x8 -LOCAL var+var- ups B
63 cycles, MatrixTransposeSSE38YA, TestMatXB 8x8 -LOCAL var+var- ups A
65 cycles, MatrixTransposeSSE38YC, TestMatXB 8x8 -LOCAL var+var- ups C
67 cycles, MatrixTransposeSSE38XC, TestMatXB 8x8 -push esi+edi- ups C
69 cycles, MatrixTransposeSSE38YI, TestMatXB 8x8 -LOCAL var,edx- ups I
70 cycles, MatrixTransposeSSE38XI, TestMatXB 8x8 -push esi,edx- ups I
103 cycles, MatrixTransposeSSE38XA, TestMatXA 4x4 -push esi+edi- ups A
120 cycles, MatrixTransposeSSE38XB, TestMatXC 12x12 -push esi+edi- ups B
121 cycles, MatrixTransposeSSE38XI, TestMatXC 12x12 -push esi,edx- ups I
123 cycles, MatrixTransposeSSE38XC, TestMatXC 12x12 -push esi+edi- ups C
124 cycles, MatrixTransposeSSE38YC, TestMatXC 12x12 -LOCAL var+var- ups C
124 cycles, MatrixTransposeSSE38YA, TestMatXC 12x12 -LOCAL var+var- ups A
125 cycles, MatrixTransposeSSE38YB, TestMatXC 12x12 -LOCAL var+var- ups B
156 cycles, MatrixTransposeSSE38YI, TestMatXC 12x12 -LOCAL var,edx- ups I
217 cycles, MatrixTransposeSSE38XA, TestMatXB 8x8 -push esi+edi- ups A
315 cycles, MatrixTransposeSSE38XB, TestMatXD 20x20 -push esi+edi- ups B
337 cycles, MatrixTransposeSSE38YB, TestMatXD 20x20 -LOCAL var+var- ups B
343 cycles, MatrixTransposeSSE38YA, TestMatXD 20x20 -LOCAL var+var- ups A
345 cycles, MatrixTransposeSSE38YC, TestMatXD 20x20 -LOCAL var+var- ups C
347 cycles, MatrixTransposeSSE38XC, TestMatXD 20x20 -push esi+edi- ups C
412 cycles, MatrixTransposeSSE38XA, TestMatXC 12x12 -push esi+edi- ups A
566 cycles, MatrixTransposeSSE38XI, TestMatXD 20x20 -push esi,edx- ups I
573 cycles, MatrixTransposeSSE38YI, TestMatXD 20x20 -LOCAL var,edx- ups I
1163 cycles, MatrixTransposeSSE38XA, TestMatXD 20x20 -push esi+edi- ups A
9056 cycles, MatrixTransposeSSE38XB, TestMatYA 100x100 -push esi+edi- ups B
9156 cycles, MatrixTransposeSSE38YB, TestMatYA 100x100 -LOCAL var+var- ups B
9370 cycles, MatrixTransposeSSE38XI, TestMatYA 100x100 -push esi,edx- ups I
9393 cycles, MatrixTransposeSSE38YI, TestMatYA 100x100 -LOCAL var,edx- ups I
9748 cycles, MatrixTransposeSSE38XC, TestMatYA 100x100 -push esi+edi- ups C
9775 cycles, MatrixTransposeSSE38YA, TestMatYA 100x100 -LOCAL var+var- ups A
9950 cycles, MatrixTransposeSSE38YC, TestMatYA 100x100 -LOCAL var+var- ups C
15266 cycles, MatrixTransposeSSE38YB, TestMatYY 120x120 -LOCAL var+var- ups B
15385 cycles, MatrixTransposeSSE38XB, TestMatYY 120x120 -push esi+edi- ups B
15690 cycles, MatrixTransposeSSE38XI, TestMatYY 120x120 -push esi,edx- ups I
15841 cycles, MatrixTransposeSSE38XB, TestMatYC 132x132 -push esi+edi- ups B
15863 cycles, MatrixTransposeSSE38YB, TestMatYC 132x132 -LOCAL var+var- ups B
15960 cycles, MatrixTransposeSSE38YI, TestMatYY 120x120 -LOCAL var,edx- ups I
16103 cycles, MatrixTransposeSSE38YA, TestMatYY 120x120 -LOCAL var+var- ups A
16118 cycles, MatrixTransposeSSE38YC, TestMatYY 120x120 -LOCAL var+var- ups C
16134 cycles, MatrixTransposeSSE38XC, TestMatYY 120x120 -push esi+edi- ups C
16416 cycles, MatrixTransposeSSE38XA, TestMatYY 120x120 -push esi+edi- ups A
16883 cycles, MatrixTransposeSSE38XA, TestMatYA 100x100 -push esi+edi- ups A
16941 cycles, MatrixTransposeSSE38XA, TestMatYC 132x132 -push esi+edi- ups A
16979 cycles, MatrixTransposeSSE38XC, TestMatYC 132x132 -push esi+edi- ups C
17038 cycles, MatrixTransposeSSE38YA, TestMatYC 132x132 -LOCAL var+var- ups A
17368 cycles, MatrixTransposeSSE38XI, TestMatYC 132x132 -push esi,edx- ups I
17421 cycles, MatrixTransposeSSE38YI, TestMatYC 132x132 -LOCAL var,edx- ups I
17567 cycles, MatrixTransposeSSE38YC, TestMatYC 132x132 -LOCAL var+var- ups C
24206 cycles, MatrixTransposeSSE38YB, TestMatYB 128x128 -LOCAL var+var- ups B
24368 cycles, MatrixTransposeSSE38XB, TestMatYB 128x128 -push esi+edi- ups B
24619 cycles, MatrixTransposeSSE38XI, TestMatYB 128x128 -push esi,edx- ups I
24639 cycles, MatrixTransposeSSE38YI, TestMatYB 128x128 -LOCAL var,edx- ups I
25893 cycles, MatrixTransposeSSE38XA, TestMatYB 128x128 -push esi+edi- ups A
26104 cycles, MatrixTransposeSSE38XC, TestMatYB 128x128 -push esi+edi- ups C
26120 cycles, MatrixTransposeSSE38YA, TestMatYB 128x128 -LOCAL var+var- ups A
26638 cycles, MatrixTransposeSSE38YC, TestMatYB 128x128 -LOCAL var+var- ups C
118597 cycles, MatrixTransposeSSE38XI, TestMatZB 260x260 -push esi,edx- ups I
118983 cycles, MatrixTransposeSSE38YI, TestMatZB 260x260 -LOCAL var,edx- ups I
119003 cycles, MatrixTransposeSSE38YB, TestMatZB 260x260 -LOCAL var+var- ups B
119393 cycles, MatrixTransposeSSE38XB, TestMatZB 260x260 -push esi+edi- ups B
122371 cycles, MatrixTransposeSSE38YC, TestMatZB 260x260 -LOCAL var+var- ups C
122542 cycles, MatrixTransposeSSE38XC, TestMatZB 260x260 -push esi+edi- ups C
122872 cycles, MatrixTransposeSSE38YA, TestMatZB 260x260 -LOCAL var+var- ups A
123103 cycles, MatrixTransposeSSE38XA, TestMatZB 260x260 -push esi+edi- ups A
126953 cycles, MatrixTransposeSSE38YI, TestMatZC 268x268 -LOCAL var,edx- ups I
126974 cycles, MatrixTransposeSSE38XB, TestMatZC 268x268 -push esi+edi- ups B
126977 cycles, MatrixTransposeSSE38XI, TestMatZC 268x268 -push esi,edx- ups I
127359 cycles, MatrixTransposeSSE38YB, TestMatZC 268x268 -LOCAL var+var- ups B
130655 cycles, MatrixTransposeSSE38XB, TestMatZZ 264x264 -push esi+edi- ups B
130727 cycles, MatrixTransposeSSE38YB, TestMatZZ 264x264 -LOCAL var+var- ups B
131699 cycles, MatrixTransposeSSE38XI, TestMatZZ 264x264 -push esi,edx- ups I
131864 cycles, MatrixTransposeSSE38YC, TestMatZC 268x268 -LOCAL var+var- ups C
131980 cycles, MatrixTransposeSSE38YI, TestMatZZ 264x264 -LOCAL var,edx- ups I
132072 cycles, MatrixTransposeSSE38XC, TestMatZC 268x268 -push esi+edi- ups C
132254 cycles, MatrixTransposeSSE38YA, TestMatZC 268x268 -LOCAL var+var- ups A
132316 cycles, MatrixTransposeSSE38XA, TestMatZC 268x268 -push esi+edi- ups A
134342 cycles, MatrixTransposeSSE38XA, TestMatZZ 264x264 -push esi+edi- ups A
134931 cycles, MatrixTransposeSSE38XC, TestMatZZ 264x264 -push esi+edi- ups C
134959 cycles, MatrixTransposeSSE38YC, TestMatZZ 264x264 -LOCAL var+var- ups C
135762 cycles, MatrixTransposeSSE38YA, TestMatZZ 264x264 -LOCAL var+var- ups A
140127 cycles, MatrixTransposeSSE38YB, TestMatZA 256x256 -LOCAL var+var- ups B
140296 cycles, MatrixTransposeSSE38XB, TestMatZA 256x256 -push esi+edi- ups B
141620 cycles, MatrixTransposeSSE38YI, TestMatZA 256x256 -LOCAL var,edx- ups I
141762 cycles, MatrixTransposeSSE38XI, TestMatZA 256x256 -push esi,edx- ups I
169636 cycles, MatrixTransposeSSE38XC, TestMatZA 256x256 -push esi+edi- ups C
169722 cycles, MatrixTransposeSSE38YA, TestMatZA 256x256 -LOCAL var+var- ups A
169767 cycles, MatrixTransposeSSE38YC, TestMatZA 256x256 -LOCAL var+var- ups C
170001 cycles, MatrixTransposeSSE38XA, TestMatZA 256x256 -push esi+edi- ups A
378139 cycles, MatrixTransposeSSE38XB, TestMatWA 500x500 -push esi+edi- ups B
381761 cycles, MatrixTransposeSSE38YI, TestMatWA 500x500 -LOCAL var,edx- ups I
384240 cycles, MatrixTransposeSSE38YB, TestMatWA 500x500 -LOCAL var+var- ups B
386579 cycles, MatrixTransposeSSE38XB, TestMatWB 504x504 -push esi+edi- ups B
387787 cycles, MatrixTransposeSSE38YB, TestMatWB 504x504 -LOCAL var+var- ups B
389193 cycles, MatrixTransposeSSE38XI, TestMatWA 500x500 -push esi,edx- ups I
389826 cycles, MatrixTransposeSSE38YI, TestMatWB 504x504 -LOCAL var,edx- ups I
390552 cycles, MatrixTransposeSSE38XI, TestMatWB 504x504 -push esi,edx- ups I
393505 cycles, MatrixTransposeSSE38YB, TestMatWC 508x508 -LOCAL var+var- ups B
393692 cycles, MatrixTransposeSSE38XB, TestMatWC 508x508 -push esi+edi- ups B
397996 cycles, MatrixTransposeSSE38YI, TestMatWZ 508x508 -LOCAL var,edx- ups I
399456 cycles, MatrixTransposeSSE38XI, TestMatWC 508x508 -push esi,edx- ups I
480503 cycles, MatrixTransposeSSE38YC, TestMatWB 504x504 -LOCAL var+var- ups C
480817 cycles, MatrixTransposeSSE38YA, TestMatWB 504x504 -LOCAL var+var- ups A
480847 cycles, MatrixTransposeSSE38XC, TestMatWB 504x504 -push esi+edi- ups C
480883 cycles, MatrixTransposeSSE38XA, TestMatWB 504x504 -push esi+edi- ups A
513839 cycles, MatrixTransposeSSE38XA, TestMatWA 500x500 -push esi+edi- ups A
513988 cycles, MatrixTransposeSSE38YA, TestMatWA 500x500 -LOCAL var+var- ups A
514119 cycles, MatrixTransposeSSE38YC, TestMatWA 500x500 -LOCAL var+var- ups C
514560 cycles, MatrixTransposeSSE38XC, TestMatWA 500x500 -push esi+edi- ups C
608451 cycles, MatrixTransposeSSE38YC, TestMatWC 508x508 -LOCAL var+var- ups C
609226 cycles, MatrixTransposeSSE38XC, TestMatWC 508x508 -push esi+edi- ups C
610219 cycles, MatrixTransposeSSE38YA, TestMatWC 508x508 -LOCAL var+var- ups A
612253 cycles, MatrixTransposeSSE38XA, TestMatWC 508x508 -push esi+edi- ups A
673824 cycles, MatrixTransposeSSE38XB, TestMatWW 512x512 -push esi+edi- ups B
677774 cycles, MatrixTransposeSSE38YB, TestMatWW 512x512 -LOCAL var+var- ups B
686959 cycles, MatrixTransposeSSE38XI, TestMatWW 512x512 -push esi,edx- ups I
687090 cycles, MatrixTransposeSSE38YI, TestMatWC 512x512 -LOCAL var,edx- ups I
929165 cycles, MatrixTransposeSSE38YC, TestMatWW 512x512 -LOCAL var+var- ups C
930455 cycles, MatrixTransposeSSE38XA, TestMatWW 512x512 -push esi+edi- ups A
931261 cycles, MatrixTransposeSSE38XC, TestMatWW 512x512 -push esi+edi- ups C
943437 cycles, MatrixTransposeSSE38YA, TestMatWW 512x512 -LOCAL var+var- ups A
Okay, I'll bite. :P
results in attached zip file
cpu speed 1.60 Ghz
What I find odd is that when doing cycle counts, the cycle counts should remain very similar. My cpu speed is a little less than half of the average cpu e.g., ~3.40 Ghz. I know that timings would be doubled or better for me, but cycle counts shouldn't be affected by cpu speed.
Of course I could be mistaken for the reason, could be a cache size issue... :icon_confused:
:t
Because this information is only useful (IMO) if we see it sorted by matrix type,here are the results (only up to 20000 characteres). (hutch, my apologize for taking this space if it is the case)Quote
Jochen:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
26 cycles, MatrixTransposeSSE38XB, TestMatXA 4x4 -push esi+edi- ups B
27 cycles, MatrixTransposeSSE38YA, TestMatXA 4x4 -LOCAL var+var- ups A
27 cycles, MatrixTransposeSSE38XI, TestMatXA 4x4 -push esi,edx- ups I
27 cycles, MatrixTransposeSSE38YI, TestMatXA 4x4 -LOCAL var,edx- ups I
28 cycles, MatrixTransposeSSE38XC, TestMatXA 4x4 -push esi+edi- ups C
28 cycles, MatrixTransposeSSE38YB, TestMatXA 4x4 -LOCAL var+var- ups B
29 cycles, MatrixTransposeSSE38YC, TestMatXA 4x4 -LOCAL var+var- ups C
103 cycles, MatrixTransposeSSE38XA, TestMatXA 4x4 -push esi+edi- ups A
57 cycles, MatrixTransposeSSE38XB, TestMatXB 8x8 -push esi+edi- ups B
60 cycles, MatrixTransposeSSE38YB, TestMatXB 8x8 -LOCAL var+var- ups B
63 cycles, MatrixTransposeSSE38YA, TestMatXB 8x8 -LOCAL var+var- ups A
65 cycles, MatrixTransposeSSE38YC, TestMatXB 8x8 -LOCAL var+var- ups C
67 cycles, MatrixTransposeSSE38XC, TestMatXB 8x8 -push esi+edi- ups C
69 cycles, MatrixTransposeSSE38YI, TestMatXB 8x8 -LOCAL var,edx- ups I
70 cycles, MatrixTransposeSSE38XI, TestMatXB 8x8 -push esi,edx- ups I
217 cycles, MatrixTransposeSSE38XA, TestMatXB 8x8 -push esi+edi- ups A
120 cycles, MatrixTransposeSSE38XB, TestMatXC 12x12 -push esi+edi- ups B
121 cycles, MatrixTransposeSSE38XI, TestMatXC 12x12 -push esi,edx- ups I
123 cycles, MatrixTransposeSSE38XC, TestMatXC 12x12 -push esi+edi- ups C
124 cycles, MatrixTransposeSSE38YC, TestMatXC 12x12 -LOCAL var+var- ups C
124 cycles, MatrixTransposeSSE38YA, TestMatXC 12x12 -LOCAL var+var- ups A
125 cycles, MatrixTransposeSSE38YB, TestMatXC 12x12 -LOCAL var+var- ups B
156 cycles, MatrixTransposeSSE38YI, TestMatXC 12x12 -LOCAL var,edx- ups I
412 cycles, MatrixTransposeSSE38XA, TestMatXC 12x12 -push esi+edi- ups A
315 cycles, MatrixTransposeSSE38XB, TestMatXD 20x20 -push esi+edi- ups B
337 cycles, MatrixTransposeSSE38YB, TestMatXD 20x20 -LOCAL var+var- ups B
343 cycles, MatrixTransposeSSE38YA, TestMatXD 20x20 -LOCAL var+var- ups A
345 cycles, MatrixTransposeSSE38YC, TestMatXD 20x20 -LOCAL var+var- ups C
347 cycles, MatrixTransposeSSE38XC, TestMatXD 20x20 -push esi+edi- ups C
566 cycles, MatrixTransposeSSE38XI, TestMatXD 20x20 -push esi,edx- ups I
573 cycles, MatrixTransposeSSE38YI, TestMatXD 20x20 -LOCAL var,edx- ups I
1163 cycles, MatrixTransposeSSE38XA, TestMatXD 20x20 -push esi+edi- ups A
9056 cycles, MatrixTransposeSSE38XB, TestMatYA 100x100 -push esi+edi- ups B
9156 cycles, MatrixTransposeSSE38YB, TestMatYA 100x100 -LOCAL var+var- ups B
9370 cycles, MatrixTransposeSSE38XI, TestMatYA 100x100 -push esi,edx- ups I
9393 cycles, MatrixTransposeSSE38YI, TestMatYA 100x100 -LOCAL var,edx- ups I
9748 cycles, MatrixTransposeSSE38XC, TestMatYA 100x100 -push esi+edi- ups C
9775 cycles, MatrixTransposeSSE38YA, TestMatYA 100x100 -LOCAL var+var- ups A
9950 cycles, MatrixTransposeSSE38YC, TestMatYA 100x100 -LOCAL var+var- ups C
16883 cycles, MatrixTransposeSSE38XA, TestMatYA 100x100 -push esi+edi- ups A
15266 cycles, MatrixTransposeSSE38YB, TestMatYY 120x120 -LOCAL var+var- ups B
15385 cycles, MatrixTransposeSSE38XB, TestMatYY 120x120 -push esi+edi- ups B
15690 cycles, MatrixTransposeSSE38XI, TestMatYY 120x120 -push esi,edx- ups I
15960 cycles, MatrixTransposeSSE38YI, TestMatYY 120x120 -LOCAL var,edx- ups I
16103 cycles, MatrixTransposeSSE38YA, TestMatYY 120x120 -LOCAL var+var- ups A
16118 cycles, MatrixTransposeSSE38YC, TestMatYY 120x120 -LOCAL var+var- ups C
16134 cycles, MatrixTransposeSSE38XC, TestMatYY 120x120 -push esi+edi- ups C
16416 cycles, MatrixTransposeSSE38XA, TestMatYY 120x120 -push esi+edi- ups A
15841 cycles, MatrixTransposeSSE38XB, TestMatYC 132x132 -push esi+edi- ups B
15863 cycles, MatrixTransposeSSE38YB, TestMatYC 132x132 -LOCAL var+var- ups B
16941 cycles, MatrixTransposeSSE38XA, TestMatYC 132x132 -push esi+edi- ups A
16979 cycles, MatrixTransposeSSE38XC, TestMatYC 132x132 -push esi+edi- ups C
17038 cycles, MatrixTransposeSSE38YA, TestMatYC 132x132 -LOCAL var+var- ups A
17368 cycles, MatrixTransposeSSE38XI, TestMatYC 132x132 -push esi,edx- ups I
17421 cycles, MatrixTransposeSSE38YI, TestMatYC 132x132 -LOCAL var,edx- ups I
17567 cycles, MatrixTransposeSSE38YC, TestMatYC 132x132 -LOCAL var+var- ups C
24206 cycles, MatrixTransposeSSE38YB, TestMatYB 128x128 -LOCAL var+var- ups B
24368 cycles, MatrixTransposeSSE38XB, TestMatYB 128x128 -push esi+edi- ups B
24619 cycles, MatrixTransposeSSE38XI, TestMatYB 128x128 -push esi,edx- ups I
24639 cycles, MatrixTransposeSSE38YI, TestMatYB 128x128 -LOCAL var,edx- ups I
25893 cycles, MatrixTransposeSSE38XA, TestMatYB 128x128 -push esi+edi- ups A
26104 cycles, MatrixTransposeSSE38XC, TestMatYB 128x128 -push esi+edi- ups C
26120 cycles, MatrixTransposeSSE38YA, TestMatYB 128x128 -LOCAL var+var- ups A
26638 cycles, MatrixTransposeSSE38YC, TestMatYB 128x128 -LOCAL var+var- ups C
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
118597 cycles, MatrixTransposeSSE38XI, TestMatZB 260x260 -push esi,edx- ups I
118983 cycles, MatrixTransposeSSE38YI, TestMatZB 260x260 -LOCAL var,edx- ups I
119003 cycles, MatrixTransposeSSE38YB, TestMatZB 260x260 -LOCAL var+var- ups B
119393 cycles, MatrixTransposeSSE38XB, TestMatZB 260x260 -push esi+edi- ups B
122371 cycles, MatrixTransposeSSE38YC, TestMatZB 260x260 -LOCAL var+var- ups C
122542 cycles, MatrixTransposeSSE38XC, TestMatZB 260x260 -push esi+edi- ups C
122872 cycles, MatrixTransposeSSE38YA, TestMatZB 260x260 -LOCAL var+var- ups A
123103 cycles, MatrixTransposeSSE38XA, TestMatZB 260x260 -push esi+edi- ups A
126953 cycles, MatrixTransposeSSE38YI, TestMatZC 268x268 -LOCAL var,edx- ups I
126974 cycles, MatrixTransposeSSE38XB, TestMatZC 268x268 -push esi+edi- ups B
126977 cycles, MatrixTransposeSSE38XI, TestMatZC 268x268 -push esi,edx- ups I
127359 cycles, MatrixTransposeSSE38YB, TestMatZC 268x268 -LOCAL var+var- ups B
131864 cycles, MatrixTransposeSSE38YC, TestMatZC 268x268 -LOCAL var+var- ups C
132072 cycles, MatrixTransposeSSE38XC, TestMatZC 268x268 -push esi+edi- ups C
132254 cycles, MatrixTransposeSSE38YA, TestMatZC 268x268 -LOCAL var+var- ups A
132316 cycles, MatrixTransposeSSE38XA, TestMatZC 268x268 -push esi+edi- ups A
130655 cycles, MatrixTransposeSSE38XB, TestMatZZ 264x264 -push esi+edi- ups B
130727 cycles, MatrixTransposeSSE38YB, TestMatZZ 264x264 -LOCAL var+var- ups B
131699 cycles, MatrixTransposeSSE38XI, TestMatZZ 264x264 -push esi,edx- ups I
131980 cycles, MatrixTransposeSSE38YI, TestMatZZ 264x264 -LOCAL var,edx- ups I
134342 cycles, MatrixTransposeSSE38XA, TestMatZZ 264x264 -push esi+edi- ups A
134931 cycles, MatrixTransposeSSE38XC, TestMatZZ 264x264 -push esi+edi- ups C
134959 cycles, MatrixTransposeSSE38YC, TestMatZZ 264x264 -LOCAL var+var- ups C
135762 cycles, MatrixTransposeSSE38YA, TestMatZZ 264x264 -LOCAL var+var- ups A
140127 cycles, MatrixTransposeSSE38YB, TestMatZA 256x256 -LOCAL var+var- ups B
140296 cycles, MatrixTransposeSSE38XB, TestMatZA 256x256 -push esi+edi- ups B
141620 cycles, MatrixTransposeSSE38YI, TestMatZA 256x256 -LOCAL var,edx- ups I
141762 cycles, MatrixTransposeSSE38XI, TestMatZA 256x256 -push esi,edx- ups I
169636 cycles, MatrixTransposeSSE38XC, TestMatZA 256x256 -push esi+edi- ups C
169722 cycles, MatrixTransposeSSE38YA, TestMatZA 256x256 -LOCAL var+var- ups A
169767 cycles, MatrixTransposeSSE38YC, TestMatZA 256x256 -LOCAL var+var- ups C
170001 cycles, MatrixTransposeSSE38XA, TestMatZA 256x256 -push esi+edi- ups A
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
378139 cycles, MatrixTransposeSSE38XB, TestMatWA 500x500 -push esi+edi- ups B
381761 cycles, MatrixTransposeSSE38YI, TestMatWA 500x500 -LOCAL var,edx- ups I
384240 cycles, MatrixTransposeSSE38YB, TestMatWA 500x500 -LOCAL var+var- ups B
389193 cycles, MatrixTransposeSSE38XI, TestMatWA 500x500 -push esi,edx- ups I
513839 cycles, MatrixTransposeSSE38XA, TestMatWA 500x500 -push esi+edi- ups A
513988 cycles, MatrixTransposeSSE38YA, TestMatWA 500x500 -LOCAL var+var- ups A
514119 cycles, MatrixTransposeSSE38YC, TestMatWA 500x500 -LOCAL var+var- ups C
514560 cycles, MatrixTransposeSSE38XC, TestMatWA 500x500 -push esi+edi- ups C
386579 cycles, MatrixTransposeSSE38XB, TestMatWB 504x504 -push esi+edi- ups B
387787 cycles, MatrixTransposeSSE38YB, TestMatWB 504x504 -LOCAL var+var- ups B
389826 cycles, MatrixTransposeSSE38YI, TestMatWB 504x504 -LOCAL var,edx- ups I
390552 cycles, MatrixTransposeSSE38XI, TestMatWB 504x504 -push esi,edx- ups I
480503 cycles, MatrixTransposeSSE38YC, TestMatWB 504x504 -LOCAL var+var- ups C
480817 cycles, MatrixTransposeSSE38YA, TestMatWB 504x504 -LOCAL var+var- ups A
480847 cycles, MatrixTransposeSSE38XC, TestMatWB 504x504 -push esi+edi- ups C
480883 cycles, MatrixTransposeSSE38XA, TestMatWB 504x504 -push esi+edi- ups A
393505 cycles, MatrixTransposeSSE38YB, TestMatWC 508x508 -LOCAL var+var- ups B
393692 cycles, MatrixTransposeSSE38XB, TestMatWC 508x508 -push esi+edi- ups B
397996 cycles, MatrixTransposeSSE38YI, TestMatWZ 508x508 -LOCAL var,edx- ups I
399456 cycles, MatrixTransposeSSE38XI, TestMatWC 508x508 -push esi,edx- ups I
608451 cycles, MatrixTransposeSSE38YC, TestMatWC 508x508 -LOCAL var+var- ups C
609226 cycles, MatrixTransposeSSE38XC, TestMatWC 508x508 -push esi+edi- ups C
610219 cycles, MatrixTransposeSSE38YA, TestMatWC 508x508 -LOCAL var+var- ups A
612253 cycles, MatrixTransposeSSE38XA, TestMatWC 508x508 -push esi+edi- ups A
673824 cycles, MatrixTransposeSSE38XB, TestMatWW 512x512 -push esi+edi- ups B
677774 cycles, MatrixTransposeSSE38YB, TestMatWW 512x512 -LOCAL var+var- ups B
686959 cycles, MatrixTransposeSSE38XI, TestMatWW 512x512 -push esi,edx- ups I
687090 cycles, MatrixTransposeSSE38YI, TestMatWC 512x512 -LOCAL var,edx- ups I
929165 cycles, MatrixTransposeSSE38YC, TestMatWW 512x512 -LOCAL var+var- ups C
930455 cycles, MatrixTransposeSSE38XA, TestMatWW 512x512 -push esi+edi- ups A
931261 cycles, MatrixTransposeSSE38XC, TestMatWW 512x512 -push esi+edi- ups C
943437 cycles, MatrixTransposeSSE38YA, TestMatWW 512x512 -LOCAL var+var- ups A
Hello sir, I can't get all results because I suppose some wine configuration while under console mode. This is what I get, I changed font size to 8 to get more results. I can't redirect results to a text file because program ask for user input (hit a key). I'm attaching results because I'm receiving a 20000 letters as maximum on this board.
***** Time table - LoopCount =10 000 *****
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
18 cycles, MatrixTransposeSSE38XI, TestMatXA 4x4 -push esi,edx- ups I
19 cycles, MatrixTransposeSSE38YI, TestMatXA 4x4 -LOCAL var,edx- ups I
20 cycles, MatrixTransposeSSE38YA, TestMatXA 4x4 -LOCAL var+var- ups A
22 cycles, MatrixTransposeSSE38XC, TestMatXA 4x4 -push esi+edi- ups C
29 cycles, MatrixTransposeSSE38YC, TestMatXA 4x4 -LOCAL var+var- ups C
30 cycles, MatrixTransposeSSE38YB, TestMatXA 4x4 -LOCAL var+var- ups B
54 cycles, MatrixTransposeSSE38XC, TestMatXB 8x8 -push esi+edi- ups C
56 cycles, MatrixTransposeSSE38XB, TestMatXA 4x4 -push esi+edi- ups B
62 cycles, MatrixTransposeSSE38YC, TestMatXB 8x8 -LOCAL var+var- ups C
62 cycles, MatrixTransposeSSE38YB, TestMatXB 8x8 -LOCAL var+var- ups B
63 cycles, MatrixTransposeSSE38YI, TestMatXB 8x8 -LOCAL var,edx- ups I
63 cycles, MatrixTransposeSSE38XI, TestMatXB 8x8 -push esi,edx- ups I
85 cycles, MatrixTransposeSSE38XA, TestMatXA 4x4 -push esi+edi- ups A
115 cycles, MatrixTransposeSSE38XC, TestMatXC 12x12 -push esi+edi- ups C
116 cycles, MatrixTransposeSSE38YA, TestMatXC 12x12 -LOCAL var+var- ups A
117 cycles, MatrixTransposeSSE38YB, TestMatXC 12x12 -LOCAL var+var- ups B
123 cycles, MatrixTransposeSSE38YC, TestMatXC 12x12 -LOCAL var+var- ups C
127 cycles, MatrixTransposeSSE38XI, TestMatXC 12x12 -push esi,edx- ups I
136 cycles, MatrixTransposeSSE38YI, TestMatXC 12x12 -LOCAL var,edx- ups I
145 cycles, MatrixTransposeSSE38XA, TestMatXB 8x8 -push esi+edi- ups A
150 cycles, MatrixTransposeSSE38XB, TestMatXB 8x8 -push esi+edi- ups B
178 cycles, MatrixTransposeSSE38YA, TestMatXB 8x8 -LOCAL var+var- ups A
260 cycles, MatrixTransposeSSE38YB, TestMatXD 20x20 -LOCAL var+var- ups B
273 cycles, MatrixTransposeSSE38XC, TestMatXD 20x20 -push esi+edi- ups C
277 cycles, MatrixTransposeSSE38YA, TestMatXD 20x20 -LOCAL var+var- ups A
281 cycles, MatrixTransposeSSE38YC, TestMatXD 20x20 -LOCAL var+var- ups C
304 cycles, MatrixTransposeSSE38YI, TestMatXD 20x20 -LOCAL var,edx- ups I
308 cycles, MatrixTransposeSSE38XI, TestMatXD 20x20 -push esi,edx- ups I
308 cycles, MatrixTransposeSSE38XA, TestMatXC 12x12 -push esi+edi- ups A
313 cycles, MatrixTransposeSSE38XB, TestMatXC 12x12 -push esi+edi- ups B
616 cycles, MatrixTransposeSSE38XA, TestMatXD 20x20 -push esi+edi- ups A
720 cycles, MatrixTransposeSSE38XB, TestMatXD 20x20 -push esi+edi- ups B
10932 cycles, MatrixTransposeSSE38YI, TestMatYA 100x100 -LOCAL var,edx- ups I
10935 cycles, MatrixTransposeSSE38XI, TestMatYA 100x100 -push esi,edx- ups I
11083 cycles, MatrixTransposeSSE38YB, TestMatYA 100x100 -LOCAL var+var- ups B
11809 cycles, MatrixTransposeSSE38YA, TestMatYA 100x100 -LOCAL var+var- ups A
11810 cycles, MatrixTransposeSSE38YC, TestMatYA 100x100 -LOCAL var+var- ups C
11884 cycles, MatrixTransposeSSE38XC, TestMatYA 100x100 -push esi+edi- ups C
17345 cycles, MatrixTransposeSSE38YB, TestMatYY 120x120 -LOCAL var+var- ups B
17347 cycles, MatrixTransposeSSE38XB, TestMatYY 120x120 -push esi+edi- ups B
17626 cycles, MatrixTransposeSSE38XA, TestMatYA 100x100 -push esi+edi- ups A
17904 cycles, MatrixTransposeSSE38YI, TestMatYY 120x120 -LOCAL var,edx- ups I
18008 cycles, MatrixTransposeSSE38XB, TestMatYA 100x100 -push esi+edi- ups B
18012 cycles, MatrixTransposeSSE38XI, TestMatYY 120x120 -push esi,edx- ups I
18024 cycles, MatrixTransposeSSE38YA, TestMatYY 120x120 -LOCAL var+var- ups A
18097 cycles, MatrixTransposeSSE38XA, TestMatYY 120x120 -push esi+edi- ups A
18119 cycles, MatrixTransposeSSE38XC, TestMatYY 120x120 -push esi+edi- ups C
18153 cycles, MatrixTransposeSSE38YC, TestMatYY 120x120 -LOCAL var+var- ups C
19222 cycles, MatrixTransposeSSE38XI, TestMatYC 132x132 -push esi,edx- ups I
19262 cycles, MatrixTransposeSSE38YB, TestMatYC 132x132 -LOCAL var+var- ups B
19291 cycles, MatrixTransposeSSE38YI, TestMatYC 132x132 -LOCAL var,edx- ups I
19311 cycles, MatrixTransposeSSE38XB, TestMatYC 132x132 -push esi+edi- ups B
20528 cycles, MatrixTransposeSSE38XC, TestMatYC 132x132 -push esi+edi- ups C
20538 cycles, MatrixTransposeSSE38YC, TestMatYC 132x132 -LOCAL var+var- ups C
20586 cycles, MatrixTransposeSSE38YA, TestMatYC 132x132 -LOCAL var+var- ups A
20604 cycles, MatrixTransposeSSE38XA, TestMatYC 132x132 -push esi+edi- ups A
28122 cycles, MatrixTransposeSSE38YB, TestMatYB 128x128 -LOCAL var+var- ups B
28304 cycles, MatrixTransposeSSE38XB, TestMatYB 128x128 -push esi+edi- ups B
28641 cycles, MatrixTransposeSSE38XI, TestMatYB 128x128 -push esi,edx- ups I
28750 cycles, MatrixTransposeSSE38YI, TestMatYB 128x128 -LOCAL var,edx- ups I
31052 cycles, MatrixTransposeSSE38YC, TestMatYB 128x128 -LOCAL var+var- ups C
31103 cycles, MatrixTransposeSSE38YA, TestMatYB 128x128 -LOCAL var+var- ups A
31142 cycles, MatrixTransposeSSE38XA, TestMatYB 128x128 -push esi+edi- ups A
31158 cycles, MatrixTransposeSSE38XC, TestMatYB 128x128 -push esi+edi- ups C
171916 cycles, MatrixTransposeSSE38XB, TestMatZA 256x256 -push esi+edi- ups B
172156 cycles, MatrixTransposeSSE38YB, TestMatZA 256x256 -LOCAL var+var- ups B
173787 cycles, MatrixTransposeSSE38XI, TestMatZA 256x256 -push esi,edx- ups I
173918 cycles, MatrixTransposeSSE38YI, TestMatZA 256x256 -LOCAL var,edx- ups I
188429 cycles, MatrixTransposeSSE38YB, TestMatZB 260x260 -LOCAL var+var- ups B
188530 cycles, MatrixTransposeSSE38XB, TestMatZB 260x260 -push esi+edi- ups B
188702 cycles, MatrixTransposeSSE38XI, TestMatZB 260x260 -push esi,edx- ups I
188740 cycles, MatrixTransposeSSE38YI, TestMatZB 260x260 -LOCAL var,edx- ups I
193331 cycles, MatrixTransposeSSE38XC, TestMatZB 260x260 -push esi+edi- ups C
193569 cycles, MatrixTransposeSSE38YC, TestMatZB 260x260 -LOCAL var+var- ups C
193777 cycles, MatrixTransposeSSE38XA, TestMatZB 260x260 -push esi+edi- ups A
194257 cycles, MatrixTransposeSSE38YA, TestMatZB 260x260 -LOCAL var+var- ups A
196732 cycles, MatrixTransposeSSE38XB, TestMatZZ 264x264 -push esi+edi- ups B
196883 cycles, MatrixTransposeSSE38YB, TestMatZZ 264x264 -LOCAL var+var- ups B
200864 cycles, MatrixTransposeSSE38YI, TestMatZZ 264x264 -LOCAL var,edx- ups I
200891 cycles, MatrixTransposeSSE38XI, TestMatZZ 264x264 -push esi,edx- ups I
201147 cycles, MatrixTransposeSSE38XB, TestMatZC 268x268 -push esi+edi- ups B
201184 cycles, MatrixTransposeSSE38YB, TestMatZC 268x268 -LOCAL var+var- ups B
201439 cycles, MatrixTransposeSSE38XI, TestMatZC 268x268 -push esi,edx- ups I
201640 cycles, MatrixTransposeSSE38YI, TestMatZC 268x268 -LOCAL var,edx- ups I
203106 cycles, MatrixTransposeSSE38YA, TestMatZZ 264x264 -LOCAL var+var- ups A
203159 cycles, MatrixTransposeSSE38XA, TestMatZZ 264x264 -push esi+edi- ups A
203764 cycles, MatrixTransposeSSE38YC, TestMatZZ 264x264 -LOCAL var+var- ups C
203870 cycles, MatrixTransposeSSE38XC, TestMatZZ 264x264 -push esi+edi- ups C
208185 cycles, MatrixTransposeSSE38YC, TestMatZC 268x268 -LOCAL var+var- ups C
208212 cycles, MatrixTransposeSSE38XC, TestMatZC 268x268 -push esi+edi- ups C
208660 cycles, MatrixTransposeSSE38XA, TestMatZC 268x268 -push esi+edi- ups A
209059 cycles, MatrixTransposeSSE38YA, TestMatZC 268x268 -LOCAL var+var- ups A
236710 cycles, MatrixTransposeSSE38XA, TestMatZA 256x256 -push esi+edi- ups A
236904 cycles, MatrixTransposeSSE38YC, TestMatZA 256x256 -LOCAL var+var- ups C
236960 cycles, MatrixTransposeSSE38YA, TestMatZA 256x256 -LOCAL var+var- ups A
236968 cycles, MatrixTransposeSSE38XC, TestMatZA 256x256 -push esi+edi- ups C
471095 cycles, MatrixTransposeSSE38YB, TestMatWB 504x504 -LOCAL var+var- ups B
471488 cycles, MatrixTransposeSSE38XB, TestMatWB 504x504 -push esi+edi- ups B
478281 cycles, MatrixTransposeSSE38YI, TestMatWB 504x504 -LOCAL var,edx- ups I
478743 cycles, MatrixTransposeSSE38XI, TestMatWB 504x504 -push esi,edx- ups I
504164 cycles, MatrixTransposeSSE38YB, TestMatWC 508x508 -LOCAL var+var- ups B
504648 cycles, MatrixTransposeSSE38XB, TestMatWA 500x500 -push esi+edi- ups B
505222 cycles, MatrixTransposeSSE38XB, TestMatWC 508x508 -push esi+edi- ups B
505627 cycles, MatrixTransposeSSE38YB, TestMatWA 500x500 -LOCAL var+var- ups B
509811 cycles, MatrixTransposeSSE38XI, TestMatWA 500x500 -push esi,edx- ups I
510504 cycles, MatrixTransposeSSE38YI, TestMatWA 500x500 -LOCAL var,edx- ups I
512446 cycles, MatrixTransposeSSE38XI, TestMatWC 508x508 -push esi,edx- ups I
512484 cycles, MatrixTransposeSSE38YI, TestMatWZ 508x508 -LOCAL var,edx- ups I
681134 cycles, MatrixTransposeSSE38XA, TestMatWB 504x504 -push esi+edi- ups A
681323 cycles, MatrixTransposeSSE38XC, TestMatWB 504x504 -push esi+edi- ups C
681479 cycles, MatrixTransposeSSE38YC, TestMatWB 504x504 -LOCAL var+var- ups C
682633 cycles, MatrixTransposeSSE38YA, TestMatWB 504x504 -LOCAL var+var- ups A
792368 cycles, MatrixTransposeSSE38YC, TestMatWA 500x500 -LOCAL var+var- ups C
792495 cycles, MatrixTransposeSSE38XA, TestMatWA 500x500 -push esi+edi- ups A
793080 cycles, MatrixTransposeSSE38XC, TestMatWA 500x500 -push esi+edi- ups C
793390 cycles, MatrixTransposeSSE38YA, TestMatWA 500x500 -LOCAL var+var- ups A
829152 cycles, MatrixTransposeSSE38XB, TestMatWW 512x512 -push esi+edi- ups B
829328 cycles, MatrixTransposeSSE38YB, TestMatWW 512x512 -LOCAL var+var- ups B
838242 cycles, MatrixTransposeSSE38XI, TestMatWW 512x512 -push esi,edx- ups I
838321 cycles, MatrixTransposeSSE38YI, TestMatWC 512x512 -LOCAL var,edx- ups I
979318 cycles, MatrixTransposeSSE38XA, TestMatWC 508x508 -push esi+edi- ups A
980942 cycles, MatrixTransposeSSE38XC, TestMatWC 508x508 -push esi+edi- ups C
981519 cycles, MatrixTransposeSSE38YC, TestMatWC 508x508 -LOCAL var+var- ups C
981731 cycles, MatrixTransposeSSE38YA, TestMatWC 508x508 -LOCAL var+var- ups A
1318720 cycles, MatrixTransposeSSE38XC, TestMatWW 512x512 -push esi+edi- ups C
1319166 cycles, MatrixTransposeSSE38YC, TestMatWW 512x512 -LOCAL var+var- ups C
1321178 cycles, MatrixTransposeSSE38XA, TestMatWW 512x512 -push esi+edi- ups A
1321444 cycles, MatrixTransposeSSE38YA, TestMatWW 512x512 -LOCAL var+var- ups A
********** END **********
Rui,
Could you spare us these massive blocks of tables, put them in a ZIP file.
Quote from: hutch-- on June 04, 2018, 04:46:16 AM
Rui,
Could you spare us these massive blocks of tables, put them in a ZIP file.
Hutch,
It is here, sorted by matrix type (
4x4 little time ... 512x512 consume much more time).
Have you any comment about it? What's wrong ?
From one basic procedure i wrote 8 different cases. As i have 4 procs, we have 32 different procedures to do the task. But until now i wrote many other different cases for the same thing.
I had a look at the zip file and the results look fine and it save dumping large amounts of data like that directly into the forum. The problem with dumping large tables into the forum is it makes the topics unreadable. Also note that the Campus is for people who are learning assembler, it is not the place for advanced mathematics which should be posted in a more specialised forum.
Quote from: hutch-- on June 04, 2018, 11:08:12 AM
I had a look at the zip file and the results look fine and it save dumping large amounts of data like that directly into the forum. The problem with dumping large tables into the forum is it makes the topics unreadable. Also note that the Campus is for people who are learning assembler, it is not the place for advanced mathematics which should be posted in a more specialised forum.
The main problem i posted here has nothing to do with advanced mathematics as you "are seeing"
simply because the main problem here is not to transpose a matrix. It was used as time-consuming code only.
I try to explain with more details but i remember that the title is: "solutions to save 1/2 CPU registers inside a procedure" (we may define local variables and we may use it).I have a set of code inside a procedure where i defined some set of LOCAL variables, the same in all. The code where i want to test the solutions to save/restore registers may be well described like this:Quote
; esi and edi are pointers already defined
; start here and do someting - 5 instructions.
loopA:
; here we need to save esi and edi pointers <<<<<---- this is the first problem HERE
loopB: ; <<< macro starts here
; here we need to save esi <<<<<---- this is the second problem HERE
; here i put a time-consuming code <<<<--- this is not the problem
; here we need to restore esi
; do something
; while ecx is not the end go to loopB ; <<< macro ends here
; here we need to restore edi and esi
; do something
; while ebx is not the end goto loopA
From this, i wrote 4 cases XA, XB, XC, XI where the macros are A,B,C,I and another 4 cases YA, YB, YC, YI where the macros are A,B,C,I also. Cases X: the solution is push esi+push edi but case I is push esi + mov edx, ediCases Y: the solutin is mov LOCAL, esi+mov LOCAL,edi but case I is mov LOCAL,edi+ mov edx, ediIt is much more a problem of assembly than math itself because we may replace the time consuming code for something else and we have the same main problem. Because it is needed to consume little time up to as much time as possible to test the main problem, using operations with matrices is a good idea or it seems to me good to test what i want to test (IMO). Now, you realize what the problem is ? Is it what you call an advanced mathematics problem ? If it is, where Hutch ? Regards :biggrin:
Hi all :biggrin:
Thanks all for your work in posting your results (Jochen, zeed151,Siekmanski,).
You have all results sorted by matrix type so you may see what is the better solution.
In many cases it seems to be push esi+push edi. For me it was a surprise or some surprise.
But it is what i do usually.
:t
mineiro,
your results are not good.
Thanks also
Rui,
Quote
Now, you realize what the problem is ? Is it what you call an advanced mathematics problem ?
If it is, where Hutch ?
Be warned that if you start giving me lip you will get R_SOULED out the door faster than Halley's comet. I am not a free kick for anyone who thinks they can make a pest of themselves. I have asked you to stop dumping mountains of data in the forum, especially the Campus as it make the forum unreadable for other members.
I have had to in the past move much of what you have posted due to the mess and I have even deleted some of it as you refuse to co-operate when asked. No matter what I will solve any problem you may initiate so do us all a favour and treat other people as you would want yourself to be treated.
Hi Rui,
Your setup is really a bit "bloated" - can't you organise the results in a way that it produces only a few averages? What I usually do to get reliable timings is cut off the worst 10% (slow due to interrupts...), and take the average of the 90%.
Hi Jochen,
I will try to think about it in the next problem.
This problem seems to be done.
Thanks :t
EDIT: Jochen, I am using a little prog ShowCpu in the file Timing.inc to identify the CPU
but zedd151 say that it doesnt identify the frequency of his CPU (AMD ?). If i am right
that prog was written by you. Have you another recent version for these new cases ?
Timing.inc is elsewhere here i think in the folder Converter8 at least.
Hutch,
I started this topic writing "... a little procedure to transpose some matrix NxN..."
which is not the main problem here. So you have reason about what you said.
My apologise.
Regards
Quote from: RuiLoureiro on June 05, 2018, 12:56:26 AMzedd151 say that it doesnt identify the frequency of his CPU
The PrintCpu macro shows the brand string supplied by the CPU through cpuid. The real frequency may differ from that (although it shouldn't...).
Quote from: RuiLoureiro
but zedd151 say that it doesnt identify the frequency of his CPU (AMD ?)
Yup, if you notice what it does say about my cpu, notice how long the description is...
Quote
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4)
A little "over-the-top" description if I say so myself.
They could have done without the "5 COMPUTE CORES 2C+3G" part. My cpu has 2 cores, no matter how they try to inflate the description to make it sound better, even magical. 8)
Quote from: jj2007
The PrintCpu macro shows the brand string supplied by the CPU through cpuid. The real frequency may differ from that (although it shouldn't...).
The specs list my cpu speed as 1.60 Ghz. It fluctuates a little, but nothing to write home about. It IS a little netbook after all.
Hi zedd151,
As you are seeing i am trying to solve the CPU identification problem and i am sure that Jochen may solve it, he is the author of the that code and i trust him.
Thanks for your work.
Quote from: RuiLoureiro
As you are seeing i am trying to solve the CPU identification problem and i am sure that Jochen may solve it
Quote from: JJ
The PrintCpu macro shows the brand string supplied by the CPU through cupid
Therefore he may not be able to solve my case. The string for my cpu apparently is longer than anticipated by cupid. Still there may be a work around for these special cases. I'm always a special case it seems. :P
This is what is displayed in 'System' info...
I know there's an option in the bios configurations regarding a max number used with the cpuid instruction, but i don't know now (i don't want to reboot now :biggrin:) if it has something to do with the "big string identification" issue... :idea:
:biggrin:
zedd151,
If you say "he may not be able to solve my case" the case seems to be solved.
What can we do ?
Regards
:icon14:
:biggrin:
Rui
LOL
:P
Next time I'll trade this one in for one with a cpu with a short name. (i286) lol
Quote from: zedd151 on June 05, 2018, 07:38:22 AMThe string for my cpu apparently is longer than anticipated by cupid.
The PrintCpu (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1385) macro has a 124 byte buffer, your string is only about 50 bytes long. I'd love to see in a debugger what's actually happening there...
From the manuals:
The brand string is architecturally defined to be 48 byte long: the first 47 bytes contain ASCII characters and the 48th byte is defined to be null (0).
Aha! Maybe.
"AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4)" is 54 bytes long.
Possibly the cpu speed is overwritten by "(SSE4)" ???
anyway I'll run the testbed through xdbg to see what's happening there, when I get a few minutes to spare.
***** a little while later ******
I used a testbed with smaller code and found that "AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G " is written first. Did not find 1.60 Ghz get written anywhere. The (SSE4) was appended later in the console window. the reason I did not use the testbed from this thread is because it takes an eternity to run in a debugger.
JJ could you write a small piece that only displays my cpu info, as in the testbed in this thread?
Would make debugging a lot easier with less code to wade through. I will post those results from x32dbg as well as screen shots of the debugging.
Quote from: zedd151 on June 05, 2018, 09:05:38 AMJJ could you write a small piece that only displays my cpu info, as in the testbed in this thread?
I suspect that your CPU doesn't follow the specs saying the brand string is up to 48 bytes long, including the zero delimiter. In fact, the PrintCpu (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1385) macro is designed to yield 48 characters. The attached variant allows up to 112 bytes, but I wonder if it works with old CPUs.
Here is the relevant part:
sub esp, 124 ; create a buffer for the brand string
mov edi, esp ; point edi to it
xor ebp, ebp
.Repeat
lea eax, [ebp+80000002h]
db 0Fh, 0A2h ; cpuid 80000002h-80000004h
stosd
mov eax, ebx
stosd
mov eax, ecx
stosd
mov eax, edx
stosd
inc ebp
.Until ebp>=3+pcup
pcup is for testing your case. The first string (pcup=0) is obtained with 3x4xstosd, i.e. 48 bytes, the second (pcup=2) with 5*4*stosd, 80 bytes. We'll see how that changes the output. Note that when I wrote "from the manual" above, it was probably an Intel manual 8)
From the AMD manual
QuoteThe three extended functions from Fn8000_0002 to Fn8000_0004 are initialized to and return a null terminated
ASCII string up to 48 characters in length corresponding to the processor name. (The 48 character maximum
includes the null character.)
QuoteThe processor name string must be programmed by the BIOS during system initialization.
Anyway, how can you get more than 48 bytes from 3 CPUID calls returning 16 bytes each?
Thanks jochen.
Here is my results with the test piece you posted:
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G
Then the program hangs until a key is pressed (enter) then disappears.
Is that what was supposed to happen, or was there more if I waited?
*---------------------------
I will look into this a little more later. I am getting ready for work at the moment.
I'll give another crack to try and determine if there's anything I can do on my end.
ps. sorry for the hijack, Rui
Quote from: zedd151 on June 05, 2018, 09:05:38 AM
"AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G (SSE4)" is 54 bytes long.
Possibly the cpu speed is overwritten by "(SSE4)" ???
OK, now I understand the problem: Your brand string is actually 45 bytes long. You didn't find the frequency, and the reason is simply that your cpu doesn't list it in the brand string. Some do, others don't 8)
The SSExx stuff is added later and has nothing to do with the brand string.
@sinsi: See above,
.Until ebp>=3+pcup was just a test if the AMD has additional bits & pieces. It doesn't.
I just looked at the source snippet
Inkey Str$("\n\nYour puter has run %3f hours since the last boot, give it a break!", Timer/3600000
LMAO! I can't wait for that at the present time. :biggrin:
Quote from: jj2007
... your cpu doesn't list it in the brand string. Some do, others don't 8)
Wtf. I'm always coming up short in one way or another. :P
Well for the price I paid for this humble box, I can't complain too much. 8)
I have a notion to write a strongly worded letter to Dell Computers. :badgrin:
edit to add:
Quote from: jochen
The SSExx stuff is added later and has nothing to do with the brand string.
as verified by my initial debugging
OK, mystery solved, and apologies to Rui for hijacking his thread :icon14:
Quote from: jj2007 on June 05, 2018, 06:03:31 PM
OK, mystery solved, and apologies to Rui for hijacking his thread :icon14:
Same goes for me. :P
zedd151,
At the end my friend Jochen solved your problem as i told you. :biggrin:
Quote from: RuiLoureiro on June 05, 2018, 10:50:39 PM
zedd151,
At the end my friend Jochen solved your problem as i told you. :biggrin:
He didn't fix it though. :P
edit to reflect Rui's edit :icon_cool:
Quote from: zedd151 on June 05, 2018, 11:10:12 PM
Quote from: RuiLoureiro on June 05, 2018, 10:50:39 PM
zedd151,
At the end my friend Jochen solved your problem as i told you. :biggrin:
He didn't fix it though. :P
But you get the solution, when it is not possible it is solved