:biggrin: Hi all Here we have 3 versions to multiply any matrix MxN by any matrix NxK using SSE instructions and 1 that uses FPU.Quote
VERSION 6:
PROCEDURE: MultiplyMxN_NxK_v6SSE
FILE: multiplySSEMxN_MxK_v6.inc
MACROS: multiplyMxN_MxK_v6A.mac <<-- to solve all cases A
multiplyMxN_MxK_v6B.mac <<-- to solve all cases B
basicmulMxN_MxK_v6.mac
VERSION 5:
PROCEDURE: MultiplyMxN_NxK_v5SSE
FILE: multiplySSEMxN_MxK_v5.inc
MACROS: multiplyMxN_MxK_v5A.mac
multiplyMxN_MxK_v5B.mac
basicmulMxN_MxK_v5.mac
VERSION 4:
PROCEDURE: MultiplyMxN_NxK_v4SSE
FILE: multiplySSEMxN_MxK_v4.inc
MACROS: multiplyMxN_MxK_v4A.mac
multiplyMxN_MxK_v4B.mac
basicmulMxN_MxK_v4.mac
VERSION FPU:
PROCEDURE: MultiplyMxN_NxK_v1FPU
FILE: multiplyFPUMxN_MxK_v1.inc
DOCUMENTATION: TEXT_ABOUT_MULTIPLY_SSE_REAL4.txt
MATRIX DEFINITION: We must define any matrixX as this
ALIGN 16
dd ?
dd ?
dd N ; <<--- number of columns
dd M ; <<--- number of lines
matrixX dd (M*N) dup (?)
If we want to alloc memory, see the file AllocMemory.inc
VERIFY SSE PROCEDURES: Use multiplyMxN_MxK_v6.exe/asm, multiplyMxN_MxK_v5.exe/asm
or multiplyMxN_MxK_v4.exe/asm
Please test it in your CPU (i5/i7/AMD).
Use ExecuteTestmultiplyMxN_MxK_SSEv6.bat and post the file ResultsmultiplyMxN_MxK_v6.txt.
particular note: i started this work taking an example given by Siekmanski :t
Good luckRuiLoureiroEDIT: replace the FPU procedure ...
Hi Rui,
Here are the results from my machine.
Hi RuiLoureiro,
my test results ..
Results for my Core i5.
AMD A6-3500
Hi all
Interesting results to multiply a vector 1x6 by 6x6 by lines v2 -Multiply1x6Real4By6x6v2note: multiply by lines means this: 1x6 * (6x6)^t
Thank you all :t
Quote
LiaoMi: :t :t
AMD Ryzen 7 1700 Eight-Core Processor (SSE4)
24 cycles, Multiply1x6Real4By6x6v2, MatrixX1x6 * MatrixY6x6 <<<<He he.. even better !!!
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)
27 cycles, Multiply1x6Real4By6x6v2, MatrixX1x6 * MatrixY6x6
Siekmanski:
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
42 cycles, Multiply1x6Real4By6x6v2, MatrixX1x6 * MatrixY6x6
Jochen:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
46 cycles, Multiply1x6Real4By6x6v2, MatrixX1x6 * MatrixY6x6
HSE:
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)
59 cycles, Multiply1x6Real4By6x6v2, MatrixX1x6 * MatrixY6x6
AMD Ryzen 7 1700 Eight-Core Processor (SSE4)