Print Page - Unaligned memory copy test piece.

Title: Unaligned memory copy test piece.
Post by: hutch-- on December 06, 2021, 08:34:55 PM

I have a task where the memory copy cannot be controlled to SSE alignment.

The example has two memory copy techniques, the old rep movsb method as reference and the following for unaligned SSE.

movdqu xmm0, [rcx+r10]
movntdq [rdx+r10], xmm0

I have stabilised the timings by running a dummy run before the timed run and on my old Haswell the unaligned SSE version runs in about 4.7 seconds for 50 gig copy. As reference the rep movsb version runs in about 6.7 seconds for the same 50 gig.

I have not run the two tests together so that one does not effect the other, if you have time, run the SSE version then change the commented out rep movsb version.

Title: Re: Unaligned memory copy test piece.
Post by: mineiro on December 06, 2021, 10:43:44 PM

This is the result in my machine:
I don't have all that include files and tools, if you can release only executable file of rep movsb I can run it here.

Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
wine umc.exe
--------------------------------
50 gig copy in 3338 milliseconds
--------------------------------

Title: Re: Unaligned memory copy test piece.
Post by: HSE on December 06, 2021, 11:02:03 PM

i3-10100 not so fast :biggrin:

xmmcopyu:
--------------------------------
50 gig copy in 7531 milliseconds
--------------------------------

ByteCopy:
--------------------------------
50 gig copy in 10563 milliseconds
--------------------------------

Title: Re: Unaligned memory copy test piece.
Post by: hutch-- on December 06, 2021, 11:51:12 PM

Thanks guys, all of these results are very useful to me.

Title: Re: Unaligned memory copy test piece.
Post by: hutch-- on December 07, 2021, 02:36:50 AM

I added the rep movsd version as a zip file.

--------------------------------
50 gig copy in 6625 milliseconds rep movsb
--------------------------------
--------------------------------
50 gig copy in 4578 milliseconds movdqu xmm0, [rcx+r10] : movntdq [rdx+r10], xmm0
--------------------------------

What I am chasing is the ratio difference as the SSE version will be used to copy memory that has originated from a MMF written to by a 32 bit app.

Title: Re: Unaligned memory copy test piece.
Post by: avcaballero on December 07, 2021, 03:11:45 AM

umcmovsb:
--------------------------------
50 gig copy in 6015 milliseconds
--------------------------------
Press any key to continue...

umc:
--------------------------------
50 gig copy in 4719 milliseconds
--------------------------------
Press any key to continue...

Title: Re: Unaligned memory copy test piece.
Post by: HSE on December 07, 2021, 03:16:47 AM

umcmovsb:
--------------------------------
50 gig copy in 9547 milliseconds
--------------------------------
Press any key to continue...

Title: Re: Unaligned memory copy test piece.
Post by: Greenhorn on December 07, 2021, 03:26:43 AM

AMD Ryzen 3700X

umcmovsb:
--------------------------------
50 gig copy in 5522 milliseconds
--------------------------------

umc:
--------------------------------
50 gig copy in 2902 milliseconds
--------------------------------

Title: Re: Unaligned memory copy test piece.
Post by: Siekmanski on December 07, 2021, 03:32:51 AM

AMD Ryzen 9 5950X 16-Core Processor

umc:
--------------------------------
50 gig copy in 2781 milliseconds
--------------------------------

umcmovsb:
--------------------------------
50 gig copy in 2563 milliseconds
--------------------------------

Title: Re: Unaligned memory copy test piece.
Post by: Greenhorn on December 07, 2021, 03:50:06 AM

Quote from: Siekmanski on December 07, 2021, 03:32:51 AM
AMD Ryzen 9 5950X 16-Core Processor

umc:
--------------------------------
50 gig copy in 2781 milliseconds
--------------------------------

umcmovsb:
--------------------------------
50 gig copy in 2563 milliseconds
--------------------------------

Well, the result for movsb is surprising. :thumbsup:

Title: Re: Unaligned memory copy test piece.
Post by: mineiro on December 07, 2021, 03:53:27 AM

Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
wine umc.exe
--------------------------------
50 gig copy in 3384 milliseconds
--------------------------------
wine umcmovsb.exe
--------------------------------
50 gig copy in 3450 milliseconds
--------------------------------

Title: Re: Unaligned memory copy test piece.
Post by: jj2007 on December 07, 2021, 03:58:20 AM

The MASM Forum

General => The Laboratory => Topic started by: hutch-- on December 06, 2021, 08:34:55 PM