The MASM Forum

General => The Laboratory => Topic started by: hutch-- on July 08, 2016, 09:23:16 PM

Title: Win64 memory copy benchmark.
Post by: hutch-- on July 08, 2016, 09:23:16 PM
Atached is a benchmark for 3 64 bit memory copy algos. One using REP MOVSQ, the other two are respectively an aligned and unaligned XMM version. The two XMM versions do not have a tail trimmer for uneven byte counts as this is not what I am testing. It is a normal window app with the tests being run from the menu.
Title: Re: Win64 memory copy benchmark.
Post by: mineiro on July 09, 2016, 06:14:28 AM
I was not able to run your example, so much memory need, my machine have only 1GB, so to avoid swap I have tried assemble source code but no lucky, received errors:
mrm macro gives: error A2108: use of register assumed to ERROR (when filling wc structure)
movq mm7, lParam and movq lParam, mm7 gives: error A2222: x87 and MMX instructions disallowed; legacy FP state not saved in Win64
Used Microsoft (R) Macro Assembler (AMD64) Version 8.00.40310.39

I have done some tests, moving source/destination to rsp and using pop/push but rep movsq is more quickly on my machine. My test was done on ideal situation, so source,destination and sizeof are all multiple of 16.
I think the only change need is about a head and tail on rep movsq to deal with unaligned data.

Used serialized rdtsc.


rep movsq
time: 0 1219200426
time: 0 592314840
time: 0 577661148
time: 0 574660377
time: 1 592705800
time: 0 593109693
mov rsp,rsi   shr rcx,4   @again: pop qword ptr [rdi]     pop qword ptr [rdi+8]    add rdi,2*8   sub rcx,1   jnz @again
time: 0 732068685
time: 0 761895693
time: 0 744006060
time: 0 748483425
time: 1 755325054
time: 0 755273367

When I have time I'll check xmm version.
Title: Re: Win64 memory copy benchmark.
Post by: jj2007 on July 09, 2016, 06:44:07 AM
There is an old memcpy thread somewhere, here is a bit about movlps etc (http://www.masmforum.com/board/index.php?topic=11567.msg86891#msg86891).
Title: Re: Win64 memory copy benchmark.
Post by: Siekmanski on July 10, 2016, 03:12:23 PM
A thread on Code Project....
Apex memmove - the fastest memcpy/memmove on x86/x64 ... EVER, written in C

http://www.codeproject.com/Articles/1110153/Apex-memmove-the-fastest-memcpy-memmove-on-x-x-EVE
Title: Re: Win64 memory copy benchmark.
Post by: jj2007 on July 10, 2016, 03:36:46 PM
Quote8 ) An optimized assembler version of these algorithms WILL be faster (I know because I have built assembler versions)

So that EVER refers to non-assembler code?  ::)
Title: Re: Win64 memory copy benchmark.
Post by: Siekmanski on July 10, 2016, 03:47:38 PM
Funny article....

In late 2013, my OCD took over and I became totally obsessed with writing the fastest memcpy/memmove function in the world; which took over my work and life.
I became so obsessed that I wrote 80,000 lines of code in over 140 variations of memmove, mostly copies with small variations and tweaks.


Title: Re: Win64 memory copy benchmark.
Post by: hutch-- on July 10, 2016, 06:30:02 PM
From motor racing, "when the flag drops, the bullsh*t stops". Lets see what it clocks like.  :biggrin:
Title: Re: Win64 memory copy benchmark.
Post by: Jokaste on November 15, 2017, 04:56:49 AM
Many years ago I had a 750GSX Suzuki Inazuma, a real pleasure.
Driving between the cars in Paris!

Here are my results:

1-10 000
2-10 655
3-9 655

In the order of the menu

Infos on my cpu

Socket 1         ID = 0
   Number of cores      2 (max 2)
   Number of threads   2 (max 2)
   Name         Intel Mobile Core 2 Duo T6570
   Codename      Penryn
   Specification      Intel(R) Core(TM)2 Duo CPU     T6570  @ 2.10GHz
   Package (platform ID)   Socket P (478) (0x7)
   CPUID         6.7.A
   Extended CPUID      6.17
   Core Stepping      R0
   Technology      45 nm
   Core Speed      2094.6 MHz
   Multiplier x Bus Speed   10.5 x 199.5 MHz
   Rated Bus speed      798.0 MHz
   Stock frequency      2100 MHz
   Instructions sets   MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, EM64T, VT-x
   L1 Data cache      2 x 32 KBytes, 8-way set associative, 64-byte line size
   L1 Instruction cache   2 x 32 KBytes, 8-way set associative, 64-byte line size
   L2 cache      2048 KBytes, 8-way set associative, 64-byte line size
   Max CPUID level      0000000Dh
   Max CPUID ext. level   80000008h
   Cache descriptor   Level 1, D, 32 KB, 1 thread(s)
   Cache descriptor   Level 1, I, 32 KB, 1 thread(s)
   Cache descriptor   Level 2, U, 2 MB, 2 thread(s)
   FID/VID Control      yes
   FID range      6.0x - 10.5x
   Max VID         1.150 V
Title: Re: Win64 memory copy benchmark.
Post by: Raistlin on November 15, 2017, 04:17:10 PM
....and Symantec Endpoint protection strikes again = SONAR.Heur.RGC!g171
QuoteReputation was not used in this detection.
and deletes the exe <sniff> - g0d I hate AV scanners that think they know viruses....
Sorry @hutch--  ca'nt run at work - will need to do so from home.
Title: Re: Win64 memory copy benchmark.
Post by: hutch-- on November 15, 2017, 04:55:32 PM
Its a malicious plot, there is not enough code in the benchmark to fit a virus into, let along a trojan.
Title: Re: Win64 memory copy benchmark.
Post by: jj2007 on November 15, 2017, 07:57:30 PM
Quote from: Raistlin on November 15, 2017, 04:17:10 PMReputation was not used in this detection.

I think the correct wording should be "Symantec reputation was damaged in this detection" ;)

Apparently, you can fumble something (source (https://www.symantec.com/connect/forums/symantec-endpoint-protection-syslog-message-field-explanation)):
QuoteThis is a Heuristic (SONAR) detection. This is likely coming from the fact that in your SONAR policy under System Change events you have the options for 'DNS Change detected' and 'Host file change detected' set to Log. Check the policy to verify