News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Win64 memory copy benchmark.

Started by hutch--, July 08, 2016, 09:23:16 PM

Previous topic - Next topic

hutch--

Atached is a benchmark for 3 64 bit memory copy algos. One using REP MOVSQ, the other two are respectively an aligned and unaligned XMM version. The two XMM versions do not have a tail trimmer for uneven byte counts as this is not what I am testing. It is a normal window app with the tests being run from the menu.

mineiro

I was not able to run your example, so much memory need, my machine have only 1GB, so to avoid swap I have tried assemble source code but no lucky, received errors:
mrm macro gives: error A2108: use of register assumed to ERROR (when filling wc structure)
movq mm7, lParam and movq lParam, mm7 gives: error A2222: x87 and MMX instructions disallowed; legacy FP state not saved in Win64
Used Microsoft (R) Macro Assembler (AMD64) Version 8.00.40310.39

I have done some tests, moving source/destination to rsp and using pop/push but rep movsq is more quickly on my machine. My test was done on ideal situation, so source,destination and sizeof are all multiple of 16.
I think the only change need is about a head and tail on rep movsq to deal with unaligned data.

Used serialized rdtsc.


rep movsq
time: 0 1219200426
time: 0 592314840
time: 0 577661148
time: 0 574660377
time: 1 592705800
time: 0 593109693
mov rsp,rsi   shr rcx,4   @again: pop qword ptr [rdi]     pop qword ptr [rdi+8]    add rdi,2*8   sub rcx,1   jnz @again
time: 0 732068685
time: 0 761895693
time: 0 744006060
time: 0 748483425
time: 1 755325054
time: 0 755273367

When I have time I'll check xmm version.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007


Siekmanski

A thread on Code Project....
Apex memmove - the fastest memcpy/memmove on x86/x64 ... EVER, written in C

http://www.codeproject.com/Articles/1110153/Apex-memmove-the-fastest-memcpy-memmove-on-x-x-EVE
Creative coders use backward thinking techniques as a strategy.

jj2007

Quote8 ) An optimized assembler version of these algorithms WILL be faster (I know because I have built assembler versions)

So that EVER refers to non-assembler code?  ::)

Siekmanski

Funny article....

In late 2013, my OCD took over and I became totally obsessed with writing the fastest memcpy/memmove function in the world; which took over my work and life.
I became so obsessed that I wrote 80,000 lines of code in over 140 variations of memmove, mostly copies with small variations and tweaks.


Creative coders use backward thinking techniques as a strategy.

hutch--

From motor racing, "when the flag drops, the bullsh*t stops". Lets see what it clocks like.  :biggrin:

Jokaste

Many years ago I had a 750GSX Suzuki Inazuma, a real pleasure.
Driving between the cars in Paris!

Here are my results:

1-10 000
2-10 655
3-9 655

In the order of the menu

Infos on my cpu

Socket 1         ID = 0
   Number of cores      2 (max 2)
   Number of threads   2 (max 2)
   Name         Intel Mobile Core 2 Duo T6570
   Codename      Penryn
   Specification      Intel(R) Core(TM)2 Duo CPU     T6570  @ 2.10GHz
   Package (platform ID)   Socket P (478) (0x7)
   CPUID         6.7.A
   Extended CPUID      6.17
   Core Stepping      R0
   Technology      45 nm
   Core Speed      2094.6 MHz
   Multiplier x Bus Speed   10.5 x 199.5 MHz
   Rated Bus speed      798.0 MHz
   Stock frequency      2100 MHz
   Instructions sets   MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, EM64T, VT-x
   L1 Data cache      2 x 32 KBytes, 8-way set associative, 64-byte line size
   L1 Instruction cache   2 x 32 KBytes, 8-way set associative, 64-byte line size
   L2 cache      2048 KBytes, 8-way set associative, 64-byte line size
   Max CPUID level      0000000Dh
   Max CPUID ext. level   80000008h
   Cache descriptor   Level 1, D, 32 KB, 1 thread(s)
   Cache descriptor   Level 1, I, 32 KB, 1 thread(s)
   Cache descriptor   Level 2, U, 2 MB, 2 thread(s)
   FID/VID Control      yes
   FID range      6.0x - 10.5x
   Max VID         1.150 V
Kenavo
---------------------------
Grincheux / Jokaste

Raistlin

....and Symantec Endpoint protection strikes again = SONAR.Heur.RGC!g171
QuoteReputation was not used in this detection.
and deletes the exe <sniff> - g0d I hate AV scanners that think they know viruses....
Sorry @hutch--  ca'nt run at work - will need to do so from home.
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

hutch--

Its a malicious plot, there is not enough code in the benchmark to fit a virus into, let along a trojan.

jj2007

Quote from: Raistlin on November 15, 2017, 04:17:10 PMReputation was not used in this detection.

I think the correct wording should be "Symantec reputation was damaged in this detection" ;)

Apparently, you can fumble something (source):
QuoteThis is a Heuristic (SONAR) detection. This is likely coming from the fact that in your SONAR policy under System Change events you have the options for 'DNS Change detected' and 'Host file change detected' set to Log. Check the policy to verify