Author Topic: Win64 memory copy benchmark.  (Read 1184 times)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4813
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Win64 memory copy benchmark.
« on: July 08, 2016, 09:23:16 PM »
Atached is a benchmark for 3 64 bit memory copy algos. One using REP MOVSQ, the other two are respectively an aligned and unaligned XMM version. The two XMM versions do not have a tail trimmer for uneven byte counts as this is not what I am testing. It is a normal window app with the tests being run from the menu.
« Last Edit: July 09, 2016, 04:10:22 PM by hutch-- »
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

mineiro

  • Member
  • ***
  • Posts: 365
Re: Win64 memory copy benchmark.
« Reply #1 on: July 09, 2016, 06:14:28 AM »
I was not able to run your example, so much memory need, my machine have only 1GB, so to avoid swap I have tried assemble source code but no lucky, received errors:
mrm macro gives: error A2108: use of register assumed to ERROR (when filling wc structure)
movq mm7, lParam and movq lParam, mm7 gives: error A2222: x87 and MMX instructions disallowed; legacy FP state not saved in Win64
Used Microsoft (R) Macro Assembler (AMD64) Version 8.00.40310.39

I have done some tests, moving source/destination to rsp and using pop/push but rep movsq is more quickly on my machine. My test was done on ideal situation, so source,destination and sizeof are all multiple of 16.
I think the only change need is about a head and tail on rep movsq to deal with unaligned data.

Used serialized rdtsc.

Code: [Select]
rep movsq
time: 0 1219200426
time: 0 592314840
time: 0 577661148
time: 0 574660377
time: 1 592705800
time: 0 593109693
mov rsp,rsi   shr rcx,4   @again: pop qword ptr [rdi]     pop qword ptr [rdi+8]    add rdi,2*8   sub rcx,1   jnz @again
time: 0 732068685
time: 0 761895693
time: 0 744006060
time: 0 748483425
time: 1 755325054
time: 0 755273367
When I have time I'll check xmm version.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: Win64 memory copy benchmark.
« Reply #2 on: July 09, 2016, 06:44:07 AM »
There is an old memcpy thread somewhere, here is a bit about movlps etc.

Siekmanski

  • Member
  • *****
  • Posts: 1094
Re: Win64 memory copy benchmark.
« Reply #3 on: July 10, 2016, 03:12:23 PM »
A thread on Code Project....
Apex memmove - the fastest memcpy/memmove on x86/x64 ... EVER, written in C

http://www.codeproject.com/Articles/1110153/Apex-memmove-the-fastest-memcpy-memmove-on-x-x-EVE

jj2007

  • Member
  • *****
  • Posts: 7558
  • Assembler is fun ;-)
    • MasmBasic
Re: Win64 memory copy benchmark.
« Reply #4 on: July 10, 2016, 03:36:46 PM »
Quote
8 ) An optimized assembler version of these algorithms WILL be faster (I know because I have built assembler versions)

So that EVER refers to non-assembler code?  ::)

Siekmanski

  • Member
  • *****
  • Posts: 1094
Re: Win64 memory copy benchmark.
« Reply #5 on: July 10, 2016, 03:47:38 PM »
Funny article....

Code: [Select]
In late 2013, my OCD took over and I became totally obsessed with writing the fastest memcpy/memmove function in the world; which took over my work and life.
I became so obsessed that I wrote 80,000 lines of code in over 140 variations of memmove, mostly copies with small variations and tweaks.


hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4813
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Win64 memory copy benchmark.
« Reply #6 on: July 10, 2016, 06:30:02 PM »
From motor racing, "when the flag drops, the bullsh*t stops". Lets see what it clocks like.  :biggrin:
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin: