The MASM Forum

Specialised Projects => Assembler/Compiler Technology => Topic started by: GabrielRavier on January 04, 2019, 06:32:04 AM

Title: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: GabrielRavier on January 04, 2019, 06:32:04 AM
So I'm pretty sure that this has already been done before, but I just made a thingo for checking supported extensions and wanted to share it. (I did it for 32 and 64-bit btw). It covers most cpuid-checking needs, so it could be a nice addition to m32lib (if someone finds a way to check for OS SSE support without using non OS-portable code), but I dunno.

Also, I made a version of MemCopy that is faster according to my tests, but I'm not sure whether it'll work out the same on old/newer CPUs (I'm using an Intel Atom, so...).

I built a test benchmark into test.exe (for procs with prefetch0 and prefetchw) and testGeneric.exe (for generic processors). The prefetches are used before running the functions for fairness (avoiding first access penalty).

Here are the results I got from running test.exe personally :
Time for copying a 4 bytes buffer 1048576 times with MemCopy : 44.452479 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopy : 61.110484 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopy : 63.367342 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopy : 74.863050 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopy : 131.158446 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopy : 88.957096 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopy : 93.991052 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopy : 95.598755 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopy : 116.077777 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopy : 480.714723 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopy : 458.975207 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopy : 472.920285 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopy : 454.860839 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyOld : 51.145703 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyOld : 55.698953 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyOld : 66.732289 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyOld : 87.198133 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyOld : 196.921873 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyOld : 100.182944 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyOld : 93.870732 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyOld : 128.469582 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyOld : 133.113931 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyOld : 624.098245 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyOld : 645.946050 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyOld : 646.588329 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyOld : 694.009272 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 46.868617 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 60.114121 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 75.788368 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 97.346831 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 151.026701 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyEnhancedMovsb : 94.024856 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyEnhancedMovsb : 102.438083 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyEnhancedMovsb : 97.960463 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyEnhancedMovsb : 104.919537 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyEnhancedMovsb : 459.434141 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyEnhancedMovsb : 487.153557 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyEnhancedMovsb : 476.152880 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyEnhancedMovsb : 564.040847 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyi386 : 46.789550 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyi386 : 58.844459 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyi386 : 68.172690 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyi386 : 122.686205 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyi386 : 344.957192 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyi386 : 283.119632 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyi386 : 358.778512 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyi386 : 373.480460 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyi386 : 385.105655 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyi386 : 474.762325 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyi386 : 493.823291 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyi386 : 497.946254 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyi386 : 491.078851 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopySSE2 : 31.322138 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopySSE2 : 58.345418 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopySSE2 : 35.237692 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopySSE2 : 43.822231 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopySSE2 : 103.727224 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopySSE2 : 75.665183 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopySSE2 : 107.753931 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopySSE2 : 161.375360 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopySSE2 : 165.980748 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopySSE2 : 478.368485 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopySSE2 : 492.367993 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopySSE2 : 471.664947 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopySSE2 : 455.557548 ms
They give the advantage to my version (all of them actually), but the results may depend on your processor.

PS : My laptop's model name is "Intel(R) Atom(TM) CPU  Z3735F @ 1.33GHz"

PPS : I did the tests on my desktop, and these were the results : Time for copying a 4 bytes buffer 1048576 times with MemCopy : 24.449344 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopy : 21.069343 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopy : 21.471565 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopy : 32.346237 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopy : 53.173801 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopy : 24.087122 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopy : 20.555565 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopy : 34.885349 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopy : 44.295131 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopy : 57.171581 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopy : 139.411173 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopy : 136.476061 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopy : 172.107632 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyOld : 34.299126 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyOld : 36.353349 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyOld : 35.475571 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyOld : 40.656018 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyOld : 55.192025 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyOld : 25.744456 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyOld : 22.593343 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyOld : 40.232018 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyOld : 52.340468 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyOld : 86.588038 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyOld : 172.878744 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyOld : 158.407182 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyOld : 195.133420 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 23.062232 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 23.546677 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 21.424010 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 29.988902 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 41.529796 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyEnhancedMovsb : 24.010233 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyEnhancedMovsb : 17.541786 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyEnhancedMovsb : 34.875571 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyEnhancedMovsb : 57.376026 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyEnhancedMovsb : 60.679138 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyEnhancedMovsb : 138.152061 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyEnhancedMovsb : 144.422286 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyEnhancedMovsb : 169.700075 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyi386 : 27.191123 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyi386 : 27.028456 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyi386 : 91.000929 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyi386 : 97.632932 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyi386 : 188.278306 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyi386 : 146.560065 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyi386 : 133.082281 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyi386 : 139.013840 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyi386 : 151.046289 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyi386 : 178.404079 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyi386 : 238.504106 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyi386 : 224.072988 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyi386 : 206.355647 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopySSE2 : 9.419115 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopySSE2 : 19.596898 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopySSE2 : 14.753340 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopySSE2 : 21.884010 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopySSE2 : 37.837795 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopySSE2 : 37.735128 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopySSE2 : 45.438242 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopySSE2 : 53.232468 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopySSE2 : 71.419143 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopySSE2 : 71.099143 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopySSE2 : 118.952053 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopySSE2 : 169.598298 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopySSE2 : 145.193842 ms

(Model name : "Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz", it's a skylake)

Updated link below.
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: jj2007 on January 04, 2019, 12:27:30 PM
There seems to be a little problem with your setup. I get some error messages.

MsgBox: Impossibile trovare il punto di ingresso _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcj della procedura nella libreria di collegamento dinamico libstdc++-6.dll.

C:\Masm32\MasmBasic\Members\GabrielRavier\MemCopy>uasm32 -elf MemCopy.asm
"uasm32" non รจ riconosciuto come comando interno o esterno,
un programma eseguibile o un file batch.

C:\Masm32\MasmBasic\Members\GabrielRavier\MemCopy>g++ testAndBench.cpp *.obj -Ofast -o testGeneric.exe
testAndBench.cpp: In function 'std::string getPrettyTime(double)':
testAndBench.cpp:22:10: error: 'to_string' is not a member of 'std'
   return std::to_string(x) + " ns";
          ^
testAndBench.cpp:24:10: error: 'to_string' is not a member of 'std'
   return std::to_string(x / 1000.0) + " us";
          ^
testAndBench.cpp:26:10: error: 'to_string' is not a member of 'std'
   return std::to_string(x / 1000000.0) + " ms";
          ^
testAndBench.cpp:28:10: error: 'to_string' is not a member of 'std'
   return std::to_string(x / 1000000000.0) + " s";
          ^
testAndBench.cpp: At global scope:
testAndBench.cpp:31:36: error: variable or field 'checkSpeed' declared void
static inline void checkSpeed(std::function<void(const void *src, void *dest, size_t len)> fn, const std::string& funcName)
                                    ^
testAndBench.cpp:31:31: error: 'function' is not a member of 'std'
static inline void checkSpeed(std::function<void(const void *src, void *dest, size_t len)> fn, const std::string& funcName)
                               ^
testAndBench.cpp:31:89: error: expression list treated as compound expression in functional cast [-fpermissive]
static inline void checkSpeed(std::function<void(const void *src, void *dest, size_t len)> fn, const std::string& funcName)
                                                                                         ^
testAndBench.cpp:31:45: error: expected primary-expression before 'void'
static inline void checkSpeed(std::function<void(const void *src, void *dest, size_t len)> fn, const std::string& funcName)
                                             ^
testAndBench.cpp:31:96: error: expected primary-expression before 'const'
static inline void checkSpeed(std::function<void(const void *src, void *dest, size_t len)> fn, const std::string& funcName)
                                                                                                ^

C:\Masm32\MasmBasic\Members\GabrielRavier\MemCopy>test


\Masm32\MasmBasic\Members\GabrielRavier\MemCopy>pause
Premere un tasto per continuare . . .
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: mabdelouahab on January 04, 2019, 06:05:46 PM

D:\GabrielRavier\MemCopy>uasm64 -elf MemCopy.asm
UASM v2.47, Nov 17 2018, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

MemCopy.asm: 292 lines, 3 passes, 104 ms, 0 warnings, 0 errors

D:\GabrielRavier\MemCopy>g++ testAndBench.cpp *.obj -Ofast -o testGeneric.exe
testAndBench.cpp: In function 'std::__cxx11::string getPrettyTime(double)':
testAndBench.cpp:22:15: error: 'to_string' is not a member of 'std'
   return std::to_string(x) + " ns";
               ^~~~~~~~~
testAndBench.cpp:24:15: error: 'to_string' is not a member of 'std'
   return std::to_string(x / 1000.0) + " us";
               ^~~~~~~~~
testAndBench.cpp:26:15: error: 'to_string' is not a member of 'std'
   return std::to_string(x / 1000000.0) + " ms";
               ^~~~~~~~~
testAndBench.cpp:28:15: error: 'to_string' is not a member of 'std'
   return std::to_string(x / 1000000000.0) + " s";
               ^~~~~~~~~
testAndBench.cpp: In function 'void checkSpeed(std::function<void(const void*, v
oid*, unsigned int)>, const string&)':
testAndBench.cpp:37:14: error: '_mm_malloc' was not declared in this scope
  void *src = _mm_malloc(times, 4096);
              ^~~~~~~~~~
testAndBench.cpp:37:14: note: suggested alternative: 'malloc'
  void *src = _mm_malloc(times, 4096);
              ^~~~~~~~~~
              malloc
testAndBench.cpp:65:2: error: '_mm_free' was not declared in this scope
  _mm_free(src);
  ^~~~~~~~
testAndBench.cpp:65:2: note: suggested alternative: 'free'
  _mm_free(src);
  ^~~~~~~~
  free

D:\GabrielRavier\MemCopy>test
Time for copying a 4 bytes buffer 1048576 times with MemCopy : 31.295746 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopy : 31.543785 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopy : 29.100596 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopy : 40.846978 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopy : 62.377236 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopy : 31.699451 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopy : 25.811506 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopy : 63.905673 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopy : 76.539438 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopy : 111.273097 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopy : 178.439207 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopy : 205.424624 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopy : 219.887039 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyOld : 37.317547 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyOld : 38.263090 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyOld : 39.626025 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyOld : 61.442812 ms

Time for copying a 1024 bytes buffer 1048576 times with MemCopyOld : 73.802878 m
s
Time for copying a 4096 bytes buffer 262144 times with MemCopyOld : 30.791114 ms

Time for copying a 16384 bytes buffer 65536 times with MemCopyOld : 33.373291 ms

Time for copying a 65536 bytes buffer 16384 times with MemCopyOld : 60.752578 ms

Time for copying a 262144 bytes buffer 4096 times with MemCopyOld : 82.619400 ms

Time for copying a 1048576 bytes buffer 1024 times with MemCopyOld : 100.879813
ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyOld : 191.411247 m
s
Time for copying a 16777216 bytes buffer 64 times with MemCopyOld : 214.801801 m
s
Time for copying a 67108864 bytes buffer 16 times with MemCopyOld : 261.172791 m
s
Time for copying a 4 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 31.6
20335 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 37.
460383 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 42.
744908 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 40
.146908 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyEnhancedMovsb : 5
1.349314 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyEnhancedMovsb : 33
.891608 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyEnhancedMovsb : 47
.169420 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyEnhancedMovsb : 64
.225986 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyEnhancedMovsb : 72
.444648 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyEnhancedMovsb : 1
01.316021 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyEnhancedMovsb : 18
4.333995 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyEnhancedMovsb : 19
2.885371 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyEnhancedMovsb : 18
3.406840 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyi386 : 44.384535 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyi386 : 46.543334 ms

Time for copying a 64 bytes buffer 1048576 times with MemCopyi386 : 77.371653 ms

Time for copying a 256 bytes buffer 1048576 times with MemCopyi386 : 107.397266
ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyi386 : 232.297996
ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyi386 : 167.903087
ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyi386 : 183.645471
ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyi386 : 181.284819
ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyi386 : 220.098728
ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyi386 : 257.070303
ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyi386 : 372.105047
ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyi386 : 338.575235
ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyi386 : 296.396969
ms
Time for copying a 4 bytes buffer 1048576 times with MemCopySSE2 : 16.196981 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopySSE2 : 27.015781 ms

Time for copying a 64 bytes buffer 1048576 times with MemCopySSE2 : 13.963770 ms

Time for copying a 256 bytes buffer 1048576 times with MemCopySSE2 : 26.062967 m
s
Time for copying a 1024 bytes buffer 1048576 times with MemCopySSE2 : 62.496124
ms
Time for copying a 4096 bytes buffer 262144 times with MemCopySSE2 : 40.519822 m
s
Time for copying a 16384 bytes buffer 65536 times with MemCopySSE2 : 53.056082 m
s
Time for copying a 65536 bytes buffer 16384 times with MemCopySSE2 : 69.197468 m
s
Time for copying a 262144 bytes buffer 4096 times with MemCopySSE2 : 103.931555
ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopySSE2 : 127.382409
ms
Time for copying a 4194304 bytes buffer 256 times with MemCopySSE2 : 243.064622
ms
Time for copying a 16777216 bytes buffer 64 times with MemCopySSE2 : 250.542586
ms
Time for copying a 67108864 bytes buffer 16 times with MemCopySSE2 : 238.975819
ms

D:\GabrielRavier\MemCopy>pause
Appuyez sur une touche pour continuer...
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: sinsi on January 04, 2019, 06:28:40 PM
Don't know what to make of this...
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: anta40 on January 04, 2019, 08:53:21 PM
@sinsi
Those executables are compiled with MinGW GCC (or any Windows-based GCC), and not MSVC.
You have to put those missing DLLs on your PATH.
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: sinsi on January 04, 2019, 09:07:52 PM
So I need to download yet another bundle of DLLs...
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: GabrielRavier on January 05, 2019, 12:40:36 AM
So I redid it and did some minor optimisations, and I static-linked the benchmarks. Also here are the results with my new version :

Time for copying a 4 bytes buffer 1048576 times with MemCopy : 35.182689 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopy : 59.306258 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopy : 32.661700 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopy : 44.464510 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopy : 108.360114 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopy : 85.267858 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopy : 153.619882 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopy : 189.894617 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopy : 183.061019 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopy : 546.795566 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopy : 486.046614 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopy : 516.154094 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopy : 488.190600 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyOld : 52.372967 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyOld : 55.247467 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyOld : 70.919422 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyOld : 93.756714 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyOld : 145.245043 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyOld : 92.522576 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyOld : 92.089424 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyOld : 98.831350 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyOld : 102.596217 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyOld : 495.940349 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyOld : 500.097115 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyOld : 490.218278 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyOld : 509.411596 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyMovsb : 44.105843 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyMovsb : 57.386869 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyMovsb : 61.867354 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyMovsb : 75.803837 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyMovsb : 136.926926 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyMovsb : 89.881840 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyMovsb : 91.602415 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyMovsb : 112.802211 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyMovsb : 101.470367 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyMovsb : 467.121438 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyMovsb : 485.370531 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyMovsb : 480.826449 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyMovsb : 497.012342 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyMMX : 44.777342 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyMMX : 58.447976 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyMMX : 63.314630 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyMMX : 77.885372 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyMMX : 136.109897 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyMMX : 90.121907 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyMMX : 91.031182 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyMMX : 96.202646 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyMMX : 113.304690 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyMMX : 547.688225 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyMMX : 600.812331 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyMMX : 575.628800 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyMMX : 516.925288 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopySSE2 : 42.884309 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopySSE2 : 46.917891 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopySSE2 : 59.337198 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopySSE2 : 78.568903 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopySSE2 : 134.707883 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopySSE2 : 89.822253 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopySSE2 : 91.426519 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopySSE2 : 104.468624 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopySSE2 : 100.475722 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopySSE2 : 482.827770 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopySSE2 : 497.123495 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopySSE2 : 500.413958 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopySSE2 : 505.475415 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopySSE3 : 34.539264 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopySSE3 : 61.922930 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopySSE3 : 34.128457 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopySSE3 : 45.019701 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopySSE3 : 107.394117 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopySSE3 : 75.795243 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopySSE3 : 105.266173 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopySSE3 : 161.711110 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopySSE3 : 183.807002 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopySSE3 : 498.366227 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopySSE3 : 503.326846 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopySSE3 : 477.680943 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopySSE3 : 519.981986 ms
(results obtained from the "Intel(R) Atom(TM) CPU  Z3735F @ 1.33GHz")

Time for copying a 4 bytes buffer 1048576 times with MemCopy : 25.812900 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopy : 35.336016 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopy : 20.049787 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopy : 34.450238 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopy : 74.273811 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopy : 56.771136 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopy : 66.258252 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopy : 84.325815 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopy : 107.936048 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopy : 136.734283 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopy : 155.900958 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopy : 180.522302 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopy : 201.216978 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyOld : 48.137799 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyOld : 48.920466 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyOld : 48.750688 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyOld : 59.917360 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyOld : 77.199590 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyOld : 39.291129 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyOld : 31.921348 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyOld : 59.326249 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyOld : 78.816924 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyOld : 91.456485 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyOld : 183.091637 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyOld : 189.532084 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyOld : 175.897856 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyMovsb : 40.374685 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyMovsb : 39.613351 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyMovsb : 39.205795 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyMovsb : 49.474689 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyMovsb : 67.285363 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyMovsb : 37.818239 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyMovsb : 30.727125 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyMovsb : 67.174697 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyMovsb : 78.568035 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyMovsb : 91.831596 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyMovsb : 151.131178 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyMovsb : 177.215190 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyMovsb : 176.230745 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopyMMX : 39.534240 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyMMX : 41.761352 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyMMX : 41.074685 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyMMX : 51.195134 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyMMX : 77.297368 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyMMX : 37.091572 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyMMX : 31.980903 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyMMX : 58.433359 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyMMX : 80.916925 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyMMX : 105.096491 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyMMX : 184.864971 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyMMX : 199.928978 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyMMX : 289.877018 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopySSE2 : 43.720464 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopySSE2 : 43.648464 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopySSE2 : 44.151575 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopySSE2 : 54.133802 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopySSE2 : 74.762700 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopySSE2 : 39.173795 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopySSE2 : 40.303129 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopySSE2 : 63.283584 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopySSE2 : 76.670701 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopySSE2 : 98.220933 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopySSE2 : 183.777859 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopySSE2 : 184.112971 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopySSE2 : 180.509858 ms
Time for copying a 4 bytes buffer 1048576 times with MemCopySSE3 : 19.439564 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopySSE3 : 31.614681 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopySSE3 : 19.926231 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopySSE3 : 32.207570 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopySSE3 : 73.421810 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopySSE3 : 55.933803 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopySSE3 : 63.564473 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopySSE3 : 91.400930 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopySSE3 : 109.192049 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopySSE3 : 110.914272 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopySSE3 : 177.417412 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopySSE3 : 176.539190 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopySSE3 : 189.168084 ms
(results obtained from the "Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz"
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: GabrielRavier on January 05, 2019, 12:46:04 AM
And the forum didn't like the static-linked super big exes being more than 500 kb so here is a second post with the InstrSet exe
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: sinsi on January 05, 2019, 08:09:24 AM
i7 4790

Time for copying a 4 bytes buffer 1048576 times with MemCopy : 7.139067 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopy : 12.916387 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopy : 7.162398 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopy : 11.958670 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopy : 24.539884 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopy : 20.084761 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopy : 22.172335 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopy : 35.122581 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopy : 56.796994 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopy : 56.276594 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopy : 80.263074 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopy : 200.598076 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopy : 180.521282 ms

Time for copying a 4 bytes buffer 1048576 times with MemCopyOld : 16.056712 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyOld : 16.186456 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyOld : 16.667022 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyOld : 30.704731 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyOld : 31.118717 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyOld : 14.280694 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyOld : 11.444814 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyOld : 34.919998 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyOld : 39.555512 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyOld : 45.918958 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyOld : 55.817937 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyOld : 124.654691 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyOld : 134.993264 ms

Time for copying a 4 bytes buffer 1048576 times with MemCopyMovsb : 13.692008 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyMovsb : 14.391660 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyMovsb : 14.449134 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyMovsb : 17.832160 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyMovsb : 24.368315 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyMovsb : 12.345342 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyMovsb : 11.421198 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyMovsb : 35.207370 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyMovsb : 39.480966 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyMovsb : 46.140320 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyMovsb : 55.560440 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyMovsb : 119.289934 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyMovsb : 123.316276 ms

Time for copying a 4 bytes buffer 1048576 times with MemCopyMMX : 14.433201 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopyMMX : 14.937098 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopyMMX : 15.034122 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopyMMX : 17.679938 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopyMMX : 24.222637 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopyMMX : 12.705837 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopyMMX : 11.536147 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopyMMX : 35.059132 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopyMMX : 39.389917 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopyMMX : 46.315873 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopyMMX : 67.282099 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopyMMX : 119.061459 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopyMMX : 124.101286 ms

Time for copying a 4 bytes buffer 1048576 times with MemCopySSE2 : 13.553728 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopySSE2 : 13.594131 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopySSE2 : 13.848213 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopySSE2 : 28.220525 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopySSE2 : 29.097722 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopySSE2 : 13.812932 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopySSE2 : 11.802180 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopySSE2 : 35.060270 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopySSE2 : 39.731065 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopySSE2 : 46.484882 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopySSE2 : 55.324283 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopySSE2 : 123.071014 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopySSE2 : 133.120507 ms

Time for copying a 4 bytes buffer 1048576 times with MemCopySSE3 : 7.073341 ms
Time for copying a 16 bytes buffer 1048576 times with MemCopySSE3 : 16.620644 ms
Time for copying a 64 bytes buffer 1048576 times with MemCopySSE3 : 7.663166 ms
Time for copying a 256 bytes buffer 1048576 times with MemCopySSE3 : 11.898065 ms
Time for copying a 1024 bytes buffer 1048576 times with MemCopySSE3 : 24.770351 ms
Time for copying a 4096 bytes buffer 262144 times with MemCopySSE3 : 20.326039 ms
Time for copying a 16384 bytes buffer 65536 times with MemCopySSE3 : 22.218713 ms
Time for copying a 65536 bytes buffer 16384 times with MemCopySSE3 : 34.908333 ms
Time for copying a 262144 bytes buffer 4096 times with MemCopySSE3 : 57.010390 ms
Time for copying a 1048576 bytes buffer 1024 times with MemCopySSE3 : 57.179968 ms
Time for copying a 4194304 bytes buffer 256 times with MemCopySSE3 : 73.096408 ms
Time for copying a 16777216 bytes buffer 64 times with MemCopySSE3 : 177.693937 ms
Time for copying a 67108864 bytes buffer 16 times with MemCopySSE3 : 175.750902 ms

Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: aw27 on January 05, 2019, 08:55:23 PM
We can use Visual Studio and link statically to have an .exe of 218 KB (versus 2550KB). Sure, with VS you will not need to link statically.
It will be also faster, of course.

Note: I am not commenting on any aspect of the algorithm or the modus operandi itself.
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: TimoVJL on January 06, 2019, 03:03:27 AM
With Pelles C 9  ;)

EDIT: TestAndBenchPOCmsvcrt.zip use msvcrt.dll
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: aw27 on January 06, 2019, 11:36:33 PM
Well, I know Pelles is not bad but is really not faster  :eusa_naughty: than VS.
I built based on Tino modified code and concluded that 6 KB .exe size is enough to at least match Pelles.  :t


Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: aw27 on January 07, 2019, 03:30:50 AM
We can further reduce it to 4608 bytes (and still keep the Manifest which is another 512 bytes). :t
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: TimoVJL on January 07, 2019, 05:05:37 AM
Sure, with Pelles C, 3 584 bytes ;)
Same code with msvc 4 096 bytes.

But size and speed is not important for host code for functions.
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: aw27 on January 07, 2019, 06:59:47 AM
OK, Visual Studio 3456 bytes, including the manifest of course.  :biggrin:
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: aw27 on January 07, 2019, 07:02:32 AM
Even better, 3328 bytes  :t
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: aw27 on January 11, 2019, 12:39:12 AM
Building with MASM the .asm listing then linking with polink to save 160 bytes (no rich PE in polink).
Result: 3040 bytes
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: TimoVJL on January 11, 2019, 01:02:13 AM
Didn't work in Windows 7 :(
/ALIGN:16 works in Windows XP and  Windows 10.
QuoteThe application was unable to start correctly (0x00000018). Click OK to close the application.
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: aw27 on January 11, 2019, 06:26:13 AM
Windows 7 dislikes executables built with the /ALIGN option. I was not expecting that, may be I am missing something but right now I can not figure out what.
/ALIGN:4096 works though.
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: hutch-- on January 11, 2019, 10:55:32 AM
If it is what I think it is, a PE file should have the default /align:512   . It varies with OS versions but 512 works on all versions.
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: aw27 on January 11, 2019, 11:02:01 PM
The "good" news is that this 3040 byte program with /ALIGN:16 works in Windows 7 32-bit and Windows 8.1 32-bit (but not 64-bit). The "bad" news is that probably nobody in the World is using Windows 7 32-bit and Windows 8.1 32-bit.
Since it works in Windows 10 64-bit, but does not work on 64-bit editions of XP, 7 and 8, we may assume that it was a WoW64 bug that was finally corrected.
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: TimoVJL on January 11, 2019, 11:13:58 PM
Feature ?
Quote32/32
   4
   SectionAlignment
   The alignment (in bytes) of sections when they are loaded into memory. It must be greater than or equal to FileAlignment. The default is the page size for the architecture.
36/36
   4
   FileAlignment
   The alignment factor (in bytes) that is used to align the raw data of sections in the image file. The value should be a power of 2 between 512 and 64 K, inclusive. The default is 512. If the SectionAlignment is less than the architecture's page size, then FileAlignment must match SectionAlignment.
A TestAndBench3 example (3 584 bytes) compiled with msvc 2010 SP1.
TestAndBench3src1.zip an example (3 072 bytes) using MulDiv
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: GabrielRavier on January 11, 2019, 11:52:34 PM
I'm using Win10 32-bit lol
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: aw27 on January 12, 2019, 12:59:27 AM
OK, default /ALIGN is 4KB (https://docs.microsoft.com/en-us/cpp/build/reference/align-section-alignment?view=vs-2017).
If we set it to less than 4KB /FILEALIGNMENT must have the same value.
However /FILEALIGNMENT range is from 512 Bytes to 64 KB (the default is 512 B)
This appears to imply that /ALIGN can never be less than 512 Bytes, otherwise /FILEALIGNMENT will need to be less than 512 Bytes to match it but then will be out of range.
Looks clear like water, or may be not.
Title: Re: Made a routine for checking cpuid stuff and a maybe a better MemCopy
Post by: TimoVJL on January 12, 2019, 10:39:54 PM
MulDiv simplify code in limited range
long long llFreq, llTime, llStart;
QueryPerformanceFrequency(&llFreq);
QueryPerformanceCounter(&llStart);
...
QueryPerformanceCounter(&llTime);
llTime = MulDiv(llTime - llStart, 1000, llFreq);
LOCAL llFreq:QWORD, llTime:QWORD, llStart:QWORD
invoke QueryPerformanceFrequency, addr llFreq
invoke QueryPerformanceCounter, addr llStart
...
invoke QueryPerformanceCounter, addr llTime
mov eax, DWORD PTR llTime
;mov ecx, DWORD PTR llTime + 4
sub eax, DWORD PTR llStart
;sbb ecx, DWORD PTR llStart + 4
invoke MulDiv, eax, 1000, DWORD PTR llFreq

EDIT: using FPU
double dTime = (ll2-ll1) * 1000.0 / llf;