Recent Posts

Pages: [1] 2 3 ... 10
1
The Laboratory / Re: Benchmark testing different types of registers.
« Last post by nidud on July 22, 2018, 11:25:22 PM »
Push/pop and memory.

Code: [Select]
repeat 10
    push rsi
    push rdi
    push rbx
    push r12
    pop r12
    pop rbx
    pop rdi
    pop rsi
    endm

Code: [Select]
repeat 10
    mov [rsp+8+0],rsi
    mov [rsp+8+8],rdi
    mov [rsp+8+16],rbx
    mov [rsp+8+24],r12
    mov r12,[rsp+8+24]
    mov rbx,[rsp+8+16]
    mov rdi,[rsp+8+8]
    mov rsi,[rsp+8+0]
    endm

Code: [Select]
    90064 cycles, rep(5000), code(241) 0.asm: reg
   162206 cycles, rep(5000), code(321) 1.asm: mmx
   162061 cycles, rep(5000), code(401) 2.asm: xmm
   207300 cycles, rep(5000), code(101) 3.asm: push/pop
   208158 cycles, rep(5000), code(401) 4.asm: mem
2
The Laboratory / Re: Benchmark testing different types of registers.
« Last post by nidud on July 22, 2018, 11:15:42 PM »
Simple move test reg,reg versus reg,xmm should be 1:2. Seems reg,mmx is the same.

Code: [Select]
repeat 10
    mov rax,rsi
    mov rcx,rdi
    mov rdx,rbx
    mov r8,r12
    mov rsi,rax
    mov rdi,rcx
    mov rbx,rdx
    mov r12,r8
    endm

Code: [Select]
repeat 10
    movd mm0,rsi
    movd mm1,rdi
    movd mm2,rbx
    movd mm3,r12
    movd rsi,mm0
    movd rdi,mm1
    movd rbx,mm2
    movd r12,mm3
    endm

Code: [Select]
repeat 10
    movd xmm0,rsi
    movd xmm1,rdi
    movd xmm2,rbx
    movd xmm3,r12
    movd rsi,xmm0
    movd rdi,xmm1
    movd rbx,xmm2
    movd r12,xmm3
    endm

Code: [Select]
Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz (AVX2)
----------------------------------------------
-- test(1)
    87160 cycles, rep(5000), code(241) 0.asm: reg
   161717 cycles, rep(5000), code(321) 1.asm: mmx
   161386 cycles, rep(5000), code(401) 2.asm: xmm
-- test(2)
    84711 cycles, rep(5000), code(241) 0.asm: reg
   161860 cycles, rep(5000), code(321) 1.asm: mmx
   161873 cycles, rep(5000), code(401) 2.asm: xmm
-- test(3)
    84742 cycles, rep(5000), code(241) 0.asm: reg
   162381 cycles, rep(5000), code(321) 1.asm: mmx
   161896 cycles, rep(5000), code(401) 2.asm: xmm

total [1 .. 3], 1++
   256613 cycles 0.asm: reg
   485155 cycles 2.asm: xmm
   485958 cycles 1.asm: mmx
hit any key to continue...
3
The Laboratory / Re: Benchmark testing different types of registers.
« Last post by jj2007 on July 22, 2018, 11:10:36 PM »
We must live on different planets
Same planet, different source and library. I have a loop that calls a proc one Billion times, the proc is just the bare minimum, no rep movsb etc.

One of the tests I did was the following, write the regs to the stack and for the little its worth, it did not time with much variation from the others.

Thanks for confirming my results above. So using mmx, xmm or just local variables produces roughly the same results.
4
The Laboratory / Re: Benchmark testing different types of registers.
« Last post by hutch-- on July 22, 2018, 10:45:11 PM »
One of the tests I did was the following, write the regs to the stack and for the little its worth, it did not time with much variation from the others. I would not recommend it for production code but it works OK.

  Warm up lap
  copy1 1906 int regs
  copy2 1891 xmm regs
  copy3 1907 mmx regs
  copy4 1907 manual stack

  That's all folks ....


The algo.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

 NOSTACKFRAME

 copy4 proc

    sub rsp, 32

    mov [rsp-8], rsi
    mov [rsp-16], rdi

    mov rsi, rcx
    mov rdi, rdx
    mov rcx, r8
    rep movsb

    mov rsi, [rsp-8]
    mov rdi, [rsp-16]

    add rsp, 32

    ret

 copy4 endp

 STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
5
The Laboratory / Re: Benchmark testing different types of registers.
« Last post by hutch-- on July 22, 2018, 10:16:39 PM »
JJ,

We must live on different planets, this is how I build the test piece and this is the disassembly of the proc that uses the XMM registers to preserve RSI and RDI.


Microsoft (R) Macro Assembler (x64) Version 14.10.24930.0
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: bmcopy.asm
 Volume in drive K is disk3_k
 Volume Serial Number is 68C7-4DBB

 Directory of K:\asm64\copytest\copybm

21/07/2018  11:28 PM             2,958 bmcopy.asm
22/07/2018  10:12 PM             3,072 bmcopy.exe
22/07/2018  10:12 PM             4,234 bmcopy.obj
21/07/2018  11:30 PM             3,949 bmcopy.zip
               4 File(s)         14,213 bytes
               0 Dir(s)  724,183,744,512 bytes free
Press any key to continue . . .

sub_1400012d6   proc
.text:00000001400012d6 66480F6EC6                 movq xmm0, rsi
.text:00000001400012db 66480F6ECF                 movq xmm1, rdi
.text:00000001400012e0 488BF1                     mov rsi, rcx
.text:00000001400012e3 488BFA                     mov rdi, rdx
.text:00000001400012e6 498BC8                     mov rcx, r8
.text:00000001400012e9 F3A4                       rep movsb byte ptr [rdi], byte ptr [rsi]
.text:00000001400012eb 66480F7EC6                 movq rsi, xmm0
.text:00000001400012f0 66480F7ECF                 movq rdi, xmm1
.text:00000001400012f5 C3                         ret
sub_1400012d6   endp
6
The Workshop / Re: Fibonacci numbers: the nature's numbers...
« Last post by Siekmanski on July 22, 2018, 07:59:32 PM »
I think, the operation goes like this:

Temp = Source
Source = Destination    ; Exchange
Destination = Destination + Temp ; Add

 :biggrin:
7
The Laboratory / Re: Benchmark testing different types of registers.
« Last post by jj2007 on July 22, 2018, 07:17:30 PM »
I get build errors (error A2070:invalid instruction operands) for movq mm0, rsi
But it works for movd mm0, rsi, and copies a QWORD, actually. This is ML64 8)

Code: [Select]
This code was assembled with ml64 in 64-bit format
Ticks for save_mmx: 2356
Ticks for save_xmm: 2324
Ticks for save_xmm_local: 2324
Ticks for save_local: 2325

The first two are "naked" procs, 3+4 have a stack frame; 3 saves rsi rdi rbx to xmm regs, 4 saves them to local qwords. This is for my Core i5. Example of loop design (@debug is an empty equate, align 16 for loops and procs):
Code: [Select]
  mov r12, loops  ; 1,000,000,000
  mov r13, rv(GetTickCount)
  align 16
@@: mov ecx, 123
@debug
call save_mmx
dec r12
jns @B
  jinvoke GetTickCount
  sub rax, r13
  Print Str$("Ticks for save_mmx: %i\n", rax)

Example mmx proc:
Code: [Select]
0000000140001010   | 48 0F 6E C6                      | movd mm0,rsi                            |
0000000140001014   | 48 0F 6E CF                      | movd mm1,rdi                            |
0000000140001018   | 48 0F 6E D3                      | movd mm2,rbx                            |
000000014000101C   | 48 FF CE                         | dec rsi                                 |
000000014000101F   | 48 FF CF                         | dec rdi                                 |
0000000140001022   | 48 FF CB                         | dec rbx                                 |
0000000140001025   | 48 0F 7E C6                      | movd rsi,mm0                            |
0000000140001029   | 48 0F 7E CF                      | movd rdi,mm1                            |
000000014000102D   | 48 0F 7E D3                      | movd rbx,mm2                            |
0000000140001031   | C3                               | ret                                     |
8
The Laboratory / Re: Automated Plugin Builder plugin for qe
« Last post by zedd151 on July 22, 2018, 07:01:40 PM »
I have trimmed the code for the Plugin Builder, and fixed a few minor bugs.
 
The latest version, 5.0.1 is linked in the first post of this thread.
 
Questions, comments, bug reports welcome; criticism and other flack are unavoidable.  :P
9
The Laboratory / Re: Benchmark testing different types of registers.
« Last post by AW on July 22, 2018, 06:36:50 PM »
Hutch,
There is the Variance of the whole which absorbs the differences you want to detect.
For this case, I would prefer the qWord's x64 conversion of the Michael Webster's code timing macros -
although neither is not fully compliant with Agner Fog thoughts, namely in respect to alignment - and test only the relevant instructions.
This is not a brown-noser comment, take it or leave it.  :t
10
The Workshop / Re: Screen Capture for 32 Bit machines
« Last post by zedd151 on July 22, 2018, 06:02:12 PM »
Z,it would be nice to have a screencapture tool,that capture to video...

I'm sure that would be nice, but unfortunately time is a huge factor. Not enough time to do everything.
 
I made this little tool for a specific purpose for myself, and posted it for anyone else that might find it useful.
 
Anyone can use the sourcecode and modify it to their hearts content, but at present I don't have a lot of time
to do much more with it as I am working on another project which is taking up most of my leisure (coding) time.
Pages: [1] 2 3 ... 10