Author Topic: Libraries vs Win API  (Read 9080 times)

Manos

  • Guest
Libraries vs Win API
« on: May 07, 2013, 01:24:32 AM »
 :idea:
Because my previous thread "C/C++ vs Assembler" became very long and out of its target, I begin a new one.

I never believed that Win API was slower than crt or other libraries.
My previous thread gave me the chance to verify that.
Therefor, I thinked to test and other functions.

I wrote a simple function, named strCopy.
This is the code:

strCopy proc pDest:DWORD, pSource:DWORD
   mov ecx, pSource   
    mov edx, pDest      
    copyLoop:
       mov al, byte ptr [ecx]   
       inc ecx         
       mov byte ptr [edx],al
       inc edx         
       cmp al, 0      
       jnz copyLoop
      mov al, byte ptr 0
   ret
strCopy endp


The above function take advantage of two channels of processor.

I tested this with Hutch's library szCopy, with lstrcpy WinAPI and with crt strcpy.
Follow are my source and the average of results:

.data

szText            db "abcdefghijklmnopqrstuvwxyz123456789", 0
szBuffer         db 0 dup (64)

includelib \MSVCRT.LIB

strcpy PROTO C :DWORD, :DWORD
strCopy PROTO pDest:DWORD, pSource:DWORD

;......................................
LOCAL dwTime   :DWORD

invoke GetTickCount
      mov dwTime, eax
      push esi
      xor esi, esi
      TestLoop:
         invoke lstrcpy, addr szBuffer, addr szText
      ;   invoke strcpy, addr szBuffer, addr szText
      ;   invoke szCopy, addr szText, addr szBuffer
      ;   invoke strCopy, addr szBuffer, addr szText
         inc esi
         cmp esi, 10000000
         jb TestLoop
    
    pop esi   
       
    invoke GetTickCount
    sub eax, dwTime
   PrintDec eax


Results (average):
 
lstrcpy (API)  Ticks 570
szCopy (Hutch lib)  Ticks 515
strCopy (mine)   Ticks 359
strcpy (crt)   Ticks 172

Manos.









jj2007

  • Member
  • *****
  • Posts: 10547
  • Assembler is fun ;-)
    • MasmBasic
Re: Libraries vs Win API
« Reply #1 on: May 07, 2013, 02:03:03 AM »
Nothing beats rep movsd. If you need inspiration, try this and this thread.

Or, even better, the Code location sensitivity of timings thread, but beware, it's advanced stuff. Unfortunately the thread no longer has the attachments, but here they are below.

RuiLoureiro

  • Member
  • ****
  • Posts: 819
Re: Libraries vs Win API
« Reply #2 on: May 07, 2013, 02:30:03 AM »
Nothing beats rep movsd.
              Jochen, i dont get it in my P4  :icon14:
              Well i tested rep movsb

jj2007

  • Member
  • *****
  • Posts: 10547
  • Assembler is fun ;-)
    • MasmBasic
Re: Libraries vs Win API
« Reply #3 on: May 07, 2013, 03:25:52 AM »
              Jochen, i dont get it in my P4  :icon14:

No 5, MemCoP4 is best for you ;-)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7542
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Libraries vs Win API
« Reply #4 on: May 08, 2013, 10:53:51 PM »
Manos,

Here is a quick optimisation for your original algo, 1 less instruction per iteration and unrolled by 2.

Code: [Select]
IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    StrCpy2 PROTO src:DWORD,dst:DWORD
    strCopy PROTO pDest:DWORD, pSource:DWORD

    .data
    align 4
      item db "The game is done, I've won I've won quote she and whistled thrice",0
    align 4
      buff db "                                                                     "

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    push esi

  REPEAT 8

    invoke GetTickCount
    push eax

    mov esi, 10000000
  @@:
    invoke strCopy, ADDR buff,ADDR item
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," Original",13,10

    invoke GetTickCount
    push eax

    mov esi, 10000000
  @@:
    invoke StrCpy2, ADDR item,ADDR buff
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," Modified",13,10

  ENDM

    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

strCopy proc pDest:DWORD, pSource:DWORD
   mov ecx, pSource   
    mov edx, pDest     
    copyLoop:
       mov al, byte ptr [ecx]   
       inc ecx         
       mov byte ptr [edx],al
       inc edx         
       cmp al, 0     
       jnz copyLoop
      mov al, byte ptr 0
   ret
strCopy endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

StrCpy2 proc src:DWORD,dst:DWORD

    mov ecx, [esp+4]    ; src
    mov edx, [esp+8]    ; dst
    push esi
    mov esi, -1

  @@:
    add esi, 1
    movzx eax, BYTE PTR [ecx+esi]
    mov BYTE PTR [edx+esi], al
    test eax, eax
    jz @F

    add esi, 1
    movzx eax, BYTE PTR [ecx+esi]
    mov BYTE PTR [edx+esi], al
    test eax, eax
    jnz @B

  @@:

    pop esi
    ret 8

StrCpy2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

Timing on my Core2 Quad.


531 Original
344 Modified
531 Original
344 Modified
516 Original
359 Modified
516 Original
343 Modified
532 Original
343 Modified
532 Original
343 Modified
516 Original
359 Modified
516 Original
344 Modified
Press any key to continue ...
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

Vortex

  • Member
  • *****
  • Posts: 2337
Re: Libraries vs Win API
« Reply #5 on: May 09, 2013, 02:59:52 AM »
Pentium IV, 3.2 Ghz :

Code: [Select]
1000 Original
844 Modified
984 Original
844 Modified
984 Original
844 Modified
969 Original
859 Modified
969 Original
843 Modified
985 Original
844 Modified
984 Original
844 Modified
984 Original
844 Modified
Press any key to continue ...

Manos

  • Guest
Re: Libraries vs Win API
« Reply #6 on: May 09, 2013, 03:57:48 AM »
Steve,

your function is much faster than my own.

Results, (average):

strCopy (mine) 360 ticks
StrCpy2 (your) 270 ticks


This is because, you have avoid increments by putting two times
the body of instructions set in the function.

I wrote a new one, named strCopyNew using the same trick
and the results are identical like your function.

strCopyNew proc pDest:DWORD, pSource:DWORD
   mov ecx, pSource   
    mov edx, pDest
   push esi
   mov esi, -1
    @@:
      add esi, 1
       movzx eax, byte ptr [ecx + esi]   
       mov byte ptr [edx + esi], al
       test eax, eax
       jz @F
   
      add esi, 1
       movzx eax, byte ptr [ecx + esi]   
       mov byte ptr [edx + esi], al
       test eax, eax
       jnz @B
   
   @@:   
   pop esi
   ret
strCopyNew endp


Results, (average):
strCopyNew 270 ticks

Manos.

RuiLoureiro

  • Member
  • ****
  • Posts: 819
Re: Libraries vs Win API
« Reply #7 on: May 09, 2013, 04:17:01 AM »
Manos,
Here is a quick optimisation for your original algo, 1 less instruction per iteration and unrolled by 2.
              Hutch, here another optimisation: 1 less instruction per iteration

Results on my P4
Quote
1016 Original
844 Modified
781 Modified again
969 Original
844 Modified
796 Modified again
985 Original
844 Modified
765 Modified again
1000 Original
844 Modified
781 Modified again
985 Original
859 Modified
781 Modified again
1000 Original
828 Modified
782 Modified again
984 Original
859 Modified
766 Modified again
984 Original
844 Modified
781 Modified again
Press any key to continue ...

Code: [Select]
IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    StrCpy3 PROTO src:DWORD,dst:DWORD
    StrCpy2 PROTO src:DWORD,dst:DWORD
    strCopy PROTO pDest:DWORD, pSource:DWORD

    .data
    align 4
      item db "The game is done, I've won I've won quote she and whistled thrice",0
    align 4
      buff db "                                                                     "

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    push esi

  REPEAT 8

    invoke GetTickCount
    push eax

    mov esi, 10000000
  @@:
    invoke strCopy, ADDR buff,ADDR item
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," Original",13,10
;----------------------------------------
    invoke GetTickCount
    push eax
    mov esi, 10000000
  @@:
    invoke StrCpy2, ADDR item,ADDR buff
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," Modified",13,10

;----------------------------------------
    invoke GetTickCount
    push eax
    mov esi, 10000000
  @@:
    invoke StrCpy3, ADDR item,ADDR buff
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," Modified again",13,10

  ENDM

    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

strCopy proc pDest:DWORD, pSource:DWORD
   mov ecx, pSource   
    mov edx, pDest     
    copyLoop:
       mov al, byte ptr [ecx]   
       inc ecx         
       mov byte ptr [edx],al
       inc edx         
       cmp al, 0     
       jnz copyLoop
      mov al, byte ptr 0
   ret
strCopy endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

StrCpy2 proc src:DWORD,dst:DWORD

    mov ecx, [esp+4]    ; src
    mov edx, [esp+8]    ; dst
    push esi
    mov esi, -1

  @@:
    add esi, 1
    movzx eax, BYTE PTR [ecx+esi]
    mov BYTE PTR [edx+esi], al
    test eax, eax
    jz @F

    add esi, 1
    movzx eax, BYTE PTR [ecx+esi]
    mov BYTE PTR [edx+esi], al
    test eax, eax
    jnz @B

  @@:

    pop esi
    ret 8

StrCpy2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

StrCpy3 proc src:DWORD,dst:DWORD

    mov ecx, [esp+4]    ; src
    mov edx, [esp+8]    ; dst
    push esi
    mov esi, -1

  @@:
    add esi, 1
    movzx eax, WORD PTR [ecx+esi]
    mov BYTE PTR [edx+esi], al
    or   al, al
    jz @F

    add esi, 1
    ;movzx eax, BYTE PTR [ecx+esi]
    mov BYTE PTR [edx+esi], ah
    or  ah, ah
    jnz @B

  @@:

    pop esi
    ret 8

StrCpy3 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

RuiLoureiro

  • Member
  • ****
  • Posts: 819
Re: Libraries vs Win API
« Reply #8 on: May 09, 2013, 04:41:43 AM »
Here another optimisation

Quote
1015 Original
829 Modified
750 Modified again
671 Modified again 2
954 Original
828 Modified
750 Modified again
672 Modified again 2
953 Original
812 Modified
766 Modified again
703 Modified again 2
937 Original
829 Modified
750 Modified again
703 Modified again 2
937 Original
813 Modified
750 Modified again
672 Modified again 2
937 Original
828 Modified
750 Modified again
656 Modified again 2
938 Original
812 Modified
750 Modified again
656 Modified again 2
969 Original
813 Modified
765 Modified again
656 Modified again 2
Press any key to continue ...

Code: [Select]
IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    StrCpy4 PROTO src:DWORD,dst:DWORD
    StrCpy3 PROTO src:DWORD,dst:DWORD
    StrCpy2 PROTO src:DWORD,dst:DWORD
    strCopy PROTO pDest:DWORD, pSource:DWORD

    .data
    align 4
      item db "The game is done, I've won I've won quote she and whistled thrice",0
    align 4
      buff db "                                                                     "

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    push esi

    invoke StrCpy3, ADDR item,ADDR buff
    print   addr buff,13,10
    invoke StrCpy4, ADDR item,ADDR buff
    print   addr buff,13,10


  REPEAT 8

    invoke GetTickCount
    push eax

    mov esi, 10000000
  @@:
    invoke strCopy, ADDR buff,ADDR item
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," Original",13,10
;----------------------------------------
    invoke GetTickCount
    push eax
    mov esi, 10000000
  @@:
    invoke StrCpy2, ADDR item,ADDR buff
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," Modified",13,10

;----------------------------------------
    invoke GetTickCount
    push eax
    mov esi, 10000000
  @@:
    invoke StrCpy3, ADDR item,ADDR buff
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," Modified again",13,10

;----------------------------------------
    invoke GetTickCount
    push eax
    mov esi, 10000000
  @@:
    invoke StrCpy4, ADDR item,ADDR buff
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print ustr$(eax)," Modified again 2",13,10

  ENDM

    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

strCopy proc pDest:DWORD, pSource:DWORD
   mov ecx, pSource   
    mov edx, pDest     
    copyLoop:
       mov al, byte ptr [ecx]   
       inc ecx         
       mov byte ptr [edx],al
       inc edx         
       cmp al, 0     
       jnz copyLoop
      mov al, byte ptr 0
   ret
strCopy endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

StrCpy2 proc src:DWORD,dst:DWORD

    mov ecx, [esp+4]    ; src
    mov edx, [esp+8]    ; dst
    push esi
    mov esi, -1

  @@:
    add esi, 1
    movzx eax, BYTE PTR [ecx+esi]
    mov BYTE PTR [edx+esi], al
    test eax, eax
    jz @F

    add esi, 1
    movzx eax, BYTE PTR [ecx+esi]
    mov BYTE PTR [edx+esi], al
    test eax, eax
    jnz @B

  @@:

    pop esi
    ret 8

StrCpy2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

StrCpy3 proc src:DWORD,dst:DWORD

    mov ecx, [esp+4]    ; src
    mov edx, [esp+8]    ; dst
    push esi
    mov esi, -1

  @@:
    add esi, 1
    movzx eax, WORD PTR [ecx+esi]
    mov BYTE PTR [edx+esi], al
    or   al, al
    jz @F

    add esi, 1
    ;movzx eax, BYTE PTR [ecx+esi]
    mov BYTE PTR [edx+esi], ah
    or  ah, ah
    jnz @B

  @@:

    pop esi
    ret 8

StrCpy3 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

StrCpy4 proc src:DWORD,dst:DWORD
    mov ecx, [esp+4]    ; src
    mov edx, [esp+8]    ; dst
    push esi
    xor     esi, esi
  @@:
    movzx eax, WORD PTR [ecx+esi]
    mov BYTE PTR [edx+esi], al
    or   al, al
    jz @F

    mov BYTE PTR [edx+esi+1], ah
    add esi, 2

    or  ah, ah
    jnz @B

  @@:

    pop esi
    ret 8

StrCpy4 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

StrCpy5 proc src:DWORD,dst:DWORD

    mov ecx, [esp+4]    ; src
    mov edx, [esp+8]    ; dst
    push esi
    mov esi, -1

  @@:
    add esi, 1
    mov eax, [ecx+esi]
    mov BYTE PTR [edx+esi], al
    or   al, al
    jz   @F

    add esi, 1
    mov BYTE PTR [edx+esi], ah
    or  ah, ah
    jz   @F

    shr eax, 16
    add esi, 1
    mov BYTE PTR [edx+esi], al
    or  al, al
    jz  @F

    add esi, 1
    mov BYTE PTR [edx+esi], ah
    or  ah, ah
    jnz @B

  @@:

    pop esi
    ret 8

StrCpy5 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

jj2007

  • Member
  • *****
  • Posts: 10547
  • Assembler is fun ;-)
    • MasmBasic
Re: Libraries vs Win API
« Reply #9 on: May 09, 2013, 06:27:50 AM »
Celeron M:
1016 Original
891 Modified
890 Modified again
891 Modified again 2
890 Modified again J
672 MasmBasic copy


"Modified again J" is a dword load plus bswap:
  @@:
    inc esi
    mov eax, [ecx+esi]
    mov BYTE PTR [edx+esi], al
    test al, al
    jz @F

    inc esi
    mov BYTE PTR [edx+esi], ah
    test ah, ah
    jz @F
   
    bswap eax
    inc esi
    mov BYTE PTR [edx+esi], ah
    test ah, ah
    jz @F

    inc esi
    mov BYTE PTR [edx+esi], al
    test al, al
    jnz @B


test al, al instead of or al, al is a lot faster on my CPU.

RuiLoureiro

  • Member
  • ****
  • Posts: 819
Re: Libraries vs Win API
« Reply #10 on: May 09, 2013, 07:02:13 AM »
Jochen,
             or is faster on my P4
             i tried StrCpy5 but not faster ! But you found out bswap ! ;)

Manos

  • Guest
Re: Libraries vs Win API
« Reply #11 on: May 09, 2013, 07:50:51 AM »
I tested again Hutch's new function, crt function and API.

.data

szText            db "abcdefghijklmnopqrstuvwxyz123456789", 0
szBuffer         db 0 dup (64)

includelib \MSVCRT.LIB

StrCpy2 PROTO src:DWORD,dst:DWORD
strcpy PROTO C :DWORD, :DWORD

Results:
lstrcpy   (API)         563
StrCpy2 (Hutch)    297
strcpy     (crt)          172


Manos.




qWord

  • Member
  • *****
  • Posts: 1473
  • The base type of a type is the type itself
    • SmplMath macros
Re: Libraries vs Win API
« Reply #12 on: May 09, 2013, 08:34:40 AM »
I'm curious if one is interested in an more statistical approach of measurement, whereas the timing is taken for a single call to the corresponding code and several milliseconds are waited between the calls to make sure that the cache is not involved as it is for the loop-x-thousands-times method? Especially for memory expensive functions like MemCopy or table bases methods this might be an better approach for measure the speed...
MREAL macros - when you need floating point arithmetic while assembling!

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7542
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Libraries vs Win API
« Reply #13 on: May 09, 2013, 11:56:15 AM »
qWord is right here, stabilise the timings with a 100 ms delay gives a more realistic result, even though it does not change much. I did a quick play with the unroll rate and found 3 was very slightly faster where 4 and higher made no difference. Changing the proc alignment slowed it down and aligning the first label also slowed it down.

My own interest in these simple byte copy routines is how fast they are the first time, a factor I call "attack" over streamed tests as it is not uncommon to perform this capacity in the middle of a much more complex algorithm where the call overhead is a problem. Now while a 4 byte copy will usually be faster, it runs into alignment problems with string data which you cannot garrantee as 4 byte aligned where the simple byte level copy is insensitive to alignment.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

Manos

  • Guest
Re: Libraries vs Win API
« Reply #14 on: May 09, 2013, 06:36:36 PM »
I'm curious if one is interested in an more statistical approach of measurement, whereas the timing is taken for a single call to the corresponding code and several milliseconds are waited between the calls to make sure that the cache is not involved as it is for the loop-x-thousands-times method?
If my poor English not mislead me, you means that I have done one only measurement for each case.
I inform you that I know to do measurements.
I am a physicist and the first thing that I taught in University is that we must do too many measurements and to take the average.
If look my first post in this thread, you 'll see that I refer in average.
If I write here my measurements, I 'll spend two pages.

I have done new test with a double lenght string.
Here are the results:

szText   db "abcdefghijklmnopqrstuvwxyz123456789abcdefghijklmnopqrstuvwxyz123456789", 0
szBuffer         db 0 dup (128)

lstrcpy  (API)         883
StrCpy2  (Hutch)  383
strcpy   (crt)           330


Manos.