Author Topic: Slight modification to StrLen in the masm32 library - 9% speed increase  (Read 13116 times)

zedd151

  • Member
  • ****
  • Posts: 871
Yeah, I know strlen again.

I just wanted to try my hand at a simple optimization. I have been trying to learn new things.

been tinkering with some optimizations, reading some of Agners material.


Code: [Select]

; **********************************************************************
;                     Modified version of StrLen
; **********************************************************************

    ; changed from align 4         <--------------------------<
    align 16
    OPTION PROLOGUE:NONE
    OPTION EPILOGUE:NONE
   
    StrLenz proc item:DWORD
        mov eax, [esp+4]            ; get pointer to string
        lea edx, [eax+3]            ; pointer+3 used in the end
        push ebp
        push edi
        mov ebp, 80808080h
    align 16 ; added align 16 here  <--------------------------<
    @@: 
   
    ; changed from repeat 3         <--------------------------<
   
    REPEAT 7
        mov edi, [eax]              ; read first 4 bytes
        add eax, 4                  ; increment pointer
        lea ecx, [edi-01010101h]    ; subtract 1 from each byte
        not edi                     ; invert all bytes
        and ecx, edi                ; and these two
        and ecx, ebp
        jnz nxt
    ENDM
        mov edi, [eax]              ; read first 4 bytes
       
        ;changed from add eax, 4    <--------------------------<
       
        add eax, 8                  ; 8 increment DWORD pointer
        lea ecx, [edi-01010101h]    ; subtract 1 from each byte
        not edi                     ; invert all bytes
        and ecx, edi                ; and these two
        and ecx, ebp
        jz  @B                      ; no zero bytes, continue loop
    nxt:
        test ecx, 00008080h         ; test first two bytes
        jnz @F
        shr ecx, 16                 ; not in the first 2 bytes
        add eax, 2
    @@:
        shl cl, 1                   ; use carry flag to avoid branch
        sbb eax, edx                ; compute length
        pop edi
        pop ebp
        ret 4
    StrLenz endp
   
    OPTION PROLOGUE:PrologueDef
    OPTION EPILOGUE:EpilogueDef

; **********************************************************************
;                     end Modified version of StrLen
; **********************************************************************


Unrolling further didn't seem to offer much of an advantage.

Here are my results:

Code: [Select]

Genuine Intel(R) CPU           T2060  @ 1.60GHz (SSE3)
10243 cycles - StrLen
10271 cycles - StrLen
10253 cycles - StrLen
10279 cycles - StrLen
10248 cycles - StrLen
10277 cycles - StrLen
10262 cycles - StrLen
10330 cycles - StrLen
10242 cycles - StrLen
10215 cycles - StrLen

9305 cycles - StrLen modified
9335 cycles - StrLen modified
9335 cycles - StrLen modified
9308 cycles - StrLen modified
9336 cycles - StrLen modified
9345 cycles - StrLen modified
9312 cycles - StrLen modified
9345 cycles - StrLen modified
9348 cycles - StrLen modified
9305 cycles - StrLen modified

--- ok ---
« Last Edit: July 20, 2018, 04:04:44 AM by zedd151 »
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #1 on: September 13, 2015, 05:08:04 AM »
prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
5103 cycles - StrLen
5120 cycles - StrLen
5637 cycles - StrLen
5105 cycles - StrLen
5105 cycles - StrLen
5105 cycles - StrLen
5109 cycles - StrLen
5099 cycles - StrLen
5125 cycles - StrLen
5122 cycles - StrLen

4542 cycles - StrLen modified
4542 cycles - StrLen modified
4545 cycles - StrLen modified
4535 cycles - StrLen modified
4938 cycles - StrLen modified
4907 cycles - StrLen modified
4546 cycles - StrLen modified
4552 cycles - StrLen modified
4553 cycles - StrLen modified
4536 cycles - StrLen modified

i think we've done this before   :biggrin:
but - it's ok that you want to play with speed tests, etc

my favorite was always the (32-bit code) 64-bit binary to ASCII string routines (signed or unsigned)

zedd151

  • Member
  • ****
  • Posts: 871
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #2 on: September 13, 2015, 05:59:39 AM »
Oh yeah, I have seen the ones here and the old forum, just checking out how to optimize.

By the way, I changed the testbed in this post I removed Sleep, and replaced it with a countdown timer.\

snippet of changed testbed

Code: [Select]
        align 16
        repeat rct
        mov eax, 1000000  ; replaced sleep with this simple register based countdown timer
        @@:
        dec eax
        jnz @b
        nops 32                   
;        invoke Sleep, 500   ; <----- removed the Sleep api, to reduce external calls

        counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
        invoke StrLen, testtext
        counter_end
        print str$(eax), 20h, "cycles - StrLen", 13, 10
        endm
        print chr$(13, 10)

The full testbed is attached.

Code: [Select]
Genuine Intel(R) CPU           T2060  @ 1.60GHz (SSE3)
6484 cycles - StrLen
5277 cycles - StrLen
5279 cycles - StrLen
5311 cycles - StrLen
5275 cycles - StrLen
5284 cycles - StrLen
5276 cycles - StrLen
5274 cycles - StrLen
5283 cycles - StrLen
5284 cycles - StrLen

4793 cycles - StrLen modified
4806 cycles - StrLen modified
4800 cycles - StrLen modified
4798 cycles - StrLen modified
4796 cycles - StrLen modified
4799 cycles - StrLen modified
4794 cycles - StrLen modified
4794 cycles - StrLen modified
4797 cycles - StrLen modified
4797 cycles - StrLen modified

--- ok ---

In the first test something was way off doubling the clock counts - I believe this test to be more or less accurate.

Heres the new testbed using eax, instead of Sleep...

edit = add snippet
« Last Edit: July 20, 2018, 04:05:08 AM by zedd151 »
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #3 on: September 13, 2015, 10:49:17 AM »
prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
5523 cycles - StrLen
5342 cycles - StrLen
5316 cycles - StrLen
5327 cycles - StrLen
5356 cycles - StrLen
5358 cycles - StrLen
6238 cycles - StrLen
5387 cycles - StrLen
5339 cycles - StrLen
5256 cycles - StrLen

4835 cycles - StrLen modified
4676 cycles - StrLen modified
4664 cycles - StrLen modified
4696 cycles - StrLen modified
4821 cycles - StrLen modified
4683 cycles - StrLen modified
4735 cycles - StrLen modified
4817 cycles - StrLen modified
4667 cycles - StrLen modified
4763 cycles - StrLen modified

zedd151

  • Member
  • ****
  • Posts: 871
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #4 on: September 13, 2015, 02:22:29 PM »
OOppss!

BIG flaw in my logic!

Code: [Select]
        ;changed from add eax, 4    <--------------------------<
        add eax, 8                  ; 8 increment DWORD pointer

Should NOT have changed this bit, changing it back to 'add eax, 4'
results in only a 1% gain. Too modest a gain to be worth bothering for.

so Nevermind.

I'll go crawl back under my rock.

Well this is odd:

Code: [Select]

Genuine Intel(R) CPU           T2060  @ 1.60GHz (SSE3)
773683 cycles - StrLen
768370 cycles - StrLen
766730 cycles - StrLen
765312 cycles - StrLen
764213 cycles - StrLen
770029 cycles - StrLen
772046 cycles - StrLen
774694 cycles - StrLen
774080 cycles - StrLen
786441 cycles - StrLen

849788 length - windows.inc - from StrLen orig

784046 cycles - StrLen modified
810693 cycles - StrLen modified
794416 cycles - StrLen modified
816771 cycles - StrLen modified
804008 cycles - StrLen modified
855046 cycles - StrLen modified
770992 cycles - StrLen modified
795870 cycles - StrLen modified
782002 cycles - StrLen modified
782005 cycles - StrLen modified

849788 length - windows.inc - from StrLenz mod
--- ok ---

Looking at the code, it should NOT be add, eax 8

But the result from actually running the modified version DOES give the same
result as the original. But it is also SLOWER.

I'll go back to my plugin thread now. :(

edit = add result from running the modded version of StrLen & last results
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #5 on: September 13, 2015, 03:39:51 PM »
before you run tests, some verification that it works is a good idea   :P

to get a good test, you need to compare algos with different string lengths
personally, i rarely use StrLen for strings longer than, say, 1 Kb
but, the algo should work for any practical length

for really long strings, i take care to keep track of the length, rather than just measure it
that doesn't always apply - but, whenever possible

zedd151

  • Member
  • ****
  • Posts: 871
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #6 on: September 13, 2015, 04:37:39 PM »
Code: [Select]
before you run tests, some verification that it works is a good idea  :dazzled:


I'm such a noob.  ::)

While I was in pursuit of a more reliable testbed, I came across that and said hmmm, "I think I'm onto something"

Oh, well. Live and learn. If you don't forget anyway.

Any thoughts about my replacing the call to Sleep, with a register based timer?
The idea was to limit unnecessary external calls if possible.
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #7 on: September 13, 2015, 05:21:29 PM »
honestly, i didn't look at the code

but, i can tell you that this horse has been pretty well beaten - lol
Michael beat it pretty well when writing the timing macros
then, Jochen and I (and others, including Hutch) beat it some more
i think that horse is dead

jj2007

  • Member
  • *****
  • Posts: 10543
  • Assembler is fun ;-)
    • MasmBasic
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #8 on: September 13, 2015, 05:54:53 PM »
i think that horse is dead

Its offspring is pretty much alive, though :biggrin:

I've included MasmBasic Len() into your testbed. The cycle counts do not look convincing, though. Reminds me of the times when we tried to tame the P4... ::)

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
3483 cycles - StrLen
4547 cycles - StrLen
12083 cycles - StrLen
12028 cycles - StrLen
9701 cycles - StrLen
9695 cycles - StrLen
12040 cycles - StrLen
9697 cycles - StrLen
12042 cycles - StrLen
9697 cycles - StrLen

2610 cycles - MB Len()
2530 cycles - MB Len()
2614 cycles - MB Len()
2530 cycles - MB Len()
2611 cycles - MB Len()
2531 cycles - MB Len()
2609 cycles - MB Len()
2546 cycles - MB Len()
2610 cycles - MB Len()
2531 cycles - MB Len()

9110 cycles - StrLen modified
9039 cycles - StrLen modified
6752 cycles - StrLen modified
9047 cycles - StrLen modified
9067 cycles - StrLen modified
11005 cycles - StrLen modified
11086 cycles - StrLen modified
9060 cycles - StrLen modified
9084 cycles - StrLen modified
9053 cycles - StrLen modified

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7537
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #9 on: September 13, 2015, 07:01:12 PM »
 :biggrin:

You know how it is folks, algo tweaking is character building.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 10543
  • Assembler is fun ;-)
    • MasmBasic
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #10 on: September 13, 2015, 07:39:31 PM »
Here is a more stable version.

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
results:
6919    bytes for StrLen
6919    bytes for StrLenz
6919    bytes for Len()

3588 cycles - StrLen
3302 cycles - StrLen
3297 cycles - StrLen
3309 cycles - StrLen
3281 cycles - StrLen
3267 cycles - StrLen
3255 cycles - StrLen
3263 cycles - StrLen
3416 cycles - StrLen
3253 cycles - StrLen

650 cycles - MB Len()
656 cycles - MB Len()
653 cycles - MB Len()
661 cycles - MB Len()
655 cycles - MB Len()
667 cycles - MB Len()
653 cycles - MB Len()
653 cycles - MB Len()
655 cycles - MB Len()
653 cycles - MB Len()

3092 cycles - StrLen modified
3064 cycles - StrLen modified
3075 cycles - StrLen modified
3052 cycles - StrLen modified
3058 cycles - StrLen modified
3053 cycles - StrLen modified
3057 cycles - StrLen modified
3063 cycles - StrLen modified
3069 cycles - StrLen modified
3051 cycles - StrLen modified

zedd151

  • Member
  • ****
  • Posts: 871
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #11 on: September 13, 2015, 08:28:08 PM »
Here is a more stable version, but only stable on small algos.

timer results displayed in microseconds :shock:

Stability is a big problem, I know. I have read through much of the material
when Michael was developing the timer/counter macros on the old forum.

I started thinking the stability problem was only related to my aging laptop.
It is good to know that I have a lot of company regarding the stability issue.

Anyway, here is the latest...

Code: [Select]
    ; counter/timer testbed for small algos


    .nolist
        include \masm32\include\masm32rt.inc
    .686
    .XMM
    .MMX
        include \masm32\macros\timers.asm
        LoadFilex       proto :dword    ; function to Load and Read Text File

    .data
        align 16                        ; align 16
        testtext        dd 0            ; lpInputTextFile
       

        ; CTRX_COUNT  - larger values for smaller algos
                      ; inversly proportional to algo size
                     
        ; TIMER_COUNT - 1,000 for milliseconds (large algo)
                      ; 1,000,000 for microseconds (small algo)
       
        CTRX_COUNT  = 1000000           ; used internally by begin_counter macro
                             
        TIMER_COUNT = 1000000           ; set to display microseconds

        SLEEP_TIME  = 1000              ; # times the reg counter loops - 'spinup' time
       
        loopctr    dd 10                ; how many times the counter is run and displayed
        looptmr    dd 10                ; how many times the timer is run and displayed

        ;----------------------------------------------------------------------
        ;------------------------- macros for cleaner code --------------------
        ;----------------------------------------------------------------------
       
        display_cpu macro
            push 1                     
            call ShowCpu                ; to display cpu information
        endm       

        z_init macro zfile              ; initialization macro
            invoke GetCurrentProcess
            invoke SetProcessAffinityMask,eax,1
        endm
       
        zcount_begin macro timey        ; replace 'Sleep', begin counter
            mov eax, timey
            @@:
            dec eax
            jnz @b
            counter_begin CTRX_COUNT, HIGH_PRIORITY_CLASS
        endm
       
        zcount_end macro ztext          ; end counter, display results
            align 16
            counter_end
            print str$(eax), ztext, 13, 10
        endm

        ztimer_begin macro timey        ; replace 'Sleep', begin timer
            mov eax, timey
            @@:
            dec eax
            jnz @b
            timer_begin TIMER_COUNT, HIGH_PRIORITY_CLASS
        endm
       
        ztimer_end macro ztext          ; end timer, display results
            timer_end
            print str$(eax), ztext, 13, 10
        endm

        print_result macro text2        ; for string functions, display string length
            print str$(text2), " - bytes length",13, 10, 13, 10
        endm

    .code
   
    start:
        z_init                          ; initialize
        fn LoadFilex, "testbed.txt"     ; load test text file
        mov testtext, esi
       
        display_cpu                     ; display cpu information
    ; --------------------- test 1 ------------------------
        align 16                        ; better than align 4 or align 8 in many instances
        topt1:                          ; loop top, for how many test will run
        zcount_begin SLEEP_TIME         ; begin counter, after running reg timer
           
        ; user code here for testing    ------------------------------------------------------
        ; user code here for testing    ------------------------------------------------------
        invoke StrLen, testtext         ; get text length
        ; user code here for testing    ------------------------------------------------------
        ; user code here for testing    ------------------------------------------------------
           
           
        zcount_end " cycles - <test name>"   ; end counter, display results
        dec loopctr   
        cmp loopctr, 0                  ; compare to see if all tests were run
        jnz topt1                       ; else, jmp to top of loop
       
        invoke StrLen, testtext         ; get text length
        print_result eax                ; print results
    ; --------------------- test 2 ------------------------
        align 16                        ; a tad better than align 4 or align 8 in many instances
        topt2:                          ; loop top, for how many test will run
            ztimer_begin SLEEP_TIME     ; begin counter, after running reg timer
           
        ; user code here for testing    ------------------------------------------------------
        ; user code here for testing    ------------------------------------------------------
        invoke StrLen, testtext         ; get text length
        ; user code here for testing    ------------------------------------------------------
        ; user code here for testing    ------------------------------------------------------
           
            ztimer_end " us - <test name>"
        dec looptmr   
        cmp looptmr, 0                  ; compare to see if all tests were run
        jnz topt2                       ; else, jmp to top of loop
       
        invoke StrLen, testtext         ; get text length
        print_result eax                ; print results
    ; -------------------- end tests ----------------------
   
    invoke GlobalFree, testtext         ; free the memory allocated by LoadFilex
    inkey chr$("-- done! --", 13, 10)   ; display finished message

    exit
   

    ShowCpu proc                        ; original ShowCpu function (I don't know who authored it)
        pushad
        sub esp, 80
        mov edi, esp
        xor ebp, ebp
        .Repeat
            lea eax, [ebp+80000002h]
            db 0Fh, 0A2h
            stosd
            mov eax, ebx
            stosd
            mov eax, ecx
            stosd
            mov eax, edx
            stosd
            inc ebp
        .Until ebp>=3
        push 1
        pop eax
        db 0Fh, 0A2h
        xor ebx, ebx
        xor esi, esi
        bt edx, 25
        adc ebx, esi
        bt edx, 26
        adc ebx, esi
        bt ecx, esi
        adc ebx, esi
        bt ecx, 9
        adc ebx, esi
        dec dword ptr [esp+4+32+80]
        .if Zero?
            mov edi, esp
            .Repeat
            .Break .if byte ptr [edi]!=32
                inc edi
            .Until 0
            .if byte ptr [edi]<32
                print chr$("pre-P4")
            .else
                print edi
            .endif
            .if ebx
                print chr$(32, 40, "SSE")
                print str$(ebx), 41, 13, 10
            .endif
        .endif
        add esp, 80
        mov [esp+32-4], ebx
        ifdef MbBufferInit
        call MbBufferInit
        endif
        popad
        ret 4
    ShowCpu endp
   
    LoadFilex proc lpName:dword             ; my own proc to load and read file into memory
    local hFile :dword, fl :dword, bRead :dword, hMem$ :dword
        invoke CreateFile, lpName, 80000000h, 0, 0, 3, 80h, 0
        mov hFile, eax
        invoke GetFileSize, hFile, 0
        inc eax                             ; to ensure the file is zero terminated
        mov fl, eax
        invoke GlobalAlloc, GPTR, fl
        mov hMem$, eax
        invoke ReadFile, hFile, hMem$, fl, addr bRead, 0
        invoke CloseHandle, hFile
        mov esi, hMem$                      ; returns ptr to memory in esi
        mov ecx, fl                         ; and filesize in ecx
        ret
    LoadFilex endp

   
    end start

For smaller algos, that take little time to run, it seems pretty stable.
but the problems with stability increase with algo size/time.

edit = forgot to add the latest results

Code: [Select]
Genuine Intel(R) CPU           T2060  @ 1.60GHz (SSE3)
1185 cycles - <test name>
1081 cycles - <test name>
1078 cycles - <test name>
1072 cycles - <test name>
1072 cycles - <test name>
1073 cycles - <test name>
1072 cycles - <test name>
1073 cycles - <test name>
1072 cycles - <test name>
1072 cycles - <test name>
1200 - bytes length

644 us - <test name>
644 us - <test name>
644 us - <test name>
644 us - <test name>
644 us - <test name>
644 us - <test name>
644 us - <test name>
645 us - <test name>
643 us - <test name>
643 us - <test name>
1200 - bytes length

-- done! --
« Last Edit: July 20, 2018, 04:06:01 AM by zedd151 »
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

zedd151

  • Member
  • ****
  • Posts: 871
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #12 on: September 13, 2015, 08:41:05 PM »
@jj

See post #4 regarding the incorrectly changed 'add eax, 4'

After changing it back to add eax, 4 it is no much faster, some tests even showed it to be slower.

So scrap the original intent of this thread. And I believe I have a pretty stable testbed now, at least for
small algo testing.

NOW, I will proceed to  *try* to make it into a plugin. :D

edited for clarification.
I'm not always the sharpest knife in the drawer, but I have my moments.  :P

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7537
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #13 on: September 13, 2015, 09:06:17 PM »
This is still my favourite string length algo, not because its the fastest under isolated conditions but because you can inline plug it into any other algo that needs a string length.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    sslen PROTO :DWORD

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    mov edx, rv(sslen,"1234567890")

    print str$(edx),13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

sslen proc pstr:DWORD

    mov eax, [esp+4]
    sub eax, 1

  lp:
    add eax, 1
    cmp BYTE PTR [eax], 0
    jne lp

    sub eax, [esp+4]

    ret 4

sslen endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

zedd151

  • Member
  • ****
  • Posts: 871
Re: Slight modification to StrLen in the masm32 library - 9% speed increase
« Reply #14 on: September 13, 2015, 09:25:19 PM »
I had forgotten about the time slice issue, so to start with a new time slice,
I put 'Sleep' back into the mix(invoke Sleep, 1)'

results:
Code: [Select]
Genuine Intel(R) CPU           T2060  @ 1.60GHz (SSE3)
1078 cycles - <test name>
1073 cycles - <test name>
1077 cycles - <test name>
1072 cycles - <test name>
1073 cycles - <test name>
1072 cycles - <test name>
1071 cycles - <test name>
1072 cycles - <test name>
1072 cycles - <test name>
1071 cycles - <test name>
1200 - bytes length

646 µs - <test name>
644 µs - <test name>
645 µs - <test name>
645 µs - <test name>
650 µs - <test name>
644 µs - <test name>

I think I'll take it back out. Works fine without it for small aldos.

Hiya hutch. Yes I know there are faster string length algos. But thanks.

Just experimenting now, to get a stable counter/timer for my plugin project.

Test results with the current testbed for "sslen"

Code: [Select]
Genuine Intel(R) CPU           T2060  @ 1.60GHz (SSE3)
2439 cycles - <test name>
2428 cycles - <test name>
2428 cycles - <test name>
2426 cycles - <test name>
2428 cycles - <test name>
2428 cycles - <test name>
2428 cycles - <test name>
2426 cycles - <test name>
2428 cycles - <test name>
2427 cycles - <test name>
1200 - bytes length

1520 µs - <test name>
1522 µs - <test name>
1521 µs - <test name>
1522 µs - <test name>
1521 µs - <test name>
1520 µs - <test name>
1520 µs - <test name>
1522 µs - <test name>
1527 µs - <test name>
1522 µs - <test name>
1200 - bytes length
I'm not always the sharpest knife in the drawer, but I have my moments.  :P