News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

The fastest way to fill a dword array with string values

Started by frktons, December 09, 2012, 02:49:23 AM

Previous topic - Next topic

dedndave

you could add a little test to see if the CPU supports SSSE3
if it does not, skip that test and say "Frktons II Step requires SSSE3 support" and go on to the next test
    mov     eax,1
    cpuid
    test    ch,2
    jz      no_ssse3_support


note: .586 or higher required to assemble "cpuid" without hard-coding

frktons

Quote from: dedndave on December 10, 2012, 10:50:09 PM
you could add a little test to see if the CPU supports SSSE3
if it does not, skip that test and say "Frktons II Step requires SSSE3 support" and go on to the next test
    mov     eax,1
    cpuid
    test    ch,2
    jz      no_ssse3_support


note: .586 or higher required to assemble "cpuid" without hard-coding

Good idea my dear Master, I'm going to test it. Inter alia, one of my pc doesn't
have SSSE3 if I correctly remember.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

 :t
------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
------------------------------------------------------------------------
2270    cycles for Dedndave code - 5 GPRs
1940    cycles for Frktons I Step / 2 GPRs
1950    cycles for Frktons I Step / 4 GPRs
Frktons II Step requires a PC with SSSE3
1165    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2242    cycles for Dedndave code - 5 GPRs
1962    cycles for Frktons I Step / 2 GPRs
1949    cycles for Frktons I Step / 4 GPRs
Frktons II Step requires a PC with SSSE3
1176    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2248    cycles for Dedndave code - 5 GPRs
1939    cycles for Frktons I Step / 2 GPRs
1956    cycles for Frktons I Step / 4 GPRs
Frktons II Step requires a PC with SSSE3
1161    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2268    cycles for Dedndave code - 5 GPRs
1947    cycles for Frktons I Step / 2 GPRs
1987    cycles for Frktons I Step / 4 GPRs
Frktons II Step requires a PC with SSSE3
1191    cycles for Jochen / 5 XMM
------------------------------------------------------------------------

--- ok ---

dedndave

here's a little routine i wrote just for you, Frank   :biggrin:
CpuFeatures PROC

;call once during program initialization
;store the value returned in EAX (AL, actually) for feature verification
;
;0 = no extended features present
;1 = MMX
;2 = SSE
;3 = SSE2
;4 = SSE3
;5 = SSSE3
;6 = SSE4

        mov     eax,1
        cpuid
        bswap   edx          ;MMX -> bit 8, SSE1 -> bit 6, SSE2 -> bit 5
        xor     eax,eax
        test    dh,1         ;MMX
        jz      CpuF00

        inc     eax
        test    dl,40h       ;SSE1
        jz      CpuF00

        inc     eax
        test    dl,20h       ;SSE2
        jz      CpuF00

        inc     eax
        test    cl,1         ;SSE3
        jz      CpuF00

        inc     eax
        test    cl,2         ;SSSE3
        jz      CpuF00

        inc     eax
        test    ecx,80000h   ;SSE4
        jz      CpuF00

        inc     eax

CpuF00: ret

CpuFeatures ENDP


        .DATA?

bFeatures db ?

        .CODE

Start:  call    CpuFeatures
        mov     bFeatures,al
;
;
;


now, if you want to see if they have SSSE3....
        cmp     bFeatures,5
        jb      no_ssse3_support

frktons

Quote from: dedndave on December 10, 2012, 11:40:52 PM
here's a little routine i wrote just for you, Frank   :biggrin:
CpuFeatures PROC

;call once during program initialization
;store the value returned in EAX (AL, actually) for feature verification
;
;0 = no extended features present
;1 = MMX
;2 = SSE
;3 = SSE2
;4 = SSE3
;5 = SSSE3
;6 = SSE4

        mov     eax,1
        cpuid
        bswap   edx          ;MMX -> bit 8, SSE1 -> bit 6, SSE2 -> bit 5
        xor     eax,eax
        test    dh,1         ;MMX
        jz      CpuF00

        inc     eax
        test    dl,40h       ;SSE1
        jz      CpuF00

        inc     eax
        test    dl,20h       ;SSE2
        jz      CpuF00

        inc     eax
        test    cl,1         ;SSE3
        jz      CpuF00

        inc     eax
        test    cl,2         ;SSSE3
        jz      CpuF00

        inc     eax
        test    ecx,80000h   ;SSE4
        jz      CpuF00

        inc     eax

CpuF00: ret

CpuFeatures ENDP


        .DATA?

bFeatures db ?

        .CODE

Start:  call    CpuFeatures
        mov     bFeatures,al
;
;
;


now, if you want to see if they have SSSE3....
        cmp     bFeatures,5
        jb      no_ssse3_support


Thanks Dave, and here it is something I wrote for you  :biggrin::

Quote
----------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
----------------------------------------------------------------------
1915    cycles for Dedndave code - 5 GPRs
1837    cycles for Frktons I Step / 2 GPRs
1893    cycles for Frktons I Step / 4 GPRs
1318    cycles for Frktons II Step / 5 MMX
1365    cycles for Frktons II Step / 5 MMX without SSSE3
836     cycles for Jochen / 5 XMM
----------------------------------------------------------------------
1913    cycles for Dedndave code - 5 GPRs
1892    cycles for Frktons I Step / 2 GPRs
1893    cycles for Frktons I Step / 4 GPRs
1317    cycles for Frktons II Step / 5 MMX
1366    cycles for Frktons II Step / 5 MMX without SSSE3
836     cycles for Jochen / 5 XMM
----------------------------------------------------------------------
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

#66
------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
------------------------------------------------------------------------
2250    cycles for Dedndave code - 5 GPRs
1943    cycles for Frktons I Step / 2 GPRs
1960    cycles for Frktons I Step / 4 GPRs
Frktons II Step requires a PC with SSSE3
2148    cycles for Frktons II Step / 5 MMX without SSSE3
1161    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2242    cycles for Dedndave code - 5 GPRs
1952    cycles for Frktons I Step / 2 GPRs
1971    cycles for Frktons I Step / 4 GPRs
Frktons II Step requires a PC with SSSE3
2268    cycles for Frktons II Step / 5 MMX without SSSE3
1161    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2271    cycles for Dedndave code - 5 GPRs
1950    cycles for Frktons I Step / 2 GPRs
1965    cycles for Frktons I Step / 4 GPRs
Frktons II Step requires a PC with SSSE3
2149    cycles for Frktons II Step / 5 MMX without SSSE3
1166    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2240    cycles for Dedndave code - 5 GPRs
1941    cycles for Frktons I Step / 2 GPRs
1958    cycles for Frktons I Step / 4 GPRs
Frktons II Step requires a PC with SSSE3
2144    cycles for Frktons II Step / 5 MMX without SSSE3
1166    cycles for Jochen / 5 XMM
------------------------------------------------------------------------

frktons

Some improvements before the final release:
Quote
------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
------------------------------------------------------------------------
1925    cycles for Dedndave code - 5 GPRs
1882    cycles for Frktons I Step / 2 GPRs
1941    cycles for Frktons I Step / 4 GPRs
1896    cycles for Frktons I Step / 4 GPRs - no external tab
1096    cycles for Frktons II Step / 5 MMX with SSSE3
1206    cycles for Frktons II Step / 5 MMX without SSSE3
836     cycles for Jochen / 5 XMM
------------------------------------------------------------------------
1917    cycles for Dedndave code - 5 GPRs
1882    cycles for Frktons I Step / 2 GPRs
1917    cycles for Frktons I Step / 4 GPRs
1893    cycles for Frktons I Step / 4 GPRs - no external tab
1091    cycles for Frktons II Step / 5 MMX with SSSE3
1206    cycles for Frktons II Step / 5 MMX without SSSE3
836     cycles for Jochen / 5 XMM
------------------------------------------------------------------------

--- ok ---

Almost close to the target.  :t
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

#68
prescott w/htt
------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
------------------------------------------------------------------------
2379    cycles for Dedndave code - 5 GPRs
1947    cycles for Frktons I Step / 2 GPRs
1961    cycles for Frktons I Step / 4 GPRs
1986    cycles for Frktons I Step / 4 GPRs - no external tab
Frktons II Step requires a PC with SSSE3
2460    cycles for Frktons II Step / 5 MMX without SSSE3
1152    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2251    cycles for Dedndave code - 5 GPRs
1966    cycles for Frktons I Step / 2 GPRs
1961    cycles for Frktons I Step / 4 GPRs
1978    cycles for Frktons I Step / 4 GPRs - no external tab
Frktons II Step requires a PC with SSSE3
2456    cycles for Frktons II Step / 5 MMX without SSSE3
1151    cycles for Jochen / 5 XMM
------------------------------------------------------------------------

MichaelW

P4 Northwood:

------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2
------------------------------------------------------------------------
2275    cycles for Dedndave code - 5 GPRs
1979    cycles for Frktons I Step / 2 GPRs
1996    cycles for Frktons I Step / 4 GPRs
2034    cycles for Frktons I Step / 4 GPRs - no external tab
Frktons II Step requires a PC with SSSE3
2970    cycles for Frktons II Step / 5 MMX without SSSE3
896     cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2330    cycles for Dedndave code - 5 GPRs
1970    cycles for Frktons I Step / 2 GPRs
1997    cycles for Frktons I Step / 4 GPRs
2734    cycles for Frktons I Step / 4 GPRs - no external tab
Frktons II Step requires a PC with SSSE3
2937    cycles for Frktons II Step / 5 MMX without SSSE3
905     cycles for Jochen / 5 XMM
------------------------------------------------------------------------
Well Microsoft, here's another nice mess you've gotten us into.

frktons

It looks like the MMX tech is a bit slow on pre-Core Duo CPUs.

We'll see the XMM one in action next.  :P
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

I've almost finished the study. Here it is the fastest code
so far I could imagine, but still room to optimize.
Quote
------------------------------------------------------------------------
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
------------------------------------------------------------------------
1950    cycles for Dedndave code - 5 GPRs
1924    cycles for Frktons I Step / 2 GPRs
1871    cycles for Frktons I Step / 4 GPRs
1888    cycles for Frktons I Step / 4 GPRs - no external tab
1079    cycles for Frktons II Step / 5 MMX with SSSE3
1199    cycles for Frktons II Step / 5 MMX without SSSE3
801     cycles for Frktons III Step / XMM/MMX with SSE2
831     cycles for Jochen / 5 XMM
------------------------------------------------------------------------
1916    cycles for Dedndave code - 5 GPRs
1930    cycles for Frktons I Step / 2 GPRs
1872    cycles for Frktons I Step / 4 GPRs
1915    cycles for Frktons I Step / 4 GPRs - no external tab
1083    cycles for Frktons II Step / 5 MMX with SSSE3
1209    cycles for Frktons II Step / 5 MMX without SSSE3
796     cycles for Frktons III Step / XMM/MMX with SSE2
831     cycles for Jochen / 5 XMM
------------------------------------------------------------------------

--- ok ---
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

prescott w/htt
------------------------------------------------------------------------
Intel(R) Pentium(R) 4 CPU 3.00GHz

Instructions: MMX, SSE1, SSE2, SSE3
------------------------------------------------------------------------
2266    cycles for Dedndave code - 5 GPRs
1976    cycles for Frktons I Step / 2 GPRs
1994    cycles for Frktons I Step / 4 GPRs
2011    cycles for Frktons I Step / 4 GPRs - no external tab
Frktons II Step requires a PC with SSSE3
2459    cycles for Frktons II Step / 5 MMX without SSSE3
1141    cycles for Frktons III Step / XMM/MMX with SSE2
1183    cycles for Jochen / 5 XMM
------------------------------------------------------------------------
2257    cycles for Dedndave code - 5 GPRs
1983    cycles for Frktons I Step / 2 GPRs
1973    cycles for Frktons I Step / 4 GPRs
1996    cycles for Frktons I Step / 4 GPRs - no external tab
Frktons II Step requires a PC with SSSE3
2462    cycles for Frktons II Step / 5 MMX without SSSE3
1143    cycles for Frktons III Step / XMM/MMX with SSE2
1184    cycles for Jochen / 5 XMM
------------------------------------------------------------------------

sinsi


------------------------------------------------------------
AMD Phenom(tm) II X6 1100T Processor

Instructions: MMX, SSE1, SSE2, SSE3
------------------------------------------------------------
948     cycles for Dedndave code - 5 GPRs
813     cycles for Frktons I Step / 2 GPRs
838     cycles for Frktons I Step / 4 GPRs
995     cycles for Frktons I Step / 4 GPRs - no external tab
Frktons II Step requires a PC with SSSE3
1089    cycles for Frktons II Step / 5 MMX without SSSE3
747     cycles for Frktons III Step / XMM/MMX with SSE2
666     cycles for Jochen / 5 XMM
------------------------------------------------------------
950     cycles for Dedndave code - 5 GPRs
823     cycles for Frktons I Step / 2 GPRs
845     cycles for Frktons I Step / 4 GPRs
996     cycles for Frktons I Step / 4 GPRs - no external tab
Frktons II Step requires a PC with SSSE3
1084    cycles for Frktons II Step / 5 MMX without SSSE3
749     cycles for Frktons III Step / XMM/MMX with SSE2
665     cycles for Jochen / 5 XMM
------------------------------------------------------------

--- ok ---

jj2007

Frktons I Step / 2 GPRs and Frktons I Step / 4 GPRs are remarkably fast but you should have a look at the output.