News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords

Started by frktons, November 25, 2012, 02:48:06 AM

Previous topic - Next topic

frktons

Quote from: nidud on November 27, 2012, 11:12:38 AM
Seems to be possible to compare the low 8 bytes:
COMISD dest,source

The destination operand is an XMM register.
The source can be either an XMM register or a memory location.

The flags are set according to the following rules:
Result Flags  Values
Unordered ZF,PF,CF  111
Greater than ZF,PF,CF  000
Less than ZF,PF,CF  001
Equal ZF,PF,CF  100


Maybe it's possible to shift (or rotate) the regs and then compare the high 8 bytes?

Probably there are many ways to do it in more than 1 step.
I'm trying to find a single SIMD instruction, like PTEST, for the task
included in level SSE3.
Some more checking and I'll see.

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

Quote from: frktons on November 27, 2012, 09:37:57 AM
Let's assume I use:


   PCMPEQD xmm0,xmm1


considering this and the others don't affect the flags,
how do I jmp somewhere after the test?
If they are equal or not, what tells me that?

psubd xmm0, xmm1
pmovmskb eax, xmm0   ; set byte mask in eax
test eax, eax

frktons

Quote from: jj2007 on November 27, 2012, 11:27:06 AM
Quote from: frktons on November 27, 2012, 09:37:57 AM
Let's assume I use:


   PCMPEQD xmm0,xmm1


considering this and the others don't affect the flags,
how do I jmp somewhere after the test?
If they are equal or not, what tells me that?

psubd xmm0, xmm1
pmovmskb eax, xmm0   ; set byte mask in eax
test eax, eax

Thanks Jochen, I'll arrange a new algo to test with your
suggestion.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

I wrote a new CheckDestX PROC to use Jochen suggestion:

; -----------------------------------------------------------------------------------------------
CheckDestX proc

    lea eax, Dest
    mov ebx, 32323232h
   
    mov ecx, (4096/16)

    movd xmm0, ebx
    pshufd xmm0, xmm0, 0

@@:

    movdqa xmm1, [eax]

    psubd xmm1, xmm0
    pmovmskb edx, xmm1   ; set byte mask in edx
    test edx, edx   

    jne CheckErr
   
       
    add eax, 16
    dec ecx
    jnz @B

CheckOK:

    lea eax, Check
    movq mm0, qword ptr TestOK
    movq qword ptr [eax], mm0
    jmp  EndCheck

CheckErr:

    lea eax, Check
    movq mm0, qword ptr TestERR
    movq qword ptr [eax], mm0

EndCheck:

    ret

CheckDestX endp


It gives the same results as CheckDest PROC and
probably is quite fast, but I didn't still test the performance of it.

But I'm still not satisfied from CPUID results:

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
---------------------------------------------------------
13876   cycles for MOV AX - Test OK
8740    cycles for LEA - Test OK
4131    cycles for MMX/PUNPCKLBW - Test OK
3153    cycles for XMM/PSHUFB - I shot - Test OK
2376    cycles for XMM/PSHUFB - II shot - Test OK
12336   STOSB - Test OK
---------------------------------------------------------
9242    cycles for MOV AX - Test OK
8731    cycles for LEA - Test OK
4131    cycles for MMX/PUNPCKLBW - Test OK
3153    cycles for XMM/PSHUFB - I shot - Test OK
2376    cycles for XMM/PSHUFB - II shot - Test OK
12330   STOSB - Test OK
---------------------------------------------------------

--- ok ---


This time I've used PrintCpu and MasmBasic include,
but the results are still not accurate. My PC has SSSE3
capability, not SSE4.

Only Alex's code that I used a couple of year ago gives
a more accurate result:

┌─────────────────────────────────────────────────────────────[27-Nov-2012 at 10:57 GMT]─┐
│OS  : Microsoft Windows 7 Ultimate Edition, 64-bit Service Pack 1 (build 7601)          │
│CPU : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 2 logical core(s) with SSSE3           │


I've read the thread about the CPUID code, but didn't find anything new.
Should I still use Alex's code or there is a more accurate routine for modern
CPU?

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

CPU's may have changed a lot
but, operating systems change at a slower rate   :biggrin:
i have a p4, which supports SSE3, running XP
XP does not support AVX instructions, nor does vista, as far as i know

our CPUID programs don't have to be updated very often, either - lol
while we might detect AVX support on a CPU (pretty easy),
it is another thing to judge the level of support offered by the OS (not so easy)

i would guess 97% of the ibm-compatible pc's in use today probably support SSE2
if you go any higher than SSE2, it might be a good idea to provide a fallback routine
it depends on what range of platforms you want your program to run on

frktons

Quote from: dedndave on November 27, 2012, 10:16:07 PM
CPU's may have changed a lot
but, operating systems change at a slower rate   :biggrin:
i have a p4, which supports SSE3, running XP
XP does not support AVX instructions, nor does vista, as far as i know

our CPUID programs don't have to be updated very often, either - lol
while we might detect AVX support on a CPU (pretty easy),
it is another thing to judge the level of support offered by the OS (not so easy)

i would guess 97% of the ibm-compatible pc's in use today probably support SSE2
if you go any higher than SSE2, it might be a good idea to provide a fallback routine
it depends on what range of platforms you want your program to run on

Yes Dave, the reasoning is quite fair.
I'm talking about the uncorrect data shown by old routines
while we have newer routines, like Alex's one, that are more
accurate, even if they don' go above SSE4.X.
Jochen's library is quite up to date and uses many SSE opcode [I imagine]
but the Macro [I think] PrintCpu should be updated to be more
correct, doesn't matter if it doesn't cover last AVX code or the like.

Well it is just my opinion, of course. Even the CPUID utility that Intel gives us
http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=7838
doesn't show that my PC has SSSE3 capabilities, but at least it doesn't say I have
SSE4.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

oh - i see what you mean
well - there have been a few that report erroneously
but, to programatically determine if a specific extension is supported is pretty easy
i.e., i wouldn't use "Alex's" or "Jochen's" or even "Dave's" routine
their purpose is to identify the CPU and capabilities, primarily for forum comparisons

that is a different function than identifying extension support for a program to select routines
what you want to do is actually much simpler   :t

dedndave

;               0_1 values come from CPUID function 1
;               8_1 values come from CPUID function 80000001h
;
;                Source        Description
;
;                0_1edx:23     MMX
;                8_1edx:22     MMX+    (AMD only)
;                8_1edx:31     3DNow!  (AMD only)
;                8_1edx:30     3DNow!+ (AMD only)
;                0_1edx:25     SSE
;                0_1edx:26     SSE2
;                0_1ecx:00     SSE3
;                0_1ecx:09     SSSE3
;                0_1ecx:19     SSE4.1
;                0_1ecx:20     SSE4.2  (Intel only)
;                8_1ecx:06     SSE4a   (AMD only)
;                8_1ecx:11     SSE5    (AMD only) - this became one of the AVX feature bits


you can get most of what you want to know by examining ECX and EDX after this...
        mov     eax,1
        cpuid

for example, ECX bit 0 will be 1 if SSE3 is supported

frktons

Thanks Dave.

CPUID is still an unknown land, I've never been in those bit-area.
Your introduction to the matter looks interesting, I'll give it a try.  :t
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

i updated it a little Frank - you may want to reload the page   :P

oh - and you have to use .586 or higher  to use CPUID   :t

frktons

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave


frktons

Quote from: jj2007 on November 27, 2012, 11:27:06 AM

psubd xmm0, xmm1
pmovmskb eax, xmm0 ; set byte mask in eax
test eax, eax



This code is a little bit faster on my Core 2 duo:

    psubd xmm1, xmm0
    pmovmskb edx, xmm1   ; set byte mask in dx
    cmp   dx, 0 


There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

nidud

deleted

frktons

Well nidud  :t

this seems to work as well as psubd, at the same performance.
So we have a couple of alternatives, at least.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama