News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

bsr eax, eax with eax=0

Started by jj2007, September 03, 2013, 12:34:10 AM

Previous topic - Next topic

jj2007

The documentation for bsr states "If no set bit is found, the contents of the destination operand are undefined". Apparently, it means in practice that the destination register remains unchanged; there is a hint, however, that some early Intel CPUs behaved differently. I am particularly interested in the bsr eax, eax with eax=0 case. And of course, if somebody has a link to a more detailed documentation, even better ;-)

Another source states "with a null source, lzcnt will return a null value, while bsr will leave the target unmodified"

See also Bsf/Bsr behavior with zero source (Chess programming).

AMD Athlon(tm) Dual Core Processor 4450B (MMX, SSE, SSE2, SSE3)

bsr reg, samereg(0)
eax             0
edx             0
ecx             0
flags:          cZso

bsr reg(12345678), otherreg(0)
x:eax           12345678
x:edx           12345678
x:ecx           12345678
flags:          cZso


Attached a simple testbed:
include \masm32\MasmBasic\MasmBasic.inc        ; download
        Init
        PrintCpu
        xor eax, eax
        xor edx, edx
        xor ecx, ecx
        bsr eax, eax
        bsr edx, edx
        bsr ecx, ecx
        deb 4, "bsr reg, samereg(0)", eax, edx, ecx, flags
        mov eax, 12345678h
        mov edx, eax
        mov ecx, eax
        xor esi, esi
        bsr eax, esi
        bsr edx, esi
        bsr ecx, esi
        deb 4, "bsr reg(12345678), otherreg(0)", x:eax, x:edx, x:ecx, flags
        Exit
end start

dedndave

Intel(R) Pentium(R) 4 CPU 3.00GHz (MMX, SSE, SSE2, SSE3)

bsr reg, samereg(0)
eax             0
edx             0
ecx             0
flags:          cZso

bsr reg(12345678), otherreg(0)
x:eax           12345678
x:edx           12345678
x:ecx           12345678
flags:          cZso

jj2007

Thanks, Dave.
For AMD it seems clear: destination unaffected. For Intel, the docu (Intel® 64 and IA-32 Architectures Software Developer's Manual, May 2012) says:
QuoteSearches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined

Most probably, the bold part could be written as "If NO most significant 1 bit is found, nothing is stored in the destination operand.

Gunther

Jochen,

your results:


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SS
E4.2, AVX)

bsr reg, samereg(0)
eax             0
edx             0
ecx             0
flags:          cZso

bsr reg(12345678), otherreg(0)
x:eax           12345678
x:edx           12345678
x:ecx           12345678
flags:          cZso

c:\tmp10>


Gunther
You have to know the facts before you can distort them.

jj2007

Grazie :icon14:

It's a pity that the effective behaviour is not properly documented by Intel. It means an extra jump (see oqTEST=? in the other thread - MasmBasic Ocmp.1 means "with extra jump" )...

Gunther

Jochen,

Quote from: jj2007 on September 03, 2013, 02:23:33 AM
Grazie :icon14:

It's a pity that the effective behaviour is not properly documented by Intel. It means an extra jump (see oqTEST=? in the other thread - MasmBasic Ocmp.1 means "with extra jump" )...

yes, that's right. Did you check it inside the AMD manuals?

Gunther
You have to know the facts before you can distort them.

jj2007

Quote from: Gunther on September 03, 2013, 03:10:57 AMDid you check it inside the AMD manuals?

No, I didn't check, but other sources say "unmodified".

FORTRANS

#7
Hi,

   Made a program to test some older CPU's.


Pentium

bsr reg, samereg(0)
EAX     00000000
EDX     00000000
ECX     00000000
OV SF ZF AC PF CF
  0  0  1  1  1  0


bsr reg(12345678), otherreg(0)
EAX     12345678
EDX     12345678
ECX     12345678
OV SF ZF AC PF CF
  0  0  1  1  1  0

P-III

bsr reg, samereg(0)
EAX     00000000
EDX     00000000
ECX     00000000
OV SF ZF AC PF CF
  0  0  1  0  1  0

bsr reg(12345678), otherreg(0)
EAX     12345678
EDX     12345678
ECX     12345678
OV SF ZF AC PF CF
  0  0  1  0  1  0

P-MMX

bsr reg, samereg(0)
EAX     00000000
EDX     00000000
ECX     00000000
OV SF ZF AC PF CF
  0  0  1  1  1  0

bsr reg(12345678), otherreg(0)
EAX     12345678
EDX     12345678
ECX     12345678
OV SF ZF AC PF CF
  0  0  1  1  1  0

Mobile Intel(R) Celeron(R) processor     600MHz (MMX, SSE, SSE2)

bsr reg, samereg(0)
eax 0
edx 0
ecx 0
flags: cZso

bsr reg(12345678), otherreg(0)
x:eax 12345678
x:edx 12345678
x:ecx 12345678
flags: cZso


Regards,

Steve N.

Edit:

   Fixed flags as pointed out.

SRN

jj2007

Quote from: FORTRANS on September 03, 2013, 04:44:15 AM
   Made a program to test some older CPU's.

Thanks, Steve. Zero flag should be set, though, after a bsr reg32, zeroreg32 - or do I misunderstand something?

OV SF ZF AC PF CF
  1  0  0  1  1  1

FORTRANS

Hi,

   No.  You are correct.  A programming error.  I will update
the flags after I fix the error(s)

Thanks,

Steve

MichaelW

I created a 16-bit DOS executable, and changed the code to preserve the flags on the stack for each BSR and display them separately.

.model small,c
.386
include support.asm
.stack
.data
.code
.startup
    xor eax, eax
    xor edx, edx
    xor ecx, ecx
    bsr eax, eax
    pushf
    bsr edx, edx
    pushf
    bsr ecx, ecx
    pushf
    print "bsr reg, samereg(0):",NL
    print dword$(eax),chr$(9)
    print dword$(edx),chr$(9)
    print dword$(ecx),NL
    popf
    call dumpflags
    popf
    call dumpflags
    popf
    call dumpflags
    mov eax, 12345678h
    mov edx, eax
    mov ecx, eax
    xor esi, esi
    bsr eax, esi
    pushf
    bsr edx, esi
    pushf
    bsr ecx, esi
    pushf
    print "bsr reg(12345678), otherreg(0):",NL
    print hexdword$(eax),"h",chr$(9)
    print hexdword$(edx),"h",chr$(9)
    print hexdword$(ecx),"h",NL
    popf
    call dumpflags
    popf
    call dumpflags
    popf
    call dumpflags
    print NL
    call waitkey
.exit
end


Results running on my P4 Northwood system under Windows XP, my P3 system under WindowsXP, my P2 system under Windows ME, and my old IBM SLC2-66 system under MS-DOS 6.22:

bsr reg, samereg(0):
0       0       0
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
bsr reg(12345678), otherreg(0):
12345678h       12345678h       12345678h
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC


Unfortunately, my AMD-K5 system is down.
Well Microsoft, here's another nice mess you've gotten us into.

jj2007

Thanks, Steve and Michael.

So basically, it is true what the sites linked above say: destination register is not undefined but rather unchanged.

Unfortunately, relying on that undocumented behaviour would not be good programming practice...

Antariy

Well, I said the same on this subject - it just seems illogical to trash the reg if there's "no operation" to be done.


Intel(R) Celeron(R) CPU 2.13GHz (MMX, SSE, SSE2, SSE3)

bsr reg, samereg(0)
eax             0
edx             0
ecx             0
flags:          cZso

bsr reg(12345678), otherreg(0)
x:eax           12345678
x:edx           12345678
x:ecx           12345678
flags:          cZso



And I think this construction

      and   ecx, 07FFFh
      or ecx, 1      ; make sure there is no zero input

      bsr   ecx, ecx

is superfluous.

INTEL 80386 PROGRAMMER'S REFERENCE MANUAL (1986)

Description

BSR scans the bits in the second word or doubleword operand from the most
significant bit to the least significant bit. The ZF flag is cleared if the
bits are all 0
; otherwise, ZF is set and the destination register is loaded
with the bit index of the first set bit found when scanning in the reverse
direction.


No words about touching destination register

Intel's instruction set reference (2008)

Description
Searches the source operand (second operand) for the most significant set bit (1 bit).
If a most significant 1 bit is found, its bit index is stored in the destination operand
(first operand). The source operand can be a register or a memory location; the
destination operand is a register. The bit index is an unsigned offset from bit 0 of the
source operand. If the content source operand is 0, the content of the destination
operand is undefined.



This is very clear definition ::)

Antariy

Probably

xor ecx,0ffffh
jz @zero
and ecx,7fffh


is still better, because jump forward usually decided as "not would be done", so this instruction shoud take less time than or ecx,1 - which explicitly changes the reg, so, breaks the prediction. Also getting zero after XOR there is no need to process further.

nidud

#14
deleted