The MASM Forum

General => The Laboratory => Topic started by: jj2007 on September 03, 2013, 12:34:10 AM

Title: bsr eax, eax with eax=0
Post by: jj2007 on September 03, 2013, 12:34:10 AM
The documentation for bsr states "If no set bit is found, the contents of the destination operand are undefined". Apparently, it means in practice that the destination register remains unchanged; there is a hint (http://semipublic.comp-arch.net/wiki/Bit_Scanning_Instructions), however, that some early Intel CPUs behaved differently. I am particularly interested in the bsr eax, eax with eax=0 case. And of course, if somebody has a link to a more detailed documentation, even better ;-)

Another source (http://code.google.com/p/corkami/wiki/x86oddities) states "with a null source, lzcnt will return a null value, while bsr will leave the target unmodified"

See also Bsf/Bsr behavior with zero source (Chess programming) (http://chessprogramming.wikispaces.com/BitScan#Processor%20Instructions%20for%20Bitscans-x86-Bsf/Bsr%20behavior%20with%20zero%20source).

AMD Athlon(tm) Dual Core Processor 4450B (MMX, SSE, SSE2, SSE3)

bsr reg, samereg(0)
eax             0
edx             0
ecx             0
flags:          cZso

bsr reg(12345678), otherreg(0)
x:eax           12345678
x:edx           12345678
x:ecx           12345678
flags:          cZso


Attached a simple testbed:
include \masm32\MasmBasic\MasmBasic.inc        ; download (http://masm32.com/board/index.php?topic=94.0)
        Init
        PrintCpu
        xor eax, eax
        xor edx, edx
        xor ecx, ecx
        bsr eax, eax
        bsr edx, edx
        bsr ecx, ecx
        deb 4, "bsr reg, samereg(0)", eax, edx, ecx, flags
        mov eax, 12345678h
        mov edx, eax
        mov ecx, eax
        xor esi, esi
        bsr eax, esi
        bsr edx, esi
        bsr ecx, esi
        deb 4, "bsr reg(12345678), otherreg(0)", x:eax, x:edx, x:ecx, flags
        Exit
end start
Title: Re: bsr eax, eax with eax=0
Post by: dedndave on September 03, 2013, 12:58:02 AM
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (MMX, SSE, SSE2, SSE3)

bsr reg, samereg(0)
eax             0
edx             0
ecx             0
flags:          cZso

bsr reg(12345678), otherreg(0)
x:eax           12345678
x:edx           12345678
x:ecx           12345678
flags:          cZso
Title: Re: bsr eax, eax with eax=0
Post by: jj2007 on September 03, 2013, 01:41:02 AM
Thanks, Dave.
For AMD it seems clear: destination unaffected. For Intel, the docu (Intel® 64 and IA-32 Architectures Software Developer’s Manual, May 2012) says:
Quote
Searches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined

Most probably, the bold part could be written as "If NO most significant 1 bit is found, nothing is stored in the destination operand.
Title: Re: bsr eax, eax with eax=0
Post by: Gunther on September 03, 2013, 02:08:38 AM
Jochen,

your results:

Code: [Select]
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SS
E4.2, AVX)

bsr reg, samereg(0)
eax             0
edx             0
ecx             0
flags:          cZso

bsr reg(12345678), otherreg(0)
x:eax           12345678
x:edx           12345678
x:ecx           12345678
flags:          cZso

c:\tmp10>

Gunther
Title: Re: bsr eax, eax with eax=0
Post by: jj2007 on September 03, 2013, 02:23:33 AM
Grazie :icon14:

It's a pity that the effective behaviour is not properly documented by Intel. It means an extra jump (see oqTEST=? in the other thread - MasmBasic Ocmp.1 means "with extra jump"  (http://masm32.com/board/index.php?topic=2222.msg24033#new))...
Title: Re: bsr eax, eax with eax=0
Post by: Gunther on September 03, 2013, 03:10:57 AM
Jochen,

Grazie :icon14:

It's a pity that the effective behaviour is not properly documented by Intel. It means an extra jump (see oqTEST=? in the other thread - MasmBasic Ocmp.1 means "with extra jump"  (http://masm32.com/board/index.php?topic=2222.msg24033#new))...

yes, that's right. Did you check it inside the AMD manuals?

Gunther
Title: Re: bsr eax, eax with eax=0
Post by: jj2007 on September 03, 2013, 03:24:17 AM
Did you check it inside the AMD manuals?

No, I didn't check, but other sources say "unmodified".
Title: Re: bsr eax, eax with eax=0
Post by: FORTRANS on September 03, 2013, 04:44:15 AM
Hi,

   Made a program to test some older CPU's.

Code: [Select]
Pentium

bsr reg, samereg(0)
EAX     00000000
EDX     00000000
ECX     00000000
 OV SF ZF AC PF CF
  0  0  1  1  1  0


bsr reg(12345678), otherreg(0)
EAX     12345678
EDX     12345678
ECX     12345678
 OV SF ZF AC PF CF
  0  0  1  1  1  0

P-III

bsr reg, samereg(0)
EAX     00000000
EDX     00000000
ECX     00000000
 OV SF ZF AC PF CF
  0  0  1  0  1  0

bsr reg(12345678), otherreg(0)
EAX     12345678
EDX     12345678
ECX     12345678
 OV SF ZF AC PF CF
  0  0  1  0  1  0

P-MMX

bsr reg, samereg(0)
EAX     00000000
EDX     00000000
ECX     00000000
 OV SF ZF AC PF CF
  0  0  1  1  1  0

bsr reg(12345678), otherreg(0)
EAX     12345678
EDX     12345678
ECX     12345678
 OV SF ZF AC PF CF
  0  0  1  1  1  0

Mobile Intel(R) Celeron(R) processor     600MHz (MMX, SSE, SSE2)

bsr reg, samereg(0)
eax 0
edx 0
ecx 0
flags: cZso

bsr reg(12345678), otherreg(0)
x:eax 12345678
x:edx 12345678
x:ecx 12345678
flags: cZso

Regards,

Steve N.

Edit:

   Fixed flags as pointed out.

SRN
Title: Re: bsr eax, eax with eax=0
Post by: jj2007 on September 03, 2013, 05:28:08 AM
   Made a program to test some older CPU's.

Thanks, Steve. Zero flag should be set, though, after a bsr reg32, zeroreg32 - or do I misunderstand something?

OV SF ZF AC PF CF
  1  0  0  1  1  1
Title: Re: bsr eax, eax with eax=0
Post by: FORTRANS on September 03, 2013, 05:35:00 AM
Hi,

   No.  You are correct.  A programming error.  I will update
the flags after I fix the error(s)

Thanks,

Steve
Title: Re: bsr eax, eax with eax=0
Post by: MichaelW on September 03, 2013, 06:16:25 AM
I created a 16-bit DOS executable, and changed the code to preserve the flags on the stack for each BSR and display them separately.
Code: [Select]
.model small,c
.386
include support.asm
.stack
.data
.code
.startup
    xor eax, eax
    xor edx, edx
    xor ecx, ecx
    bsr eax, eax
    pushf
    bsr edx, edx
    pushf
    bsr ecx, ecx
    pushf
    print "bsr reg, samereg(0):",NL
    print dword$(eax),chr$(9)
    print dword$(edx),chr$(9)
    print dword$(ecx),NL
    popf
    call dumpflags
    popf
    call dumpflags
    popf
    call dumpflags
    mov eax, 12345678h
    mov edx, eax
    mov ecx, eax
    xor esi, esi
    bsr eax, esi
    pushf
    bsr edx, esi
    pushf
    bsr ecx, esi
    pushf
    print "bsr reg(12345678), otherreg(0):",NL
    print hexdword$(eax),"h",chr$(9)
    print hexdword$(edx),"h",chr$(9)
    print hexdword$(ecx),"h",NL
    popf
    call dumpflags
    popf
    call dumpflags
    popf
    call dumpflags
    print NL
    call waitkey
.exit
end

Results running on my P4 Northwood system under Windows XP, my P3 system under WindowsXP, my P2 system under Windows ME, and my old IBM SLC2-66 system under MS-DOS 6.22:
Code: [Select]
bsr reg, samereg(0):
0       0       0
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
bsr reg(12345678), otherreg(0):
12345678h       12345678h       12345678h
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC

Unfortunately, my AMD-K5 system is down.
Title: Re: bsr eax, eax with eax=0
Post by: jj2007 on September 03, 2013, 06:45:53 AM
Thanks, Steve and Michael.

So basically, it is true what the sites linked above say: destination register is not undefined but rather unchanged.

Unfortunately, relying on that undocumented behaviour would not be good programming practice...
Title: Re: bsr eax, eax with eax=0
Post by: Antariy on September 03, 2013, 10:50:40 AM
Well, I said the same on this subject - it just seems illogical to trash the reg if there's "no operation" to be done.

Code: [Select]
Intel(R) Celeron(R) CPU 2.13GHz (MMX, SSE, SSE2, SSE3)

bsr reg, samereg(0)
eax             0
edx             0
ecx             0
flags:          cZso

bsr reg(12345678), otherreg(0)
x:eax           12345678
x:edx           12345678
x:ecx           12345678
flags:          cZso


And I think this construction

      and   ecx, 07FFFh
      or ecx, 1      ; make sure there is no zero input

      bsr   ecx, ecx

is superfluous.

INTEL 80386 PROGRAMMER'S REFERENCE MANUAL (1986)

Description

BSR scans the bits in the second word or doubleword operand from the most
significant bit to the least significant bit. The ZF flag is cleared if the
bits are all 0
; otherwise, ZF is set and the destination register is loaded
with the bit index of the first set bit found when scanning in the reverse
direction.


No words about touching destination register

Intel's instruction set reference (2008)

Description
Searches the source operand (second operand) for the most significant set bit (1 bit).
If a most significant 1 bit is found, its bit index is stored in the destination operand
(first operand). The source operand can be a register or a memory location; the
destination operand is a register. The bit index is an unsigned offset from bit 0 of the
source operand. If the content source operand is 0, the content of the destination
operand is undefined.



This is very clear definition ::)
Title: Re: bsr eax, eax with eax=0
Post by: Antariy on September 03, 2013, 12:12:08 PM
Probably

xor ecx,0ffffh
jz @zero
and ecx,7fffh


is still better, because jump forward usually decided as "not would be done", so this instruction shoud take less time than or ecx,1 - which explicitly changes the reg, so, breaks the prediction. Also getting zero after XOR there is no need to process further.
Title: Re: bsr eax, eax with eax=0
Post by: nidud on September 03, 2013, 05:20:52 PM
Probably

xor ecx,0ffffh
jz @zero
and ecx,7fffh


is still better, because jump forward usually decided as "not would be done", so this instruction shoud take less time than or ecx,1 - which explicitly changes the reg, so, breaks the prediction. Also getting zero after XOR there is no need to process further.

It is better, but testing/modifying the flags on Intel is painfully slow, and XOR modifies the flag, and JXX tests the flag
Code: [Select]
not ecx
and ecx,7FFFh

and this (until proven otherwise) is paranoia:
Quote
However, as long as Intel docs explicitly state content undefined, it is recommend to don't rely on a pre-initialized content of that target register, if the source is zero.

Quote
Most probably, the bold part could be written as "If NO most significant 1 bit is found, nothing is stored in the destination operand.
Title: Re: bsr eax, eax with eax=0
Post by: dedndave on September 04, 2013, 12:08:15 AM
i read it this way:
if they say "undefined", assume nothing - lol

the way we typically use BSF/BSR, we branch on cases according to the ZF to handle the issue