The documentation for bsr states "If no set bit is found, the contents of the destination operand are undefined". Apparently, it means in practice that the destination register remains unchanged; there is a hint (http://semipublic.comp-arch.net/wiki/Bit_Scanning_Instructions), however, that some early Intel CPUs behaved differently. I am particularly interested in the bsr eax, eax with eax=0 case. And of course, if somebody has a link to a more detailed documentation, even better ;-)
Another source (http://code.google.com/p/corkami/wiki/x86oddities) states "with a null source, lzcnt will return a null value, while bsr will leave the target unmodified"
See also Bsf/Bsr behavior with zero source (Chess programming) (http://chessprogramming.wikispaces.com/BitScan#Processor%20Instructions%20for%20Bitscans-x86-Bsf/Bsr%20behavior%20with%20zero%20source).
AMD Athlon(tm) Dual Core Processor 4450B (MMX, SSE, SSE2, SSE3)
bsr reg, samereg(0)
eax 0
edx 0
ecx 0
flags: cZso
bsr reg(12345678), otherreg(0)
x:eax 12345678
x:edx 12345678
x:ecx 12345678
flags: cZso
Attached a simple testbed:
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
Init
PrintCpu
xor eax, eax
xor edx, edx
xor ecx, ecx
bsr eax, eax
bsr edx, edx
bsr ecx, ecx
deb 4, "bsr reg, samereg(0)", eax, edx, ecx, flags
mov eax, 12345678h
mov edx, eax
mov ecx, eax
xor esi, esi
bsr eax, esi
bsr edx, esi
bsr ecx, esi
deb 4, "bsr reg(12345678), otherreg(0)", x:eax, x:edx, x:ecx, flags
Exit
end start
Intel(R) Pentium(R) 4 CPU 3.00GHz (MMX, SSE, SSE2, SSE3)
bsr reg, samereg(0)
eax 0
edx 0
ecx 0
flags: cZso
bsr reg(12345678), otherreg(0)
x:eax 12345678
x:edx 12345678
x:ecx 12345678
flags: cZso
Thanks, Dave.
For AMD it seems clear: destination unaffected. For Intel, the docu (IntelĀ® 64 and IA-32 Architectures Software Developer's Manual, May 2012) says:
QuoteSearches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined
Most probably, the bold part could be written as "
If NO most significant 1 bit is found, nothing is stored in the destination operand.
Jochen,
your results:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SS
E4.2, AVX)
bsr reg, samereg(0)
eax 0
edx 0
ecx 0
flags: cZso
bsr reg(12345678), otherreg(0)
x:eax 12345678
x:edx 12345678
x:ecx 12345678
flags: cZso
c:\tmp10>
Gunther
Grazie :icon14:
It's a pity that the effective behaviour is not properly documented by Intel. It means an extra jump (see oqTEST=? in the other thread - MasmBasic Ocmp.1 means "with extra jump" (http://masm32.com/board/index.php?topic=2222.msg24033#new))...
Jochen,
Quote from: jj2007 on September 03, 2013, 02:23:33 AM
Grazie :icon14:
It's a pity that the effective behaviour is not properly documented by Intel. It means an extra jump (see oqTEST=? in the other thread - MasmBasic Ocmp.1 means "with extra jump" (http://masm32.com/board/index.php?topic=2222.msg24033#new))...
yes, that's right. Did you check it inside the AMD manuals?
Gunther
Quote from: Gunther on September 03, 2013, 03:10:57 AMDid you check it inside the AMD manuals?
No, I didn't check, but other sources say "unmodified".
Hi,
Made a program to test some older CPU's.
Pentium
bsr reg, samereg(0)
EAX 00000000
EDX 00000000
ECX 00000000
OV SF ZF AC PF CF
0 0 1 1 1 0
bsr reg(12345678), otherreg(0)
EAX 12345678
EDX 12345678
ECX 12345678
OV SF ZF AC PF CF
0 0 1 1 1 0
P-III
bsr reg, samereg(0)
EAX 00000000
EDX 00000000
ECX 00000000
OV SF ZF AC PF CF
0 0 1 0 1 0
bsr reg(12345678), otherreg(0)
EAX 12345678
EDX 12345678
ECX 12345678
OV SF ZF AC PF CF
0 0 1 0 1 0
P-MMX
bsr reg, samereg(0)
EAX 00000000
EDX 00000000
ECX 00000000
OV SF ZF AC PF CF
0 0 1 1 1 0
bsr reg(12345678), otherreg(0)
EAX 12345678
EDX 12345678
ECX 12345678
OV SF ZF AC PF CF
0 0 1 1 1 0
Mobile Intel(R) Celeron(R) processor 600MHz (MMX, SSE, SSE2)
bsr reg, samereg(0)
eax 0
edx 0
ecx 0
flags: cZso
bsr reg(12345678), otherreg(0)
x:eax 12345678
x:edx 12345678
x:ecx 12345678
flags: cZso
Regards,
Steve N.
Edit:
Fixed flags as pointed out.
SRN
Quote from: FORTRANS on September 03, 2013, 04:44:15 AM
Made a program to test some older CPU's.
Thanks, Steve. Zero flag should be set, though, after a bsr reg32, zeroreg32 - or do I misunderstand something?
OV SF ZF AC PF CF
1 0 0 1 1 1
Hi,
No. You are correct. A programming error. I will update
the flags after I fix the error(s)
Thanks,
Steve
I created a 16-bit DOS executable, and changed the code to preserve the flags on the stack for each BSR and display them separately.
.model small,c
.386
include support.asm
.stack
.data
.code
.startup
xor eax, eax
xor edx, edx
xor ecx, ecx
bsr eax, eax
pushf
bsr edx, edx
pushf
bsr ecx, ecx
pushf
print "bsr reg, samereg(0):",NL
print dword$(eax),chr$(9)
print dword$(edx),chr$(9)
print dword$(ecx),NL
popf
call dumpflags
popf
call dumpflags
popf
call dumpflags
mov eax, 12345678h
mov edx, eax
mov ecx, eax
xor esi, esi
bsr eax, esi
pushf
bsr edx, esi
pushf
bsr ecx, esi
pushf
print "bsr reg(12345678), otherreg(0):",NL
print hexdword$(eax),"h",chr$(9)
print hexdword$(edx),"h",chr$(9)
print hexdword$(ecx),"h",NL
popf
call dumpflags
popf
call dumpflags
popf
call dumpflags
print NL
call waitkey
.exit
end
Results running on my P4 Northwood system under Windows XP, my P3 system under WindowsXP, my P2 system under Windows ME, and my old IBM SLC2-66 system under MS-DOS 6.22:
bsr reg, samereg(0):
0 0 0
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
bsr reg(12345678), otherreg(0):
12345678h 12345678h 12345678h
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
NV UP EI PL ZR NA PE NC
Unfortunately, my AMD-K5 system is down.
Thanks, Steve and Michael.
So basically, it is true what the sites linked above say: destination register is not undefined but rather unchanged.
Unfortunately, relying on that undocumented behaviour would not be good programming practice...
Well, I said the same on this subject - it just seems illogical to trash the reg if there's "no operation" to be done.
Intel(R) Celeron(R) CPU 2.13GHz (MMX, SSE, SSE2, SSE3)
bsr reg, samereg(0)
eax 0
edx 0
ecx 0
flags: cZso
bsr reg(12345678), otherreg(0)
x:eax 12345678
x:edx 12345678
x:ecx 12345678
flags: cZso
And I think this construction
and ecx, 07FFFh
or ecx, 1 ; make sure there is no zero input
bsr ecx, ecx
is superfluous.
INTEL 80386 PROGRAMMER'S REFERENCE MANUAL (1986)
Description
BSR scans the bits in the second word or doubleword operand from the most
significant bit to the least significant bit. The ZF flag is cleared if the
bits are all 0; otherwise, ZF is set and the destination register is loaded
with the bit index of the first set bit found when scanning in the reverse
direction.
No words about touching destination register
Intel's instruction set reference (2008)
Description
Searches the source operand (second operand) for the most significant set bit (1 bit).
If a most significant 1 bit is found, its bit index is stored in the destination operand
(first operand). The source operand can be a register or a memory location; the
destination operand is a register. The bit index is an unsigned offset from bit 0 of the
source operand. If the content source operand is 0, the content of the destination
operand is undefined.
This is very clear definition ::)
Probably
xor ecx,0ffffh
jz @zero
and ecx,7fffh
is still better, because jump forward usually decided as "not would be done", so this instruction shoud take less time than or ecx,1 - which explicitly changes the reg, so, breaks the prediction. Also getting zero after XOR there is no need to process further.
deleted
i read it this way:
if they say "undefined", assume nothing - lol
the way we typically use BSF/BSR, we branch on cases according to the ZF to handle the issue