Just a warning to all.
I had a function which had always been working. I modified it to align a short loop (added ALIGN OWORD). The function fails to function correctly (did not cause an error - just bad results which caused other errors - quite difficult to determine just what was happening). The data was 8 DWORDS which were all 0 except most significant which was 40000000h, and I was subtracting a single DWORD with a value of 1. I expected to get 7 DWORDS of FFFFFFFFh and the most significant DWORD with 3FFFFFFFh:
This is the source code:
;*******************************************************************************
;
; SUB a LONG_NUMB value from another. This is a routine called to subtract a
; long number from another after all scaling and validations have been done.
;
; esi has the source OFFSET of the data.
; edi has the already scaled OFFSET of the destination data.
; ecx has the DWORD count to subtract.
;
; Returns nothing with edi pointing to the last word modified. Note that this
; may not be the highest DWORD in the destination if a short source is
; subtracted from a long destination and there are no borrows past the last
; DWORD in the destination or source. This is primarily a check for a size
; extension due to a final borrow.
;
;*******************************************************************************
ALIGN OWORD
SUBIt PROC PRIVATE USES eax ebx esi
pushfd
clc
;
; subtract all the value DWORDS.
;
ALIGN OWORD
DoAll:
mov ebx,[esi]
lea esi,[esi + (SIZEOF DWORD)]
sbb [edi],ebx
lea edi,[edi + (SIZEOF DWORD)]
dec ecx
jnz DoAll
jnc Exit
mov eax,0
;
; Propogate the borrow.
;
ALIGN OWORD
Again:
sbb [edi],eax
lea edi,[edi + (SIZEOF DWORD)]
jc Again
;
; Exit SUBIt, adjust the destination pointer to the last modified DWORD.
;
Exit:
sub edi,(SIZEOF DWORD)
popfd
test dTestZero,0
ret
SUBIt ENDP
This is the generated code from MASM 9.0 (Visual Studio "Disassembly" tab):
ALIGN OWORD
SUBIt PROC PRIVATE USES eax ebx esi
00404FC0 push eax
00404FC1 push ebx
00404FC2 push esi
pushfd
00404FC3 pushfd
clc
00404FC4 clc
00404FC5 lea esp,[esp]
00404FCC lea esp,[esp]
;
; subtract all the value DWORDS.
;
ALIGN OWORD
DoAll:
mov ebx,[esi]
00404FD0 mov ebx,dword ptr [esi]
lea esi,[esi + (SIZEOF DWORD)]
00404FD2 lea esi,[esi+4]
sbb [edi],ebx
00404FD5 sbb dword ptr [edi],ebx
lea edi,[edi + (SIZEOF DWORD)]
00404FD7 lea edi,[edi+4]
dec ecx
00404FDA dec ecx
jnz DoAll
00404FDB jne DoAll (404FD0h)
jnc Exit
00404FDD jae Exit (404FF7h)
mov eax,0
00404FDF mov eax,0
00404FE4 lea esp,[esp] ************************** alignment inserted instructions
00404FEB add eax,0 ****************************** think this might reset the carry Flag?
;
; Propogate the borrow.
;
ALIGN OWORD ***************************************** just added this to help speed up the code
Again:
sbb [edi],eax
00404FF0 sbb dword ptr [edi],eax **************** and cause the SBB here to fail?
lea edi,[edi + (SIZEOF DWORD)]
00404FF2 lea edi,[edi+4]
jc Again
00404FF5 jb Again (404FF0h)
;
; Exit SUBIt, adjust the destination pointer to the last modified DWORD.
;
Exit:
sub edi,(SIZEOF DWORD)
00404FF7 sub edi,4
popfd
00404FFA popfd
test dTestZero,0
00404FFB test dword ptr [dTestZero (40C6BCh)],0
ret
00405005 pop esi
00405006 pop ebx
00405007 pop eax
00405008 ret
00405009 lea esp,[esp]
SUBIt ENDP
Beware!
Dave.
Yes, add eax,0 instruction resets all flags according to its operation, so the code logic break there, Dave. Trying to resort the instructions so that the required loops will start at paragraph alignment might help.
Alex,
Oh, I already know how to do that, and I did it. Here:
ALIGN OWORD
SUBIt PROC PRIVATE USES eax ebx esi
pushfd
xor eax,eax ************************************ moved this up here
clc
ALIGN OWORD
;
; Subtract all the value DWORDS.
;
DoAll:
mov ebx,[esi]
lea esi,[esi + (SIZEOF DWORD)]
sbb [edi],ebx
lea edi,[edi + (SIZEOF DWORD)]
dec ecx
jnz DoAll
jc Again *************************************** jump here if you need SBB
jmp Exit *************************************** otherwise exit
*************************************** don't care what MASM inserts here
ALIGN OWORD
;
; Propogate the borrow.
;
Again:
sbb [edi],eax
lea edi,[edi + (SIZEOF DWORD)]
jc Again
;
; Exit SUBIt, adjust the destination pointer to the last modified DWORD.
;
Exit:
sub edi,(SIZEOF DWORD)
popfd
test dTestZero,0
ret
SUBIt ENDP
The problem was that there was no error, just not calculated correctly. When working with bignums, it is hard to visualize what kind of numbers you should be getting. You have no idea where the process is breaking down. In this case, the number was filled with 6 DWORDS of 0 when it should have had 6 DWORDS of FFF..., stuffed into the middle of the number with the end DWORDS having the correct value. Just stepping through with VS it appeared that the SBB in the second loop was just not working correctly (I had no idea whether or not my CPU was failing, but to always get this error, and still run Windows, the CPU must be working correctly). I had to start trying to figure out where the Carry flag was in the Flags Reg, but Carry had to be set because I got here following a skipped jnc Exit. I finally started looking at the disassembly window and found that MS trash that was causing the error.
Dave.
no easy way to fix it - writing a macro to align means you need the current address :(
on the bright side.....
your loop is small - a SHORT branch at the end
from my experience, if the branch is SHORT, alignment doesn't make much difference
what you might do - align the PROC
then, insert NOPS (or whatever) to align the loop
Dave,
As I told Alex, I already fixed it. The problem was trying to find what had happened. I even ran some timing tests which I have zipped and attached here, and unaligned loops do make a difference. In the zip, BaseTest.txt is the base aligned timing, and BaseTest.txt is the unaligned timing. As I measured it, it was 22% longer.
Dave.
Dave,
I think you already know this comment, MASM does not have bugs, it just has features you need to understand. Something I have always like about MASM is that its a bad mannered cantankerous old pig that is more likely to bite your hand than hold it, has a macro engine that borders on the unintelligible and is something that is basically a boiler room code production tool for the kiddies at Microsoft. :biggrin:
Steve,
I agree.
Dave.
I'd rather call it a feature than a bug - even the "reduced" precision has 27 digits, far beyond what a REAL10 can handle, let alone a REAL8:
.code ; fast sinus implementation; inspired by lolengine/remez (http://lolengine.net/wiki/doc/maths/remez/tutorial-fixing-parameters)
align 8 ; see also Fast and accurate sine/cosine (http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648) (devmaster Nick)
if 1 ; these choke with ml 8 ... 10 only - ML 6.15 and JWasm are OK:
a3 REAL8 -1.666665709650470145824129400050267289858e-1;
a5 REAL8 8.333017291562218127986291618761571373087e-3;
a7 REAL8 -1.980661520135080504411629636078917643846e-4;
a9 REAL8 2.600054767890361277123254766503271638682e-6;
else ; fine for all versions
a3 REAL8 -1.66666570965047014582412940e-1;
a5 REAL8 8.33301729156221812798629161e-3;
a7 REAL8 -1.98066152013508050441162963e-4;
a9 REAL8 2.60005476789036127712325476e-6;
endif