Hi Guys
I`m updating a FputoString function and have some doubts about the scientific output format. In normal printf it seems that when dealing with Floating points, 0 can be represented as: 0.00000e+00, 0.00000e-00, or 0.000000 etc etc, right ?
My question is...this format to represent 0 is really necessary ? Shouldn´t it be simpler to represent it as a single 0 ?
What´s the purpose of using formats such as: 0.00000e+00, 0.000 etc etc ?
Quote from: guga on May 23, 2025, 10:17:45 PMHi Guys
I`m updating a FputoString function and have some doubts about the scientific output format. In normal print it seems that when dealing with Floating points, 0 can be represented as: 0.00000e+00, 0.00000e-00, or 0.000000 etc etc, right ?
My question is...this format to represent 0 is really necessary ? Shouldn´t it be simpler to represent it as a single 0 ?
What´s the purpose of using formats such as: 0.00000e+00, 0.000 etc etc ?
real4 and dword have in common zeros have same encoding,so if you want to you could use conditional jump and print "0",if you discover a zero
Hi,
Quote from: guga on May 23, 2025, 10:17:45 PMWhat´s the purpose of using formats such as: 0.00000e+00, 0.000 etc etc ?
The first thing that comes to mind is output alignment. Or "pretty
printing".
Regards,
Steve N.
QuoteShouldn´t it be simpler to represent it as a single 0 ?
That was the philosophy for the FpuFLtoA function of the Fpulib, although it did include the possibility of front padding with spaces for potential "alignment" of the decimal delimiter when specified.
If you can have a logical reason to offer some other different option in your own functions, it would be entirely up to you (and be entirely acceptable as always).
Thanks guys
Hi Raymond, thanks. I was thinking about removing this and keeping just 0, but maybe it would be better to add an optional flag in case someone wants this kind of output. I don't see any use for it, but maybe it will be useful for others.
Quote from: guga on May 23, 2025, 10:17:45 PMHi Guys
I`m updating a FputoString function and have some doubts about the scientific output format. In normal printf it seems that when dealing with Floating points, 0 can be represented as: 0.00000e+00, 0.00000e-00, or 0.000000 etc etc, right ?
My question is...this format to represent 0 is really necessary ? Shouldn´t it be simpler to represent it as a single 0 ?
What´s the purpose of using formats such as: 0.00000e+00, 0.000 etc etc ?
False precision is all I can think of.
Heh; I'm always amused by seeing things on Microsoft Learn where they give constants like "0x00000001", where a simple "1" would suffice.
Hi David. Yes, I agree. I usually opt for the simplest way. I'm updating another function that I created specifically for these conversions, but I got confused when I looked at the format that the M$ printf function exported when it came to FPU. I found it very strange. Unfortunately, it seems that some people use this, so I'm going to try to do as Raymond suggested and implement this as an option. The problem is trying to do it in a way that doesn't change the functionality too much (I mean, the processing speed). The worst thing is that I've already finished the function, it was ready, but I tried to look at this M$ function to see if I could adapt it to other ways of representing the FPU and I came across this weird format of 0.000e+00 etc.
Quote from: guga on May 24, 2025, 11:44:38 AMHi David. Yes, I agree. I usually opt for the simplest way. I'm updating another function that I created specifically for these conversions, but I got confused when I looked at the format that the M$ printf function exported when it came to FPU. I found it very strange. Unfortunately, it seems that some people use this, so I'm going to try to do as Raymond suggested and implement this as an option. The problem is trying to do it in a way that doesn't change the functionality too much (I mean, the processing speed).
Again, I have to ask:
WHY?Who the hell cares how many more microseconds your conversion routine takes?
Think about it: you're converting a binary value to a
visible format. Not some other internal binary format.
Which means that, by necessity, this is going to be used for some kind of output display, where speed is not of the essence. Not for converting into some other format where speed is important, like processing a huge database or some such.
Worst case, you could probably use
wsprintf() to accomplish this formatting. (I use that function all the time.)
Just code the damn thing and be done with it.
QuoteWhich means that, by necessity, this is going to be used for some kind of output display, where speed is not of the essence. Not for converting into some other value where speed is important, like processing a huge database or some such.
Indeed. It makes sense.
I agree with David because afterwards print or SendMessage to a gui control takes Milliseconds
If you write to a file or capture some output it might need to be a specific format when it gets parsed?
That's probably why printf etc differentiate between decimal and scientific output.
99.9% it won't matter.
Then again, the FPU can have -0 and +0 :badgrin:
Quote from: sinsi on May 25, 2025, 03:19:43 PMIf you write to a file or capture some output it might need to be a specific format when it gets parsed?
That's probably why printf etc differentiate between decimal and scientific output.
99.9% it won't matter.
Then again, the FPU can have -0 and +0 :badgrin:
Latest calculator i made had support for showing unicode infinity and - infinity
Quote from: sinsi on May 25, 2025, 03:19:43 PMThen again, the FPU can have -0 and +0 :badgrin:
True, but do we really need to display that?
I mean, when does it matter to us which zero the FPU is giving us?
Quote from: NoCforMe on May 26, 2025, 04:36:58 AMQuote from: sinsi on May 25, 2025, 03:19:43 PMThen again, the FPU can have -0 and +0 :badgrin:
True, but do we really need to display that?
I mean, when does it matter to us which zero the FPU is giving us?
yeah. I saw that too. But it´s useless IMHO (Unless we are working with Imaginary numbers, i suppose). Allowing the function output only +Infinite or -Infinite is enough.
Unicode Character "∞" (U+221E)
For those who want to use it in gui control,use with invoke sendwmessage
Quote from: daydreamer on May 26, 2025, 07:15:06 AMUnicode Character "∞" (U+221E)
For those who want to use it in gui control,use with invoke sendwmessage
I think "+INF" and "-INF" would be far safer bets than trying to display Unicode characters.
Quote from: NoCforMe on May 26, 2025, 08:27:25 AMQuote from: daydreamer on May 26, 2025, 07:15:06 AMUnicode Character "∞" (U+221E)
For those who want to use it in gui control,use with invoke sendwmessage
I think "+INF" and "-INF" would be far safer bets than trying to display Unicode characters.
Indeed. Currently, the function output this messages in case of errors:
- +INFINITE
- INDEFINITE
- -INFINITE
- QNAN
- SNAN
- Special INDEFINITE +INFINITE
- Special INDEFINITE -INFINITE
- Special INDEFINITE QNAN
- Special INDEFINITE SNAN
- Unknown FPU error
And for internal debugging, i added this while i´m testing for valid numbers
- Valid Normal Negative
- Valid Normal Positive
- Valid Subnormal Negative
- Valid Subnormal Positive
- Valid Zero Number
- Denormalized
I believe this covers all common FPU errors. Not sure if there are additional error modes to output.
And when the function identifies a Positive or Negative Subnormal value, it tries to de-normalize multiplying the input (Which is always converted to a tenbyte 1st) by 2^64 and immediately multiplied by it´s opposite (1/(2 ^64)) (That is the same as if i divided it again by 2^64)
Quote from: guga on May 26, 2025, 09:15:42 AMIndeed. Currently, the function output this messages in case of errors:
Are those "quiet" and "signaling" NANs? Do users really want to be informed of internal FPU conditions at that level of detail?
Have you checked with the FPU-meister here, Raymond F? (I don't know enough about it to be of much help.) He should be able to tell us what information here is relevant and what isn't.
Hi David. Yeah, i´m using a variation of his function to check for the categories. I´m just trying to output as many as possible, so it can be used in debuggers for additional information, for example.
RosAsm uses those information on the debugger (on a similar way as in Ollydbg or IdaPro), but i´m updating the function on a way it can be more portable so others can use as well.
This is the older FloattoAscii routine based on Raymonds fpu. Btw...I´ll try porting this to masm once i´m done.
; Flags:
[REGULAR 0 SCIENTIFIC 1]
Proc FloatToAscii:
Arguments @Source, @Destination, @Decimal, @Flag
Local @temporary, @eSize, @oldcw, @truncw, @stword
Structure @BCD 12, @bcdstr 0
fclex ;clear exception flags on FPU
; Get the specified number of decimals for result (MAX = 15):
On D@Decimal > 0F, mov D@Decimal 0F
; The FPU will be initialized only if the source parameter is not taken
; from the FPU itself (D@ Source <> &NULL):
.If D@Source = &NULL
fld st0 ;copy it to preserve the original value
.Else
mov eax D@Source
If eax > 0400_000
finit | fld T$eax
; Check first if value on FPU is valid or equal to zero:
ftst ;test value on FPU
fstsw W@stword ;get result
test W@stword 04000 ;check it for zero or NAN
jz L0> ;continue if valid non-zero
test W@stword 0100 ;now check it for NAN
jnz L1> ;Src is NAN or infinity - cannot convert
; Here: Value to be converted = 0
mov eax D@Destination | mov W$eax '0' ; Write '0', 0 szstring
mov eax &TRUE | finit | ExitP
Else
L1: finit | mov eax &FALSE | ExitP
End_If
.End_If
; Get the size of the number:
L0: fld st0 ;copy it
fabs ;insures a positive value
fld1 | fldl2t
fdivp ST1 ST0 ;->1/[log2(10)]
fxch | fyl2x ;->[log2(Src)]/[log2(10)] = log10(Src)
fstcw W@oldcw ;get current control word
mov ax W@oldcw
or ax 0C00 ;code it for truncating
mov W@truncw ax
fldcw W@truncw ;change rounding code of FPU to truncate
fist D@eSize ;store characteristic of logarithm
fldcw W@oldcw ;load back the former control word
ftst ;test logarithm for its sign
fstsw W@stword ;get result
test W@stword 0100 ;check if negative
jz L0>
dec D@eSize
L0: On D@eSize > 15, mov D@Flag SCIENTIFIC
; Multiply the number by a power of 10 to generate a 16-digit integer:
L0: fstp st0 ;get rid of the logarithm
mov eax 15
sub eax D@eSize ;exponent required to get a 16-digit integer
jz L0> ;no need if already a 16-digit integer
mov D@temporary eax
fild D@temporary
fldl2t | fmulp ST1 ST0 ;->log2(10)*exponent
fld st0 | frndint | fxch
fsub st0 st1 ;keeps only the fractional part on the FPU
f2xm1 ;->2^(fractional part)-1
fld1
faddp ST1 ST0 ;add 1 back
fscale ;re-adjust the exponent part of the REAL number
fxch
fstp st0
fmulp ST1 ST0 ;->16-digit integer
L0: fbstp T@bcdstr ;transfer it as a 16-digit packed decimal
fstsw W@stword ;retrieve exception flags from FPU
test W@stword 1 ;test for invalid operation
jnz L1<< ;clean-up and return error
; Unpack bcd, the 10 bytes returned by the FPU being in the little-endian style:
push ecx, esi, edi
lea esi D@bcdstr+9
mov edi D@Destination
mov al B$esi ;sign byte
dec esi | dec esi
If al = 080
mov al minusSign ;insert sign if negative number
Else
mov al Space ;insert space if positive number
End_If
stosb
...If D@Flag = REGULAR
; Verify number of decimals required vs maximum allowed:
mov eax 15 | sub eax D@eSize
cmp eax D@Decimal | jae L0>
mov D@Decimal eax
; ;check for integer digits:
L0: mov ecx D@eSize
or ecx ecx ;is it negative
jns L3>
; Insert required leading 0 before decimal digits:
mov ax '0o' | stosw
neg ecx
cmp ecx D@Decimal | jbe L0>
jmp L8>>
L0: dec ecx | jz L0>
stosb | jmp L0<
L0:
mov ecx D@Decimal | inc ecx
add ecx D@eSize | jg L4>
jmp L8>>
; Do integer digits:
L3: inc ecx
L0: movzx eax B$esi | dec esi | ror ax 4 | ror ah 4
add ax '00' | stosw | sub ecx 2 | jg L0<
jz L0>
dec edi
L0: cmp D@Decimal 0 | jz L8>>
mov al pointSign | stosb
If ecx <> 0
mov al ah | stosb
mov ecx D@Decimal | dec ecx | jz L8>>
Else
mov ecx D@Decimal
End_If
; Do decimal digits:
L4: movzx eax B$esi
dec esi
ror ax 4 | ror ah 4 | add ax 03030 | stosw
sub ecx 2 | jg L4<
jz L1>
dec edi
L1: jmp L8>>
; scientific notation
...Else
mov ecx D@Decimal | inc ecx
movzx eax B$esi | dec esi
ror ax 4 | ror ah 4 | add ax '00' | stosb
mov al pointSign | stosb
mov al ah | stosb
sub ecx 2 | jz L7>
jns L0>
dec edi | jmp L7>
L0: movzx eax B$esi
dec esi
ror ax 4 | ror ah 4
add ax '00' | stosw | sub ecx 2 | jg L0<
jz L7>
dec edi
L7: mov al 'E' | stosb
mov al plusSign, ecx D@eSize | or ecx ecx | jns L0>
mov al minusSign | neg ecx
L0: stosb
; Note: the absolute value of the size could not exceed 4931
mov eax ecx
mov cl 100
div cl ;->thousands & hundreds in AL, tens & units in AH
push eax
and eax 0FF ;keep only the thousands & hundreds
mov cl 10
div cl ;->thousands in AL, hundreds in AH
add ax '00' ;convert to characters
stosw ;insert them
pop eax
shr eax 8 ;get the tens & units in AL
div cl ;tens in AL, units in AH
add ax '00' ;convert to characters
stosw ;insert them
...End_If
L8: mov B$edi Space ;string terminating character
pop edi, esi, ecx
finit | mov eax D@eSize
EndP
And this is the one i´m currently updating.
Proc FloatToString:
Arguments @Float80Pointer, @InputFlag, @DestinationPointer, @TruncateBytes, @AddSubNormalMsg
Local @ExponentSize, @ControlWord, @FPUStatusHandle, @tempdw, @extra10x, @FPUMode
Structure @TmpStringBuff 128, @pTempoAsciiFpuDis 0, @pBCDtempoDis 64, @pTmpInputDis 96
Uses esi, edi, edx, ecx, ebx
; @FPUStatusHandle = 0 Default
; @FPUStatusHandle = 1 Increased. Positive exponent value
; @FPUStatusHandle = 2 Decreased. Negative exponent value
call 'RosMem.FastZeroMem' D@TmpStringBuff, 128
mov D@ExponentSize 0, D@FPUStatusHandle 0, D@extra10x 0 D@FPUMode SpecialFPU_PosValid; D@IsNegative &FALSE,
mov edi D@DestinationPointer, eax D@Float80Pointer
; always work with a cOpy of the input to prevent it being changed, specially when the number is subnormal.
lea ebx D@pTmpInputDis
finit | fclex | fstcw W@ControlWord ; Save FPU control word, clear exceptions, reset FPU
If D@InputFlag = FPU_STR_REAL4_INT
fild F$eax
Else_if D@InputFlag = FPU_STR_REAL4_FLOAT
fld F$eax
Else_if D@InputFlag = FPU_STR_REAL8_INT
fild R$eax
Else_if D@InputFlag = FPU_STR_REAL8_FLOAT
fld R$eax
Else
fld T$eax
End_If
fstp T$ebx
fldcw W@ControlWord | fwait ; Restore FPU control word
; Check for zero (positive or negative)
.If_and D$ebx = 0, D$ebx+4 = 0
If_Or W$ebx+8 = 0, W$ebx+8 = 08000
;mov W$ebx+8 0 ; no longer needed since we are working only with a copy
mov B$edi '0', B$edi+1 0
mov eax D@FPUMode
ExitP
End_If
.End_If
; Handle sign and special number categories
call RealTenFPUNumberCategory ebx
mov D@FPUMode eax
If eax >= SpecialFPU_QNAN ; do we have any special FPU being used ? Yes, display the proper message and exit
mov ebx eax
call WriteFPUErrorMsg eax, edi
mov eax ebx
ExitP
End_If
Test_If B$ebx+9 0_80
mov B$edi '-' | inc edi
xor B$ebx+9 0_80
Test_End
; Special handling for subnormal numbers
.If_Or D@FPUMode = SpecialFPU_PosSubNormal, D@FPUMode = SpecialFPU_NegSubNormal
finit | fclex | fstcw W@ControlWord ; Save FPU control word, clear exceptions, reset FPU
fld T$ebx ; Load the subnormal number into ST(0) (e.g., X)
fld T$Float_NormalizationFactor ; Load 2^60 into ST(0), pushing X to ST(1)
fmulp ST1 ST0 ; Compute X * 2^60. Result (normalized) is in ST(0). ST(1) is popped.
; FPU Stack: ST(0) = X * 2^60
fld T$Float_DenormalizationFactor ; Load 1/2^60 into ST(0), pushing (X * 2^60) to ST(1)
fmulp ST1, ST0 ; Compute (X * 2^60) * (1/2^60). Result (original X, now normalized) is in ST(0).
; FPU Stack: ST(0) = X (normalized)
fstp T$ebx ; Store the normalized original value back to memory (original @Float80Pointer location)
; NO ADJUSTMENT TO D@ExponentSize IS NEEDED HERE,
; because the number's actual mathematical value hasn't changed.
; Its internal representation is just optimized.
fldcw W@ControlWord | fwait ; Restore FPU control word
.End_If
; extract the exponent. 1e4933
finit | fclex | fstcw W@ControlWord
fld T$ebx
call GetExponentFromST0 &FPU_EXCEPTION_INVALIDOPERATION__&FPU_EXCEPTION_DENORMALIZED__&FPU_EXCEPTION_ZERODIV__&FPU_EXCEPTION_OVERFLOW__&FPU_EXCEPTION_UNDERFLOW__&FPU_EXCEPTION_PRECISION__&FPU_PRECISION_64BITS
mov D@ExponentSize eax
ffree ST0
.If D@ExponentSize < FPU_ROUND
fld T$ebx
fld st0 | frndint | fcomp st1 | fstsw ax
Test_If ax &FPU_EXCEPTION_STACKFAULT
lea ecx D@pBCDtempoDis
fbstp T$ecx ; -> TBYTE containing the packed digits
fwait
lea eax D@pTempoAsciiFpuDis
lea ecx D@pBCDtempoDis
call FloatToBCD_SSE ecx, eax
mov eax FPU_MAXDIGITS+1 | mov ecx D@ExponentSize | sub eax ecx | inc ecx
lea esi D@pTempoAsciiFpuDis | add esi eax
If B$esi = '0'
inc esi | dec ecx
End_If
call 'FastCRT.StrncpyEx' edi, esi, ecx | add edi eax
xor eax eax
jmp L9>>
Test_End
ffree ST0
.End_If
; Necessary for FPU 80 Bits. If it is 0, the correct is only 0 and not 0.e+XXXXX.
If D@ExponentSize = 080000000
mov D@ExponentSize 0
Else_If D@ExponentSize = 0
mov D@ExponentSize 0
End_If
; multiply the number by the power of 10 to generate required integer and store it as BCD
; We need to extract here all the exponents of a given number and multiply the result by the power of FPU_MAXDIGITS+1 (1e17)
; So, if our number is 4.256879e9, the result must be 4.256879e17. If we have 3e-2 the result is 3e17.
; If we have 0.1 the result is 1e17 and so on.
; If we have as a result a power of 1e16. It means that we need to decrease the iExp by 1, because the original
; exponential value is wrong.
; This result will be stored in ST0
..If D@ExponentSize <s 0
mov eax D@ExponentSize | neg eax | add eax FPU_MAXDIGITS ; always add MaxDigits-1
mov edx D@ExponentSize | lea edx D$eax+edx
.If eax > 4932
mov edx eax | sub edx 4932 | sub edx FPU_MAXDIGITS
mov D@extra10x edx | add D@extra10x FPU_MAXDIGITS
mov eax 4932
If D@extra10x >= FPU_MAXDIGITS
inc D@ExponentSize
End_If
.Else_If edx >= FPU_MAXDIGITS
inc D@ExponentSize
.End_If
..Else_If D@ExponentSize > 0
mov eax FPU_MAXDIGITS+1 | sub eax D@ExponentSize
..Else ; Exponent size = 0
mov eax FPU_MAXDIGITS+1
..End_If
mov D@tempdw eax
; Apply scaling
fild D@tempdw
call ST0PowerOf10
fld T$ebx | fmulp ST0 ST1
If D@extra10x > 0
; Calculate the exponencial value of the extrabytes
fild F@extra10x
call ST0PowerOf10
fmulp ST0 ST1 ; and multiply it to we get XXe17 or xxe16
End_If
; now we must get the power of FPU_MAXDIGITS+1. In this case, we will get the value 1e17.
fstp T$FloatToStr_ScaledValue ; Load scaled number back into ST0
Fpu_If T$FloatToStr_ScaledValue < T$FloatToStr_Reference
fld T$FloatToStr_ScaledValue ; Reload the scaled value into ST0
fmul R$FloatToStr_Ten ; Perform the multiplication
dec D@ExponentSize
fstp T$FloatToStr_ScaledValue ; Store the result back, or use it directly if this is the last FPU operation
Fpu_End_If
; Now, get the FPU status *after* the macro has executed its comparison and potentially fmul.
; This will capture the FPU status word including any exception flags from the fmul.
; If FPURoundFix needs the comparison result (CF, ZF, PF), this still won't work perfectly
; because the SAHF already transferred them to EFLAGS, and we need to get the full FSW.
; However, if FPURoundFix mostly cares about *exceptions* (invalid, underflow, overflow), this might suffice.
fstsw ax | fwait | mov D@FPUStatusHandle eax ; save exception flags. THis is the best place to capture all exceptions
; including the oned related to the comparitions
fld T$FloatToStr_ScaledValue ; Reload the final scaled value for fbstp.
; Final conversion to BCD
lea ecx D@pBCDtempoDis
fbstp T$ecx ; ->TBYTE containing the packed digits
fwait
lea ecx D@pBCDtempoDis
lea eax D@pTempoAsciiFpuDis
call FloatToBCD_SSE ecx, eax
; Adjust the Exponent when some Exceptions occurs and try to Fix whenever is possible the rounding numbers
lea eax D@pTempoAsciiFpuDis
call FPURoundFix eax, D@FPUStatusHandle D@ExponentSize, D@TruncateBytes
mov D@ExponentSize eax
lea esi D@pTempoAsciiFpuDis | mov ecx D@ExponentSize
inc ecx
..If_And ecx <= FPU_ROUND, ecx > 0
mov eax 0
While B$esi <= ' ' | inc esi | End_While
While B$esi = '0'
inc esi
; It may happens that on rare cases where we had an ecx = 0-1, we have only '0' on esi.
; So while we are cleaning it, if all is '0', we set edi to one single '0', to avoid we have a
; Empty String.
If B$esi = 0
mov B$edi '0' | inc edi
jmp L9>>
End_If
End_While
C_call 'FastCRT.FormatStr' edi, {'%s' 0}, esi | add edi D@ExponentSize | inc edi
add esi D@ExponentSize | inc esi
.If B$esi <> 0
mov B$edi '.' | inc edi
C_call 'FastCRT.FormatStr' edi, {'%s' 0}, esi | add edi eax
While B$edi-1 = '0' | mov B$edi-1 0 | dec edi | End_While
If B$edi-1 = '.'
dec edi
End_If
.End_If
..Else
While B$esi <= ' ' | inc esi | End_While
.While B$esi = '0'
inc esi
If B$esi = 0
mov B$edi '0' | inc edi
jmp L9>>
End_If
.End_While
.If B$esi <> 0
movsb | mov B$edi '.' | inc edi
C_call 'FastCRT.FormatStr' edi, {'%s' 0}, esi | add edi eax
; Clean last Zeros at the end of the Number String.
While B$edi-1 = '0' | mov B$edi-1 0 | dec edi | End_While
If B$edi-1 = '.'
dec edi
End_If
mov B$edi 'e' | mov eax D@ExponentSize
mov B$edi+1 '+'
Test_If eax 0_8000_0000
neg eax | mov B$edi+1 '-'
Test_End
inc edi | inc edi
C_call 'FastCRT.FormatStr' edi, {'%d' 0}, eax | add edi eax
.Else
mov B$edi '0' | inc edi
.End_If
..End_If
; For developers only:
; Uncomment these function if you want to analyse the Exceptions modes of the FPU Data.
;call TestingFPUExceptions D@FPUStatusHandle ; Control Word exceptions
;call TestingFPUStatusRegister D@FPUStatusHandle ; Status Registers envolved on the operation
L9:
mov B$edi 0
fldcw W@ControlWord | fwait
.If D@AddSubNormalMsg = &TRUE
If_Or D@FPUMode = SpecialFPU_PosSubNormal, D@FPUMode = SpecialFPU_NegSubNormal
call 'FastCRT.StrCpy' edi, {B$ " (Denormalized)", 0}
End_If
.End_If
mov eax D@FPUMode
EndP
The new version wil include the output formats that lacks in RosAsm. So, one can output the numbers in a wide range of formats as needed.
The goal is create a general usage function and use it as an export on a dll. And also, add this function as an extension to a FormatString i created for RosAsm (A variation of ws_printf). I´m quite done. Just need to review the code, clean and create the routines to handle non-scientific outputs, such as 0.000121234, 12344.5645 etc etc
dont forget to also include the common usage of SSE/SSE2 denormal and such,which is high probability fp code is run with SSE/SSE2 instead,even if you dont write SSE/SSE2 code yourself,invoking math function from a library might be coded in SSE/SSE2
I am curious if it works on all OSes used today :win7,win8,win10,win11 with unicode in a gui control,or functionality on earliest win7 lack unicode caps ?
Quote from: daydreamer on May 26, 2025, 03:41:55 PMdont forget to also include the common usage of SSE/SSE2 denormal and such,which is high probability fp code is run with SSE/SSE2 instead,even if you dont write SSE/SSE2 code yourself,invoking math function from a library might be coded in SSE/SSE2
I am curious if it works on all OSes used today :win7,win8,win10,win11 with unicode in a gui control,or functionality on earliest win7 lack unicode caps ?
Hi Daydreamer
SSE/SSE2 denormal i´m using specific for a pow, log and exp functions. For FPU it´s not necessary, since i´m focusing in precision as well and FPU grant´s that when dealing with 80 bits. The helper functions i created that uses SSE inside the FPUtoString are for the BCD conversion, dwtoAscii conversion and to copy the error messages to the output.
The older FloatToBCD inside RosAsm was this:
; new on 20/05/2025
Proc FloatToBCD:
Arguments @Input, @Output
Uses esi, edi, ecx, eax
mov esi D@Input
add esi 9
mov edi D@Output
; The 1st bytes of the BCD Will always be 0. So we will need to bypass them to properly
; achieve a better result in case we are dealing with negative exponent values or other non
; integer values that may result in more then One zero byte at the start. Example:
; Sometimes when we are analysing the number 1, it can have the following starting bytes:
; 00999999999999999 . So, the 1st Byte is ignored to the form in edi this: 09999999999.
; On this example it is the number 0.999999999999999e-6 that is in fact 1e-6
mov ecx 10
If B$esi = 0
dec ecx
dec esi
End_If
Do
; Process two nibbles at once
mov al B$esi | mov ah al | shr al 4 | and ah 0F
add ax '00'
mov W$edi ax
add edi 2
dec esi
dec ecx
Repeat_Until_Zero
mov B$edi 0
EndP
But, i updated it again to use SSE2 as:
; equates for pshufd and friends
[SSE_INVERT_DWORDS 27] ; invert the order of dwords
[SSE_ROTATE_64BITS 78] ; the same things and result as SSE_SWAP_QWORDS, SSE_ROTATE_RIGHT_64BITS, SSE_ROTATE_LEFT_64BITS, SSE_ROTATE_64BITS
; global variables used:
[<16 SSE2_NibbleMask: B$ 0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F]
[<16 SSE2_AsciiZero: B$ '0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0']
Proc FloatToBCD_SSE:
Arguments @Input, @Output
Structure @XmmPreserve 128, @XMMReg0Dis 0, @XMMReg1Dis 16, @XMMReg2Dis 32, @XMMReg3Dis 48
Uses esi, edi, ecx, eax
; Preserve XMM registers efficiently
movdqu X@XMMReg0Dis XMM0
movdqu X@XMMReg1Dis XMM1
movdqu X@XMMReg2Dis XMM2
movdqu X@XMMReg3Dis XMM3
mov esi D@Input
mov edi D@Output
; Check for leading zero byte
mov ecx 2
If B$esi+9 = 0
dec ecx
End_If
add esi ecx
; Prepare SSE2 constants
movdqa xmm3 X$SSE2_NibbleMask ; 0x0F0F0F0F...
movdqa xmm2 X$SSE2_AsciiZero ; '00' repeated
; Process 8 bytes at a time (SSE2 can do 16, but we're limited by BCD format)
movq xmm0 Q$esi ; Load bytes [esi+ecx..esi]
MOVDQA xmm1 xmm0 ; xmm1 for lower nibbles
PAND xmm1 xmm3 ; Isolate lower nibbles
POR xmm1 xmm2 ; Convert lower nibbles to ASCII
PSRLW xmm0 4 ; Shift upper nibbles to lower positions
PAND xmm0 xmm3 ; Isolate shifted upper nibbles
POR xmm0 xmm2 ; Convert upper nibbles to ASCII
;SSE_SWAP_D_HI_LOW xmm2 xmm0
; --- Interleave the digits (Upper, Lower, Upper, Lower...) ---
; The result for the first 16 digits will be in xmm0
PUNPCKLBW xmm0 xmm1 ; xmm0 now has [U0, L0, U1, L1, U2, L2, U3, L3, U4, L4, U5, L5, U6, L6, U7, L7]
PSHUFHW xmm0 xmm0 SSE_INVERT_DWORDS ; Reverse 16-bit word order in high 64 bits of xmm0
PSHUFLW xmm0 xmm0 SSE_INVERT_DWORDS ; Reverse 16-bit word order in low 64 bits of xmm0
PSHUFD xmm0 xmm0 SSE_ROTATE_64BITS ; Swap 64-bit halves and reverse dword order within them.
; Store 16 ASCII characters
movdqu X$edi xmm0
add edi 16
; process remaining:
sub esi ecx
Do
; Process two nibbles at once
mov al B$esi | mov ah al | shr al 4 | and ah 0F
add ax '00'
mov W$edi ax
add edi 2
dec esi
dec ecx
Repeat_Until_Zero
mov B$edi 0
; Restore XMM registers
movdqu XMM0 X@XMMReg0Dis
movdqu XMM1 X@XMMReg1Dis
movdqu XMM2 X@XMMReg2Dis
movdqu XMM3 X@XMMReg3Dis
EndP
I´m using just a few SSE2 functions to optimize the whole function a little bit. It´s not completely necessary, but it won´t hurt optimizing it, specially if the function is to be used in large databases where someone needs to convert the FPU to Ascii string. For example, i´ll have to use it anyway on a large set of 130.000 variables i'll use in the pow function. So it won´t hurt forcing the function to be a bit faster, while maintaining performance.
Guga great to code BCD using SSE2 :thumbsup:
,also because in x64 mode,the special x86 bcd opcodes wont work
but best performance when optimizing big ascii files are
1:allocate big enough buffer in memory which is fastest and convert all numbers to ascii text
and code that adds together strings "3.14159 "+"1.4215 "+ lot faster than print/fprint
2:block read/write whole buffer to files,compared to fprint 130.000 numbers lot faster
Hi Daydreamer
Tks :thumbsup: :thumbsup: I didn´t knew that problem in x64 related to bcd opcodes. I thought it could be done easily. When i ported this BCD routine to SSE2 for x86, it was a hell, because i wasn´t being able to reverse the order of all 18 to 20 ascii chars at once (i could use pshufb, but for now, i can´t try using SSE4 opcodes yet). Fortunately, i remembered a old test in another function that swapped words, and i gave a last try to see if that could be adapted to work as expected :)
About optimizing Big ascii files. Yeah, i´m aware that using the data in memory of such larger dataset is faster. But, i didn´t tested it yet on the pow function. My original idea still is use a small table of variables varying in few blocks (3 perhaps) of only 128, 256 or 512 Real8 as i did when succeeded to port the M$ function. The problem is that in both (mine and M$) it starts loosing precision after the 15th to 16th digit, and i have no idea how M$ scaled this thing. I mean, i couldn´t succeed to recreate the exact Math equations they did to produce the values on those tables.
That´s why i tried a new table regardless the size. The problem is that, as you saw, it may loose performance, and it will occupy a large space in the data section, resulting on a even larger dll (The math functions will be used on a dll, btw).
On the other hand, if i use them in memory, i´ll have to precalculate the values of all used tables, before pass them onto the pow (or log, ln etc etc) function. And it will kill performance making the code slow. Alternatively, to bypass this performance issue, i can simply add a Generate Table function, (or initially/Setup function - or whatever name it can be) and inform the users that if he wants to use the math functions, such initialization functions must be settled 1st.
Or since the functions will be used as exports apis inside a dll, i can use the initialization functions on the start of the Dll (Right after DLL_PROCESS_ATTACH), and if someone wants to use the pow, log, etc etc, it won´t loose his speed since the tables will be previously generated in memory.
I´ll have to test all of this to make sure what is the better to use. Ideally it would be better use just smaller tables but, no matter what math equation i try, i didn´t succeeded to make the result be precise up to the 17th digit yet and grant me the same value as in wolframalpha. The only way i found (yet) is with those larger tables.
Quote from: guga on May 27, 2025, 06:51:53 PMHi Daydreamer
Tks :thumbsup: :thumbsup: I didn´t knew that problem in x64 related to bcd opcodes. I thought it could be done easily. When i ported this BCD routine to SSE2 for x86, it was a hell, because i wasn´t being able to reverse the order of all 18 to 20 ascii chars at once (i could use pshufb, but for now, i can´t try using SSE4 opcodes yet).
This is all very interesting, but tell us this:
Who the hell even uses BCD anymore?I'm pretty sure that format went the way of the dinosaurs with "big iron" IBM mainframes and such.
Do you know of even one instance of someone actually using that nowadays?
Sheesh, the amount of effort that people expend here on things that nobody
[1] is never going to use ...
[1] For certain values of "nobody".
Hi,
Quote from: guga on May 26, 2025, 06:25:53 AMQuote from: NoCforMe on May 26, 2025, 04:36:58 AMQuote from: sinsi on May 25, 2025, 03:19:43 PMThen again, the FPU can have -0 and +0 :badgrin:
True, but do we really need to display that?
I mean, when does it matter to us which zero the FPU is giving us?
yeah. I saw that too. But it´s useless IMHO (Unless we are working with Imaginary numbers, i suppose). Allowing the function output only +Infinite or -Infinite is enough.
Any normal usage of imaginary numbers does not have the concept of
a "signed" zero. Plus and minus zeroes are an artifact of the internal
representation of a floating point number used by the FPU. At least those
made by Intel.
Cheers,
Steve N.
Good. So we're paring down the number of cases that actually need to be presented to the user here.
That should make your life a little bit easier, @guga.
Quote from: NoCforMe on May 28, 2025, 05:01:37 AMQuote from: guga on May 27, 2025, 06:51:53 PMHi Daydreamer
Tks :thumbsup: :thumbsup: I didn´t knew that problem in x64 related to bcd opcodes. I thought it could be done easily. When i ported this BCD routine to SSE2 for x86, it was a hell, because i wasn´t being able to reverse the order of all 18 to 20 ascii chars at once (i could use pshufb, but for now, i can´t try using SSE4 opcodes yet).
This is all very interesting, but tell us this:
Who the hell even uses BCD anymore?
I'm pretty sure that format went the way of the dinosaurs with "big iron" IBM mainframes and such.
Do you know of even one instance of someone actually using that nowadays?
Sheesh, the amount of effort that people expend here on things that nobody[1] is never going to use ...
[1] For certain values of "nobody".
Hi David.
The bcd part of the code is necessary to unpack the values stored in FBSTP opcode.
https://c9x.me/x86/html/file_module_x86_id_83.html (https://c9x.me/x86/html/file_module_x86_id_83.html)
https://masm32.com/masmcode/rayfil/BCDtut.html (https://masm32.com/masmcode/rayfil/BCDtut.html)
About other uses, I don't know if it is used in other ways besides these conversion operations with FPU.
QuoteGood. So we're paring down the number of cases that actually need to be presented to the user here.
That should make your life a little bit easier, @guga.
Yes. I don´t see the purpose of forcing the function to output +0 or -0. It already exports the proper error messages and message codes related to FPU.
I finished today all main representations and corresponding flags. So, i allowed to the user have full control of the rounding mode and amount of digits to output. Ex: transforming numbers as 1.99999965858 to 2.0 etc, truncating the output to a certain amount of digits, padding the ending digits with 0 (If someone needs this for alignment) etc. It works for scientific and not scientific format.
I´ll finish now the cases of negative exponents smaller than 18 (Ex: 1.5e-16 etc), so it can be represented as 0.00000000000000015 etc. And then, i´ll clean up the code and port it to masm.
Later i´ll adapt a function i created to mimic the behavior of ws_printf (and printf) to allow input things like: printf("%f", myFloat) Also allowing the user to have full control on the output.
A general function that converts FPU to string is helpful for a wide range of things people can create, since representing the values on controls (edit controls, static controls etc) or using it in huge tables and so on. I believe a single and general function is easier than forcing people to create their own FPU conversion routines for each needs.
Quote from: guga on May 28, 2025, 08:35:56 AMQuote from: NoCforMe on May 28, 2025, 05:01:37 AMQuote from: guga on May 27, 2025, 06:51:53 PMHi Daydreamer
Tks :thumbsup: :thumbsup: I didn´t knew that problem in x64 related to bcd opcodes. I thought it could be done easily. When i ported this BCD routine to SSE2 for x86, it was a hell, because i wasn´t being able to reverse the order of all 18 to 20 ascii chars at once (i could use pshufb, but for now, i can´t try using SSE4 opcodes yet).
This is all very interesting, but tell us this:
Who the hell even uses BCD anymore?
I'm pretty sure that format went the way of the dinosaurs with "big iron" IBM mainframes and such.
Do you know of even one instance of someone actually using that nowadays?
Sheesh, the amount of effort that people expend here on things that nobody[1] is never going to use ...
[1] For certain values of "nobody".
The bcd part of the code is necessary to unpack the values stored in FBSTP opcode.
Yes, but again,
who is going to use FBSTP? That would indicate someone who is doing computations in BCD; who does that nowadays?
The only use case I can think of is someone either running a COBOL program or using a COBOL emulator, which might use BCD as a numeric storage format.
... unless you plan on advertising your FPU functions as "capable of handling all FPU data types" ...
Take a look at Raymond´s FpuFLtoA here (https://masm32.com/masmcode/rayfil/downloads/Fpulib2_341.zip). I don't know other faster method to convert these values (which are packed) stored in a TenByte without using his method (which i adapted to work with SSE2 only on this specific part of the code).
Quote... unless you plan on advertising your FPU functions as "capable of handling all FPU data types" ...
I did it already. It uses all FPU types and exports the error messages according
(Expect for those +0 or -0, which are useless, IMHO. After all all it represents are a type of +INF and -INF - Already existent on the function). The idea is make a general function simple to use as possible, and yet able to output all types of messages (and values) needed to work with FPU.
Quote from: guga on May 28, 2025, 10:06:06 AMTake a look at Raymond´s FpuFLtoA here (https://masm32.com/masmcode/rayfil/downloads/Fpulib2_341.zip). I don't know other faster method to convert these values (which are packed) stored in a TenByte without using his method (which i adapted to work with SSE2 only on this specific part of the code).
So again, at the risk of sounding like a broken record
[1]:
Who cares how fast the conversion s are?[1] Probably not meaningful to those who've never heard a 78 rpm record with a crack ...
No problem, but...Why would I take Raymond's function and make it slower? What's the point of using the function he created using a specific opcode for this type of conversion (fbstp) and making the result dozens of times slower? If you don't use fbstp, the alternative would be to convert each byte to decimal with opcodes like div that would loop up to 10 times to transform each byte into decimal ascii. Do you see how much slower this would make the function? Why would I ruin his code like this, if there is already a specific opcode for this that allows you to do the conversion all at once?
The function I created (adapted from Raymond's) is for general use. I can't assume that someone will use it solely and exclusively to put text (Float ascii, in fact) in an edit box. If the function is for general use, it has to be effective for other uses, especially if the user uses it for large databases, for example. Otherwise, it would be easier to just use printf to generate the float ascii.
The core of the function is already done. Why would I change it and make it slower? There is no harm in making the function faster for general uses.
I'm not saying, obviously, that you should make it slower. That would be exceedingly stupid.
I'm just saying you shouldn't obsess over how fast it is. Of course you're going to code it in an efficient way that'll probably be at least as fast as Raymond's code (which BTW isn't particularly fast) or faster.
Besides, with your 0.1% of the share of the assembler "market", I'm not sure your efforts make all that much difference anyway.
https://www.youtube.com/watch?v=kw-U6smcLzk (https://www.youtube.com/watch?v=kw-U6smcLzk)
Hi all,
I wrote this routine 7 years ago because, I needed 256 real4 values at once at 60Hz in realtime on my screen.
This SIMD routine is 28 times faster then sprintf for real4 values on my old PC.
Edit: latest version
align 4
Real4_2_ASCII proc Real4string:DWORD,floatnumber:REAL4
mov ecx,Real4string
mov eax,floatnumber
test eax,eax
je message_PosZero
cmp eax,080000000h
je message_NegZero
; check floating-point exceptions
cmp eax,07F7FFFFFh
ja TestExceptions
ProcessReal4:
mov byte ptr [ecx],020h ; Write " " to the string
test eax,80000000h ; Check the sign bit
jz No_SignBit
mov byte ptr [ecx],02dh ; write "-" character to the string
and floatnumber,7FFFFFFFh ; Make it an absolute value
and eax,7FFFFFFFh ; Remove the sign bit
No_SignBit:
; Fast Log10(x)-1 routine to calculate the number of digits
shr eax,23 ; Get the 8bit exponent
sub eax,127 ; Adjust for the exponent bias
cvtsi2ss xmm0,eax ; Convert int32 to real4
mulss xmm0,Log10_2 ; Approximate Log10(x) == Log10(2) * exponent bits == 0.30102999566398119 * exponent bits
addss xmm0,PowersOfTen[37*4] ; Add one to get the approximated number of digits from the floating point value
cvtss2si eax,xmm0 ; Convert real4 to int32
mov ecx,eax ; Save approximated number of digits
mov edx,38+1 ; Highest possible number of digits + 1
add eax,edx ; Get the Power Of Ten offset for the digits rounding check
; Now do the check to get the exact rounded number of digits from the floating point value
; We can do this by comparing it to the closest Power Of Ten below the floating point value
movss xmm0,floatnumber
comiss xmm0,PowersOfTen[eax*4-4]
jc ExactLog10xMin1 ; Is it below the closest Power Of Ten?
cmp ecx,edx ; It is above, also check the approximated number of digits
je ExactLog10xMin1 ; Is it not above the highest possible number of digits skip adjustment
dec edx ; Adjust the number of digits by subtracting one
ExactLog10xMin1: ; Now we are allmost done to get the exact number of digits
; There is one exception, the lowest Power Of Ten check value is out of range ( 1.0E+39 )
; See the last added value in the PowersOfTen table, it's used for the out of range check
sub edx,ecx ; edx holds the offsets for the PowersOfTen table and the scientific notation string table
mulss xmm0,PowersOfTen[edx*4] ; Get the calculated Power Of Ten value and multiply it with the floating point value
comiss xmm0,PowersOfTen[38*4] ; Compare to 10.0
jnc ExactNumDigits ; Is it below 10.0?
inc edx ; It is below, adjust the offset for the scientific notation string
mulss xmm0,PowersOfTen[38*4] ; Adjust decimal position ( it also solves the out of range issue )
ExactNumDigits: ; At this point we have the exact number of digits from the floating point value
mulss xmm0,PowersOfTen[42*4] ; Get the 7 significant digits from the range -1.175494E-38 to 3.402823E+38
cvtss2si eax,xmm0 ; We want a Natural Number
cvtsi2ss xmm0,eax ; So, remove the digits after the decimal point
shufps xmm0,xmm0,0 ; Splat...., make 4 copies from the real4 number
movaps xmm1,xmm0 ; Copy to a total of 8 copies
mulps xmm0,dividers ; Produce base10 numbers
mulps xmm1,dividers+16 ; Produce base10 numbers
movaps xmm2,xmm0 ; Copy them
movaps xmm3,xmm1 ; Copy them
mulps xmm2,div10 ; Nullify least significant base10 numbers
mulps xmm3,div10 ; Nullify least significant base10 numbers
cvttps2dq xmm0,xmm0 ; Truncate remaining fractions
cvttps2dq xmm1,xmm1 ; Truncate remaining fractions
cvtdq2ps xmm0,xmm0 ; Convert back to real4
cvtdq2ps xmm1,xmm1 ; Convert back to real4
cvttps2dq xmm2,xmm2 ; Truncate remaining fractions
cvttps2dq xmm3,xmm3 ; Truncate remaining fractions
cvtdq2ps xmm2,xmm2 ; Convert back to real4
cvtdq2ps xmm3,xmm3 ; Convert back to real4
mulps xmm2,mul10 ; Move them back in the correct base10 position
mulps xmm3,mul10 ; Move them back in the correct base10 position
subps xmm0,xmm2 ; Subtract to get the extracted digits
subps xmm1,xmm3 ; Subtract to get the extracted digits
cvttps2dq xmm0,xmm0 ; Convert back to int32
cvttps2dq xmm1,xmm1 ; Convert back to int32
shufps xmm0,xmm0,11100001b ; Swap the 2 first digits, the first digit is always zero
; Now we can write the decimal point for free ( no more memory swaps )
; Using a prepared ASCIIconverter constant
packssdw xmm0,xmm1 ; Pack 8 x 32bit to 8 x 16bit ( signed but, we are within the limit )
packuswb xmm0,xmm0 ; Pack 8 x 16bit to 16 x 8bit unsigned
movq xmm1,ASCIIconverterE ; Prepared to insert a decimal point and convert to ASCII in one go
paddb xmm1,xmm0 ; Convert the number to ASCII
mov edx,Scientific_sz[edx*4]; Get the 4 byte scientific notation string
mov ecx,Real4string
movq qword ptr [ecx+1],xmm1 ; Write the 7 significant digits
mov [ecx+9],edx ; Write the scientific notation string
; mov byte ptr [ecx+13],0 ; Terminate the string ( not needed we are inside the 16 bytes )
ret
TestExceptions:
cmp eax,07F800000h
je message_Inf
cmp eax,07F800001h
je message_SNaN
cmp eax,07FBFFFFFh
je message_SNaN
cmp eax,07FC00000h
je message_QNaN
cmp eax,07FFFFFFFh
je message_QNaN
cmp eax,0FFC00001h
je message_QnegNaN
cmp eax,0FFBFFFFFh
je message_SnegNaN
cmp eax,0FF800001h
je message_SnegNaN
cmp eax,0FFC00000h
je message_Indeterm
cmp eax,0FF800000h
je message_NegInf
cmp eax,0FFFFFFFFh
je message_QnegNaN
jmp ProcessReal4 ; No exceptions found, proceed...
message_QnegNaN:
movaps xmm0,oword ptr szQnegNaN
movaps oword ptr [ecx],xmm0
ret
message_SnegNaN:
movaps xmm0,oword ptr szSnegNaN
movaps oword ptr [ecx],xmm0
ret
message_Indeterm:
movaps xmm0,oword ptr szIndeterm
movaps oword ptr [ecx],xmm0
ret
message_NegInf:
movaps xmm0,oword ptr szNegInf
movaps oword ptr [ecx],xmm0
ret
;message_NegNorm:
; movaps xmm0,oword ptr szNegNorm
; movaps oword ptr [ecx],xmm0
; ret
;message_Norm:
; movaps xmm0,oword ptr szNorm
; movaps oword ptr [ecx],xmm0
; ret
message_Inf:
movaps xmm0,oword ptr szInf
movaps oword ptr [ecx],xmm0
ret
message_SNaN:
movaps xmm0,oword ptr szSNaN
movaps oword ptr [ecx],xmm0
ret
message_QNaN:
movaps xmm0,oword ptr szQNaN
movaps oword ptr [ecx],xmm0
ret
message_PosZero:
movaps xmm0,oword ptr szPosZero
movaps oword ptr [ecx],xmm0
ret
message_NegZero:
movaps xmm0,oword ptr szNegZero
movaps oword ptr [ecx],xmm0
ret
Real4_2_ASCII endp
Tks a lot SiekManski
I´ll take a look.
Quote64 bit exp: 4 float: 1.000000e+004 power10: 1.000000e+002 result: 1000000.000000
64 bit exp: 0 float: 1.000000e-045
mxcsr_register: 1FA2h exp: 0 mantissa: 00000001h
PowerOfTen: 1.401298e-045 00000001h
SIMD Real4 to ASCII conversion by Siekmanski 2018.
1000000 calls per Run for the Cycle counter and the Routine timer.
AMD Ryzen 5 2400G with Radeon Vega Graphics
Routine timers running now....
Real4_2_ASCII Cycles: 86 RoutineTime: 0.021150000 seconds
sprintf Cycles: 1960 RoutineTime: 0.554756400 seconds
Result Real4_2_ASCII: 3.402823e+38
Result sprintf : 3.402823e+038
Press any key to continue...
Btw...This (https://randomascii.wordpress.com/2012/01/11/tricks-with-the-floating-point-format/) article is really interesting
Great code Siekmanski :thumbsup: