The MASM Forum

General => The Laboratory => Topic started by: raymond on January 31, 2025, 02:07:46 PM

Title: BINARY INTEGER SQUARE ROOTS
Post by: raymond on January 31, 2025, 02:07:46 PM
Some testing has been performed lately to compare the speed of different procedures to extract square roots. Since one of the purposes was to search for prime numbers, and prime numbers are integers, this prompted me to revive an old procedure to extract the 'integer' square root of 'integer' numbers with 'integer' code instructions, and compare its speed to the general use of the FPU.

Although it may be three times slower than the FPU for the largest 31-bit integers, some may be interested in seeing this simple algo. It uses only the EAX, ECX and EDX registers (and it could be easily adapted to the 64-bit system by using the three equivalent 64-bit registers, as long as the bsr and bts instructions are still in use).

This is an adaptation of a procedure which was described in an IBM user's manual for their mechanical calculator marketed in the mid-50s. Not being a mathematician, I don't know where they may have obtained such a procedure nor can I explain why it works. I would assume that it may have been developed by the ancients to use with an "abacus".

;this "getsqr" procedure would be called with the target integer already in the EAX register
;the programmer would be responsible to check it's validity
;'negative' integers would simply be handled as positive 32-bit integers

getsqr:
      bsr   edx,eax     ;get position number of most significant digit in EAX
      xor   ecx,ecx
      bts   ecx,edx     ;ECX to be used for trailing bit position
      shr   edx,1 ;check if position was in even or odd number bit
      .if   CARRY?      ;if EDX was odd, the target has even number of significant bits
            shr   ecx,1 ;shift ECX, it must start at the next even position
      .endif
      xor   edx,edx     ;EDX to be used for sqr

sq1:
      add   edx,ecx
      .if   edx > eax
            sub   edx,ecx
            jmp   sq2
      .endif
      sub   eax,edx
      add   edx,ecx
sq2:
      shr   edx,1
      shr   ecx,1
      jc    sq3         ;last integer bit of root has been set
      shr   ecx,1       ;shift under next even position
      jmp   sq1

sq3:

;at this point, the EDX register would contain the truncated square root result.
;if a rounded result would be preferred, the following code must then be used
; .if eax > edx
; inc edx
; .endif

      ret               ;sqr answer is returned in the EDX register
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: Biterider on January 31, 2025, 05:48:07 PM
Hi raymond
Very interesting  :thumbsup:
Thank you.

Regards, Biterider
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: HSE on January 31, 2025, 09:56:59 PM
Hi Raymond!

Quote from: raymond on January 31, 2025, 02:07:46 PMand it could be easily adapted to the 64-bit system by using the three equivalent 64-bit registers, as long as the bsr and bts instructions are still in use

That is correct  :thumbsup:

Regards, HSE
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: jj2007 on February 01, 2025, 01:28:52 AM
Quote from: raymond on January 31, 2025, 02:07:46 PMNot being a mathematician

I always thought you were the only mathematician in this forum, Raymond :cool:
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: daydreamer on February 01, 2025, 04:24:58 AM
Thanks, great raymond  :thumbsup:
It Shows similarity to a division proc based on how human calculate many digits with pen & paper in school
Héctor 64 bit conversion :thumbsup:

Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: raymond on March 06, 2025, 10:58:42 AM
Just thought of a detail which I had not mentioned in my original post. Using the given code, regardless if you modify it or not to get a "rounded" square root, the EAX register would not be altered. Then, upon return, if EAX=0, the result in the EDX register would be the exact square root.

Without the rounding code, the resulting content of the EAX register would be equivalent to the remainder of a division of the target number by the truncated square root.
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: ognil on March 06, 2025, 12:13:49 PM
Hi,
To compute the square root of an integer using SSE instructions in MASM64 assembly, follow these steps:

Load the integer into a general-purpose register (e.g., RAX).
Convert the integer to a double-precision float using CVTSI2SD.
Compute the square root using the SQRTSD instruction.
Result is stored in an XMM register as a double-precision float.

Here's the code example:

.data
num QWORD 25  ; Example integer (64-bit)
.code
main proc
    ; Load the integer into RAX
    mov rax, num
    ; Convert RAX to double-precision float in XMM0
    cvtsi2sd xmm0, rax
    ; Compute square root of XMM0
    sqrtsd xmm0, xmm0
    ; The result is now in XMM0 as a double-precision float
    ; Optionally convert back to integer (truncates decimal part)
    ; cvttsd2si rax, xmm0
    ret
main endp

Explanation:
cvtsi2sd xmm0, rax : Converts the 64-bit integer in RAX to a double-precision float in XMM0.
sqrtsd xmm0, xmm0 : Computes the square root of the value in XMM0, storing the result back in XMM0.

Optional Conversion : Use cvttsd2si to convert the result back to an integer if needed, but note this truncates the decimal part.
This approach handles 64-bit integers accurately using double-precision floating-point operations.
For 32-bit integers, the code remains valid as the upper bits are zero-extended. :biggrin:
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: NoCforMe on March 06, 2025, 05:16:55 PM
Hey, since we're now posting totally trivial code examples (yours boils down to really just one instruction: sqrtsd xmm0, xmm0), I'll offer my contribution:

    fsqrt
    fistp  integerSqrt

See how simple that is?
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: jj2007 on March 06, 2025, 08:19:56 PM
Quote from: ognil on March 06, 2025, 12:13:49 PMCompute the square root using the SQRTSD instruction.

Eleven years ago, in the Lab (https://masm32.com/board/index.php?topic=3125.msg32376#msg32376)

Quote from: NoCforMe on March 06, 2025, 05:16:55 PMyours boils down to really just one instruction: sqrtsd xmm0, xmm0

Right.
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: ognil on March 07, 2025, 02:07:44 AM
QuoteEleven years ago, in the Lab

Тhanks jj, :thumbsup:
I saw MichaelW's post and was immediately disappointed and lost interest in it.
For me, posting code without a detailed explanation of each line and instruction has zero value due to the lack of additional time and desire to study it. The same goes for the macros that are used.
This is another example of bad programming practice that is widely used in this forum. :sad:
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: NoCforMe on March 07, 2025, 06:02:54 AM
Quote from: ognil on March 07, 2025, 02:07:44 AM
QuoteEleven years ago, in the Lab

Тhanks jj, :thumbsup:

You completely missed the point here.
Your example was completely trivial. Therefore not helpful.
The discussion was about finding integer square roots without using the FPU (and therefore without using xmm instructions either).
Seems to have flown right over your head ...
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: zedd151 on March 07, 2025, 06:41:40 AM
Quote from: NoCforMe on March 07, 2025, 06:02:54 AMYou completely missed the point here.
...
Seems to have flown right over your head ...

It was just another opportunity for him to take shots at the members here:
Quote from: ognil on March 07, 2025, 02:07:44 AMThis is another example of bad programming practice that is widely used in this forum. :sad:

:rolleyes:  :rolleyes:
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: NoCforMe on March 07, 2025, 06:54:03 AM
Quote from: zedd151 on March 07, 2025, 06:41:40 AM
Quote from: NoCforMe on March 07, 2025, 06:02:54 AMYou completely missed the point here.
...
Seems to have flown right over your head ...

It was just another opportunity for him to take shots at the members here:
Quote from: ognil on March 07, 2025, 02:07:44 AMThis is another example of bad programming practice that is widely used in this forum. :sad:

:rolleyes:  :rolleyes:


Plus I'm sure it came from some "AI" source ...
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: ognil on March 07, 2025, 07:41:17 AM
QuoteYou completely missed the point here.
Your example was completely trivial. Therefore not helpful.
The discussion was about finding integer square roots without using the FPU (and therefore without using xmm instructions either).
Seems to have flown right over your head ...

Thank you  NoCforMe for controlling your emotions, :badgrin:
To completely calm down, this code is specially created for you. :biggrin:

To compute the integer square root of a 64-bit integer without using SSE or FPU instructions in 64 assembly, we can use a binary search approach.
The algorithm repeatedly narrows down the range of possible square roots by comparing the square of the midpoint with the target number. Here's the step-by-step explanation and the corresponding assembly code:

Approach
Binary Search : The algorithm maintains a range [low, high] and iteratively computes the midpoint mid of this range.
Square Check : For each midpoint mid, compute mid^2 and compare it with the target number n.
Adjust Range : If mid^2 is less than or equal to n, update the lower bound to mid + 1 and keep track of the largest valid mid. If mid^2 exceeds n, adjust the upper bound to mid - 1.
Termination : The loop continues until the range is exhausted, at which point the largest valid mid is the integer square root.

Solutionisqrt:
    push    rbx              ; Save callee-saved register
    mov     rcx, rdi        ; rcx = n (input)
    xor     rbx, rbx        ; rbx = result (initially 0)
    mov     rdi, 0          ; rdi = low (initially 0)
    mov     rsi, rcx        ; rsi = high (initially n)
loop_start:
 ; Calculate mid = low + (high - low) / 2
    mov     rax, rsi        ; rax = high
    sub     rax, rdi        ; rax = high - low
    shr     rax, 1          ; rax = (high - low) / 2
    add     rax, rdi        ; rax = mid
 ; Compute mid^2 and check against n
    mov     r8, rax          ; r8 = mid
    mul     rax              ; rdx:rax = mid^2
    cmp     rdx, 0          ; Check if high 64 bits are zero
    jne     mid_too_big      ; If not, mid^2 > n
    cmp     rax, rcx            ; Compare low 64 bits with n
    jg      mid_too_big      ; If mid^2 > n
; Update result and adjust low
    mov     rbx, r8 ; Update result to mid
    add     r8, 1              ; mid + 1
    mov     rdi, r8             ; low = mid + 1
    jmp     next_iter
mid_too_big:
; Adjust high
    sub     r8, 1               ; mid - 1
    mov     rsi, r8             ; high = mid - 1
next_iter:
; Check if low <= high
    cmp     rdi, rsi
    jle     loop_start
; Return result
    mov     rax, rbx            ; Move result to rax
    pop     rbx                 ; Restore callee-saved register
    ret

Explanation
Initialization : The input number n is stored in rcx, and the initial range [low, high] is set to [0, n].
Midpoint Calculation : The midpoint mid is computed as low + (high - low) / 2 to avoid overflow.
Square Check : The square of mid is calculated using the mul instruction, which produces a 128-bit result in rdx:rax. If the high 64 bits (rdx) are non-zero, mid^2 exceeds n. Otherwise, compare the low 64 bits (rax) with n.
Adjust Range : Depending on whether mid^2 is within bounds, adjust low or high and update the result if a valid midpoint is found.
Termination : The loop exits when low exceeds high, and the result is returned in rax.
This approach efficiently narrows down the possible values using binary search, ensuring logarithmic time complexity relative to the input size. :smiley: 
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: jj2007 on March 07, 2025, 08:19:22 AM
Now I am bit insecure whether Ognil is really Lingo, because that code works indeed :cool:
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: zedd151 on March 07, 2025, 08:24:47 AM
Quote from: jj2007 on March 07, 2025, 08:19:22 AMNow I am bit insecure whether Ognil is really Lingo, because that code works indeed :cool:
include \masm64\include64\masm64rt.inc

.data

.code

start proc

mov rdi, 1000000
call isqrt

    invoke ExitProcess, 0
start endp

align 16
isqrt:
    push    rbx                 ; Save callee-saved register
    mov     rcx, rdi            ; rcx = n (input)   ; input number passed in rdi
    xor     rbx, rbx            ; rbx = result (initially 0)
    mov     rdi, 0              ; rdi = low (initially 0)
    mov     rsi, rcx            ; rsi = high (initially n)
loop_start:
 ; Calculate mid = low + (high - low) / 2
    mov     rax, rsi            ; rax = high
    sub     rax, rdi            ; rax = high - low
    shr     rax, 1              ; rax = (high - low) / 2
    add     rax, rdi           ; rax = mid
 ; Compute mid^2 and check against n
    mov     r8, rax             ; r8 = mid
    mul     rax                 ; rdx:rax = mid^2
    cmp     rdx, 0              ; Check if high 64 bits are zero
    jne     mid_too_big         ; If not, mid^2 > n
    cmp     rax, rcx               ; Compare low 64 bits with n
    jg      mid_too_big         ; If mid^2 > n
; Update result and adjust low
    mov     rbx, r8        ; Update result to mid
    add     r8, 1                  ; mid + 1
    mov     rdi, r8             ; low = mid + 1
    jmp     next_iter
mid_too_big:
; Adjust high
    sub     r8, 1               ; mid - 1
    mov     rsi, r8             ; high = mid - 1
next_iter:
; Check if low <= high
    cmp     rdi, rsi
    jle     loop_start
; Return result
    mov     rax, rbx            ; Move result to rax
    pop     rbx                 ; Restore callee-saved register
    ret

end

  :biggrin:
I tested it too.
He forgot to mention that the input number is passed in rdi. I input 1000000, results of course 1000
I only slapped this together, ran and and inspected the results in rax in the debugger x64dbg.
If you want an interface or some kind of display, maybe he can provide it.  :tongue:

edit: I had the isqrt function within start procedure, I fixed that little issue.  :rolleyes:
I was wondering why there was a LEAVE before the return there.   :biggrin:
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: NoCforMe on March 07, 2025, 09:15:13 AM
Quote from: jj2007 on March 07, 2025, 08:19:22 AMNow I am bit insecure whether Ognil is really Lingo, because that code works indeed :cool:

For that you can thank "AI", not Lingo/Ognil.
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: ognil on March 07, 2025, 10:51:25 AM
Thanks guys for the compliments, :badgrin:

but I don't think Raymond, who started this post with interesting code (unlike you), will like them very much... :sad:   
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: NoCforMe on March 07, 2025, 11:36:06 AM
You know, you really are a total ass, whoever you are.
I'm pretty sure I speak for the congregation on this.
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: zedd151 on March 07, 2025, 12:49:42 PM
Quote from: ognil on March 07, 2025, 10:51:25 AMThanks guys for the compliments,
but I don't think Raymond, who started this post with interesting code (unlike you), will like them very much... :sad: 
And I do not think that MichaelW, who has not even been around here (since November 02, 2018) to defend himself against folks like you, will much like your comments either.

Quote from: ognil on March 07, 2025, 02:07:44 AMI saw MichaelW's post and was immediately disappointed and lost interest in it.
...
This is another example of bad programming practice that is widely used in this forum. :sad:

Conversely, raymond still frequents the forum almost daily and can speak for himself.

Apologies to raymond, for the nonsense that has been going on here in this thread by myself and others.
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: jack on March 07, 2025, 01:00:30 PM
the code posted by ognil makes me think that it's original target was Linux, in Linux rdi is the first integer argument to a function, whereas in Windows it's rcx
also the code fails to preserve the non-volatile registers rdi and rsi
also, raymond's code 2 times faster
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: NoCforMe on March 07, 2025, 01:07:22 PM
Quote from: jack on March 07, 2025, 01:00:30 PMthe code posted by ognil makes me think that it's original target was Linux, in Linux rdi is the first integer argument to a function, whereas in Windows it's rcx
also the code fails to preserve the non-volatile registers rdi and rsi
also, raymond's code 2 times faster

So whaddya expect from "AI"?
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: daydreamer on March 07, 2025, 07:23:36 PM
@Zedd
Sqrt(1000000) = 1000,that's kinda approximation I want to try with string reducing approximately by halved zeros after first digit

for 2 to sqrt(n)


Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: jj2007 on March 07, 2025, 08:22:49 PM
Quote from: ognil on March 07, 2025, 10:51:25 AMThanks guys for the compliments, :badgrin:

Just for fun, I wrote a testbed to get some timings for your isqrt algo:

This program was assembled with UAsm64 in 64-bit format.
875 ticks for isqrt, result=1234567
797 ticks for isqrt, result=9876543
 31 ticks for fsqrt, result=1234567
 32 ticks for fsqrt, result=9876543

This program was assembled with ml64 in 64-bit format.
907 ticks for isqrt, result=1234567
828 ticks for isqrt, result=9876543
 15 ticks for fsqrt, result=1234567
 32 ticks for fsqrt, result=9876543
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: jack on March 07, 2025, 10:01:38 PM
I made a simple test in FreeBasic, I replaced labels with numbers otherwise it may give a duplicate symbol error, also replaced rsi with r9 which is volatile
hope you guys won't mind
' compile with fbc64
#cmdline "-asm intel -w all -arch native -gen gcc -v"

Function asmsqrt Naked Cdecl(Byval n As Ulongint) As Ulongint
    Asm
        Push    rbx                 ' Save callee-saved register
        Push    rdi                 ' Save callee-saved register
        'mov     rcx, rdi           ' rcx = n (input)
        'n is already in register rcx
        Xor     rbx, rbx            ' rbx = result (initially 0)
        mov     rdi, 0              ' rdi = low (initially 0)
        mov     r9, rcx         ' r9 = high (initially n)
    0:
     ' Calculate mid = low + (high - low) / 2
        mov     rax, r9         ' rax = high
        Sub     rax, rdi            ' rax = high - low
        Shr     rax, 1              ' rax = (high - low) / 2
        Add     rax, rdi        ' rax = mid
     ' Compute mid^2 and check against n
        mov     r8, rax             ' r8 = mid
        mul     rax                 ' rdx:rax = mid^2
        cmp     rdx, 0              ' Check if high 64 bits are zero
        jne     1f      ' If not, mid^2 > n
        cmp     rax, rcx            ' Compare low 64 bits with n
        jg      1f      ' If mid^2 > n
    ' Update result and adjust low
        mov     rbx, r8     ' Update result to mid
        Add     r8, 1               ' mid + 1
        mov     rdi, r8             ' low = mid + 1
        jmp     2f
    1:
    ' Adjust high
        Sub     r8, 1               ' mid - 1
        mov     r9, r8             ' high = mid - 1
    2:
    ' Check if low <= high
        cmp     rdi, r9
        jle     0b
    ' Return result
        mov     rax, rbx            ' Move result to rax
        Pop     rdi                 ' Restore callee-saved register
        Pop     rbx                 ' Restore callee-saved register
        ret
    End Asm
End Function

Dim As Ulongint n, root
Dim As Longint sum
Dim As Double t

t=Timer
sum=0

For n=1 To 10000000
    root=asmsqrt(n)
    'root=fix(sqr(n))
    sum+=root
Next
t=timer-t
Print sum
Print t
Print "Press return to end"
Sleep
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: TimoVJL on March 07, 2025, 10:36:45 PM
Is isqrt for CPUs like ARM, that don't have FPU ?
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: daydreamer on March 07, 2025, 11:14:53 PM
Quote from: TimoVJL on March 07, 2025, 10:36:45 PMIs isqrt for CPUs like ARM, that don't have FPU ?
Maybe arduino and 8 bit cpus that is used in embedded?
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: TimoVJL on March 07, 2025, 11:53:11 PM
Quote from: daydreamer on March 07, 2025, 11:14:53 PM
Quote from: TimoVJL on March 07, 2025, 10:36:45 PMIs isqrt for CPUs like ARM, that don't have FPU ?
Maybe arduino and 8 bit cpus that is used in embedded?
Do you even know, what is ARM CPU family ?
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: ognil on March 08, 2025, 03:41:19 AM
QuoteJust for fun, I wrote a testbed to get some timings for your isqrt algo:

Thank you jj for the compliment and recognition of my code for actively participating in your pleasures. :badgrin:

I became a C++ guy from university with two projects implemented in my bank, related to transferring all S.W.I.F.T. daily transactions to the bank's general ledger.
Therefore, it is important for me that the program runs without errors, since the hardware takes care of the speed.

With the availability of modern hardware, optimizing the code has long been a thing of the past and is now just a distant memory for old people. :sad:
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: NoCforMe on March 08, 2025, 07:04:05 AM
Quote from: ognil on March 08, 2025, 03:41:19 AMWith the availability of modern hardware, optimizing the code has long been a thing of the past and is now just a distant memory for old people. :sad:

Hah! What you seem to have missed--what sailed right over your head--is that your "modern hardware" code (using XMM) was a couple orders of magnitude SLOWER than legacy (FPU) code, as demonstrated by JJ's timings above.

Do you even bother reading replies here? One gets the feeling that you're like a "write-only" bot.
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: jj2007 on March 08, 2025, 08:31:11 AM
One more, using FastMath (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1526)

282 ms for the FPU
608 ms for FastMath
X      Y fpu          Y FastMath
990.0  31.46427        31.46427
991.0  31.48015        31.48015
992.0  31.49603        31.49603
993.0  31.51190        31.51190
994.0  31.52777        31.52777
995.0  31.54362        31.54362
996.0  31.55947        31.55947
997.0  31.57531        31.57531
998.0  31.59114        31.59114
999.0  31.60696        31.60696
1000.0  31.62278        31.62278

The fpu is pretty fast :biggrin:
Twice as fast as the FastMath version, and a factor 33 faster than Ognil/Lingo's version.
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: ognil on March 08, 2025, 08:44:54 AM
NoCforMe,

I don't know what you're taking to calm your nerves, but maybe they need to increase your dosage because your nervous depression doesn't allow you to differentiate between hardware (something like NVMe SSD) and software (something like XMM). :sad:

JJ's "speed tests" are always manipulated depending on his current emotional state and are practically completely useless and unnecessary to anyone.

I'm not Lingo and I've already said that I don't care about optimizing code speed.

I suggest you stop here and go for a walk in nature for a while.
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: jj2007 on March 08, 2025, 09:52:25 AM
Quote from: ognil on March 08, 2025, 08:44:54 AMJJ's "speed tests" are always manipulated

Prove it, or I'll call you a liar :biggrin:

Your code is hopelessly slow, but it's a nice intellectual exercise, so I liked it. And I don't care if it's AI or genuine ;-)
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: NoCforMe on March 08, 2025, 10:57:18 AM
Quote from: ognil on March 08, 2025, 08:44:54 AMyour nervous depression doesn't allow you to differentiate between hardware (something like NVMe SSD) and software (something like XMM)

What???

That's provably, laughably false: XMM refers to a set of 128-bit registers (therefore nothing like software. (And technically they fall under the heading of "SSE".)

Your credibility here is slipping fast, man.
Title: Re: BINARY INTEGER SQUARE ROOTS
Post by: ognil on March 08, 2025, 11:32:24 AM
NoCforMe,

I agree with you that you, jj and zed are the greatest programmers here and everyone else on the forum (including me) should take your advice as law and bow down to the elegance and speed of your code.

Now please calm down and take a walk. :thumbsup: