BINARY INTEGER SQUARE ROOTS

raymond · January 31, 2025, 02:07:46 PM

Some testing has been performed lately to compare the speed of different procedures to extract square roots. Since one of the purposes was to search for prime numbers, and prime numbers are integers, this prompted me to revive an old procedure to extract the 'integer' square root of 'integer' numbers with 'integer' code instructions, and compare its speed to the general use of the FPU.

Although it may be three times slower than the FPU for the largest 31-bit integers, some may be interested in seeing this simple algo. It uses only the EAX, ECX and EDX registers (and it could be easily adapted to the 64-bit system by using the three equivalent 64-bit registers, as long as the bsr and bts instructions are still in use).

This is an adaptation of a procedure which was described in an IBM user's manual for their mechanical calculator marketed in the mid-50s. Not being a mathematician, I don't know where they may have obtained such a procedure nor can I explain why it works. I would assume that it may have been developed by the ancients to use with an "abacus".

Code Select

;this "getsqr" procedure would be called with the target integer already in the EAX register
;the programmer would be responsible to check it's validity
;'negative' integers would simply be handled as positive 32-bit integers

getsqr:
      bsr   edx,eax     ;get position number of most significant digit in EAX
      xor   ecx,ecx
      bts   ecx,edx     ;ECX to be used for trailing bit position
      shr   edx,1	;check if position was in even or odd number bit
      .if   CARRY?      ;if EDX was odd, the target has even number of significant bits 
            shr   ecx,1 ;shift ECX, it must start at the next even position
      .endif
      xor   edx,edx     ;EDX to be used for sqr

sq1:
      add   edx,ecx
      .if   edx > eax
            sub   edx,ecx
            jmp   sq2
      .endif
      sub   eax,edx
      add   edx,ecx
sq2:
      shr   edx,1
      shr   ecx,1
      jc    sq3         ;last integer bit of root has been set
      shr   ecx,1       ;shift under next even position
      jmp   sq1

sq3:

;at this point, the EDX register would contain the truncated square root result.
;if a rounded result would be preferred, the following code must then be used
;	.if	eax > edx
;		inc	edx
;	.endif

      ret               ;sqr answer is returned in the EDX register

Biterider · January 31, 2025, 05:48:07 PM

Hi raymond
Very interesting

Thank you.

Regards, Biterider

HSE · January 31, 2025, 09:56:59 PM

Hi Raymond!

Quote from: raymond on January 31, 2025, 02:07:46 PMand it could be easily adapted to the 64-bit system by using the three equivalent 64-bit registers, as long as the bsr and bts instructions are still in use

That is correct

Regards, HSE

jj2007 · February 01, 2025, 01:28:52 AM

Quote from: raymond on January 31, 2025, 02:07:46 PMNot being a mathematician

I always thought you were the only mathematician in this forum, Raymond

daydreamer · February 01, 2025, 04:24:58 AM

Thanks, great raymond

It Shows similarity to a division proc based on how human calculate many digits with pen & paper in school
Héctor 64 bit conversion

raymond · March 06, 2025, 10:58:42 AM

Just thought of a detail which I had not mentioned in my original post. Using the given code, regardless if you modify it or not to get a "rounded" square root, the EAX register would not be altered. Then, upon return, if EAX=0, the result in the EDX register would be the exact square root.

Without the rounding code, the resulting content of the EAX register would be equivalent to the remainder of a division of the target number by the truncated square root.

ognil · March 06, 2025, 12:13:49 PM

Hi,
To compute the square root of an integer using SSE instructions in MASM64 assembly, follow these steps:

Load the integer into a general-purpose register (e.g., RAX).
Convert the integer to a double-precision float using CVTSI2SD.
Compute the square root using the SQRTSD instruction.
Result is stored in an XMM register as a double-precision float.

Here's the code example:

Code Select

.data
num QWORD 25  ; Example integer (64-bit)
.code
main proc
    ; Load the integer into RAX
    mov rax, num
    ; Convert RAX to double-precision float in XMM0
    cvtsi2sd xmm0, rax
    ; Compute square root of XMM0
    sqrtsd xmm0, xmm0
    ; The result is now in XMM0 as a double-precision float
    ; Optionally convert back to integer (truncates decimal part)
    ; cvttsd2si rax, xmm0
    ret
main endp

Explanation:
cvtsi2sd xmm0, rax : Converts the 64-bit integer in RAX to a double-precision float in XMM0.
sqrtsd xmm0, xmm0 : Computes the square root of the value in XMM0, storing the result back in XMM0.

Optional Conversion : Use cvttsd2si to convert the result back to an integer if needed, but note this truncates the decimal part.
This approach handles 64-bit integers accurately using double-precision floating-point operations.
For 32-bit integers, the code remains valid as the upper bits are zero-extended.

NoCforMe · March 06, 2025, 05:16:55 PM

Hey, since we're now posting totally trivial code examples (yours boils down to really just one instruction: sqrtsd xmm0, xmm0), I'll offer my contribution:

Code Select

    fsqrt
    fistp  integerSqrt

See how simple that is?

jj2007 · March 06, 2025, 08:19:56 PM

Quote from: ognil on March 06, 2025, 12:13:49 PMCompute the square root using the SQRTSD instruction.

Eleven years ago, in the Lab

Quote from: NoCforMe on March 06, 2025, 05:16:55 PMyours boils down to really just one instruction: sqrtsd xmm0, xmm0

Right.

ognil · March 07, 2025, 02:07:44 AM

QuoteEleven years ago, in the Lab

Тhanks jj,

I saw MichaelW's post and was immediately disappointed and lost interest in it.
For me, posting code without a detailed explanation of each line and instruction has zero value due to the lack of additional time and desire to study it. The same goes for the macros that are used.
This is another example of bad programming practice that is widely used in this forum.

NoCforMe · March 07, 2025, 06:02:54 AM

Quote from: ognil on March 07, 2025, 02:07:44 AM
QuoteEleven years ago, in the Lab

Тhanks jj,

You completely missed the point here.
Your example was completely trivial. Therefore not helpful.
The discussion was about finding integer square roots without using the FPU (and therefore without using xmm instructions either).
Seems to have flown right over your head ...

zedd · March 07, 2025, 06:41:40 AM

Quote from: NoCforMe on March 07, 2025, 06:02:54 AMYou completely missed the point here.
...
Seems to have flown right over your head ...

It was just another opportunity for him to take shots at the members here:

Quote from: ognil on March 07, 2025, 02:07:44 AMThis is another example of bad programming practice that is widely used in this forum.

NoCforMe · March 07, 2025, 06:54:03 AM

Quote from: zedd151 on March 07, 2025, 06:41:40 AM
Quote from: NoCforMe on March 07, 2025, 06:02:54 AMYou completely missed the point here.
...
Seems to have flown right over your head ...

It was just another opportunity for him to take shots at the members here:
Quote from: ognil on March 07, 2025, 02:07:44 AMThis is another example of bad programming practice that is widely used in this forum.

Plus I'm sure it came from some "AI" source ...

ognil · March 07, 2025, 07:41:17 AM

QuoteYou completely missed the point here.
Your example was completely trivial. Therefore not helpful.
The discussion was about finding integer square roots without using the FPU (and therefore without using xmm instructions either).
Seems to have flown right over your head ...

Thank you NoCforMe for controlling your emotions,

To completely calm down, this code is specially created for you.

To compute the integer square root of a 64-bit integer without using SSE or FPU instructions in 64 assembly, we can use a binary search approach.
The algorithm repeatedly narrows down the range of possible square roots by comparing the square of the midpoint with the target number. Here's the step-by-step explanation and the corresponding assembly code:

Approach
Binary Search : The algorithm maintains a range [low, high] and iteratively computes the midpoint mid of this range.
Square Check : For each midpoint mid, compute mid^2 and compare it with the target number n.
Adjust Range : If mid^2 is less than or equal to n, update the lower bound to mid + 1 and keep track of the largest valid mid. If mid^2 exceeds n, adjust the upper bound to mid - 1.
Termination : The loop continues until the range is exhausted, at which point the largest valid mid is the integer square root.

Solution

Code Select

isqrt:
    push    rbx             	; Save callee-saved register
    mov     rcx, rdi        	; rcx = n (input)
    xor     rbx, rbx        	; rbx = result (initially 0)
    mov     rdi, 0          	; rdi = low (initially 0)
    mov     rsi, rcx        	; rsi = high (initially n)
loop_start:
 ; Calculate mid = low + (high - low) / 2
    mov     rax, rsi        	; rax = high
    sub     rax, rdi        	; rax = high - low
    shr     rax, 1          	; rax = (high - low) / 2
    add     rax, rdi       	; rax = mid
 ; Compute mid^2 and check against n
    mov     r8, rax         	; r8 = mid
    mul     rax             	; rdx:rax = mid^2
    cmp     rdx, 0          	; Check if high 64 bits are zero
    jne     mid_too_big     	; If not, mid^2 > n
    cmp     rax, rcx           	; Compare low 64 bits with n
    jg      mid_too_big     	; If mid^2 > n
; Update result and adjust low
    mov     rbx, r8		; Update result to mid
    add     r8, 1              	; mid + 1
    mov     rdi, r8             ; low = mid + 1
    jmp     next_iter
mid_too_big:
; Adjust high
    sub     r8, 1               ; mid - 1
    mov     rsi, r8             ; high = mid - 1
next_iter:
; Check if low <= high
    cmp     rdi, rsi
    jle     loop_start
; Return result
    mov     rax, rbx            ; Move result to rax
    pop     rbx                 ; Restore callee-saved register
    ret

Explanation
Initialization : The input number n is stored in rcx, and the initial range [low, high] is set to [0, n].
Midpoint Calculation : The midpoint mid is computed as low + (high - low) / 2 to avoid overflow.
Square Check : The square of mid is calculated using the mul instruction, which produces a 128-bit result in rdx:rax. If the high 64 bits (rdx) are non-zero, mid^2 exceeds n. Otherwise, compare the low 64 bits (rax) with n.
Adjust Range : Depending on whether mid^2 is within bounds, adjust low or high and update the result if a valid midpoint is found.
Termination : The loop exits when low exceeds high, and the result is returned in rax.
This approach efficiently narrows down the possible values using binary search, ensuring logarithmic time complexity relative to the input size.

jj2007 · March 07, 2025, 08:19:22 AM

Now I am bit insecure whether Ognil is really Lingo, because that code works indeed

The MASM Forum

News:

BINARY INTEGER SQUARE ROOTS

raymond

Biterider

HSE

jj2007

daydreamer

raymond

ognil

NoCforMe

jj2007

ognil

NoCforMe

zedd

NoCforMe

ognil

jj2007