Ciao,

**jj2007**!

There is soon a new year, so I'll a little flood

It would seem that it could be simpler than fsqrt, but we'll go the other way

1) The best-known integer algorithm for calculating the square root of a number is striking in its simplicity

`unsigned sqrt_cpu_int(usigned X)`

{ unsigned div = 1, result = 0;

while (X > 0)

{ X -= div; div += 2; result += X < 0 ? 0 : 1; }

return result;

}

The disadvantage of this algorithm is that the number of iterations will increase with increasing X

2) calculation of the square root of the Taylor series expansion.

Let

*X* be any number;

*f(X)* is a function depending on

*X*; a is a known number close to

*X*;

*f(a)* is a known value of the function.

We expand

*f(X)* in a Taylor series:

f(X)=f(a)+(X-a)f'(a)+((X-a)²∙f"(a))/2!+...+((X-a)

^{n}∙f

^{n}(a))/n!

Let

*X* be the number from which to extract the root. Then f(X)=√X; a is a known number close to

*X*;

*f(a)=√a* is a known number close to

*√X*, and

*f(X)=√a+(X-a)/(2√a)+...=(2a+X-a)/(2√a)=(a+X)/(2√a)*The value

*√X* can be found by setting

*√a* and then calculating

*f(X)*.

*f(X)*² can be compared to the original number

*X*. If the accuracy is insufficient, then the number a is replaced by

*f(X)*², √

*a* by

*f(X)* and the calculation is repeated

3) the search for an integer square root by Newton's method begins with a certain value of g

_{0}, which is the initial estimate

*√X*. Then a series of refinements of the value of the square root is performed by the formula

*g*_{n+1} = (g_{n} + X/g_{n})/2. To reduce the number of iterations, you can, in the first step, select the initial value for the variable

*g*_{0}`{usigned x1;`

int a, g0, g1;

if (x <= 1) return x;

a = 1;

x1 = x-1;

if (x1> 65535) {a = a + 8; x1 = x1 >> 16;}

if (x1> 255) {a = a + 4; x1 = x1 >> 8;}

if (x1> 15) {a = a + 2; x1 = x1 >> 4;}

if (x1> 3) {a = a + 1;}

g0 = 1 << a; // g0 = 2 //

g1 = (g0 + (x >> a)) >> 1;

while (g1 <g0) // we repeat, while the approximations are strictly reduced

{g0 = g1; g1 = (g0 + (x/g0)) >> 1}

}

4) Since the division is a fairly slow operation, you can abandon it. To calculate the square root, we use multiplication. A 32-bit X number will have a 16-bit square root. At each step, the 1 bit of the root value is refined. For the acceleration, the calculation "without jumps" is made to add or subtract the value in the corresponding digit

`mov ebx, 4000h`

mov ecx, 8000h

mov eax, 40000000h

@@: cmp eax, X

je @f; premature exit from the cycle

sbb edx, edx; if CF = 1 then edx = 0FFFFFFFFh otherwise edx = 0

sub ecx, ebx

and edx, ebx; if CF = 1 then edx = ebx otherwise edx = 0

lea ecx, [ecx + edx * 2]; if CF = 1 then ecx = ecx + ebx otherwise ecx = ecx-ebx

shr ebx, 1; move to the next level

mov eax, ecx

mul eax; we get "eax squared"

test ebx, ebx

jnz @b

@@: mov eax, ecx; eax: = sqrt (X)

It is not clear how, neither division, nor multiplication, but it works! And the algorithm was described already in 1945 in Von Neumann J. "First Draft of a Reaport on the EDVAC"

` mov ebp,X`

mov ecx,40000000h

xor eax,eax

@@: mov ebx,eax

or ebx,ecx

shr eax,1

cmp ebp,ebx

sbb edx,edx

not edx

mov edi,edx

and edx,ebx

and edi,ecx

sub ebp,edx

or eax,edi

shr ecx,2

test ecx,ecx

jnz @b

;eax=sqrt(X)

Or

`mov ebp, X`

inc ebp

bsr ecx, ebp

and ecx, -2

mov ebx, 1

shl ebx, cl; to reduce the number of iterations

xor eax, eax

@@: lea ecx, [eax + ebx]

shr eax, 1

cmp ecx, ebp

sbb edx, edx

mov edi, edx

and edx, ecx

sub ebp, edx

and edi, ebx

or eax, edi

shr ebx, 2

jnz @b

maybe these algorithms will be faster than fsqrt with 5703 cycles for 100