The MASM Forum

General => The Campus => Topic started by: Mikl__ on November 24, 2021, 04:16:02 AM

Title: idiv algorithm
Post by: Mikl__ on November 24, 2021, 04:16:02 AM
Hi, All!
It required to binary divide +117 by -13. If I divide +117 by +13, it is simple, I will used subtractions and shifts,
(https://wasm.in/attachments/02-jpg.7101/)
but how to binary divide 01110101b by 11110011b so that the result is 11110111 using elementary operations (shifts, addition, subtraction, logic AND, OR, XOR)?
I understand that you need to look for literature on intel8080 or microcontrollers and look there, but so far nothing is at hand...
Title: Re: idiv algorithm
Post by: nidud on November 24, 2021, 05:01:27 AM
deleted
Title: Re: idiv algorithm
Post by: mineiro on November 24, 2021, 07:00:32 AM
Two's complement (not + 1) (neg).
A computer never subtracts, only do additions. So, how to perform subtraction by doing addition?
The two-complement idea was in limbo for a long time until they found a use for it. (from eletronics point of view).

There is a one-complement as well, which uses only "not".
In both cases, exceptions should be handled.

In electronics they teach us "half adder" and "full adder". They don't teach us "subtraction".
Title: Re: idiv algorithm
Post by: hutch-- on November 24, 2021, 08:42:13 AM
 I am driven by sheer laziness when I use SSE2 32 and 64 bit arithmetic, its easy, fast and reliable.  :biggrin:
Title: Re: idiv algorithm
Post by: mineiro on November 24, 2021, 09:22:09 AM
This code need a bit more refinement, with a bit more work can deal with big numbers.

;253/7, 253/-7        ;prime numbers
mov rdi,8000000000000000h       ;sign mask
mov rax,117 ;0      ;dividend
mov rbx,-13 ;0      ;divisor
xor rsi,rsi         ;quotient
                    ;remainder will be rax
test rbx,rdi        ;sign mask, if left most digit is 1 means negative number, if 0 means positive number
jz @F
neg rbx             ;two complement
@@:
.if rax < rbx       ;2/5, 0/1    ;quotient = 0, remainder = dividend
    jmp @F
.endif
.if rbx == 0        ;N/0        ;handle exception
    xor eax,eax
    div rsi         ;forced exception, division by 0
.endif
;--------------------
bsr rcx,rax         ;counting rotations
inc rcx
bsr rdx,rbx
inc rdx
sub rcx,rdx         ;rcx= final shift, magnitude        2/2 = 0, 3/2 = 0
shl rbx,cl          ;align numbers

.repeat
    shl rsi,1
    .if rax >= rbx
        sub rax,rbx         ;rax turns into remainder
        or rsi,1
    .endif
    shr rbx,1
    dec rcx
.until rcx == -1

@@:
;RESULT
;quotient = 36         ;rsi
;remainder = 1         ;rax
;36*7+1=253
Title: Re: idiv algorithm
Post by: daydreamer on November 24, 2021, 11:42:37 PM
Look at old free computer magazines,there was multiply article, on 6502 but it's just shifts and adds?
Maybe search for 8bit embedded code?
@miniero
Maybe possible make subtract circuit thinking out of the box,like rcpss instruction uses 1024 LUT and gfx card pixel shader has 1cycle cosine and sine,probably builtin LUT
Neg xdelta and neg ydelta perfect for breakout game,just bounce ball inside square

Title: Re: idiv algorithm
Post by: mineiro on November 25, 2021, 04:29:05 AM
Quote from: daydreamer on November 24, 2021, 11:42:37 PM
... just shifts and adds?
@miniero
... probably builtin LUT

shifts are arithmetical adds or can be multiplications.
1+1=10 or (1b shl 1)
10+10=100 or (10b shl 1)
101+101=1010 or (101b shl 1) or (101*2)
...
Yes, a transform precalculated table (LUT) is an option, but will be more harder to be expanded (big data) because need more memory to hold results. Like perform a number factorial!.

Title: Re: idiv algorithm
Post by: daydreamer on November 25, 2021, 02:42:32 PM
Quote from: mineiro on November 25, 2021, 04:29:05 AM
Quote from: daydreamer on November 24, 2021, 11:42:37 PM
... just shifts and adds?
@miniero
... probably builtin LUT

shifts are arithmetical adds or can be multiplications.
1+1=10 or (1b shl 1)
10+10=100 or (10b shl 1)
101+101=1010 or (101b shl 1) or (101*2)
...
Yes, a transform precalculated table (LUT) is an option, but will be more harder to be expanded (big data) because need more memory to hold results. Like perform a number factorial!.
something like this

mov ecx,numberofbits
xor edx,edx ;zero result,ebx*eax
@@L1:sar ebx,1 ; check bits 0,1,2,3,4,5,6,7,8...in loop
jcc @@L2 ;if mul by zero jump over,else mul by 1
add edx,eax
@@L2:sal eax,1 ;now * 10,100,1000,10000,10000...
dec ecx
jne @@L1

with the clock cycles on modern cpus,probably circuits that tests many bits in parallel to achive so low number of cycles,probably
actually I have factorial LUTs for my trigo taylor series,for 3!,5!,7!,9! for sine,2!,4!,6!,8! for cosine,storing 99! in LUT is big gain vs doing lots of muls to calculate it
divide is more complex testing equal or less:subtract,last time I did it I ended up in some endless loop
still schools multiply table 0-100 easily fit into memory
Title: Re: idiv algorithm
Post by: mikeburr on December 01, 2021, 04:12:31 AM
in 64 bit .. multiply by 4EC4EC4EC4EC4EC5 == divide by 13 ... but what are you going to do with the remainder ???
regards mike b
Title: Re: idiv algorithm
Post by: mikeburr on December 02, 2021, 02:46:24 AM
ps the complement is B13B13B13B13B13B  btw
Title: Re: idiv algorithm
Post by: Mikl__ on December 05, 2021, 12:32:19 PM
Why do I have to shift +117 to the left by 9 digits and add +117 to get the correct result? Is this some kind of witchcraft?
+117/-13=-9
-13=243    -9=247   243*247=60021=117*513 ?
Title: Re: idiv algorithm
Post by: mineiro on December 05, 2021, 09:54:31 PM
Number 117 fits into 7 bits (log2 117= 6,87036472 bits, so 7 bits).
Number 13 fits into 4 bits (log2 13= 3,700439718 bits, so 4 bits).
We can do log2 by using instruction bsr.
Select biggest from 4 and 7, we need one more bit to be sign bit. So, we can deal with 8 bits group.
The left most bit into N bits group it's a sign bit.
Values of dividend and divisor should be aligned. The left most bit 1 of dividend need be aligned with left most bit 1 of divisor. Both having same positive signal.

Transform -13 to +13 using N bits group.
117     01110101    dividend
13      00001101    divisor

Align both numbers, shift left most bit 1 of divisor to the left most bit 1 of dividend.
Divisor was shift left by 3 bits positions. We need this 3 value (align_position), to know how to stop division process.
        01110101    dividend (remainder)
        01101000    divisor

quotient=0
.repeat
    quotient = quotient *2                  ;shl
    .if dividend >= divisor                 ;>=
        dividend= dividend - divisor        ;sub
        inc quotient                        ;inc
    .endif
    divisor = divisor / 2                   ;shr
    align_position = align_position -1      ;dec
.until align_position != -1

align
117     01110101    dividend (remainder)
104     01101000    divisor
        0           quotient
        3           times to go (align_position)

Start process:
quotient=0  align_position=3
117 >= 104? yes, quotient *2, subtract, increase quotient, divisor/2, align_position-1       quotient=0*2+1  align_position=2
13 >= 52? no, quotient *2, divisor/2, align_position-1                                       quotient=2  align_position=1
13 >= 26? no, quotient *2, divisor/2, align_position-1                                       quotient=4  align_position=0
13 >= 13? yes, quotient *2, subtract, increase quotient, divisor/2, align_position-1         quotient=8+1 align_position=-1
0 (remainder)

Quotient=9, remainder(dividend) = 0. Because divisor was negative, we need transform quotient into negative.
+9=00001001
-9=11110110+1=11110111
Title: Re: idiv algorithm
Post by: daydreamer on December 06, 2021, 01:21:07 AM
Would 1/x reciprocal LUT + multiply be faster?
1/117
Title: Re: idiv algorithm
Post by: mikeburr on February 19, 2022, 08:14:07 AM
the twos complement of a number doesnt usually give its 64 bit inverse... im afraid it takes a bit more working out than that .. in fact its quite involved but you can create a table of them as i did for the first 30000 primes and incorporated them in a 64 bit program to extract factors of numbers up to about 10^11 i think .. this is faster than division [but not as much as i was hoping because of the way it is unrolled during processing i suspect ] although the algo is slightly more complicated [ but not much]
regards mike b