idiv algorithm

Mikl__ · November 24, 2021, 04:16:02 AM

Hi, All!
It required to binary divide +117 by -13. If I divide +117 by +13, it is simple, I will used subtractions and shifts,

but how to binary divide 01110101b by 11110011b so that the result is 11110111 using elementary operations (shifts, addition, subtraction, logic AND, OR, XOR)?
I understand that you need to look for literature on intel8080 or microcontrollers and look there, but so far nothing is at hand...

nidud · November 24, 2021, 05:01:27 AM

deleted

mineiro · November 24, 2021, 07:00:32 AM

Two's complement (not + 1) (neg).
A computer never subtracts, only do additions. So, how to perform subtraction by doing addition?
The two-complement idea was in limbo for a long time until they found a use for it. (from eletronics point of view).

There is a one-complement as well, which uses only "not".
In both cases, exceptions should be handled.

In electronics they teach us "half adder" and "full adder". They don't teach us "subtraction".

hutch-- · November 24, 2021, 08:42:13 AM

I am driven by sheer laziness when I use SSE2 32 and 64 bit arithmetic, its easy, fast and reliable.

mineiro · November 24, 2021, 09:22:09 AM

This code need a bit more refinement, with a bit more work can deal with big numbers.

Code Select


;253/7, 253/-7        ;prime numbers
mov rdi,8000000000000000h       ;sign mask
mov rax,117 ;0      ;dividend
mov rbx,-13 ;0      ;divisor
xor rsi,rsi         ;quotient
                    ;remainder will be rax
test rbx,rdi        ;sign mask, if left most digit is 1 means negative number, if 0 means positive number
jz @F
neg rbx             ;two complement
@@:
.if rax < rbx       ;2/5, 0/1    ;quotient = 0, remainder = dividend
    jmp @F
.endif
.if rbx == 0        ;N/0        ;handle exception
    xor eax,eax
    div rsi         ;forced exception, division by 0
.endif
;--------------------
bsr rcx,rax         ;counting rotations
inc rcx
bsr rdx,rbx
inc rdx
sub rcx,rdx         ;rcx= final shift, magnitude        2/2 = 0, 3/2 = 0
shl rbx,cl          ;align numbers

.repeat
    shl rsi,1
    .if rax >= rbx
        sub rax,rbx         ;rax turns into remainder
        or rsi,1
    .endif
    shr rbx,1
    dec rcx
.until rcx == -1

@@:
;RESULT
;quotient = 36         ;rsi
;remainder = 1         ;rax
;36*7+1=253

daydreamer · November 24, 2021, 11:42:37 PM

Look at old free computer magazines,there was multiply article, on 6502 but it's just shifts and adds?
Maybe search for 8bit embedded code?
@miniero
Maybe possible make subtract circuit thinking out of the box,like rcpss instruction uses 1024 LUT and gfx card pixel shader has 1cycle cosine and sine,probably builtin LUT
Neg xdelta and neg ydelta perfect for breakout game,just bounce ball inside square

mineiro · November 25, 2021, 04:29:05 AM

Quote from: daydreamer on November 24, 2021, 11:42:37 PM
... just shifts and adds?
@miniero
... probably builtin LUT

shifts are arithmetical adds or can be multiplications.
1+1=10 or (1b shl 1)
10+10=100 or (10b shl 1)
101+101=1010 or (101b shl 1) or (101*2)
...
Yes, a transform precalculated table (LUT) is an option, but will be more harder to be expanded (big data) because need more memory to hold results. Like perform a number factorial!.

daydreamer · November 25, 2021, 02:42:32 PM

Quote from: mineiro on November 25, 2021, 04:29:05 AM
Quote from: daydreamer on November 24, 2021, 11:42:37 PM
... just shifts and adds?
@miniero
... probably builtin LUT

shifts are arithmetical adds or can be multiplications.
1+1=10 or (1b shl 1)
10+10=100 or (10b shl 1)
101+101=1010 or (101b shl 1) or (101*2)
...
Yes, a transform precalculated table (LUT) is an option, but will be more harder to be expanded (big data) because need more memory to hold results. Like perform a number factorial!.

something like this

Code Select


mov ecx,numberofbits
xor edx,edx ;zero result,ebx*eax 
@@L1:sar ebx,1 ; check bits 0,1,2,3,4,5,6,7,8...in loop
jcc @@L2 ;if mul by zero jump over,else mul by 1
add edx,eax
@@L2:sal eax,1 ;now * 10,100,1000,10000,10000...
dec ecx
jne @@L1

with the clock cycles on modern cpus,probably circuits that tests many bits in parallel to achive so low number of cycles,probably
actually I have factorial LUTs for my trigo taylor series,for 3!,5!,7!,9! for sine,2!,4!,6!,8! for cosine,storing 99! in LUT is big gain vs doing lots of muls to calculate it
divide is more complex testing equal or less:subtract,last time I did it I ended up in some endless loop
still schools multiply table 0-100 easily fit into memory

mikeburr · December 01, 2021, 04:12:31 AM

in 64 bit .. multiply by 4EC4EC4EC4EC4EC5 == divide by 13 ... but what are you going to do with the remainder ???
regards mike b

mikeburr · December 02, 2021, 02:46:24 AM

ps the complement is B13B13B13B13B13B btw

Mikl__ · December 05, 2021, 12:32:19 PM

Why do I have to shift +117 to the left by 9 digits and add +117 to get the correct result? Is this some kind of witchcraft?
+117/-13=-9
-13=243 -9=247 243*247=60021=117*513 ?

mineiro · December 05, 2021, 09:54:31 PM

Number 117 fits into 7 bits (log2 117= 6,87036472 bits, so 7 bits).
Number 13 fits into 4 bits (log2 13= 3,700439718 bits, so 4 bits).
We can do log2 by using instruction bsr.
Select biggest from 4 and 7, we need one more bit to be sign bit. So, we can deal with 8 bits group.
The left most bit into N bits group it's a sign bit.
Values of dividend and divisor should be aligned. The left most bit 1 of dividend need be aligned with left most bit 1 of divisor. Both having same positive signal.

Transform -13 to +13 using N bits group.
117 01110101 dividend
13 00001101 divisor

Align both numbers, shift left most bit 1 of divisor to the left most bit 1 of dividend.
Divisor was shift left by 3 bits positions. We need this 3 value (align_position), to know how to stop division process.
01110101 dividend (remainder)
01101000 divisor

quotient=0
.repeat
quotient = quotient *2 ;shl
.if dividend >= divisor ;>=
dividend= dividend - divisor ;sub
inc quotient ;inc
.endif
divisor = divisor / 2 ;shr
align_position = align_position -1 ;dec
.until align_position != -1

align
117 01110101 dividend (remainder)
104 01101000 divisor
0 quotient
3 times to go (align_position)

Start process:
quotient=0 align_position=3
117 >= 104? yes, quotient *2, subtract, increase quotient, divisor/2, align_position-1 quotient=0*2+1 align_position=2
13 >= 52? no, quotient *2, divisor/2, align_position-1 quotient=2 align_position=1
13 >= 26? no, quotient *2, divisor/2, align_position-1 quotient=4 align_position=0
13 >= 13? yes, quotient *2, subtract, increase quotient, divisor/2, align_position-1 quotient=8+1 align_position=-1
0 (remainder)

Quotient=9, remainder(dividend) = 0. Because divisor was negative, we need transform quotient into negative.
+9=00001001
-9=11110110+1=11110111

daydreamer · December 06, 2021, 01:21:07 AM

Would 1/x reciprocal LUT + multiply be faster?
1/117

mikeburr · February 19, 2022, 08:14:07 AM

the twos complement of a number doesnt usually give its 64 bit inverse... im afraid it takes a bit more working out than that .. in fact its quite involved but you can create a table of them as i did for the first 30000 primes and incorporated them in a 64 bit program to extract factors of numbers up to about 10^11 i think .. this is faster than division [but not as much as i was hoping because of the way it is unrolled during processing i suspect ] although the algo is slightly more complicated [ but not much]
regards mike b

The MASM Forum

News:

idiv algorithm

Mikl__

nidud

mineiro

hutch--

mineiro

daydreamer

mineiro

daydreamer

mikeburr

mikeburr

Mikl__

mineiro

daydreamer

mikeburr