News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

idiv algorithm

Started by Mikl__, November 24, 2021, 04:16:02 AM

Previous topic - Next topic

Mikl__

Hi, All!
It required to binary divide +117 by -13. If I divide +117 by +13, it is simple, I will used subtractions and shifts,

but how to binary divide 01110101b by 11110011b so that the result is 11110111 using elementary operations (shifts, addition, subtraction, logic AND, OR, XOR)?
I understand that you need to look for literature on intel8080 or microcontrollers and look there, but so far nothing is at hand...

nidud

#1
deleted

mineiro

Two's complement (not + 1) (neg).
A computer never subtracts, only do additions. So, how to perform subtraction by doing addition?
The two-complement idea was in limbo for a long time until they found a use for it. (from eletronics point of view).

There is a one-complement as well, which uses only "not".
In both cases, exceptions should be handled.

In electronics they teach us "half adder" and "full adder". They don't teach us "subtraction".
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

hutch--

 I am driven by sheer laziness when I use SSE2 32 and 64 bit arithmetic, its easy, fast and reliable.  :biggrin:

mineiro

This code need a bit more refinement, with a bit more work can deal with big numbers.

;253/7, 253/-7        ;prime numbers
mov rdi,8000000000000000h       ;sign mask
mov rax,117 ;0      ;dividend
mov rbx,-13 ;0      ;divisor
xor rsi,rsi         ;quotient
                    ;remainder will be rax
test rbx,rdi        ;sign mask, if left most digit is 1 means negative number, if 0 means positive number
jz @F
neg rbx             ;two complement
@@:
.if rax < rbx       ;2/5, 0/1    ;quotient = 0, remainder = dividend
    jmp @F
.endif
.if rbx == 0        ;N/0        ;handle exception
    xor eax,eax
    div rsi         ;forced exception, division by 0
.endif
;--------------------
bsr rcx,rax         ;counting rotations
inc rcx
bsr rdx,rbx
inc rdx
sub rcx,rdx         ;rcx= final shift, magnitude        2/2 = 0, 3/2 = 0
shl rbx,cl          ;align numbers

.repeat
    shl rsi,1
    .if rax >= rbx
        sub rax,rbx         ;rax turns into remainder
        or rsi,1
    .endif
    shr rbx,1
    dec rcx
.until rcx == -1

@@:
;RESULT
;quotient = 36         ;rsi
;remainder = 1         ;rax
;36*7+1=253
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

daydreamer

Look at old free computer magazines,there was multiply article, on 6502 but it's just shifts and adds?
Maybe search for 8bit embedded code?
@miniero
Maybe possible make subtract circuit thinking out of the box,like rcpss instruction uses 1024 LUT and gfx card pixel shader has 1cycle cosine and sine,probably builtin LUT
Neg xdelta and neg ydelta perfect for breakout game,just bounce ball inside square

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

mineiro

Quote from: daydreamer on November 24, 2021, 11:42:37 PM
... just shifts and adds?
@miniero
... probably builtin LUT

shifts are arithmetical adds or can be multiplications.
1+1=10 or (1b shl 1)
10+10=100 or (10b shl 1)
101+101=1010 or (101b shl 1) or (101*2)
...
Yes, a transform precalculated table (LUT) is an option, but will be more harder to be expanded (big data) because need more memory to hold results. Like perform a number factorial!.

I'd rather be this ambulant metamorphosis than to have that old opinion about everything

daydreamer

Quote from: mineiro on November 25, 2021, 04:29:05 AM
Quote from: daydreamer on November 24, 2021, 11:42:37 PM
... just shifts and adds?
@miniero
... probably builtin LUT

shifts are arithmetical adds or can be multiplications.
1+1=10 or (1b shl 1)
10+10=100 or (10b shl 1)
101+101=1010 or (101b shl 1) or (101*2)
...
Yes, a transform precalculated table (LUT) is an option, but will be more harder to be expanded (big data) because need more memory to hold results. Like perform a number factorial!.
something like this

mov ecx,numberofbits
xor edx,edx ;zero result,ebx*eax
@@L1:sar ebx,1 ; check bits 0,1,2,3,4,5,6,7,8...in loop
jcc @@L2 ;if mul by zero jump over,else mul by 1
add edx,eax
@@L2:sal eax,1 ;now * 10,100,1000,10000,10000...
dec ecx
jne @@L1

with the clock cycles on modern cpus,probably circuits that tests many bits in parallel to achive so low number of cycles,probably
actually I have factorial LUTs for my trigo taylor series,for 3!,5!,7!,9! for sine,2!,4!,6!,8! for cosine,storing 99! in LUT is big gain vs doing lots of muls to calculate it
divide is more complex testing equal or less:subtract,last time I did it I ended up in some endless loop
still schools multiply table 0-100 easily fit into memory
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

mikeburr

in 64 bit .. multiply by 4EC4EC4EC4EC4EC5 == divide by 13 ... but what are you going to do with the remainder ???
regards mike b

mikeburr

ps the complement is B13B13B13B13B13B  btw

Mikl__

Why do I have to shift +117 to the left by 9 digits and add +117 to get the correct result? Is this some kind of witchcraft?
+117/-13=-9
-13=243    -9=247   243*247=60021=117*513 ?

mineiro

Number 117 fits into 7 bits (log2 117= 6,87036472 bits, so 7 bits).
Number 13 fits into 4 bits (log2 13= 3,700439718 bits, so 4 bits).
We can do log2 by using instruction bsr.
Select biggest from 4 and 7, we need one more bit to be sign bit. So, we can deal with 8 bits group.
The left most bit into N bits group it's a sign bit.
Values of dividend and divisor should be aligned. The left most bit 1 of dividend need be aligned with left most bit 1 of divisor. Both having same positive signal.

Transform -13 to +13 using N bits group.
117     01110101    dividend
13      00001101    divisor

Align both numbers, shift left most bit 1 of divisor to the left most bit 1 of dividend.
Divisor was shift left by 3 bits positions. We need this 3 value (align_position), to know how to stop division process.
        01110101    dividend (remainder)
        01101000    divisor

quotient=0
.repeat
    quotient = quotient *2                  ;shl
    .if dividend >= divisor                 ;>=
        dividend= dividend - divisor        ;sub
        inc quotient                        ;inc
    .endif
    divisor = divisor / 2                   ;shr
    align_position = align_position -1      ;dec
.until align_position != -1

align
117     01110101    dividend (remainder)
104     01101000    divisor
        0           quotient
        3           times to go (align_position)

Start process:
quotient=0  align_position=3
117 >= 104? yes, quotient *2, subtract, increase quotient, divisor/2, align_position-1       quotient=0*2+1  align_position=2
13 >= 52? no, quotient *2, divisor/2, align_position-1                                       quotient=2  align_position=1
13 >= 26? no, quotient *2, divisor/2, align_position-1                                       quotient=4  align_position=0
13 >= 13? yes, quotient *2, subtract, increase quotient, divisor/2, align_position-1         quotient=8+1 align_position=-1
0 (remainder)

Quotient=9, remainder(dividend) = 0. Because divisor was negative, we need transform quotient into negative.
+9=00001001
-9=11110110+1=11110111
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

daydreamer

Would 1/x reciprocal LUT + multiply be faster?
1/117
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

mikeburr

the twos complement of a number doesnt usually give its 64 bit inverse... im afraid it takes a bit more working out than that .. in fact its quite involved but you can create a table of them as i did for the first 30000 primes and incorporated them in a 64 bit program to extract factors of numbers up to about 10^11 i think .. this is faster than division [but not as much as i was hoping because of the way it is unrolled during processing i suspect ] although the algo is slightly more complicated [ but not much]
regards mike b