News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

64/64-bit division in 32-bit code

Started by NoCforMe, August 22, 2023, 08:02:41 AM

Previous topic - Next topic

NoCforMe

OK, here's yet another implementation of a 64-by-64-bit division routine for 32-bit (x86) code. Tested, works. I got this from, of all places, a Wikipedia article on division methods. I really hate Wikipedia, for many reasons, and usually try to avoid it if at all possible. But for technical stuff they're sometimes not bad.

Anyhow, the article included the following method () in pseudocode:

Q := 0                  -- Initialize quotient and remainder to zero
R := 0                     
for i := n − 1 .. 0 do  -- Where n is number of bits in N
  R := R << 1           -- Left-shift R by 1 bit
  R(0) := N(i)          -- Set the least-significant bit of R equal to bit i of the numerator
  if R ≥ D then
    R := R − D
    Q(i) := 1
  end
end

where N is the dividend, D is the divisor and R is the remainder.

Here's my implementation. It's reasonably fast, fast enough for my purposes. (Anyone want to race this against other methods?) My previous attempt was a successive-subtraction divider, which worked but it was s-o-o s-l-o-w as to be unusable (unless you want a division function that doubles as a delay timer!).

Ironically, since I ran out of registers here, this would be an ideal candidate for 64-bit code, except that it would be totally unnecessary there. I like this as a 32-bit function. (I pass the dividend in EDX:EAX and the divisor in a small structure.)

;============================================
; Div64x64 (&divStruct)
;
; 64-bit by 64-bit successive-subtraction division.
; Quotient and remainder are 64 bits.
;
; On entry,
; EDX:EAX = dividend
; divisor in divStruct
;
; Returns:
; EDX:EAX = quotient
; remainder in divStruct
;============================================

Div64x64 PROC divStructPtr:DWORD
LOCAL divisorLo:DWORD, divisorHi:DWORD, dividendLo:DWORD, dividendHi:DWORD
LOCAL bitIndex:DWORD

; If dividend is zero, skip all this stuff, just return zero for quotient & remainder:
MOV ECX, EAX
OR ECX, EDX
JNZ @F
MOV EDX, divStructPtr
MOV [EDX].DIVSTRUCT.remainderLo, 0
MOV [EDX].DIVSTRUCT.remainderHi, 0
XOR EAX, EAX
XOR EDX, EDX
RET

@@: PUSH EBX
PUSH ESI
PUSH EDI

MOV dividendLo, EAX
MOV dividendHi, EDX

MOV ECX, divStructPtr
MOV EAX, [ECX].DIVSTRUCT.divisorLo
MOV divisorLo, EAX
MOV EAX, [ECX].DIVSTRUCT.divisorHi
MOV divisorHi, EAX

XOR ESI, ESI
XOR EDI, EDI ;EDI:ESI = quotient
XOR ECX, ECX
XOR EBX, EBX ;EBX:ECX = remainder
XOR EAX, EAX ;Clear bit carrier.

MOV bitIndex, 63

; Left-shift R:
dvloop: SHLD EBX, ECX, 1
SHL ECX, 1

; R(0) = N(i):
PUSH ECX
MOV EDX, 1 ;"Seed" for bitmask.
MOV ECX, bitIndex
CMP ECX, 32
JB doLow1
SUB ECX, 32
SHL EDX, CL ;EDX now is mask for the bit we want.
TEST dividendHi, EDX ;Is N[i] set?
JMP @F

doLow1: SHL EDX, CL
TEST dividendLo, EDX ;Is N[i] set?
@@: SETNZ AL
POP ECX
AND ECX, DWORD PTR NOT 1 ;Clear bit[0].
OR ECX, EAX ;Set in bit[0] (or not).

; If R >= D:
CMP EBX, divisorHi
JB next
JA @F

; Must be equal--need to compare low DWORDs:
CMP ECX, divisorLo
JB next

; R = R - D:
@@: SUB ECX, divisorLo
SBB EBX, divisorHi

; Q[i] = 1:
PUSH ECX
MOV EDX, 1
MOV ECX, bitIndex
CMP ECX, 32
JB doLow2
SUB ECX, 32
SHL EDX, CL ;EDX now is mask for the bit we want.
OR EDI, EDX
JMP SHORT @F

doLow2: SHL EDX, CL
OR ESI, EDX
@@: POP ECX

next: DEC bitIndex
JNS dvloop ;Make sure we travel through loop on 0.

MOV EDX, divStructPtr
MOV [EDX].DIVSTRUCT.remainderLo, ECX
MOV [EDX].DIVSTRUCT.remainderHi, EBX
MOV EAX, ESI
MOV EDX, EDI

POP EDI
POP ESI
POP EBX
RET

Div64x64 ENDP
Assembly language programming should be fun. That's why I do it.

jj2007

I give it a try, but the numbers don't convince me. What am I doing wrong?

  mov divstruct.divisorLo, 123
  mov eax, 123456789
  cdq
  push edx
  push eax
  invoke Div64x64, addr divstruct ; expected: 123456789/123=1003713.7
  deb 4, "Out div64", divstruct.remainderLo, divstruct.remainderHi, divstruct.divisorLo, divstruct.divisorHi
  fild qword ptr [esp]
  fild qword ptr divstruct.divisorLo
  fdiv
  fistp qword ptr divstruct.remainderLo
  deb 4, "Out fpu", divstruct.remainderLo, divstruct.remainderHi, divstruct.divisorLo, divstruct.divisorHi

Out div64
divstruct.remainderLo   90
divstruct.remainderHi   0
divstruct.divisorLo     123
divstruct.divisorHi     0

Out fpu
divstruct.remainderLo   1003714
divstruct.remainderHi   0
divstruct.divisorLo     123
divstruct.divisorHi     0

NoCforMe

Looks like you're not setting .divisorHi in the divstruct, so you're getting random garbage there. And the quotient comes back in EDX:EAX, not in the structure.
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: NoCforMe on August 22, 2023, 03:12:14 PMLooks like you're not setting .divisorHi in the divstruct, so you're getting random garbage there.

divisorHi is zero, not a problem here.

QuoteAnd the quotient comes back in EDX:EAX, not in the structure.

That was the missing information, now it works, thanks:

Print Str$("Result: %i\n", edx::eax)
Output: Result: 1003713

jj2007

The fpu wins, as usual :azn:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Dividing 123456789 by 123, 5000000 times
85 ms   for fpu fdiv, result=1003714
1146 ms for Div64x64, result=1003713
89 ms   for fpu fdiv, result=1003714
1137 ms for Div64x64, result=1003713
87 ms   for fpu fdiv, result=1003714
1154 ms for Div64x64, result=1003713

NoCforMe

Thanks for those sobering results. Speed isn't an issue in my application, but it's always a nice thing to have.

So if a guy wanted to use the FPU in 32-bit code, could he do this to load up the FPU?

fild qword ptr [some 64-bit var]

Never used qwords before in X86.

Then there's the problem of getting the remainder ...
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: NoCforMe on August 23, 2023, 09:00:37 AMSo if a guy wanted to use the FPU in 32-bit code, could he do this to load up the FPU?

fild qword ptr [some 64-bit var]

Yes. The fild instruction works for DWORDs, QWORDs and even WORDs.

QuoteThen there's the problem of getting the remainder ...

frndint is your friend.

daydreamer

David you been too late,this was solution to solve division on old z80,6502 without division opcode,there was asm code articles in computer magazine for multiply proc
Fastest 64 bit div might be special case where you can use sse2 64 bit shift /2,/4/6,/8
After seen 32 bit example program 2^x ,i made sse2 64bit shift 2^x
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

NoCforMe

Quote from: daydreamer on August 23, 2023, 04:25:43 PMDavid you been too late,this was solution to solve division on old z80,6502 without division opcode

Oh, I know. This is part of my hobby, reinventing wheels that were invented decades ago. I'd actually like to do some Z80 programming someday; something about that incredibly underutilized machine just seems so attractive. (Love that context-switching mechanism to handle interrupts!) I did write a divide routine for another uprocessor, the SX-28 which I used to make a couple projects including a (very accurate) camera shutter-speed tester.
Assembly language programming should be fun. That's why I do it.

daydreamer

If you keep develop it to work with bigger numbers = bigintiger division that can be used test very high prime numbers

My mindset is trained old-school Algos/cheating with luts and asm,so using SSE/fpu/gp asm is my skill doing things,sorry can't erase that to replace with pixelshader skill,demos bruteforcing in real-time with help of very fast gpus, also mindset need to understand the inverse ways of per pixel coding,instead of used to solve things with loop y and innerloop x and write pixels

Dos 16 bit is much more challenging than being spoiled by 3+ghz 64bit cpu,64GB memory, fast gpu
So mother of inventions when you dont have things is no more,but MHz cpu without SIMD opcodes ,MB memory and no gpu, so it motivate do my best to run program faster

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jack

Quote from: NoCforMe on August 22, 2023, 08:02:41 AMOK, here's yet another implementation of a 64-by-64-bit division routine for 32-bit (x86) code. Tested, works. I got this from, of all places, a Wikipedia article on division methods. I really hate Wikipedia, for many reasons, and usually try to avoid it if at all possible. But for technical stuff they're sometimes not bad.

Anyhow, the article included the following method () in pseudocode:

Q := 0                  -- Initialize quotient and remainder to zero
R := 0                     
for i := n − 1 .. 0 do  -- Where n is number of bits in N
  R := R << 1           -- Left-shift R by 1 bit
  R(0) := N(i)          -- Set the least-significant bit of R equal to bit i of the numerator
  if R ≥ D then
    R := R − D
    Q(i) := 1
  end
end

where N is the dividend, D is the divisor and R is the remainder.
R := R << 1 that algorithm is just a shift and subtract method, right?
what's different from your previous attempt was a successive-subtraction divider

jack

I think that it should be possible to implement a shift and subtract division by shifting 4 bits at a time
here's some public domain C code that might be of interest https://github.com/glitchub/arith64/tree/master

NoCforMe

Quote from: jack on August 25, 2023, 08:57:33 PMwhat's different from your previous attempt was a successive-subtraction divider

Oh, that one was a really dumbass routine that simply subtracted the divisor from the dividend repeatedly until it went too far, yielding the quotient (and remainder). It works, but gawd is it slow! This one's much faster. I actually implemented a division routine that way for a microprocessor (SX-28) that had no multiply or divide instructions, which was acceptable in that application.
Assembly language programming should be fun. That's why I do it.

stoo23

#13
:smiley:
QuoteOh, I know. This is part of my hobby, reinventing wheels that were invented decades ago. I'd actually like to do some Z80 programming someday; something about that incredibly underutilized machine just seems so attractive.

you should see if you can find an old SORD M68K  :smiley:
https://www.sord.co.nz/sord-history.html
https://www.sord.co.nz/collection-sord-m68.html

We acquired one for some engineering development work we were doing waaay back, around the time IBM released it's First 'PC' !!

The SORD, had Both a Z80 A AND a Motorola 68000 on board. When using the 68000, the Z80, handled all the I/O functions.
It came with it's own Colour Graphics Programming language and ran an Excellent Compilable BASIC for the Z80.

It was an Excellent machine and realistically far more advanced than Any of the Early PC's

In an Era when people were always suggesting the Japanese made great hardware but Not software, the SORD BASIC and a SORD Database/Spreadsheet application called PIPS and the SORD Word Processor, were 3 very good examples that disproved that theory  :smiley:

NoCforMe

Interesting; I had no idea such a system existed. Was this mainly a NZ/Aus. thing? I don't think this was ever sold widely in the US.
Assembly language programming should be fun. That's why I do it.