64/64-bit division in 32-bit code

NoCforMe · August 22, 2023, 08:02:41 AM

OK, here's yet another implementation of a 64-by-64-bit division routine for 32-bit (x86) code. Tested, works. I got this from, of all places, a Wikipedia article on division methods. I really hate Wikipedia, for many reasons, and usually try to avoid it if at all possible. But for technical stuff they're sometimes not bad.

Anyhow, the article included the following method () in pseudocode:

Code Select

Q := 0                  -- Initialize quotient and remainder to zero
R := 0                     
for i := n − 1 .. 0 do  -- Where n is number of bits in N
  R := R << 1           -- Left-shift R by 1 bit
  R(0) := N(i)          -- Set the least-significant bit of R equal to bit i of the numerator
  if R ≥ D then
    R := R − D
    Q(i) := 1
  end
end

where N is the dividend, D is the divisor and R is the remainder.

Here's my implementation. It's reasonably fast, fast enough for my purposes. (Anyone want to race this against other methods?) My previous attempt was a successive-subtraction divider, which worked but it was s-o-o s-l-o-w as to be unusable (unless you want a division function that doubles as a delay timer!).

Ironically, since I ran out of registers here, this would be an ideal candidate for 64-bit code, except that it would be totally unnecessary there. I like this as a 32-bit function. (I pass the dividend in EDX:EAX and the divisor in a small structure.)

Code Select

;============================================
; Div64x64 (&divStruct)
;
; 64-bit by 64-bit successive-subtraction division.
; Quotient and remainder are 64 bits.
;
; On entry,
;	EDX:EAX = dividend
;	divisor in divStruct
;
; Returns:
;	EDX:EAX = quotient
;	remainder in divStruct
;============================================

Div64x64	PROC divStructPtr:DWORD
	LOCAL	divisorLo:DWORD, divisorHi:DWORD, dividendLo:DWORD, dividendHi:DWORD
	LOCAL	bitIndex:DWORD

; If dividend is zero, skip all this stuff, just return zero for quotient & remainder:
	MOV	ECX, EAX
	OR	ECX, EDX
	JNZ	@F
	MOV	EDX, divStructPtr
	MOV	[EDX].DIVSTRUCT.remainderLo, 0
	MOV	[EDX].DIVSTRUCT.remainderHi, 0
	XOR	EAX, EAX
	XOR	EDX, EDX
	RET

@@:	PUSH	EBX
	PUSH	ESI
	PUSH	EDI

	MOV	dividendLo, EAX
	MOV	dividendHi, EDX

	MOV	ECX, divStructPtr
	MOV	EAX, [ECX].DIVSTRUCT.divisorLo
	MOV	divisorLo, EAX
	MOV	EAX, [ECX].DIVSTRUCT.divisorHi
	MOV	divisorHi, EAX

	XOR	ESI, ESI
	XOR	EDI, EDI		;EDI:ESI = quotient
	XOR	ECX, ECX
	XOR	EBX, EBX		;EBX:ECX = remainder
	XOR	EAX, EAX		;Clear bit carrier.

	MOV	bitIndex, 63

; Left-shift R:
dvloop:	SHLD	EBX, ECX, 1
	SHL	ECX, 1

; R(0) = N(i):
	PUSH	ECX
	MOV	EDX, 1			;"Seed" for bitmask.
	MOV	ECX, bitIndex
	CMP	ECX, 32
	JB	doLow1
	SUB	ECX, 32
	SHL	EDX, CL			;EDX now is mask for the bit we want.
	TEST	dividendHi, EDX		;Is N[i] set?
	JMP	@F

doLow1:	SHL	EDX, CL
	TEST	dividendLo, EDX		;Is N[i] set?
@@:	SETNZ	AL
	POP	ECX
	AND	ECX, DWORD PTR NOT 1	;Clear bit[0].
	OR	ECX, EAX		;Set in bit[0] (or not).

; If R >= D:
	CMP	EBX, divisorHi
	JB	next
	JA	@F

; Must be equal--need to compare low DWORDs:
	CMP	ECX, divisorLo
	JB	next

; R = R - D:
@@:	SUB	ECX, divisorLo
	SBB	EBX, divisorHi

; Q[i] = 1:
	PUSH	ECX
	MOV	EDX, 1
	MOV	ECX, bitIndex
	CMP	ECX, 32
	JB	doLow2
	SUB	ECX, 32
	SHL	EDX, CL			;EDX now is mask for the bit we want.
	OR	EDI, EDX
	JMP	SHORT @F

doLow2:	SHL	EDX, CL
	OR	ESI, EDX
@@:	POP	ECX

next:	DEC	bitIndex
	JNS	dvloop			;Make sure we travel through loop on 0.

	MOV	EDX, divStructPtr
	MOV	[EDX].DIVSTRUCT.remainderLo, ECX
	MOV	[EDX].DIVSTRUCT.remainderHi, EBX
	MOV	EAX, ESI
	MOV	EDX, EDI

	POP	EDI
	POP	ESI
	POP	EBX
	RET

Div64x64	ENDP

jj2007 · August 22, 2023, 08:55:49 AM

I give it a try, but the numbers don't convince me. What am I doing wrong?

Code Select

  mov divstruct.divisorLo, 123
  mov eax, 123456789
  cdq
  push edx
  push eax
  invoke Div64x64, addr divstruct	; expected: 123456789/123=1003713.7
  deb 4, "Out div64", divstruct.remainderLo, divstruct.remainderHi, divstruct.divisorLo, divstruct.divisorHi
  fild qword ptr [esp]
  fild qword ptr divstruct.divisorLo
  fdiv
  fistp qword ptr divstruct.remainderLo
  deb 4, "Out fpu", divstruct.remainderLo, divstruct.remainderHi, divstruct.divisorLo, divstruct.divisorHi

Code Select

Out div64
divstruct.remainderLo   90
divstruct.remainderHi   0
divstruct.divisorLo     123
divstruct.divisorHi     0

Out fpu
divstruct.remainderLo   1003714
divstruct.remainderHi   0
divstruct.divisorLo     123
divstruct.divisorHi     0

NoCforMe · August 22, 2023, 03:12:14 PM

Looks like you're not setting .divisorHi in the divstruct, so you're getting random garbage there. And the quotient comes back in EDX:EAX, not in the structure.

jj2007 · August 22, 2023, 08:16:38 PM

Quote from: NoCforMe on August 22, 2023, 03:12:14 PMLooks like you're not setting .divisorHi in the divstruct, so you're getting random garbage there.

divisorHi is zero, not a problem here.

QuoteAnd the quotient comes back in EDX:EAX, not in the structure.

That was the missing information, now it works, thanks:

Code Select

Print Str$("Result: %i\n", edx::eax)
Output: Result: 1003713

jj2007 · August 22, 2023, 11:31:45 PM

The fpu wins, as usual

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Dividing 123456789 by 123, 5000000 times
85 ms   for fpu fdiv, result=1003714
1146 ms for Div64x64, result=1003713
89 ms   for fpu fdiv, result=1003714
1137 ms for Div64x64, result=1003713
87 ms   for fpu fdiv, result=1003714
1154 ms for Div64x64, result=1003713

NoCforMe · August 23, 2023, 09:00:37 AM

Thanks for those sobering results. Speed isn't an issue in my application, but it's always a nice thing to have.

So if a guy wanted to use the FPU in 32-bit code, could he do this to load up the FPU?

Code Select

fild qword ptr [some 64-bit var]

Never used qwords before in X86.

Then there's the problem of getting the remainder ...

jj2007 · August 23, 2023, 09:28:10 AM

Quote from: NoCforMe on August 23, 2023, 09:00:37 AMSo if a guy wanted to use the FPU in 32-bit code, could he do this to load up the FPU?

Code Select Expand
fild qword ptr [some 64-bit var]

Yes. The fild instruction works for DWORDs, QWORDs and even WORDs.

QuoteThen there's the problem of getting the remainder ...

frndint is your friend.

daydreamer · August 23, 2023, 04:25:43 PM

David you been too late,this was solution to solve division on old z80,6502 without division opcode,there was asm code articles in computer magazine for multiply proc
Fastest 64 bit div might be special case where you can use sse2 64 bit shift /2,/4/6,/8
After seen 32 bit example program 2^x ,i made sse2 64bit shift 2^x

NoCforMe · August 23, 2023, 04:44:51 PM

Quote from: daydreamer on August 23, 2023, 04:25:43 PMDavid you been too late,this was solution to solve division on old z80,6502 without division opcode

Oh, I know. This is part of my hobby, reinventing wheels that were invented decades ago. I'd actually like to do some Z80 programming someday; something about that incredibly underutilized machine just seems so attractive. (Love that context-switching mechanism to handle interrupts!) I did write a divide routine for another uprocessor, the SX-28 which I used to make a couple projects including a (very accurate) camera shutter-speed tester.

daydreamer · August 23, 2023, 05:43:50 PM

If you keep develop it to work with bigger numbers = bigintiger division that can be used test very high prime numbers

My mindset is trained old-school Algos/cheating with luts and asm,so using SSE/fpu/gp asm is my skill doing things,sorry can't erase that to replace with pixelshader skill,demos bruteforcing in real-time with help of very fast gpus, also mindset need to understand the inverse ways of per pixel coding,instead of used to solve things with loop y and innerloop x and write pixels

Dos 16 bit is much more challenging than being spoiled by 3+ghz 64bit cpu,64GB memory, fast gpu
So mother of inventions when you dont have things is no more,but MHz cpu without SIMD opcodes ,MB memory and no gpu, so it motivate do my best to run program faster

jack · August 25, 2023, 08:57:33 PM

Quote from: NoCforMe on August 22, 2023, 08:02:41 AMOK, here's yet another implementation of a 64-by-64-bit division routine for 32-bit (x86) code. Tested, works. I got this from, of all places, a Wikipedia article on division methods. I really hate Wikipedia, for many reasons, and usually try to avoid it if at all possible. But for technical stuff they're sometimes not bad.

Anyhow, the article included the following method () in pseudocode:

Code Select Expand
Q := 0 -- Initialize quotient and remainder to zero R := 0 for i := n − 1 .. 0 do -- Where n is number of bits in N R := R << 1 -- Left-shift R by 1 bit R(0) := N(i) -- Set the least-significant bit of R equal to bit i of the numerator if R ≥ D then R := R − D Q(i) := 1 end end
where N is the dividend, D is the divisor and R is the remainder.

R := R << 1 that algorithm is just a shift and subtract method, right?
what's different from your previous attempt was a successive-subtraction divider

jack · August 25, 2023, 10:19:52 PM

I think that it should be possible to implement a shift and subtract division by shifting 4 bits at a time
here's some public domain C code that might be of interest https://github.com/glitchub/arith64/tree/master

NoCforMe · August 26, 2023, 12:04:26 PM

Quote from: jack on August 25, 2023, 08:57:33 PMwhat's different from your previous attempt was a successive-subtraction divider

Oh, that one was a really dumbass routine that simply subtracted the divisor from the dividend repeatedly until it went too far, yielding the quotient (and remainder). It works, but gawd is it slow! This one's much faster. I actually implemented a division routine that way for a microprocessor (SX-28) that had no multiply or divide instructions, which was acceptable in that application.

stoo23 · August 26, 2023, 12:42:27 PM

QuoteOh, I know. This is part of my hobby, reinventing wheels that were invented decades ago. I'd actually like to do some Z80 programming someday; something about that incredibly underutilized machine just seems so attractive.

you should see if you can find an old SORD M68K

https://www.sord.co.nz/sord-history.html
https://www.sord.co.nz/collection-sord-m68.html

We acquired one for some engineering development work we were doing waaay back, around the time IBM released it's First 'PC' !!

The SORD, had Both a Z80 A AND a Motorola 68000 on board. When using the 68000, the Z80, handled all the I/O functions.
It came with it's own Colour Graphics Programming language and ran an Excellent Compilable BASIC for the Z80.

It was an Excellent machine and realistically far more advanced than Any of the Early PC's

In an Era when people were always suggesting the Japanese made great hardware but Not software, the SORD BASIC and a SORD Database/Spreadsheet application called PIPS and the SORD Word Processor, were 3 very good examples that disproved that theory

NoCforMe · August 27, 2023, 04:31:33 AM

Interesting; I had no idea such a system existed. Was this mainly a NZ/Aus. thing? I don't think this was ever sold widely in the US.

The MASM Forum

News:

64/64-bit division in 32-bit code

NoCforMe

jj2007

NoCforMe

jj2007

jj2007

NoCforMe

jj2007

daydreamer

NoCforMe

daydreamer

jack

jack

NoCforMe

stoo23

NoCforMe