The MASM Forum

General => The Workshop => Topic started by: jj2007 on May 01, 2022, 10:55:56 PM

Title: Division by multiplying with a magic number
Post by: jj2007 on May 01, 2022, 10:55:56 PM
I had hoped to divide by 10 with SIMD instructions, but somehow it fails. Do out math experts have an explanation?

include \masm32\include\masm32rt.inc
.686p
.xmm
.code
MyQ dq 123456789
MyQD dd 123456789
Magic dq 0CCCCCCCCCCCCCCCDh
MagicD dd 0CCCCCCCDh

start:
  mov eax, MyQD
  mov ecx, MagicD
  mul ecx ; multiply DWORDs, result is 64 bits
  shr edx, 3
  print str$(edx), " result magic D", 13, 10 ; 12345678
  movlps xmm0, MyQ
  movlps xmm1, Magic
  pclmullqlqdq xmm0, xmm1 ; multiply QWORDs, result is 128 bits
  movhlps xmm0, xmm0 ; move high QWORD to lo QWORD
  psrlq xmm0, 3 ; see shr edx, 3
  movd eax, xmm0
  inkey str$(eax), " result magic Q", 13, 10  ; garbage
  exit
end start
Title: Re: Division by multiplying with a magic number
Post by: InfiniteLoop on May 02, 2022, 02:54:58 AM
pclmullqlqdq

Its "carry-less" multiplication. Its a useless instruction.
You need mulx for 64-bit x 64-bit ==> 128-bit or a complicated series of AVX2 using shifts or 64-bit double manipulation or AVX512 multiply.
Title: Re: Division by multiplying with a magic number
Post by: Biterider on May 02, 2022, 03:52:16 AM
Hi
Yes, pclmullqlqdq is not the instruction you are looking for. It performs a polynomial multiplication on GF(2).
As mentioned by InfiniteLoop, for this implementation you need (64bit) x (64bit) = high64(128bit).

This article may help https://stackoverflow.com/questions/28868367/getting-the-high-part-of-64-bit-integer-multiplication/50958815#50958815 (https://stackoverflow.com/questions/28868367/getting-the-high-part-of-64-bit-integer-multiplication/50958815#50958815).

Interesting to note that C intrinsics fall back on a simple mul instruction for a 128-bit (RDX::RAX) multiplication.

Biterider
Title: Re: Division by multiplying with a magic number
Post by: jj2007 on May 02, 2022, 06:14:58 AM
Thanks, InfiniteLoop and Biterider :thumbsup:

Quote from: Biterider on May 02, 2022, 03:52:16 AMInteresting to note that C intrinsics fall back on a simple mul instruction for a 128-bit (RDX::RAX) multiplication

I know, this works fine, but I need it for my 32-bit library. Bad luck :sad: