News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Division by multiplying with a magic number

Started by jj2007, May 01, 2022, 10:55:56 PM

Previous topic - Next topic

jj2007

I had hoped to divide by 10 with SIMD instructions, but somehow it fails. Do out math experts have an explanation?

include \masm32\include\masm32rt.inc
.686p
.xmm
.code
MyQ dq 123456789
MyQD dd 123456789
Magic dq 0CCCCCCCCCCCCCCCDh
MagicD dd 0CCCCCCCDh

start:
  mov eax, MyQD
  mov ecx, MagicD
  mul ecx ; multiply DWORDs, result is 64 bits
  shr edx, 3
  print str$(edx), " result magic D", 13, 10 ; 12345678
  movlps xmm0, MyQ
  movlps xmm1, Magic
  pclmullqlqdq xmm0, xmm1 ; multiply QWORDs, result is 128 bits
  movhlps xmm0, xmm0 ; move high QWORD to lo QWORD
  psrlq xmm0, 3 ; see shr edx, 3
  movd eax, xmm0
  inkey str$(eax), " result magic Q", 13, 10  ; garbage
  exit
end start

InfiniteLoop

pclmullqlqdq

Its "carry-less" multiplication. Its a useless instruction.
You need mulx for 64-bit x 64-bit ==> 128-bit or a complicated series of AVX2 using shifts or 64-bit double manipulation or AVX512 multiply.

Biterider

Hi
Yes, pclmullqlqdq is not the instruction you are looking for. It performs a polynomial multiplication on GF(2).
As mentioned by InfiniteLoop, for this implementation you need (64bit) x (64bit) = high64(128bit).

This article may help https://stackoverflow.com/questions/28868367/getting-the-high-part-of-64-bit-integer-multiplication/50958815#50958815.

Interesting to note that C intrinsics fall back on a simple mul instruction for a 128-bit (RDX::RAX) multiplication.

Biterider

jj2007

Thanks, InfiniteLoop and Biterider :thumbsup:

Quote from: Biterider on May 02, 2022, 03:52:16 AMInteresting to note that C intrinsics fall back on a simple mul instruction for a 128-bit (RDX::RAX) multiplication

I know, this works fine, but I need it for my 32-bit library. Bad luck :sad: