Print Page - Modular multiplicative inverse

Title: Modular multiplicative inverse
Post by: mabdelouahab on September 09, 2019, 07:31:44 AM

What is the rule to get the multiplicative inverse of X (8 bits)?
What is the condition that is contrived in 02BH to be the multiplicative inverse for 015H?

QuoteThis affine transformation is the sum of multiple rotations of the byte as a vector, where addition is the XOR operation:

s = b ⊕ ( b ⋘ 1 ) ⊕ ( b ⋘ 2 ) ⊕ ( b ⋘ 3 ) ⊕ ( b ⋘ 4 ) ⊕ 63

where b represents the multiplicative inverse, ....

Rijndael S-box (https://en.wikipedia.org/wiki/Rijndael_S-box)

Title: Re: Modular multiplicative inverse
Post by: mabdelouahab on September 09, 2019, 05:39:56 PM

Quotea ≡ b (mod n) means :
- a and b have the same remainder when divided by n
- a = K.n + b
- (a - b) is multiple of n

How it was applied to the Rijndael table?
017h≡05Fh? or 06Dh≡093h? ...

Code Select

.data		;00 , 01   02   03   04   05   06   07   08   09   0A   0B   0C   0D   0E   0F		
MulInv	DB	001h,001h,08Dh,0F6h,0CBh,052h,07Bh,0D1h,0E8h,04Fh,029h,0C0h,0B0h,0E1h,0E5h,0C7h	;00
	DB	074h,0B4h,0AAh,04Bh,099h,02Bh,060h,05Fh,058h,03Fh,0FDh,0CCh,0FFh,040h,0EEh,0B2h	;01
	DB	03Ah,06Eh,05Ah,0F1h,055h,04Dh,0A8h,0C9h,0C1h,00Ah,098h,015h,030h,044h,0A2h,0C2h	;02
	DB	02Ch,045h,092h,06Ch,0F3h,039h,066h,042h,0F2h,035h,020h,06Fh,077h,0BBh,059h,019h	;03
	DB	01Dh,0FEh,037h,067h,02Dh,031h,0F5h,069h,0A7h,064h,0ABh,013h,054h,025h,0E9h,009h	;04
	DB	0EDh,05Ch,005h,0CAh,04Ch,024h,087h,0BFh,018h,03Eh,022h,0F0h,051h,0ECh,061h,017h	;05
	DB	016h,05Eh,0AFh,0D3h,049h,0A6h,036h,043h,0F4h,047h,091h,0DFh,033h,093h,021h,03Bh	;06
	DB	079h,0B7h,097h,085h,010h,0B5h,0BAh,03Ch,0B6h,070h,0D0h,006h,0A1h,0FAh,081h,082h	;07
	DB	083h,07Eh,07Fh,080h,096h,073h,0BEh,056h,09Bh,09Eh,095h,0D9h,0F7h,002h,0B9h,0A4h	;08
	DB	0DEh,06Ah,032h,06Dh,0D8h,08Ah,084h,072h,02Ah,014h,09Fh,088h,0F9h,0DCh,089h,09Ah	;09	
	DB	0FBh,07Ch,02Eh,0C3h,08Fh,0B8h,065h,048h,026h,0C8h,012h,04Ah,0CEh,0E7h,0D2h,062h	;0A
	DB	00Ch,0E0h,01Fh,0EFh,011h,075h,078h,071h,0A5h,08Eh,076h,03Dh,0BDh,0BCh,086h,057h	;0B
	DB	00Bh,028h,02Fh,0A3h,0DAh,0D4h,0E4h,00Fh,0A9h,027h,053h,004h,01Bh,0FCh,0ACh,0E6h	;0C
	DB	07Ah,007h,0AEh,063h,0C5h,0DBh,0E2h,0EAh,094h,08Bh,0C4h,0D5h,09Dh,0F8h,090h,06Bh	;0D	
	DB	0B1h,00Dh,0D6h,0EBh,0C6h,00Eh,0CFh,0ADh,008h,04Eh,0D7h,0E3h,05Dh,050h,01Eh,0B3h	;0E
	DB	05Bh,023h,038h,034h,068h,046h,003h,08Ch,0DDh,09Ch,07Dh,0A0h,0CDh,01Ah,041h,01Ch	;0F

Title: Re: Modular multiplicative inverse
Post by: LiaoMi on September 09, 2019, 06:46:14 PM

Hi mabdelouahab,

4.2 Multiplication
5.1.1 SubBytes()Transformation
http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf)

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 10, 2019, 12:09:01 AM

Your question appears to be: How is the Multiplicative Inverse Table produced for the Rijndael characteristic 2 finite field which uses the following polynomial for multiplication: x^8 + x^4 + x^3 + x + 1?

There are a few ways, but you can try the Euclid-Wallis Algorithm, mentioned here on the chapter Simple algebraic field extensions (https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm).
If you follow the mentioned example where a= x^6+x^4+x+1 (binary 0101 0011 = 53h) you will get the inverse as x^7 + x^6 + x^3 + x (binary 1100 1010 = CAh). Confirm in the table that it is correct!

If you don't understand, I can't explain better because the person that explained it to me will not be available in the short term. :biggrin:

Title: Re: Modular multiplicative inverse
Post by: manuel on September 10, 2019, 03:47:28 AM

Translated by Google sorry.
I think you want to calculate the multiplicative inverse of x (Mod 255) you need to calculate
MCDE with the values of x, 255. If the mcde (x, 255) is different from 1, then x has no inverse
If the mcde (x, 255) == 1 then the algorithm also returns the inverse.
I have the function written in C # for the BigInteger class. If you want it, you tell me.

Title: Re: Modular multiplicative inverse
Post by: manuel on September 10, 2019, 04:05:45 AM

Mcd(2Bh,FFh) = 1 OK. The condition inv(2B) = ACh
Mcd(15h,FFh) = 3 != 1 No inverse exist.

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 10, 2019, 04:17:48 AM

Quote from: manuel on September 10, 2019, 04:05:45 AM
Mcd(2Bh,FFh) = 1 OK. The condition inv(2B) = ACh
Mcd(15h,FFh) = 3 != 1 No inverse exist.

We are talking about a different thing,we are talking about the Finite Field GF(2^8). Things are much more complicated.
Introduction to Finite Fields.
http://www.cs.utsa.edu/~wagner/laws/FFM.html

Title: Re: Modular multiplicative inverse
Post by: mabdelouahab on September 10, 2019, 06:54:30 AM

Quote from: LiaoMi on September 09, 2019, 06:46:14 PM
Hi mabdelouahab,

4.2 Multiplication
5.1.1 SubBytes()Transformation
http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf)

Hi LiaoMi
"For the AES algorithm, this irreducible polynomial is m(x) = x^8 + x^4 + x^3 + x +1 or {01}{1b} in hexadecimal notation."
These explain a lot, Thanks

Quote from: AW on September 10, 2019, 12:09:01 AM
If you don't understand, I can't explain better because the person that explained it to me will not be available in the short term. :biggrin:

I think we will not need your friend, because we have LiaoMi :biggrin:
But the algorithm did not work with me، It gives me in the first step : quotient=x +1 , but the author confirms that the first step : quotient=x^2 +1

"Moreover, div is an auxiliary function that computes the quotient of the Euclidean division."

Code Select

function inverse(a, p)
    t := 0;     newt := 1;    
    r := p;     newr := a;    
    while newr ≠ 0
        quotient := r div newr
        (r, newr) := (newr, r - quotient * newr)
        (t, newt) := (newt, t - quotient * newt) 
    if degree(r) > 0 then 
        return "Either p is not irreducible or a is a multiple of p"
    return (1/r) * t

Code Select

inversefunc proc a;,p
	LOCAL _t,_newt,_r,_newr,_quotient
	mov _t,0
	mov _newt,1
	mov eax,011BH	;p in my case p Always equal 011bh
	mov _r,eax
	mov eax,a
	mov _newr,eax
@@:
	xor edx,edx
	mov eax,_r
	mov ecx,_newr
	div ecx
	mov _quotient,eax	;        quotient := r div newr
	mov edx,_newr
	mov ebx,_r
	mov _r,edx			; 		r:= newr
	mul ecx
	sub ebx,eax
	mov _newr,ebx		;		newr:=r - quotient * newr
	mov eax,_quotient
	imul _newt
	mov ebx,_t
	mov edx,_newt
	mov _t,edx			;		t:=newt
	sub ebx,eax
	mov _newt,ebx		; 		newt:=t - quotient * newt)
	cmp _newr,0
	jne @B
	mov eax,_t
	ret

inversefunc endp

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 10, 2019, 01:20:47 PM

@mabdelouahab
You can't divide polynomials that way! :badgrin:
Check with Liao Mi, if you don't believe. :biggrin:

Here is another example of using the Euclid-Wallis Algorithm (https://math.stackexchange.com/questions/529156/extended-euclidean-algorithm-in-gf28), to build the table.
We want the inverse of x^5+1 (which is 10 0001 = 21h). The result is x^6+x^5+x^3+x^2+x (which means 110 1110 = 6Eh)
Confirm in the table that the result is correct. I did, and can tel you it is! :thumbsup:

Title: Re: Modular multiplicative inverse
Post by: mabdelouahab on September 13, 2019, 01:31:30 AM

Final code

Code Select

include masm32rt.inc
include GF28.inc
		v=053h
.code
Start:
	invoke gf28GetMultiplicativeInverse,v
	printf("Multiplicative inverse of {%xh}=%Xh\n",v,eax)
	inkey
	exit

end Start

GF28.inc:

Code Select

	tPolyType struct
		coeff	db	16	dup(0)
	tPolyType ends
.data
 	gf28IrreduciblePolynomial db 	1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0 ; =11Bh

.code
	bytetopoly proc b  :byte	,p :ptr poly8
			mov esi,p
			dec esi
			xor eax,eax
			mov al,b
			mov ecx,-1
		bytetopolystrt:
			inc cl
			cmp cl,8
			je bytetopolyend__
			inc esi
			bt eax,ecx
			jnc bytetopolyz00
				mov byte ptr [esi],1
				jmp bytetopolystrt
		bytetopolyz00:
				mov byte ptr [esi],0
				jmp bytetopolystrt
		bytetopolyend__:
		ret
	bytetopoly endp
	wordtopoly proc b  :word	,p :ptr poly16
			mov esi,p
			dec esi
			xor eax,eax
			mov ax,b
			mov ecx,-1
		wordtopolystrt:
			inc cl
			cmp cl,16
			je wordtopolyend__
			inc esi
			bt eax,ecx
			jnc wordtopolyz00
				mov byte ptr [esi],1
				jmp wordtopolystrt
		wordtopolyz00:
				mov byte ptr [esi],0
				jmp wordtopolystrt
		wordtopolyend__:
		ret
	wordtopoly endp
	polytobyte proc pb :ptr byte,p :ptr poly8
		mov esi,p
		mov edi,pb
		mov byte ptr [edi],0
		xor ecx,ecx
		mov eax,1
	polytobytestrt:
		cmp byte ptr [esi],1
		jne @F
		or byte ptr [edi],al
	@@:
		inc esi
		shl al,1
		inc cl
		cmp cl,8
		jne polytobytestrt
		ret
	polytobyte endp
	polytoword proc pb :ptr word,p :ptr poly16
		mov esi,p
		mov edi,pb
		mov word ptr [edi],0
		xor ecx,ecx
		mov eax,1
	polytowordstrt:
		cmp byte ptr [esi],1
		jne @F
		or word ptr [edi],ax
	@@:
		inc esi
		shl ax,1
		inc cl
		cmp cl,16
		jne polytowordstrt
		ret
	polytoword endp	
	gf28resetpoly proc ptPolyType,deg
		mov edi,ptPolyType
		xor ecx,ecx
	@@: mov byte ptr [edi+ecx],0
		inc ecx
		cmp ecx,deg
		jne @B
		ret
	gf28resetpoly endp
		
	gf28MovPolyToPoly proc pdes,psrc,l
		mov edi,psrc
		mov esi,pdes
		xor ecx,ecx
	@@:
		mov al,[edi+ecx]
		mov [esi+ecx],al
		inc ecx
		cmp ecx,l
		jne @B	
		ret
	gf28MovPolyToPoly endp

	gf28GetDegreePoly proc uses esi edi ecx edx pfx
	LOCAL d,notzero_
		mov d,0
		mov notzero_,0
		mov esi,pfx
	
		xor ecx,ecx
	@@:	
		.if byte ptr [esi+ecx] !=0
			mov notzero_,1
			mov d,ecx
		.endif
		inc ecx
		cmp ecx,15
		jne @B
		mov eax,d
		.if notzero_
			inc eax
		.endif
		ret
	
	gf28GetDegreePoly endp
	
	gf28AddPoly proc pfxgx:ptr,pfx:ptr,pgx:ptr,l
		LOCAL tmppoly:tPolyType
		invoke gf28resetpoly,addr tmppoly,16

		lea esi,tmppoly
		mov edi,pfx
		mov edx,pgx 
		xor ecx,ecx
		
	@@:	mov al,[edx+ecx]
		xor al,[edi+ecx]
		mov [esi+ecx],al
		inc ecx
		cmp ecx,l
		jne @B
		
		invoke gf28MovPolyToPoly,pfxgx,addr tmppoly,16	
		
		ret
	gf28AddPoly endp

	gf28mulPoly proc pfxgx:ptr,pfx:ptr,pgx:ptr
		LOCAL indx1,indx2
		LOCAL tmppoly:tPolyType
		invoke gf28resetpoly,addr tmppoly,16
		lea esi,tmppoly
		mov edi,pfx
		mov indx1,0
	gf28mulPolyf:	
			mov edx,pgx 
			mov indx2,0
			gf28mulPolyg:	
				;*************************
				mov al,[edx]
				and al,[edi]
				mov ecx,indx1
				add ecx,indx2
				xor al,[esi+ecx]
				mov [esi+ecx],al				
				;*************************		
			inc edx
			inc indx2
			cmp indx2,8
			jne gf28mulPolyg
		inc edi
		inc indx1
		cmp indx1,8
		jne gf28mulPolyf
		
		lea edi,gf28IrreduciblePolynomial
	
	gf28mulPolymod:
		invoke gf28GetDegreePoly,addr tmppoly
		
		sub eax,9
		mov indx1,eax
		.if sdword ptr eax < 0
			invoke gf28MovPolyToPoly,pfxgx,addr tmppoly,16	
			ret
		.endif
		xor ecx,ecx
	@@:	
		mov edx,indx1
		add edx,ecx		
		mov al,[esi+edx]
		xor al,[edi+ecx]
		mov [esi+edx],al
		inc ecx
		cmp ecx,9
		jne @B
		jmp gf28mulPolymod
		
		ret
	gf28mulPoly endp
	gf28divpoly proc  pfxgx:ptr,pfx:ptr,pgx:ptr,pmod:ptr
		LOCAL indx1,indx2
		LOCAL tmppolymod:tPolyType
		LOCAL tmppolyqot:tPolyType
		.if pfxgx==0 && pmod==0 
			ret
		.endif
		.if pfxgx != 0
			invoke gf28resetpoly, addr tmppolyqot,16
			lea esi,tmppolyqot			
		.endif
		invoke gf28resetpoly,addr tmppolymod,16
		lea ebx,tmppolymod
		mov edi,pfx
		xor ecx,ecx
	@@:
		mov al,[edi+ecx]
		mov [ebx+ecx],al
		inc ecx
		cmp ecx,9
		jne @B	
		
		mov edi,ebx
		mov edx,pgx 
	
		mov indx2,rv(gf28GetDegreePoly,pgx)
	gf28divmod:
		mov indx1,rv(gf28GetDegreePoly,edi)
		sub eax,indx2
		mov indx1,eax
		.if (sdword ptr eax < 0) 
			.if pmod != 0
				invoke gf28MovPolyToPoly,pmod,addr tmppolymod,9
			.endif
			.if pfxgx != 0
				invoke gf28MovPolyToPoly,pfxgx,addr tmppolyqot,9
			.endif
			ret
		.endif
		.if pfxgx !=0
			mov byte ptr [esi+eax],1
		.endif
		push edx
		push ebx		
		push edi
		pop edi
		pop ebx
		pop edx
	
		xor ecx,ecx
	@@:	
		mov ebx,indx1
		add ebx,ecx
		mov al,[edi+ebx]
		xor al,[edx+ecx]
		mov [edi+ebx],al
		inc ecx
		cmp ecx,indx2
		jne @B
		jmp gf28divmod
		
		ret

	gf28divpoly endp
	
	gf28invPoly proc pinv,pfx
			LOCAL _t		:tPolyType
			LOCAL _r		:tPolyType
			LOCAL _newt		:tPolyType
			LOCAL _newr 	:tPolyType
			LOCAL _quotient	:tPolyType
			LOCAL _tmp		:tPolyType
			
		invoke gf28MovPolyToPoly, addr _r,addr gf28IrreduciblePolynomial,16	; r := p; invoke wordtopoly,011bh,addr _r       
		invoke wordtopoly,0h,addr _t				;    t := 0;       
		invoke wordtopoly,1h,addr _newt				;    newt := 1;    
		invoke gf28MovPolyToPoly, addr _newr,pfx,16	;    newr := a
		
		@@:
			invoke gf28GetDegreePoly,addr _newr
			.if eax == 0
				invoke gf28MovPolyToPoly, pinv,addr _t,16
				ret
			.endif
			invoke gf28MovPolyToPoly, addr _tmp,addr _newr,16	    
			invoke gf28divpoly,addr _quotient,addr _r,addr _newr,addr _newr
			invoke gf28MovPolyToPoly, addr _r,addr _tmp,16	;    r := _newr;    
	
			invoke gf28mulPoly,addr _tmp,addr _newt,addr _quotient
			invoke gf28AddPoly,addr _tmp,addr _tmp,addr _t,8
			invoke gf28MovPolyToPoly, addr _t,addr _newt ,8	;    t := _newt; ;   
			invoke gf28MovPolyToPoly,addr _newt, addr _tmp,8   
			jmp @B	; loop
		ret
	gf28invPoly endp
	gf28GetMultiplicativeInverse proc a:byte
		LOCAL _tmpPoly		:tPolyType
		LOCAL _tmpInvPoly	:tPolyType
		LOCAL _byteInverse	:byte
		
		invoke gf28resetpoly,addr _tmpPoly,16
		invoke bytetopoly,a,addr _tmpPoly
		invoke gf28invPoly,addr _tmpInvPoly,addr _tmpPoly
		invoke polytobyte,addr _byteInverse,addr _tmpInvPoly
		xor eax,eax
		mov al,_byteInverse
		ret
	gf28GetMultiplicativeInverse endp

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 13, 2019, 03:58:43 AM

It works but is a confusing mess, it appears like you converted it from some HLL.
All you need are 3 (three) ASM functions. Polynomial divide, GF(2^8) multiply and GF(2^8) add.

Code Select


.model flat, C

.code

polyDivision proc uses esi edi num:dword, den:dword, quocPtr:ptr, remPtr:ptr
	LOCAL remainder
	LOCAL term
	
	mov eax, num
	mov remainder, eax
	bsr esi, remainder
	bsr edi, den
	xor edx, edx 
	
	.while esi>=edi 
		mov ecx, esi
		sub ecx, edi 
		mov eax, den
		shl eax, cl
		mov term, eax
		mov eax, 1
		shl eax, cl
		xor edx, eax
		mov eax, remainder
		xor eax, term
		mov remainder, eax

		bsr esi, remainder
		bsr edi, den	
	.endw
	mov eax, quocPtr
	mov dword ptr [eax], edx
	mov eax, remPtr
	mov ecx, remainder
	mov dword ptr [eax], ecx
	
	ret
polyDivision endp

polyMultiply proc mult1:byte, mult2:byte, res : ptr
	LOCAL result : byte
	mov result, 0
	.while (mult1 != 0) && (mult2 !=0)
		movzx eax,mult2
		and al, 1
		.if al !=0
			movzx eax, result
			movzx ecx, mult1 
			xor eax, ecx
			mov result, al
		.endif
		movzx eax, mult1
		and eax, 80h
		.if al != 0
			movzx eax, mult1
			shl eax, 1
			xor eax, 11Bh
			mov mult1, al
		.else
			mov al, mult1
			shl al, 1
			mov mult1, al
		.endif
		mov al, mult2
		shr al,1
		mov mult2, al
	.endw
	mov eax, res
	mov cl, result
	mov byte ptr [eax], cl 
	ret
polyMultiply endp

polyAdd proc add1:byte, add2:byte
	movzx eax, add1
	movzx ecx, add2
	xor eax, ecx
	ret
polyAdd endp

end

When you use the above procedures with the following C you can make fast a beautiful table:

Code Select


#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

extern
void polyDivision(unsigned num, unsigned den, unsigned* polyDivision, unsigned* remPtr);
void polyMultiply(unsigned char mult1, unsigned char mult2, unsigned char* res);
unsigned char polyAdd(unsigned char add1, unsigned char add2);

int main()
{
	int i, j;
	int t, newt, r, newr, quotient, remain;
	unsigned char tempByte, tempNew, printValue, tempNewr;

	for (i = 0; i < 16; i++)
		printf("\t%.2x", i);

	printf("\n\n");

	for (i = 0; i < 16; i++)
	{
		printf("%.2x\t", i);
		for (j = 0; j < 16; j++)
		{
			tempNewr = i * 16 + j;
			newr = tempNewr;
			r = 0x11B;
			t = 0;
			newt = 1;
			if (newr>1)
			{
				while (newr != 1)
				{
					polyDivision(r, newr, &quotient, &remain);
					r = newr;
					newr = remain;
					polyMultiply(quotient, newt, &tempByte);
					tempNew = polyAdd(t, tempByte);
					t = newt;
					newt = tempNew;
				}
			}
			if (tempNewr < 2)
				printValue = tempNewr;
			else
				printValue = tempNew;
			printf("%.2x\t", printValue);

		}
		printf("\n");
	}

	return 0;
}

Output:

Code Select


        00      01      02      03      04      05      06      07      08      09      0a      0b      0c      0d     0e       0f

00      00      01      8d      f6      cb      52      7b      d1      e8      4f      29      c0      b0      e1     e5       c7
01      74      b4      aa      4b      99      2b      60      5f      58      3f      fd      cc      ff      40     ee       b2
02      3a      6e      5a      f1      55      4d      a8      c9      c1      0a      98      15      30      44     a2       c2
03      2c      45      92      6c      f3      39      66      42      f2      35      20      6f      77      bb     59       19
04      1d      fe      37      67      2d      31      f5      69      a7      64      ab      13      54      25     e9       09
05      ed      5c      05      ca      4c      24      87      bf      18      3e      22      f0      51      ec     61       17
06      16      5e      af      d3      49      a6      36      43      f4      47      91      df      33      93     21       3b
07      79      b7      97      85      10      b5      ba      3c      b6      70      d0      06      a1      fa     81       82
08      83      7e      7f      80      96      73      be      56      9b      9e      95      d9      f7      02     b9       a4
09      de      6a      32      6d      d8      8a      84      72      2a      14      9f      88      f9      dc     89       9a
0a      fb      7c      2e      c3      8f      b8      65      48      26      c8      12      4a      ce      e7     d2       62
0b      0c      e0      1f      ef      11      75      78      71      a5      8e      76      3d      bd      bc     86       57
0c      0b      28      2f      a3      da      d4      e4      0f      a9      27      53      04      1b      fc     ac       e6
0d      7a      07      ae      63      c5      db      e2      ea      94      8b      c4      d5      9d      f8     90       6b
0e      b1      0d      d6      eb      c6      0e      cf      ad      08      4e      d7      e3      5d      50     1e       b3
0f      5b      23      38      34      68      46      03      8c      dd      9c      7d      a0      cd      1a     41       1c

Title: Re: Modular multiplicative inverse
Post by: mabdelouahab on September 13, 2019, 04:51:22 AM

:thumbsup:

Title: Re: Modular multiplicative inverse
Post by: nidud on September 13, 2019, 07:26:16 AM

deleted

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 13, 2019, 08:38:47 AM

Quote from: nidud on September 13, 2019, 07:26:16 AM
A reduced version with less confusing mess:

At least you understood the mess enough to roll up your sleeves and spend ten minutes looking for a way to save 2 milliseconds. :thumbsup:

Quote
Bear metal is shorter and faster.

Do you mean beer metal? :rolleyes:

Title: Re: Modular multiplicative inverse
Post by: HSE on September 13, 2019, 09:53:20 AM

Bare :biggrin:

Title: Re: Modular multiplicative inverse
Post by: nidud on September 13, 2019, 10:28:51 AM

deleted

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 13, 2019, 04:56:30 PM

Quote from: nidud on September 13, 2019, 10:28:51 AM
why are you here?

I am here mostly to have fun with people that publish performance results but don't show how they got them.

In particular, I would like to see how you used RDTSC (http://www.forwardscattering.org/post/15) to measure the clock cycles :skrewy:

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 14, 2019, 02:45:01 AM

:badgrin:

Interesting that when I replace your highly optimized polyDivision with my messy polyDivision the performance difference is about 13%. (60 versus 68 cycles, no chance to cook bitcoins :sad: with this difference)
Cycles polynomial Division: 60 (53898/896) versus Cycles polynomial Division: 68 (61178/896)

I am yet to discover where you obtained 100106 cycles for the original. and 63363 cycles for the optimized. but I am going to think the whole weekend about this and may be able to find out. The likely reason might be related to the bear metal (it does happen :thumbsup:).

Yes, I know I have disregarded some practices like flushing the queue to guarantee serial execution of RDTSC, but this was just to illustrate that your conclusions are delirious. Sure, and I believe as well that these tests are mostly meaningless for modern processors but this is another discussion.

Code Select


#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

extern
void polyDivision(unsigned num, unsigned den, unsigned* polyDivision, unsigned* remPtr);
int polyDivision2(unsigned num, unsigned den, unsigned* polyDivision);
void polyMultiply(unsigned char mult1, unsigned char mult2, unsigned char* res);
unsigned char polyAdd(unsigned char add1, unsigned char add2);

int main()
{
	int i, j;
	int t, newt, r, newr, quotient, remain;
	unsigned char tempByte, tempNew, printValue, tempNewr;
	unsigned long long a, b;
	int counter = 0;
	unsigned long long accumDif = 0;

	for (i = 0; i < 16; i++)
		printf("\t%.2x", i);

	printf("\n\n");

	for (i = 0; i < 16; i++)
	{
		printf("%.2x\t", i);
		for (j = 0; j < 16; j++)
		{
			tempNewr = i * 16 + j;
			newr = tempNewr;
			r = 0x11B;
			t = 0;
			newt = 1;
			if (newr>1)
			{
				while (newr != 1)
				{
					counter++; // NEW
					a=__rdtsc(); // NEW
					polyDivision(r, newr, &quotient, &remain);
					//remain = polyDivision2(r, newr, &quotient); // NEW
					b = __rdtsc(); // NEW
					accumDif += b - a; // NEW
					r = newr;
					newr = remain;
					polyMultiply(quotient, newt, &tempByte);
					tempNew = polyAdd(t, tempByte);
					t = newt;
					newt = tempNew;
				}
			}
			if (tempNewr < 2)
				printValue = tempNewr;
			else
				printValue = tempNew;
			printf("%.2x\t", printValue);

		}
		printf("\n");
	}
	
	printf("Cycles polynomial Division: %d (%lld/%d)", (int) (accumDif / counter), accumDif, counter); // NEW

	return 0;
}

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 14, 2019, 06:08:51 PM

:skrewy:

And for the GF(2^8) multiply the results are "Cycles GF(2^8) multiply: 105 (94234/896)" for the original and "Cycles GF(2^8) multiply: 90 (80948/896)" for the super-ultra-hyper optimized version from Nidud. A decent, but not impressive, 11.7% improvements. :greenclp:
But I am still waiting for your report explaining where you obtained the 100392 cycles versus 37749 cycles improvement (266% improvement :badgrin:). I guess I will have to wait a long long time. :sad:

Code Select


#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

extern
void polyDivision(unsigned num, unsigned den, unsigned* polyDivision, unsigned* remPtr);
int polyDivision2(unsigned num, unsigned den, unsigned* polyDivision);
void polyMultiply(unsigned char mult1, unsigned char mult2, unsigned char* res);
unsigned char polyMultiply2(unsigned char mult1, unsigned char mult2);
unsigned char polyAdd(unsigned char add1, unsigned char add2);

int main()
{
	int i, j;
	int t, newt, r, newr, quotient, remain;
	unsigned char tempByte, tempNew, printValue, tempNewr;
	unsigned long long a, b;
	int counter = 0;
	unsigned long long accumDif = 0;

	for (i = 0; i < 16; i++)
		printf("\t%.2x", i);

	printf("\n\n");

	for (i = 0; i < 16; i++)
	{
		printf("%.2x\t", i);
		for (j = 0; j < 16; j++)
		{
			tempNewr = i * 16 + j;
			newr = tempNewr;
			r = 0x11B;
			t = 0;
			newt = 1;
			if (newr>1)
			{
				while (newr != 1)
				{
					counter++;
					//a=__rdtsc();
					polyDivision(r, newr, &quotient, &remain);
					//remain = polyDivision2(r, newr, &quotient);
					//b = __rdtsc();
					//accumDif += b - a;
					r = newr;
					newr = remain;
					a = __rdtsc();
					//polyMultiply(quotient, newt, &tempByte);
					tempByte = polyMultiply2(quotient, newt); //NEW
					b = __rdtsc();
					accumDif += b - a;
					tempNew = polyAdd(t, tempByte);
					t = newt;
					newt = tempNew;
				}
			}
			if (tempNewr < 2)
				printValue = tempNewr;
			else
				printValue = tempNew;
			printf("%.2x\t", printValue);

		}
		printf("\n");
	}
	
	//printf("Cycles polynomial Division: %d (%lld/%d)", (int) (accumDif / counter), accumDif, counter);
	printf("Cycles GF(2^8) multiply: %d (%lld/%d)", (int)(accumDif / counter), accumDif, counter);

	return 0;
}

Title: Re: Modular multiplicative inverse
Post by: nidud on September 14, 2019, 09:33:31 PM

deleted

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 15, 2019, 12:33:54 AM

It took me more than a reasonable amount of time to build what you attached here because if there is something I dislike in ASMC is that it is not user friendly. It is made by one ultra nerd for other ultra nerds.

Now, your test does not make sense. You picked up a set of 3 ASM instructions, repeated them 50000 times and showed that if we replace that with just one instruction we would save a lot of cycles! Thank you my friend!
But in the whole procedure that does weight much as I have demonstrated. Of course, it is a positive contribution and I would probably find it if I was set to optimize the code.

Have a nice day. :skrewy:

Title: Re: Modular multiplicative inverse
Post by: nidud on September 17, 2019, 03:36:49 AM

deleted

Title: Re: Modular multiplicative inverse
Post by: aw27 on September 17, 2019, 04:39:29 AM

You are in a state of reality denial and I will, of course, not feed the hallucination.

Quote
It's a masm compatible assembler so it shouldn't be that complicated.
asmc \masm32\m32lib\*.asm
ml -c \masm32\m32lib\*.asm

Try to build what you dropped here like that then. :badgrin:
The 3 *.asm files need to be built to .bin, a trickery that only JWASM descendents have, and the .cmd file is not a file to be run by cmd.exe :rolleyes: it is an ASM file. WTF! :skrewy:

Title: Re: Modular multiplicative inverse
Post by: HSE on September 17, 2019, 04:56:21 AM

I don't knew that CMD can make that! Interesting :thumbsup:

Here making .bin with UASM32:

Code Select

\nidud\shr>uasm32 -q -bin ?.asm

\nidud\shr>asmc -q -pe "\nidud\shr\timeit.cmd"

\nidud\shr>timeit.exe
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)
----------------------------------------------
-- test(0)
   380926 cycles, rep(1000), code(410) 0.asm: mov al,m | shr al,1 | mov m,al
   353402 cycles, rep(1000), code(160) 1.asm: shr m,1
    50546 cycles, rep(1000), code(113) 2.asm: shr al,1
-- test(1)
   362377 cycles, rep(1000), code(410) 0.asm: mov al,m | shr al,1 | mov m,al
   358679 cycles, rep(1000), code(160) 1.asm: shr m,1
    48543 cycles, rep(1000), code(113) 2.asm: shr al,1
-- test(2)
   351191 cycles, rep(1000), code(410) 0.asm: mov al,m | shr al,1 | mov m,al
   358392 cycles, rep(1000), code(160) 1.asm: shr m,1
    49500 cycles, rep(1000), code(113) 2.asm: shr al,1

total [0 .. 2], 1++
   148589 cycles 2.asm: shr al,1
  1070473 cycles 1.asm: shr m,1
  1094494 cycles 0.asm: mov al,m | shr al,1 | mov m,al
hit any key to continue...

The MASM Forum

General => The Campus => Topic started by: mabdelouahab on September 09, 2019, 07:31:44 AM