Hello all,
I'm interested in trying to create the K constants of SHA256 using FPU.
In an RFC https://tools.ietf.org/html/rfc4634
It states the K constants are derived from cubic roots of first 64 prime numbers.
SHA-224 and SHA-256 use the same sequence of sixty-four constant
32-bit words, K0, K1, ..., K63. These words represent the first
thirty-two bits of the fractional parts of the cube roots of the
first sixty-four prime numbers. In hex, these constant words are as
follows (from left to right):
428a2f98 71374491 b5c0fbcf e9b5dba5
3956c25b 59f111f1 923f82a4 ab1c5ed5
d807aa98 12835b01 243185be 550c7dc3
72be5d74 80deb1fe 9bdc06a7 c19bf174
e49b69c1 efbe4786 0fc19dc6 240ca1cc
2de92c6f 4a7484aa 5cb0a9dc 76f988da
983e5152 a831c66d b00327c8 bf597fc7
c6e00bf3 d5a79147 06ca6351 14292967
27b70a85 2e1b2138 4d2c6dfc 53380d13
650a7354 766a0abb 81c2c92e 92722c85
a2bfe8a1 a81a664b c24b8b70 c76c51a3
d192e819 d6990624 f40e3585 106aa070
19a4c116 1e376c08 2748774c 34b0bcb5
A program to demonstrate this is here -> http://www.di-mgt.com.au/hexcuberoots.c.html
I would like to create these numbers using FPU and have found some cubic root code written originally by Raymond, the author of FPULIB but cannot seem to get values outlined in document. What am I doing wrong here?
.data
number dd 2
dd 0
.code
public cube
cube:
push offset number
call CubeLogA
mov eax, [number]
mov ebx, [number+4]
invoke ExitProcess, eax
CubeLogA proc uses eax ecx edx x:DWORD
.data
recip3 dt 0.333333333333333333
.code
mov eax, [x]
finit
fld recip3
fild dword ptr[eax]
fabs
fyl2x ; ->log2(Src1)*exponent
fld st(0) ; copy the logarithm
frndint ; keep only the characteristic
fsub st(1),st ; keeps only the mantissa
fxch ; get the mantissa on top
f2xm1 ; ->2^(mantissa)-1
fld1
fadd ; add 1 back
fscale ; scale it with the characteristic
fstp st(1) ; copy result over and "pop" it
fstp dword ptr[eax]
ret
CubeLogA endp
end
First things first: Benvenuto al forum, Lorenzo :icon14:
"Keeps the mantissa" should be "keep the fraction", right?
Code looks so far OK, but it would be better if you posted the whole code for testing. One open question here is precision: the FPU offers real4, real8 and real10, and the fractions are different for each of them.
grazie :biggrin:
Even for the h constants which are sqaure root of first 8 prime numbers, I use REAL10 type but the 64-bit fractional part doesn't look like that provided by specifications.
push 2
fild dword ptr[esp]
fsqrt
fstp real10 ptr [number]
The first 32-bits of prime number 2 is F9DE6800h but SHA-256 uses 6A09E667h so is it perhaps that more precision is required or I'm not extracting fractional part properly?
I don't really have any program at moment, I just wanted to see how these values are created on FPU.
(http://www.ee.nthu.edu.tw/jcliao/mic97/chap13R/F13_23.JPG)
what that picture doesn't tell you....
REAL4 and REAL8 floats use an implied leading 1
REAL10 floats use an explicit leading 1
Read a bit more about this and seem to make little progress.
.data
x dq 0
pwr dq 4294967296
.code
; --------------------
push 2 ; 1st prime number
fild dword ptr[esp]
fsqrt
fld1 ; load 1
fsub ; subtract from sqrt (2)
lea eax, [pwr] ; load 2^32
fild qword ptr[eax]
fmul ; multiply fractional part by 2^32
fistp qword ptr[x] ; store integer
The above code gives 6A09E668h which is +1 than what used by sha-256
Then to get integer of cubic root fractional part, something similar
.data
recip3 dt 0.333333333333333333
pwr dq 4294967296
.code
mov eax, [x]
finit
fld recip3
fild qword ptr[eax]
fabs
fyl2x ; ->log2(Src1)*exponent
fld st(0) ; copy the logarithm
frndint ; keep only the characteristic
fsub st(1), st ; keeps only the mantissa
fxch ; get the mantissa on top
f2xm1 ; ->2^(mantissa)-1
lea ebx, [pwr]
fild qword ptr [ebx]
fmul
fscale ; scale it with the characteristic
fstp st(1) ; copy result over and "pop" it
fistp qword ptr[eax]
ret
I'm not sure why values in SHA-256 spec are truncated by 1.
Quote from: dedndave on January 30, 2015, 07:08:03 PM
REAL4 and REAL8 floats use an implied leading 1
REAL10 floats use an explicit leading 1
let me clarify that a little bit....
for REAL4 and REAL 8 formats, the leading 1 bit is not stored in the fraction bits
it's assumed to be 1 (except for a value of zero, maybe a couple others)
for REAL10 format, the leading 1 is stored at the highest order fraction bit
for most valid REAL10 values, the high order fraction bit will be 1
So it's only 63 bits I should be reading?
Mask out the last bit with 0EFFFFFFFh ?
It seems the number was being rounded up instead of down when converted to integer with FISTP so this now gives same result.
; round down
fstcw [ctrl_wrd]
mov dx, [ctrl_wrd]
or dx, 00400h
and dx, 0f7ffh
mov [ctrl_wrd], dx
fldcw [ctrl_wrd]
...or add the one to convert a REAL4/REAL8 fraction to something meaningful
when you create a REAL4 or REAL8 from numeric data, you have to remove the leading 1
You must remove the integer part of the cube roots:
c = Pi1/3
Ki = (c-⌊c⌋)·232
I made an attempt to duplicate the sequence in C using doubles, the pow and floor functions, and shifts, and the result works for only a few of the low primes, specifically 2, 3, and 13.
#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
#include <math.h>
/*
2 428a2f98
3 71374491
5 b5c0fbcf
7 e9b5dba5
11 3956c25b
13 59f111f1
17 923f82a4
19 ab1c5ed5
23 d807aa98
29 12835b01
31 243185be
37 550c7dc3
41 72be5d74
43 80deb1fe
47 9bdc06a7
53 c19bf174
59 e49b69c1
73 efbe4786
79 0fc19dc6
83 240ca1cc
*/
int __cdecl main(void)
{
double x=11,y=(double)1/3, r;
r = pow(x,y);
printf("%.15f\t%.15f\t%.15f\n",r,floor(r),r*r*r);
r -= floor(r);
printf("%.15f\n\n",r);
__asm
{
mov eax, r
mov edx, r+4
mov cl, 22
shrd eax, edx, cl
mov r, eax
mov r+4, edx
}
printf("%.16I64x\n",*(__int64 *)(&r));
//printf("%.16I64x\n",*(unsigned __int64 *)(&r));
_getch();
return(0);
}
The straight forward solution does work as expected:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
int main(void)
{
int i;
double d;
unsigned int u;
unsigned int const dP[64] = {
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
151, 157, 163, 167, 173, 179, 181, 191, 193, 197,
199, 211, 223, 227, 229, 233, 239, 241, 251, 257,
263, 269, 271, 277, 281, 283, 293, 307, 311 };
for (i = 0; i < 64; ++i) {
d = pow(dP[i], 1.0 / 3.0);
u = (unsigned int) floor((d - floor(d)) * pow(2, 32)); // the outer floor is redundant
printf("K%02i = 0x%8x\n", i, u);
}
getchar();
return EXIT_SUCCESS;
}
There was one floor function missing in my previous post:
c = Pi1/3
Ki = ⌊(c-⌊c⌋)·232⌋
I got something working for me eventually and wanted smallest code possible rather than for speed. Here's final result.
.686
.xmm
.model flat, stdcall
.nolist
.nocref
WIN32_LEAN_AND_MEAN equ 1
include windows.inc
include stdio.inc
includelib kernel32.lib
includelib msvcrt.lib
.list
.cref
.data
; first 64 prime numbers
primes dw 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53
dw 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131
dw 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223
dw 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311
primes_len equ $-primes
hex_fmt db "%08X ", 0
newline db 13, 10, 0
.code
public cube
cube:
.code
call init_fpu
lea esi, [primes]
xor edi, edi
print_primes:
mov ecx, edi
and ecx, 3
jnz skip_nl
invoke printf, addr newline
skip_nl:
movzx eax, word ptr[esi+edi*2]
call cbr2int
invoke printf, addr hex_fmt, eax
inc edi
cmp edi, primes_len / sizeof (word)
jne print_primes
invoke ExitProcess, eax
; get cubic root of number and return 32-bit fractional part as integer
cbr2int:
.data
recip3 dt 0.333333333333333333
.code
push 1
push 0
fild qword ptr[esp] ; load 2^32
fld recip3
push eax ; load integer
fild dword ptr[esp]
push eax
fabs
fyl2x ; ->log2(Src1)*exponent
fld st(0) ; copy the logarithm
frndint ; keep only the characteristic
fsub st(1), st ; keeps only the mantissa
fxch ; get the mantissa on top
f2xm1 ; ->2^(mantissa)-1
fscale
fstp st(1) ; copy result over and "pop" it
fmul ; * 2^32
fistp qword ptr[esp] ; save integer
pop eax
add esp, 3*4 ; release 2^32 on stack
ret
init_fpu:
; round numbers down
push eax
fstcw [esp] ; store control word
pop eax
or ax, 00400H ; set round down bit
and ax, 0F7FFH ; clear bit
push eax
fldcw [esp] ; load control word
pop eax
ret
sqrt2int:
; get square root of number in eax and return 32-bit fractional part as integer
push 1
push 0
fild qword ptr[esp] ; load 2^32
push eax
fild dword ptr[esp] ; load integer
push eax
fsqrt
fld1 ; load 1
fsub ; subtract to get fractional part
fmul ; multiply fractional part by 2^32
frndint
fistp qword ptr[esp]
pop eax
add esp, 3*4 ; release 2^32 on stack
ret
end
output of above
428A2F98 71374491 B5C0FBCF E9B5DBA5
3956C25B 59F111F1 923F82A4 AB1C5ED5
D807AA98 12835B01 243185BE 550C7DC3
72BE5D74 80DEB1FE 9BDC06A7 C19BF174
E49B69C1 EFBE4786 0FC19DC6 240CA1CC
2DE92C6F 4A7484AA 5CB0A9DC 76F988DA
983E5152 A831C66D B00327C8 BF597FC7
C6E00BF3 D5A79147 06CA6351 14292967
27B70A85 2E1B2138 4D2C6DFC 53380D13
650A7354 766A0ABB 81C2C92E 92722C85
A2BFE8A1 A81A664B C24B8B70 C76C51A3
D192E819 D6990624 F40E3585 106AA070
19A4C116 1E376C08 2748774C 34B0BCB5
391C0CB3 4ED8AA4A 5B9CCA4F 682E6FF3
748F82EE 78A5636F 84C87814 8CC70208
90BEFFFA A4506CEB BEF9A3F7 C67178F2
which matches the values used in SHA-256
I tried to create the 64-bit values just out of interest but didn't figure it out.
Congrats, Lorenzo, really nice job :t
Your code triggered some odd errors related to Windows.inc - are you using a non-Masm32 version, or did you modify inc files?
I attach a version that runs with standard Masm32 includes.
Using JWASM and win32inc files.