News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Obtaining Mantissa / Fractional part of cubic number

Started by lorenzo, January 30, 2015, 03:51:05 PM

Previous topic - Next topic

lorenzo

Hello all,

I'm interested in trying to create the K constants of SHA256 using FPU.
In an RFC https://tools.ietf.org/html/rfc4634

It states the K constants are derived from cubic roots of first 64 prime numbers.


   SHA-224 and SHA-256 use the same sequence of sixty-four constant
   32-bit words, K0, K1, ..., K63.  These words represent the first
   thirty-two bits of the fractional parts of the cube roots of the
   first sixty-four prime numbers.  In hex, these constant words are as
   follows (from left to right):

        428a2f98 71374491 b5c0fbcf e9b5dba5
        3956c25b 59f111f1 923f82a4 ab1c5ed5
        d807aa98 12835b01 243185be 550c7dc3
        72be5d74 80deb1fe 9bdc06a7 c19bf174
        e49b69c1 efbe4786 0fc19dc6 240ca1cc
        2de92c6f 4a7484aa 5cb0a9dc 76f988da
        983e5152 a831c66d b00327c8 bf597fc7
        c6e00bf3 d5a79147 06ca6351 14292967
        27b70a85 2e1b2138 4d2c6dfc 53380d13
        650a7354 766a0abb 81c2c92e 92722c85
        a2bfe8a1 a81a664b c24b8b70 c76c51a3
        d192e819 d6990624 f40e3585 106aa070
        19a4c116 1e376c08 2748774c 34b0bcb5


A program to demonstrate this is here -> http://www.di-mgt.com.au/hexcuberoots.c.html

I would like to create these numbers using FPU and have found some cubic root code written originally by Raymond, the author of FPULIB but cannot seem to get values outlined in document. What am I doing wrong here?


.data
number dd 2
       dd 0
.code
public cube
cube:
    push offset number
    call CubeLogA
   
    mov    eax, [number]
    mov    ebx, [number+4]
   
    invoke ExitProcess, eax
   
CubeLogA proc uses eax ecx edx x:DWORD
.data
recip3      dt    0.333333333333333333
.code
    mov   eax, [x]
    finit
    fld   recip3
    fild  dword ptr[eax]
    fabs
    fyl2x                   ; ->log2(Src1)*exponent
    fld   st(0)             ; copy the logarithm
    frndint                 ; keep only the characteristic
    fsub  st(1),st          ; keeps only the mantissa
    fxch                    ; get the mantissa on top
    f2xm1                   ; ->2^(mantissa)-1
    fld1
    fadd                    ; add 1 back
    fscale                  ; scale it with the characteristic
    fstp  st(1)             ; copy result over and "pop" it
    fstp  dword ptr[eax]
    ret
CubeLogA endp
   
    end

jj2007

First things first: Benvenuto al forum, Lorenzo :icon14:

"Keeps the mantissa" should be "keep the fraction", right?
Code looks so far OK, but it would be better if you posted the whole code for testing. One open question here is precision: the FPU offers real4, real8 and real10, and the fractions are different for each of them.

lorenzo

 grazie :biggrin:

Even for the h constants which are sqaure root of first 8 prime numbers, I use REAL10 type but the 64-bit fractional part doesn't look like that provided by specifications.


    push    2
    fild    dword ptr[esp]
    fsqrt
    fstp   real10 ptr [number]


The first 32-bits of prime number 2 is F9DE6800h but SHA-256 uses 6A09E667h so is it perhaps that more precision is required or I'm not extracting fractional part properly?

I don't really have any program at moment, I just wanted to see how these values are created on FPU.

dedndave



what that picture doesn't tell you....

REAL4 and REAL8 floats use an implied leading 1
REAL10 floats use an explicit leading 1

lorenzo

Read a bit more about this and seem to make little progress.


.data
x   dq 0
pwr dq 4294967296
.code
    ; --------------------
    push 2                  ; 1st prime number
    fild dword ptr[esp]
    fsqrt
    fld1                    ; load 1
    fsub                    ; subtract from sqrt (2)
    lea eax, [pwr]          ; load 2^32
    fild qword ptr[eax]
    fmul                    ; multiply fractional part by 2^32
    fistp qword ptr[x]      ; store integer


The above code gives 6A09E668h which is +1 than what used by sha-256
Then to get integer of cubic root fractional part, something similar


.data
recip3 dt 0.333333333333333333
pwr    dq 4294967296
.code
    mov   eax, [x]
    finit
    fld   recip3
    fild  qword ptr[eax]
    fabs
    fyl2x                   ; ->log2(Src1)*exponent
    fld   st(0)             ; copy the logarithm
    frndint                 ; keep only the characteristic
    fsub  st(1), st         ; keeps only the mantissa
    fxch                    ; get the mantissa on top
    f2xm1                   ; ->2^(mantissa)-1
    lea   ebx, [pwr]
    fild  qword ptr [ebx]
    fmul
    fscale                  ; scale it with the characteristic
    fstp  st(1)             ; copy result over and "pop" it
    fistp  qword ptr[eax]
    ret


I'm not sure why values in SHA-256 spec are truncated by 1.

dedndave

Quote from: dedndave on January 30, 2015, 07:08:03 PM
REAL4 and REAL8 floats use an implied leading 1
REAL10 floats use an explicit leading 1

let me clarify that a little bit....

for REAL4 and REAL 8 formats, the leading 1 bit is not stored in the fraction bits
it's assumed to be 1 (except for a value of zero, maybe a couple others)

for REAL10 format, the leading 1 is stored at the highest order fraction bit
for most valid REAL10 values, the high order fraction bit will be 1

lorenzo

So it's only 63 bits I should be reading?
Mask out the last bit with 0EFFFFFFFh ?

lorenzo

It seems the number was being rounded up instead of down when converted to integer with FISTP so this now gives same result.


    ; round down
    fstcw [ctrl_wrd]
    mov dx, [ctrl_wrd]
    or  dx, 00400h
    and dx, 0f7ffh
    mov [ctrl_wrd], dx
    fldcw [ctrl_wrd]


dedndave

...or add the one to convert a REAL4/REAL8 fraction to something meaningful

when you create a REAL4 or REAL8 from numeric data, you have to remove the leading 1

qWord

You must remove the integer part of the cube roots:

c = Pi1/3
Ki = (c-⌊c⌋)·232
MREAL macros - when you need floating point arithmetic while assembling!

MichaelW

I made an attempt to duplicate the sequence in C using doubles, the pow and floor functions, and shifts, and the result works for only a few of the low primes, specifically 2, 3, and 13.

#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
#include <math.h>
/*
2  428a2f98
3  71374491
5  b5c0fbcf
7  e9b5dba5
11 3956c25b
13 59f111f1
17 923f82a4
19 ab1c5ed5
23 d807aa98
29 12835b01
31 243185be
37 550c7dc3
41 72be5d74
43 80deb1fe
47 9bdc06a7
53 c19bf174
59 e49b69c1
73 efbe4786
79 0fc19dc6
83 240ca1cc
*/

int __cdecl main(void)

    double x=11,y=(double)1/3, r;
    r = pow(x,y);
    printf("%.15f\t%.15f\t%.15f\n",r,floor(r),r*r*r);   
    r -= floor(r);
    printf("%.15f\n\n",r);
    __asm
    {
        mov eax, r
        mov edx, r+4
        mov cl, 22
        shrd eax, edx, cl
        mov r, eax
        mov r+4, edx
    }
    printf("%.16I64x\n",*(__int64 *)(&r));
    //printf("%.16I64x\n",*(unsigned __int64 *)(&r));

    _getch();
    return(0);
}



Well Microsoft, here's another nice mess you've gotten us into.

qWord

The straight forward solution does work as expected:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>

int main(void)
{
   int i;
   double d;
   unsigned int u;

   unsigned int const dP[64] = {
      2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
      43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
      101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
      151, 157, 163, 167, 173, 179, 181, 191, 193, 197,
      199, 211, 223, 227, 229, 233, 239, 241, 251, 257,
      263, 269, 271, 277, 281, 283, 293, 307, 311 };

   for (i = 0; i < 64; ++i) {
      d = pow(dP[i], 1.0 / 3.0);
      u = (unsigned int) floor((d - floor(d)) * pow(2, 32));    // the outer floor is redundant
      printf("K%02i = 0x%8x\n", i, u);
   }

   getchar();
   return EXIT_SUCCESS;
}


There was one floor function missing in my previous post:

c = Pi1/3
Ki = ⌊(c-⌊c⌋)·232

MREAL macros - when you need floating point arithmetic while assembling!

lorenzo

I got something working for me eventually and wanted smallest code possible rather than for speed. Here's final result.




.686
.xmm
.model flat, stdcall
 
.nolist
.nocref

WIN32_LEAN_AND_MEAN equ 1

include windows.inc
include stdio.inc

includelib kernel32.lib
includelib msvcrt.lib

.list
.cref

.data

; first 64 prime numbers
primes dw 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53
       dw 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131
       dw 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223
       dw 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311
primes_len equ $-primes

hex_fmt db "%08X ", 0
newline db 13, 10, 0

.code

public cube
cube:
.code
    call  init_fpu
    lea   esi, [primes]
    xor   edi, edi
print_primes:
    mov   ecx, edi
    and   ecx, 3
    jnz   skip_nl
    invoke printf, addr newline
skip_nl:
    movzx eax, word ptr[esi+edi*2]
    call  cbr2int

    invoke printf, addr hex_fmt, eax
    inc   edi
    cmp   edi, primes_len / sizeof (word)
    jne   print_primes
   
    invoke ExitProcess, eax
   
    ; get cubic root of number and return 32-bit fractional part as integer
cbr2int:
.data
recip3 dt 0.333333333333333333
.code
    push   1
    push   0
    fild   qword ptr[esp]   ; load 2^32
    fld    recip3
    push   eax              ; load integer
    fild   dword ptr[esp]
    push   eax
    fabs
    fyl2x                   ; ->log2(Src1)*exponent
    fld    st(0)            ; copy the logarithm
    frndint                 ; keep only the characteristic
    fsub   st(1), st        ; keeps only the mantissa
    fxch                    ; get the mantissa on top
    f2xm1                   ; ->2^(mantissa)-1
    fscale
    fstp   st(1)            ; copy result over and "pop" it
    fmul                    ; * 2^32
    fistp  qword ptr[esp]   ; save integer
    pop    eax
    add    esp, 3*4         ; release 2^32 on stack
    ret

init_fpu:
    ; round numbers down
    push   eax
    fstcw  [esp]            ; store control word
    pop    eax
    or     ax, 00400H       ; set round down bit
    and    ax, 0F7FFH       ; clear bit
    push   eax
    fldcw  [esp]            ; load control word
    pop    eax
    ret
   
sqrt2int:
    ; get square root of number in eax and return 32-bit fractional part as integer
    push   1
    push   0
    fild   qword ptr[esp]   ; load 2^32
    push   eax
    fild   dword ptr[esp]   ; load integer
    push   eax
    fsqrt
    fld1                    ; load 1
    fsub                    ; subtract to get fractional part
    fmul                    ; multiply fractional part by 2^32
    frndint
    fistp  qword ptr[esp]
    pop    eax
    add    esp, 3*4         ; release 2^32 on stack
    ret
   
    end


output of above

428A2F98 71374491 B5C0FBCF E9B5DBA5
3956C25B 59F111F1 923F82A4 AB1C5ED5
D807AA98 12835B01 243185BE 550C7DC3
72BE5D74 80DEB1FE 9BDC06A7 C19BF174
E49B69C1 EFBE4786 0FC19DC6 240CA1CC
2DE92C6F 4A7484AA 5CB0A9DC 76F988DA
983E5152 A831C66D B00327C8 BF597FC7
C6E00BF3 D5A79147 06CA6351 14292967
27B70A85 2E1B2138 4D2C6DFC 53380D13
650A7354 766A0ABB 81C2C92E 92722C85
A2BFE8A1 A81A664B C24B8B70 C76C51A3
D192E819 D6990624 F40E3585 106AA070
19A4C116 1E376C08 2748774C 34B0BCB5
391C0CB3 4ED8AA4A 5B9CCA4F 682E6FF3
748F82EE 78A5636F 84C87814 8CC70208
90BEFFFA A4506CEB BEF9A3F7 C67178F2


which matches the values used in SHA-256
I tried to create the 64-bit values just out of interest but didn't figure it out.

jj2007

Congrats, Lorenzo, really nice job :t

Your code triggered some odd errors related to Windows.inc - are you using a non-Masm32 version, or did you modify inc files?
I attach a version that runs with standard Masm32 includes.

lorenzo