News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Using rounding control registers for SIMD instructions

Started by jj2007, July 25, 2023, 08:00:12 PM

Previous topic - Next topic

jj2007

.DATA
OldControlReg    dd ?                  ; normally 01F80h
NewControlReg    dd 101111110000000b   ; first two digits 10 are bits 14+13
; bit positions:    432109876543210
.CODE
  stmxcsr OldControlReg                ; save current setting
  ldmxcsr NewControlRegv               ; change to new setting

  movlps xmm0, FP8(1234567890.49)      ; do some calculations
  mulsd xmm0, FP8(0.00277777778)       ; just an arbitrary example
  movaps xmm1, xmm0
  roundsd xmm1, xmm0, 8+2              ; 8 suppresses exceptions; try 8+0, 8+1, 8+2, 8+3
  subsd xmm0, xmm1
  mulsd xmm0, FP8(360.0)

  ldmxcsr OldControlReg                ; restore previous setting

See Cloutier for the roundsd documentation:
QuoteRounding RC Field Description Mode Setting

Round to 00B Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is nearest (even) the even value (i.e., the integer value with the least-significant bit of zero).

Round down 01B Rounded result is closest to but no greater than the infinitely precise result. (toward −∞)

Round up 10B Rounded result is closest to but no less than the infinitely precise result. (toward +∞)

Round toward 11B Rounded result is closest to but no greater in absolute value than the infinitely precise result. zero (Truncate)

This part is a bit cryptic:
IF (imm[2] = '1)
    THEN // rounding mode is determined by MXCSR.RC
        DEST[63:0] := ConvertDPFPToInteger_M(SRC[63:0]);
    ELSE // rounding mode is determined by IMM8.RC
        DEST[63:0] := ConvertDPFPToInteger_Imm(SRC[63:0]);

It might mean that if you set bit 2 of roundsd's imm8, you activate the register mxcsr register. Please check yourself, I am not sure :cool:

If all that is too simple for your taste, try Intel :smiley: