News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Bezier Spline - want to expand <sub> and <sup>

Started by dedndave, May 29, 2013, 08:08:07 PM

Previous topic - Next topic

qWord

Quote from: dedndave on June 04, 2013, 02:04:12 AMfor example, i change the value of a pointer while the FPU is writing to that location
with FPU code, the CPU and FPU execute different streams
the streams are synchronized as needed - otherwise there would be clear statements in Intel's and AMD's documentation (you might read them in this context). Unless not supporting antiques like the 286/386, you can safely remove all WAITs (assuming masked exceptions).
MREAL macros - when you need floating point arithmetic while assembling!

dedndave

ok, thanks qWord   :t

i was under the impression it was required when accessing things like the control word
of course - my code does not do that - but i was wondering if that were still true

dedndave

ok - i see that Raymond uses FWAIT, as a precautionary measure

i know - i should RTFM
but, i have spent the last 3 days reading about bezier splines - lol
my eyes are getting very old   :(

MichaelW

Per the Intel manual:
Quote
The FNSTSW AX form of the instruction is used primarily in conditional branching...

When the FNSTSW AX instruction is executed, the AX register is updated before the processor executes any further instructions. The status stored in the AX register is thus guaranteed to be from the completion of the prior FPU instruction.

But there is no such statement for the instruction form where the destination is a memory location and no such statement for FSTCW/FNSTCW, where the destination must be a memory location.
Well Microsoft, here's another nice mess you've gotten us into.

qWord

I've seen that there are some unnecessary DIVs in the function GetFirstControlPoints(). As an example, the following modification use scalar SSE2 instructions to do the same and needs only one division.
GetFirstControlPoints PROC uKnotQty:UINT,lpKnotArray:LPVOID,lpResArray:LPVOID

;subroutine for deBoorBezierSpline

;EBX = size of X, Y, Rhs, or Tmp array in bytes
;only EBX and EBP are preserved

;-----------------------------------------

_lpResArray   TEXTEQU <dword ptr [ebp+20]>  ;pointer to Res (result) array
_lpKnotArray  TEXTEQU <dword ptr [ebp+16]>  ;pointer to Knot array
_uKnotQty     TEXTEQU <dword ptr [ebp+12]>  ;knot point qty
;                                [ebp+8]    ;RETurn address
;                                [ebp+4]    ;saved EBX contents
;                                [ebp]      ;saved EBP contents

;-----------------------------------------

        push    ebx
        push    ebp
        mov     ebp,esp
        mov     eax,_lpKnotArray
        mov     edx,_uKnotQty
        and     esp,-8
        lea     edi,[eax+2*ebx]
        sub     edx,2             ;EDX = control pair qty - 1
        add     edi,ebx           ;EDI = pointer to last knot point element
        lea     esi,[esp-8]       ;ESI = pointer to last Rhs element
        sub     esp,ebx
        lea     ecx,[edi-24]      ;ECX = pointer to second-to-last knot point element

;Rhs#[n!-1!]=(8!*Knot![n!-1!].X+Knot![n!].X)/2.0#

        mov     eax,[ecx]
        shl     eax,3
        add     eax,[edi]
       
        movsd xmm6,r8_1_0 ; xmm6 = 1.0
        movsd xmm7,r8_r2_0 ; xmm7 = 1/b = 0.5
       
        cvtsi2sd xmm1,eax
mulsd xmm1,xmm7
        movsd real8 ptr [esi],xmm1
       
        sub     ecx,24
        sub     edi,24
        add     edx,-1

        lea     esi,[esi-8]
        jz      gfcps1

;for(i!=1!;i!<n!-1!;++i!)
;    Rhs#[i!]=4!*Knot![i!].X+2!*Knot![i!+1!].X

gfcps0: mov     eax,[ecx]
        shl     eax,1
        add     eax,[edi]
        shl     eax,1
       
        cvtsi2sd xmm0,eax
        movsd real8 ptr [esi],xmm0
       
        sub     ecx,24
        sub     edi,24
        add     edx,-1

        lea     esi,[esi-8]
        jnz     gfcps0

;Rhs#[0!]=Knot![0!].X+2!*Knot![1!].X

gfcps1: mov     eax,[edi]
        shl     eax,1
        add     eax,[ecx]
       
        cvtsi2sd xmm0,eax
        movsd real8 ptr [esi],xmm0

;EBX = X/Y/Rhs/Tmp array bytes
;EDX = 0
;ESI = pointer to Rhs array

        mov     ecx,_uKnotQty
        sub     esp,ebx
        add     ecx,-1            ;ECX = de Boor control pair qty
        mov     edi,_lpResArray   ;EDI = pointer to Res array
        add     ecx,-1            ;ECX = de Boor control pair qty - 1
        mov     ebx,esp           ;EBX = pointer to Tmp array
        mov     edx,ecx           ;EDX = de Boor control pair qty - 1

;double b#=2.0#
;Res#[0!]=Rhs#[0!]/b#

        movsd xmm0,real8 ptr [esi]
        mulsd xmm0,xmm7
        movsd real8 ptr [edi],xmm0

        add     ebx,8
        add     esi,8

; for (int i = 1; i < n; i++) // Decomposition and forward substitution.
; {
; tmp[i] = 1 / b;
; b = (i < n - 1 ? 4.0 : 3.5) - tmp[i];
; x[i] = (rhs[i] - x[i - 1]) / b;
; }

gfcps2: movsd real8 ptr [ebx],xmm7

cmp     edx,1
        jnz     gfcps3

        movsd xmm0,r8_3_5
        jmp short gfcps4

gfcps3: movsd xmm0,r8_4_0

gfcps4: movsd xmm4,real8 ptr [esi]
subsd xmm0,xmm7
movapd xmm7,xmm6
subsd xmm4,real8 ptr [edi]
divsd xmm7,xmm0
add     edi,8
mulsd xmm4,xmm7
        movsd real8 ptr [edi],xmm4
        add     ebx,8
        add     esi,8
        add     edx,-1
        jnz     gfcps2

;EBX = pointer to end of Tmp array
;ECX = de Boor control pair qty - 1
;EDI = pointer to last element of Res array

        sub     edi,8
        sub     ebx,8       

; for (int i = 1; i < n; i++)
; x[n - i - 1] -= tmp[n - i] * x[n - i]; // Backsubstitution.

gfcps5:
movsd xmm0,real8 ptr [edi]
movsd xmm1,real8 ptr [edi+8]
mulsd xmm1,real8 ptr [ebx]
subsd xmm0,xmm1
movsd real8 ptr [edi],xmm0

        sub     ebx,8
        sub     edi,8
        add     ecx,-1
        jnz     gfcps5

        leave
        pop     ebx
        ret     12

GetFirstControlPoints ENDP
MREAL macros - when you need floating point arithmetic while assembling!

dedndave

i think the whole thing could be done with SSE, rather than FPU
but, that's for someone who knows SSE   :redface:
i may have to learn it, sooner than later - lol

qWord

#36
Quote from: dedndave on June 04, 2013, 08:22:01 AMbut, that's for someone who knows SSE   :redface:
i may have to learn it, sooner than later - lol
common... at least the scalar stuff is extreme simple. The basic instructions:

type identifier (type ID):
    ss = scalar single = REAL4 = low DWORD of corresponding XMM register
    sd = scalar double = REAL8 = low QWORD of corresponding XMM register
    For the conversion instructions:
    si = scalar integer = SDWORD = GPR or memory location

instructions:
    mov<type ID> {xmmReg, memory location}, {memory location, xmmReg}  ; for mem -> reg: zero extension
    e.g.:
        movsd xmm0,REAl8 ptr [eax] ; move scalar double from [eax] to xmm0. the upper 8 bytes are zero.

arithmetic:
    add/sub/mul/div/sqrt<type ID> xmmReg, {xmmReg,memory location}
    e.g.:
        sqrtss xmm0,REAL4 ptr [eax]  ; single precision: xmm0 = sqrt(REAL4 ptr [eax])
        sqrtsd xmm0,xmm1               ; double precision: xmm0 = sqrt(xmm1)

conversion:
    cvt<type ID src>2<type ID dest> {xmmReg, memory location, 32bit GRP}, {32bit GPR or integer memory location, memory location, xmmReg}
    e.g.:
        cvtsi2ss xmm0,eax     ; eax -> REAl4 in xmm0
        cvtsd2si eax,xmm0     ; REAL8 in xmm0 -> eax
        cvtss2sd xmm0,xmm0 ; convert scalar single to scalar double in xmm0

compare:
    comi<type ID> xmmReg, {xmmReg,memory location}
    set the flags as comparing unsigned values (ja/jb,...)

The above instruction does not need alignment of operands.
MREAL macros - when you need floating point arithmetic while assembling!

dedndave

thanks for the simple tutorial, qWord   :t

is that SSE, SSE2 ?

qWord

Quote from: dedndave on June 04, 2013, 10:21:30 AM
thanks for the simple tutorial, qWord   :t

is that SSE, SSE2 ?
both.
The double precision stuff has been introduced with SSE2.
MREAL macros - when you need floating point arithmetic while assembling!

jj2007


Gunther

Hi qWord,

thank you for the good re-fresher.  :t It's always a good idea to read your posts very attentive.

Gunther

You have to know the facts before you can distort them.

dedndave

i think there is a better way to create the de Boor control points
if you look at the images above, you will see that the Bezier curve exceeds the data minima and maxima

this can be overcome by setting the first derivative term to 0 for knot points where
both adjacent knot points are either above or below the center knot point

let me play with it   :P

johnsa

Hey,

Just caught up on this thread now. Is there a reason you're using this particular type of spline in favour of other alternatives?
For splines that pass through all of it's control points (IE: data points) I normally use a Catmull Rom spline and granted I only cursed over the pages but it seems to me the basis functions and implementation are a lot simpler.

John

dedndave

hi John
yah - i looked at the Catmull ROM splines, as well
they suffer the same issue - the curve exceeds the data minima and maxima

also - there is simpler math to do a point-by point Bezier spline
at least, it seems easier than using PolyBezier - lol

Catmull ROM looks like it would be nice in a graphics program,
where you want to let the user create an odd-shaped curve
as you said - the curve passes through all the points

still, i like to have options
i would like to find a way to create de Boor control points so the minima and maxima are not exceeded
if for no other reason, a learning process   :P

dedndave

well - the first problem pops right out there - lol

the routine doesn't know we are ploting a mathematical function
it simply sees a series of points and solves for the control points, in the order they are listed
when it finds a control point, it looks at the previously solved point for "direction"

for example, i can rotate the point data and get a reasonable curve



the routine doesn't know whether the terms minima and maxima apply to the x axis or the y axis
furthermore, it doesn't know that i am using constant uni-directional steps in one axis