Bezier Spline - want to expand <sub> and <sup>

qWord · June 04, 2013, 02:51:09 AM

Quote from: dedndave on June 04, 2013, 02:04:12 AMfor example, i change the value of a pointer while the FPU is writing to that location
with FPU code, the CPU and FPU execute different streams

the streams are synchronized as needed - otherwise there would be clear statements in Intel's and AMD's documentation (you might read them in this context). Unless not supporting antiques like the 286/386, you can safely remove all WAITs (assuming masked exceptions).

dedndave · June 04, 2013, 03:06:38 AM

ok, thanks qWord :t

i was under the impression it was required when accessing things like the control word
of course - my code does not do that - but i was wondering if that were still true

dedndave · June 04, 2013, 03:21:27 AM

ok - i see that Raymond uses FWAIT, as a precautionary measure

i know - i should RTFM
but, i have spent the last 3 days reading about bezier splines - lol
my eyes are getting very old :(

MichaelW · June 04, 2013, 04:22:31 AM

Per the Intel manual:

Quote
The FNSTSW AX form of the instruction is used primarily in conditional branching...

When the FNSTSW AX instruction is executed, the AX register is updated before the processor executes any further instructions. The status stored in the AX register is thus guaranteed to be from the completion of the prior FPU instruction.

But there is no such statement for the instruction form where the destination is a memory location and no such statement for FSTCW/FNSTCW, where the destination must be a memory location.

qWord · June 04, 2013, 05:23:18 AM

I've seen that there are some unnecessary DIVs in the function GetFirstControlPoints(). As an example, the following modification use scalar SSE2 instructions to do the same and needs only one division.

Code Select

GetFirstControlPoints PROC uKnotQty:UINT,lpKnotArray:LPVOID,lpResArray:LPVOID

;subroutine for deBoorBezierSpline

;EBX = size of X, Y, Rhs, or Tmp array in bytes
;only EBX and EBP are preserved

;-----------------------------------------

_lpResArray   TEXTEQU <dword ptr [ebp+20]>  ;pointer to Res (result) array
_lpKnotArray  TEXTEQU <dword ptr [ebp+16]>  ;pointer to Knot array
_uKnotQty     TEXTEQU <dword ptr [ebp+12]>  ;knot point qty
;                                [ebp+8]    ;RETurn address
;                                [ebp+4]    ;saved EBX contents
;                                [ebp]      ;saved EBP contents

;-----------------------------------------

        push    ebx
        push    ebp
        mov     ebp,esp
        mov     eax,_lpKnotArray
        mov     edx,_uKnotQty
        and     esp,-8
        lea     edi,[eax+2*ebx]
        sub     edx,2             ;EDX = control pair qty - 1
        add     edi,ebx           ;EDI = pointer to last knot point element
        lea     esi,[esp-8]       ;ESI = pointer to last Rhs element
        sub     esp,ebx
        lea     ecx,[edi-24]      ;ECX = pointer to second-to-last knot point element

;Rhs#[n!-1!]=(8!*Knot![n!-1!].X+Knot![n!].X)/2.0#

        mov     eax,[ecx]
        shl     eax,3
        add     eax,[edi]
        
        movsd xmm6,r8_1_0	; xmm6 = 1.0
        movsd xmm7,r8_r2_0	; xmm7 = 1/b = 0.5
        
        cvtsi2sd xmm1,eax
		mulsd xmm1,xmm7
        movsd real8 ptr [esi],xmm1
        
        sub     ecx,24
        sub     edi,24
        add     edx,-1

        lea     esi,[esi-8]
        jz      gfcps1

;for(i!=1!;i!<n!-1!;++i!)
;    Rhs#[i!]=4!*Knot![i!].X+2!*Knot![i!+1!].X

gfcps0: mov     eax,[ecx]
        shl     eax,1
        add     eax,[edi]
        shl     eax,1
        
        cvtsi2sd xmm0,eax
        movsd real8 ptr [esi],xmm0
        
        sub     ecx,24
        sub     edi,24
        add     edx,-1

        lea     esi,[esi-8]
        jnz     gfcps0

;Rhs#[0!]=Knot![0!].X+2!*Knot![1!].X

gfcps1: mov     eax,[edi]
        shl     eax,1
        add     eax,[ecx]
        
        cvtsi2sd xmm0,eax
        movsd real8 ptr [esi],xmm0

;EBX = X/Y/Rhs/Tmp array bytes
;EDX = 0
;ESI = pointer to Rhs array

        mov     ecx,_uKnotQty
        sub     esp,ebx
        add     ecx,-1            ;ECX = de Boor control pair qty
        mov     edi,_lpResArray   ;EDI = pointer to Res array
        add     ecx,-1            ;ECX = de Boor control pair qty - 1
        mov     ebx,esp           ;EBX = pointer to Tmp array
        mov     edx,ecx           ;EDX = de Boor control pair qty - 1

;double b#=2.0#
;Res#[0!]=Rhs#[0!]/b#

        movsd xmm0,real8 ptr [esi]
        mulsd xmm0,xmm7
        movsd real8 ptr [edi],xmm0

        add     ebx,8
        add     esi,8

;			for (int i = 1; i < n; i++) // Decomposition and forward substitution.
;			{
;				tmp[i] = 1 / b;
;				b = (i < n - 1 ? 4.0 : 3.5) - tmp[i];
;				x[i] = (rhs[i] - x[i - 1]) / b;
;			}

gfcps2: movsd real8 ptr [ebx],xmm7

		cmp     edx,1
        jnz     gfcps3

        movsd xmm0,r8_3_5
        jmp short gfcps4

gfcps3: movsd xmm0,r8_4_0

gfcps4: movsd xmm4,real8 ptr [esi]
		subsd xmm0,xmm7
		movapd xmm7,xmm6
		subsd xmm4,real8 ptr [edi]
		divsd xmm7,xmm0
		add     edi,8
		mulsd xmm4,xmm7
        movsd real8 ptr [edi],xmm4
        add     ebx,8
        add     esi,8
        add     edx,-1
        jnz     gfcps2

;EBX = pointer to end of Tmp array
;ECX = de Boor control pair qty - 1
;EDI = pointer to last element of Res array

        sub     edi,8
        sub     ebx,8        

;			for (int i = 1; i < n; i++)
;				x[n - i - 1] -= tmp[n - i] * x[n - i]; // Backsubstitution.

gfcps5: 
		movsd xmm0,real8 ptr [edi]
		movsd xmm1,real8 ptr [edi+8]
		mulsd xmm1,real8 ptr [ebx]
		subsd xmm0,xmm1
		movsd real8 ptr [edi],xmm0

        sub     ebx,8
        sub     edi,8
        add     ecx,-1
        jnz     gfcps5

        leave
        pop     ebx
        ret     12

GetFirstControlPoints ENDP

dedndave · June 04, 2013, 08:22:01 AM

i think the whole thing could be done with SSE, rather than FPU
but, that's for someone who knows SSE :redface:
i may have to learn it, sooner than later - lol

qWord · June 04, 2013, 10:19:51 AM

Quote from: dedndave on June 04, 2013, 08:22:01 AMbut, that's for someone who knows SSE :redface:
i may have to learn it, sooner than later - lol

common... at least the scalar stuff is extreme simple. The basic instructions:

type identifier (type ID):
ss = scalar single = REAL4 = low DWORD of corresponding XMM register
sd = scalar double = REAL8 = low QWORD of corresponding XMM register
For the conversion instructions:
si = scalar integer = SDWORD = GPR or memory location

instructions:
mov<type ID> {xmmReg, memory location}, {memory location, xmmReg} ; for mem -> reg: zero extension
e.g.:
movsd xmm0,REAl8 ptr [eax] ; move scalar double from [eax] to xmm0. the upper 8 bytes are zero.

arithmetic:
add/sub/mul/div/sqrt<type ID> xmmReg, {xmmReg,memory location}
e.g.:
sqrtss xmm0,REAL4 ptr [eax] ; single precision: xmm0 = sqrt(REAL4 ptr [eax])
sqrtsd xmm0,xmm1 ; double precision: xmm0 = sqrt(xmm1)

conversion:
cvt<type ID src>2<type ID dest> {xmmReg, memory location, 32bit GRP}, {32bit GPR or integer memory location, memory location, xmmReg}
e.g.:
cvtsi2ss xmm0,eax ; eax -> REAl4 in xmm0
cvtsd2si eax,xmm0 ; REAL8 in xmm0 -> eax
cvtss2sd xmm0,xmm0 ; convert scalar single to scalar double in xmm0

compare:
comi<type ID> xmmReg, {xmmReg,memory location}
set the flags as comparing unsigned values (ja/jb,...)

The above instruction does not need alignment of operands.

dedndave · June 04, 2013, 10:21:30 AM

thanks for the simple tutorial, qWord :t

is that SSE, SSE2 ?

qWord · June 04, 2013, 10:26:27 AM

Quote from: dedndave on June 04, 2013, 10:21:30 AM
thanks for the simple tutorial, qWord :t

is that SSE, SSE2 ?

both.
The double precision stuff has been introduced with SSE2.

jj2007 · June 04, 2013, 03:33:25 PM

Quote from: dedndave on June 04, 2013, 10:21:30 AM
thanks for the simple tutorial, qWord :t

well done indeed :t

Gunther · June 04, 2013, 07:25:35 PM

Hi qWord,

thank you for the good re-fresher. :t It's always a good idea to read your posts very attentive.

Gunther

dedndave · June 04, 2013, 10:13:18 PM

i think there is a better way to create the de Boor control points
if you look at the images above, you will see that the Bezier curve exceeds the data minima and maxima

this can be overcome by setting the first derivative term to 0 for knot points where
both adjacent knot points are either above or below the center knot point

let me play with it :P

johnsa · June 04, 2013, 10:43:06 PM

Hey,

Just caught up on this thread now. Is there a reason you're using this particular type of spline in favour of other alternatives?
For splines that pass through all of it's control points (IE: data points) I normally use a Catmull Rom spline and granted I only cursed over the pages but it seems to me the basis functions and implementation are a lot simpler.

John

dedndave · June 04, 2013, 10:58:53 PM

hi John
yah - i looked at the Catmull ROM splines, as well
they suffer the same issue - the curve exceeds the data minima and maxima

also - there is simpler math to do a point-by point Bezier spline
at least, it seems easier than using PolyBezier - lol

Catmull ROM looks like it would be nice in a graphics program,
where you want to let the user create an odd-shaped curve
as you said - the curve passes through all the points

still, i like to have options
i would like to find a way to create de Boor control points so the minima and maxima are not exceeded
if for no other reason, a learning process :P

dedndave · June 04, 2013, 11:24:00 PM

well - the first problem pops right out there - lol

the routine doesn't know we are ploting a mathematical function
it simply sees a series of points and solves for the control points, in the order they are listed
when it finds a control point, it looks at the previously solved point for "direction"

for example, i can rotate the point data and get a reasonable curve

the routine doesn't know whether the terms minima and maxima apply to the x axis or the y axis
furthermore, it doesn't know that i am using constant uni-directional steps in one axis

The MASM Forum

News:

Bezier Spline - want to expand <sub> and <sup>

qWord

dedndave

dedndave

MichaelW

qWord

dedndave

qWord

dedndave

qWord

jj2007

Gunther

dedndave

johnsa

dedndave

dedndave