Recent Posts

Pages: [1] 2 3 ... 10
1
Game Development / Re: Lonewolff's ASM game creation thread
« Last post by Lonewolff on Today at 08:36:56 PM »
Thanks man, might be worth a look at  :t

Just finished optimising the hell out of my matrix routines. I am quite pleased with the results.

Code: [Select]
20000000 iterations

XMMatrixIdentity
        DirectXMath 578 ms
        C++ 125 ms
ASM 109 ms
XMMatrixPerspectiveFovLH
        DirectXMath 3000 ms
        C++ 2172 ms
ASM 1594 ms
XMMatrixLookAtLH
        DirectXMath 9343 ms
C++ 2781 ms
ASM 1641 ms
XMMatrixTranspose
        DirectXMath 0.719s
        C++ 0.125s
ASM 0.125s

So, I am killing Microsoft's own implementation and this is just using normal FPU commands.
2
The Campus / Re: Odd issue
« Last post by Lonewolff on Today at 08:25:40 PM »
I tracked down where the fault is. Still doesn't make much sense to me though.

It is to do with the loop

Code: [Select]
myloop:
invoke MatrixTranspose, addr testMat

mov eax, ddCounter
inc eax
mov ddCounter, eax
cmp eax, ddCount
jl myloop

If I comment out the loop code the MatrixTranspose function works correctly.

I know that 'invoke' stores its result in EAX, but EAX is immediately overwritten on the next line.

So I am still a little confused as to why this should affect the MatrixTranspose prototype.

Even if I change EAX to ECX the fault remains.


[edit]
Arghhh.... Worked it out.

It is an absolute noob bug.

Because I am testing performance in a loop I am continuously transposing the same matrix. Just so happens because I am working with an even number of iterations, I am transposing the matrix back to what it was to begin with. ROFL.

Man, what a stupid mistake :P

3
MasmBasic & the RichMasm IDE / Floor & Ceil
« Last post by jj2007 on Today at 05:24:34 PM »
include \masm32\MasmBasic\MasmBasic.inc         ; download
  SetGlobals MyR4:REAL4, MyW:WORD=-123, MyDw:DWORD=-123
  Init
  PrintLine cfm$("number\tFloor(number)\tCeil(number)")
  PrintLine Str$("%4f", MyW), Str$("\t%5f", Floor(MyW)v), Str$("  \t%5f", Ceil(MyW)v), " (WORD size integer)"
  PrintLine Str$("%4f", MyDw), Str$("\t%5f", Floor(MyDw)v), Str$("  \t%5f", Ceil(MyDw)v), " (DWORD size integer)"
  For_ ecx=0 To 29
        Rand(-99.9, +99.9, MyR4)
        PrintLine Str$("%4f ", MyR4), Str$("\t%5f", Floor(MyR4)v), Str$("  \t%5f", Ceil(MyR4)v)
  Next
EndOfCode


Output:
Code: [Select]
number  Floor(number)   Ceil(number)
-123.0  -123.00         -123.00 (WORD size integer)
-123.0  -123.00         -123.00 (DWORD size integer)
-9.629  -10.0000        -9.0000
-85.50  -86.000         -85.000
-72.59  -73.000         -72.000
72.67   72.000          73.000
31.14   31.000          32.000
64.07   64.000          65.000
-55.86  -56.000         -55.000
57.42   57.000          58.000
-95.26  -96.000         -95.000
3.386   3.0000          4.0000
12.49   12.000          13.000
-27.00  -28.000         -27.000
15.36   15.000          16.000
35.19   35.000          36.000
-49.67  -50.000         -49.000
-70.70  -71.000         -70.000
17.82   17.000          18.000
-7.807  -8.0000         -7.0000
37.56   37.000          38.000
95.73   95.000          96.000
27.80   27.000          28.000
-67.03  -68.000         -67.000
68.09   68.000          69.000
-75.14  -76.000         -75.000
-53.27  -54.000         -53.000
55.16   55.000          56.000
75.63   75.000          76.000
90.65   90.000          91.000
88.01   88.000          89.000
-21.09  -22.000         -21.000

Not included in the current (December '17) release, therefore attached as FloorCeil.inc; you may add it to your MasmBasic.inc.

Note the red v after the Str$("format", Numberv): Floor() and Ceil() always return ST(0) for further processing; if you just print the value, it needs to be popped via fstp st from the FPU. That's what the v does. Of course, you can also pop it directly into another variable, specified as second argument:

  Floor(123.456, MyDw)
  Print Str$("dw(123.456)=%i", MyDw)


The first argument can be an immediate (interpreted as double) or a DWORD, WORD, REAL4 ... REAL10 variable.
4
The Campus / Re: Floating point arithmetic question
« Last post by Lonewolff on Today at 04:52:40 PM »
So I stumbled across something hey?  :lol:

Heaps faster all round for real4's to do two fld's. :t

Maybe a caching thing in the CPU itself? Knows it already has the value there so just re-uses it perhaps?
5
The Campus / Re: Floating point arithmetic question
« Last post by jj2007 on Today at 04:45:39 PM »
What happen here? Cycles are so slower than the other machines?

No idea how this can run in 11 cycles (not including the loop overhead, though) ::)
Code: [Select]
  mov ebx, 99 ; loop 100x
  align 4
  .Repeat
fld MyR4
fld MyR4
fstp st
fstp st
dec ebx
  .Until Sign?
6
The Campus / Re: Floating point arithmetic question
« Last post by HSE on Today at 10:17:25 AM »

Code: [Select]
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

12      cycles for 100 * fld Real4 mem, mem
11      cycles for 100 * fld Real4 mem, st
12      cycles for 100 * fld Real8 mem, mem
11      cycles for 100 * fld Real8 mem, st
612     cycles for 100 * fld Real10 mem, mem
318     cycles for 100 * fld Real10 mem, st

12      cycles for 100 * fld Real4 mem, mem
10      cycles for 100 * fld Real4 mem, st
11      cycles for 100 * fld Real8 mem, mem
11      cycles for 100 * fld Real8 mem, st
612     cycles for 100 * fld Real10 mem, mem
317     cycles for 100 * fld Real10 mem, st

13      cycles for 100 * fld Real4 mem, mem
9       cycles for 100 * fld Real4 mem, st
12      cycles for 100 * fld Real8 mem, mem
10      cycles for 100 * fld Real8 mem, st
612     cycles for 100 * fld Real10 mem, mem
319     cycles for 100 * fld Real10 mem, st

16      bytes for fld Real4 mem, mem
12      bytes for fld Real4 mem, st
16      bytes for fld Real8 mem, mem
12      bytes for fld Real8 mem, st
16      bytes for fld Real10 mem, mem
12      bytes for fld Real10 mem, st


--- ok ---
What happen here? Cycles are so slower than the other machines?
7
The Campus / Re: Floating point arithmetic question
« Last post by Siekmanski on Today at 09:56:12 AM »
Code: [Select]
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

168     cycles for 100 * fld Real4 mem, mem
266     cycles for 100 * fld Real4 mem, st
168     cycles for 100 * fld Real8 mem, mem
272     cycles for 100 * fld Real8 mem, st
373     cycles for 100 * fld Real10 mem, mem
373     cycles for 100 * fld Real10 mem, st

167     cycles for 100 * fld Real4 mem, mem
268     cycles for 100 * fld Real4 mem, st
168     cycles for 100 * fld Real8 mem, mem
268     cycles for 100 * fld Real8 mem, st
373     cycles for 100 * fld Real10 mem, mem
372     cycles for 100 * fld Real10 mem, st

167     cycles for 100 * fld Real4 mem, mem
266     cycles for 100 * fld Real4 mem, st
168     cycles for 100 * fld Real8 mem, mem
267     cycles for 100 * fld Real8 mem, st
373     cycles for 100 * fld Real10 mem, mem
373     cycles for 100 * fld Real10 mem, st

16      bytes for fld Real4 mem, mem
12      bytes for fld Real4 mem, st
16      bytes for fld Real8 mem, mem
12      bytes for fld Real8 mem, st
16      bytes for fld Real10 mem, mem
12      bytes for fld Real10 mem, st


--- ok ---
8
The Campus / Re: Floating point arithmetic question
« Last post by jj2007 on Today at 09:42:08 AM »
Weird ::)
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

170     cycles for 100 * fld Real4 mem, mem
269     cycles for 100 * fld Real4 mem, st
169     cycles for 100 * fld Real8 mem, mem
270     cycles for 100 * fld Real8 mem, st
372     cycles for 100 * fld Real10 mem, mem
372     cycles for 100 * fld Real10 mem, st

169     cycles for 100 * fld Real4 mem, mem
267     cycles for 100 * fld Real4 mem, st
169     cycles for 100 * fld Real8 mem, mem
267     cycles for 100 * fld Real8 mem, st
372     cycles for 100 * fld Real10 mem, mem
374     cycles for 100 * fld Real10 mem, st

169     cycles for 100 * fld Real4 mem, mem
269     cycles for 100 * fld Real4 mem, st
169     cycles for 100 * fld Real8 mem, mem
267     cycles for 100 * fld Real8 mem, st
373     cycles for 100 * fld Real10 mem, mem
373     cycles for 100 * fld Real10 mem, st

16      bytes for fld Real4 mem, mem
12      bytes for fld Real4 mem, st
16      bytes for fld Real8 mem, mem
12      bytes for fld Real8 mem, st
16      bytes for fld Real10 mem, mem
12      bytes for fld Real10 mem, st
Code: [Select]
Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz (SSE4)

166     cycles for 100 * fld Real4 mem, mem
165     cycles for 100 * fld Real4 mem, st
175     cycles for 100 * fld Real8 mem, mem
168     cycles for 100 * fld Real8 mem, st
1029    cycles for 100 * fld Real10 mem, mem
596     cycles for 100 * fld Real10 mem, st

163     cycles for 100 * fld Real4 mem, mem
163     cycles for 100 * fld Real4 mem, st
163     cycles for 100 * fld Real8 mem, mem
168     cycles for 100 * fld Real8 mem, st
1041    cycles for 100 * fld Real10 mem, mem
602     cycles for 100 * fld Real10 mem, st

170     cycles for 100 * fld Real4 mem, mem
164     cycles for 100 * fld Real4 mem, st
174     cycles for 100 * fld Real8 mem, mem
169     cycles for 100 * fld Real8 mem, st
1056    cycles for 100 * fld Real10 mem, mem
611     cycles for 100 * fld Real10 mem, st
10
The Campus / Re: Parsing a string for NULL terminator
« Last post by jj2007 on Today at 09:33:04 AM »
Thanks, Alex & José - I am so glad to have two real fans now ;)
Pages: [1] 2 3 ... 10