News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

GENERAL PROTECTION FAULT - Why?

Started by frktons, December 14, 2012, 05:30:07 AM

Previous topic - Next topic

hutch--

You guys worry me with your register usage, if you properly comply with the Intel ABI (preserve EBX ESI EDI EBP and ESP and trash EAX ECX and EDX) you will never get into trouble. Do it wrong and you will keep getting mysterious crashes. (Voice crying in the wilderness etc ....)

Dave, the main reason why you make a deviant branching mechanism is for speed yet you are using an antique and really SSSLLLOOOOOOOWWWWWWWWW "lodsd" instruction that will kill any speed advantage with a dispatcher. This is DOS brain era yet Intel in manuals from early Pentiums upwards to current i3/5/7 hardware recommend against using LODS, MOVS etc ... without the REP(E) prefix.

dedndave

in the grander scheme of things, i doubt the difference between
LODSD | JMP EAX
and
ADD ESI,4 | JMP [ESI]
is nearly as big as you think   :P

if you put any kind of code in the little module, it will likely overshadow the difference
but, if you like, i can whip out Michael's timing macro and we can test it   :biggrin:

jj2007

Quote from: dedndave on December 14, 2012, 01:26:54 PM
in the grander scheme of things, ...

It's definitely a bit shorter ;-)

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 188/100 cycles

?    cycles for 100 * lodsd & jmp
?    cycles for 100 * add esi & jmp

30      bytes for lodsd & jmp
46      bytes for add esi & jmp

dedndave

that's not the test we were talking about
there has to be some "signifigant" code in each module
nonetheless, here are my results...
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
+19 of 20 tests valid, loop overhead is approx. 243/100 cycles

28529   cycles for 100 * lodsd & jmp
19006   cycles for 100 * add esi & jmp

28495   cycles for 100 * lodsd & jmp
18970   cycles for 100 * add esi & jmp

28514   cycles for 100 * lodsd & jmp
18981   cycles for 100 * add esi & jmp


i am going to call that 29 vs 19 cycles
if you put 500 cycles of code in each section, it becomes rather insignifigant

japheth

Quote from: jj2007 on December 14, 2012, 02:13:58 PM
It's definitely a bit shorter ;-)

But it's also cheating, because "ADD ESI,4 | JMP [ESI]" is not even vaguely equivalent to "LODSD | JMP EAX". Perhaps you meant "ADD ESI,4 | JMP [ESI-4]"?



jj2007

#20
Quote from: japheth on December 14, 2012, 04:58:27 PM
Quote from: jj2007 on December 14, 2012, 02:13:58 PM
It's definitely a bit shorter ;-)

But it's also cheating, because "ADD ESI,4 | JMP [ESI]" is not even vaguely equivalent to "LODSD | JMP EAX". Perhaps you meant "ADD ESI,4 | JMP [ESI-4]"?

It is, it is, just read the posts before writing. Don't be scared, it's plain assembler, not MasmBasic :icon_mrgreen:

And surprisingly, on my cheap Intel CPU it's even faster...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
2033    cycles for 100 * lodsd & jmp
2132    cycles for 100 * add esi & jmp


On AMD instead, it's 12 vs 9 cycles for the five jumps:

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
12744   cycles for 100 * lodsd & jmp
9169    cycles for 100 * add esi & jmp

japheth

Quote from: jj2007 on December 14, 2012, 05:21:45 PM
It is, it is, ...

It is what? It is ... cheating? - or: It is ... equivalent? Try to express yourself clearly!

Quote
... just read the posts before writing.

Ok, I finally did - but it didn't help.

Quote
Don't be scared, it's plain assembler, not MasmBasic :icon_mrgreen:

Ok .. but it still looks awe.......some.

jj2007

Quote from: japheth on December 14, 2012, 04:58:27 PM
"ADD ESI,4 | JMP [ESI]" is not even vaguely equivalent to "LODSD | JMP EAX"

Quote from: jj2007 on December 14, 2012, 05:21:45 PM
It is, it is, ...

Quote from: japheth on December 15, 2012, 01:21:56 AM
It is what? It is ... cheating? - or: It is ... equivalent? Try to express yourself clearly!

I am so sorry! What I meant is: "ADD ESI,4 | JMP [ESI]" is not even vaguely fully equivalent to "LODSD | JMP EAX" in the context of the ten lines of plain assembly code posted above by Dave. Apologies for not expressing myself clearly - I'm not a native English speaker, y'know ;-)

dedndave

the difference is that EAX gets destroyed   :P

frktons

Quote from: dedndave on December 15, 2012, 02:51:15 AM
the difference is that EAX gets destroyed   :P

How beautiful. Let's destroy something togheter  :P

Quote
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
loop overhead is approx. 187/100 cycles

3153    cycles for 100 * lodsd & jmp
3000    cycles for 100 * add esi & jmp

3151    cycles for 100 * lodsd & jmp
3000    cycles for 100 * add esi & jmp

3149    cycles for 100 * lodsd & jmp
3003    cycles for 100 * add esi & jmp

30      bytes for lodsd & jmp
46      bytes for add esi & jmp


--- ok ---

It's about 5% difference. Not that much I suppose.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama