GENERAL PROTECTION FAULT - Why?

hutch-- · December 14, 2012, 12:28:33 PM

You guys worry me with your register usage, if you properly comply with the Intel ABI (preserve EBX ESI EDI EBP and ESP and trash EAX ECX and EDX) you will never get into trouble. Do it wrong and you will keep getting mysterious crashes. (Voice crying in the wilderness etc ....)

Dave, the main reason why you make a deviant branching mechanism is for speed yet you are using an antique and really SSSLLLOOOOOOOWWWWWWWWW "lodsd" instruction that will kill any speed advantage with a dispatcher. This is DOS brain era yet Intel in manuals from early Pentiums upwards to current i3/5/7 hardware recommend against using LODS, MOVS etc ... without the REP(E) prefix.

dedndave · December 14, 2012, 01:26:54 PM

in the grander scheme of things, i doubt the difference between
LODSD | JMP EAX
and
ADD ESI,4 | JMP [ESI]
is nearly as big as you think :P

if you put any kind of code in the little module, it will likely overshadow the difference
but, if you like, i can whip out Michael's timing macro and we can test it

jj2007 · December 14, 2012, 02:13:58 PM

Quote from: dedndave on December 14, 2012, 01:26:54 PM
in the grander scheme of things, ...

It's definitely a bit shorter ;-)

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
loop overhead is approx. 188/100 cycles

? cycles for 100 * lodsd & jmp
? cycles for 100 * add esi & jmp

30 bytes for lodsd & jmp
46 bytes for add esi & jmp

dedndave · December 14, 2012, 02:55:52 PM

that's not the test we were talking about
there has to be some "signifigant" code in each module
nonetheless, here are my results...

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
+19 of 20 tests valid, loop overhead is approx. 243/100 cycles

28529   cycles for 100 * lodsd & jmp
19006   cycles for 100 * add esi & jmp

28495   cycles for 100 * lodsd & jmp
18970   cycles for 100 * add esi & jmp

28514   cycles for 100 * lodsd & jmp
18981   cycles for 100 * add esi & jmp

i am going to call that 29 vs 19 cycles
if you put 500 cycles of code in each section, it becomes rather insignifigant

japheth · December 14, 2012, 04:58:27 PM

Quote from: jj2007 on December 14, 2012, 02:13:58 PM
It's definitely a bit shorter ;-)

But it's also cheating, because "ADD ESI,4 | JMP [ESI]" is not even vaguely equivalent to "LODSD | JMP EAX". Perhaps you meant "ADD ESI,4 | JMP [ESI-4]"?

jj2007 · December 14, 2012, 05:21:45 PM

Quote from: japheth on December 14, 2012, 04:58:27 PM
Quote from: jj2007 on December 14, 2012, 02:13:58 PM
It's definitely a bit shorter ;-)

But it's also cheating, because "ADD ESI,4 | JMP [ESI]" is not even vaguely equivalent to "LODSD | JMP EAX". Perhaps you meant "ADD ESI,4 | JMP [ESI-4]"?

It is, it is, just read the posts before writing. Don't be scared, it's plain assembler, not MasmBasic :icon_mrgreen:

And surprisingly, on my cheap Intel CPU it's even faster...

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
2033 cycles for 100 * lodsd & jmp
2132 cycles for 100 * add esi & jmp

On AMD instead, it's 12 vs 9 cycles for the five jumps:

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
12744 cycles for 100 * lodsd & jmp
9169 cycles for 100 * add esi & jmp

japheth · December 15, 2012, 01:21:56 AM

Quote from: jj2007 on December 14, 2012, 05:21:45 PM
It is, it is, ...

It is what? It is ... cheating? - or: It is ... equivalent? Try to express yourself clearly!

Quote
... just read the posts before writing.

Ok, I finally did - but it didn't help.

Quote
Don't be scared, it's plain assembler, not MasmBasic :icon_mrgreen:

Ok .. but it still looks awe.......some.

jj2007 · December 15, 2012, 01:27:28 AM

Quote from: japheth on December 14, 2012, 04:58:27 PM
"ADD ESI,4 | JMP [ESI]" is not even vaguely equivalent to "LODSD | JMP EAX"

Quote from: jj2007 on December 14, 2012, 05:21:45 PM
It is, it is, ...

Quote from: japheth on December 15, 2012, 01:21:56 AM
It is what? It is ... cheating? - or: It is ... equivalent? Try to express yourself clearly!

I am so sorry! What I meant is: "ADD ESI,4 | JMP [ESI]" is ~~not even vaguely~~ fully equivalent to "LODSD | JMP EAX" in the context of the ten lines of plain assembly code posted above by Dave. Apologies for not expressing myself clearly - I'm not a native English speaker, y'know ;-)

dedndave · December 15, 2012, 02:51:15 AM

the difference is that EAX gets destroyed :P

frktons · December 15, 2012, 04:49:34 AM

Quote from: dedndave on December 15, 2012, 02:51:15 AM
the difference is that EAX gets destroyed :P

How beautiful. Let's destroy something togheter :P

Quote
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz (SSE4)
loop overhead is approx. 187/100 cycles

3153 cycles for 100 * lodsd & jmp
3000 cycles for 100 * add esi & jmp

3151 cycles for 100 * lodsd & jmp
3000 cycles for 100 * add esi & jmp

3149 cycles for 100 * lodsd & jmp
3003 cycles for 100 * add esi & jmp

30 bytes for lodsd & jmp
46 bytes for add esi & jmp

--- ok ---

It's about 5% difference. Not that much I suppose.

The MASM Forum

News:

GENERAL PROTECTION FAULT - Why?

hutch--

dedndave

jj2007

dedndave

japheth

jj2007

japheth

jj2007

dedndave

frktons