News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Pelle's Forum

Started by dedndave, December 28, 2012, 04:42:08 AM

Previous topic - Next topic

Gunther

Quote from: jj2007 on January 05, 2013, 06:37:07 AM
Hi folks,

I took the liberty to post something on Pelle's Forum, in response to

Quotesome really optimized compilers are known to be 1:1 with assembler (i.e. Intel C++), and even better sometimes

;)

oh yes, the "really optimized compilers" will do our job. Really cool response, Jochen.  :t

Gunther
You have to know the facts before you can distort them.

ragdog

Quote.NET Framework to "make easier" programming.

Yes many people coding now in RAD Studio XE (delphi) , .Net or QT
Last have i helped a friend about Win programming an he is a good coder (delphi)
but Win Programming not any experince  ::)

Ok i know Big projects in Win programming wthout a Framework is not efficiency

I have last found a Cpp source from a good tool but this is written with a framework
No any apis oder anything  :redface:

But i love it correct Craft  :biggrin:

MichaelW

#17
The general slowness of the MSVCRT functions can be partially explained by the need to run on older processors. For my test I used the Microsoft strcmp source from the PSDK, compiled with the range of optimizations provided with the VC++ Toolkit 2003 compiler.

Windows 2000 SP4, P6:

1105 cycles, crt_strcmp
882 cycles, strcmp_gb
883 cycles, strcmp_g3
882 cycles, strcmp_g4
882 cycles, strcmp_g5
882 cycles, strcmp_g6
1098 cycles, strcmp_g7
1098 cycles, strcmp_g7_sse2

1106 cycles, crt_strcmp
883 cycles, strcmp_gb
883 cycles, strcmp_g3
883 cycles, strcmp_g4
883 cycles, strcmp_g5
883 cycles, strcmp_g6
1098 cycles, strcmp_g7
1098 cycles, strcmp_g7_sse2

1106 cycles, crt_strcmp
884 cycles, strcmp_gb
883 cycles, strcmp_g3
883 cycles, strcmp_g4
883 cycles, strcmp_g5
883 cycles, strcmp_g6
1097 cycles, strcmp_g7
1097 cycles, strcmp_g7_sse2


Windows XP SP3, P4 Northwood:

633 cycles, crt_strcmp
1318 cycles, strcmp_gb
1316 cycles, strcmp_g3
1316 cycles, strcmp_g4
1316 cycles, strcmp_g5
1319 cycles, strcmp_g6
893 cycles, strcmp_g7
911 cycles, strcmp_g7_sse2

619 cycles, crt_strcmp
1317 cycles, strcmp_gb
1316 cycles, strcmp_g3
1316 cycles, strcmp_g4
1316 cycles, strcmp_g5
1316 cycles, strcmp_g6
904 cycles, strcmp_g7
914 cycles, strcmp_g7_sse2

626 cycles, crt_strcmp
1317 cycles, strcmp_gb
1315 cycles, strcmp_g3
1316 cycles, strcmp_g4
1316 cycles, strcmp_g5
1316 cycles, strcmp_g6
904 cycles, strcmp_g7
915 cycles, strcmp_g7_sse2


Note how much lower the cycle count is for the XP SP3 MSVCRT, and that this is running on a processor with a lower IPC than the P3.

The relevant parts of the code-generation options:

/G3 optimize for 80386
/G4 optimize for 80486
/G5 optimize for Pentium
/G6 optimize for PPro, P-II, P-III
/G7 optimize for Pentium 4 or Athlon
/GB optimize for blended model (default)

/arch:<SSE|SSE2> minimum CPU architecture requirements, one of:
    SSE - enable use of instructions available with SSE enabled CPUs
    SSE2 - enable use of instructions available with SSE2 enabled CPUs


The SSE2 option had no effect.

Well Microsoft, here's another nice mess you've gotten us into.

Vortex

XP SP3, Pentium IV 3.2 Ghz :

1341 cycles, strcmp_g3
1341 cycles, strcmp_g4
1342 cycles, strcmp_g5
1342 cycles, strcmp_g6
922 cycles, strcmp_g7
922 cycles, strcmp_g7_sse2

701 cycles, crt_strcmp
1341 cycles, strcmp_gb
1340 cycles, strcmp_g3
1341 cycles, strcmp_g4
1341 cycles, strcmp_g5
1341 cycles, strcmp_g6
921 cycles, strcmp_g7
923 cycles, strcmp_g7_sse2

701 cycles, crt_strcmp
1341 cycles, strcmp_gb
1340 cycles, strcmp_g3
1341 cycles, strcmp_g4
1341 cycles, strcmp_g5
1341 cycles, strcmp_g6
921 cycles, strcmp_g7
921 cycles, strcmp_g7_sse2

Press any key to continue ...

jj2007

Quote from: MichaelW on January 05, 2013, 07:37:01 PM
The general slowness of the MSVCRT functions can be partially explained...
...
The SSE2 option had no effect.

That is an understatement ;-)

Innermost loop of strcmp_g7_sse2
004059D6     ³> Ú84C9              Útest cl, cl
004059D8     ³.³74 14             ³jz short 004059EE
004059DA     ³. ³8A4E 01           ³mov cl, [esi+1]
004059DD     ³. ³0FB642 01         ³movzx eax, byte ptr [edx+1]
004059E1     ³. ³83C6 01           ³add esi, 1
004059E4     ³. ³83C2 01           ³add edx, 1
004059E7     ³. ³0FB6F9            ³movzx edi, cl
004059EA     ³. ³2BC7              ³sub eax, edi
004059EC     ³.À74 E8             Àjz short 004059D6


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
616 cycles, crt_strcmp
886 cycles, strcmp_gb
885 cycles, strcmp_g3
886 cycles, strcmp_g4
885 cycles, strcmp_g5
886 cycles, strcmp_g6
889 cycles, strcmp_g7
890 cycles, strcmp_g7_sse2
119 cycles, StringsDiffer

616 cycles, crt_strcmp
886 cycles, strcmp_gb
886 cycles, strcmp_g3
885 cycles, strcmp_g4
886 cycles, strcmp_g5
885 cycles, strcmp_g6
889 cycles, strcmp_g7
888 cycles, strcmp_g7_sse2
120 cycles, StringsDiffer

616 cycles, crt_strcmp
885 cycles, strcmp_gb
886 cycles, strcmp_g3
888 cycles, strcmp_g4
888 cycles, strcmp_g5
887 cycles, strcmp_g6
888 cycles, strcmp_g7
889 cycles, strcmp_g7_sse2
119 cycles, StringsDiffer


Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
430 cycles, crt_strcmp
474 cycles, strcmp_gb
471 cycles, strcmp_g3
474 cycles, strcmp_g4
470 cycles, strcmp_g5
477 cycles, strcmp_g6
475 cycles, strcmp_g7
470 cycles, strcmp_g7_sse2
64 cycles, StringsDiffer

430 cycles, crt_strcmp
471 cycles, strcmp_gb
473 cycles, strcmp_g3
471 cycles, strcmp_g4
473 cycles, strcmp_g5
470 cycles, strcmp_g6
470 cycles, strcmp_g7
471 cycles, strcmp_g7_sse2
65 cycles, StringsDiffer

430 cycles, crt_strcmp
472 cycles, strcmp_gb
471 cycles, strcmp_g3
472 cycles, strcmp_g4
471 cycles, strcmp_g5
474 cycles, strcmp_g6
470 cycles, strcmp_g7
469 cycles, strcmp_g7_sse2
64 cycles, StringsDiffer



Gunther

My test results:

strings 1+2 are not equal
strings 1+1 are equal
strings 1+2 are not equal
strings 1+1 are equal

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE3, SSSE3, SSE4.1, SSE4.2, AVX)

Press any key to continue ...
421 cycles, crt_strcmp
406 cycles, strcmp_gb
408 cycles, strcmp_g3
406 cycles, strcmp_g4
410 cycles, strcmp_g5
406 cycles, strcmp_g6
408 cycles, strcmp_g7
407 cycles, strcmp_g7_sse2
49 cycles, StringsDiffer
376 cycles, crt_strcmp
407 cycles, strcmp_gb
407 cycles, strcmp_g3
408 cycles, strcmp_g4
406 cycles, strcmp_g5
407 cycles, strcmp_g6
407 cycles, strcmp_g7
408 cycles, strcmp_g7_sse2
49 cycles, StringsDiffer
376 cycles, crt_strcmp
406 cycles, strcmp_gb
407 cycles, strcmp_g3
408 cycles, strcmp_g4
407 cycles, strcmp_g5
406 cycles, strcmp_g6
407 cycles, strcmp_g7
408 cycles, strcmp_g7_sse2
49 cycles, StringsDiffer


Gunther
You have to know the facts before you can distort them.

jj2007

Hi Gunther,

You should try the /*8 compiler switch :biggrin:

Gunther

Hi Jochen,

Quote from: jj2007 on January 06, 2013, 04:17:16 AM
You should try the /*8 compiler switch :biggrin:

evil to him who evil thinks.

Gunther
You have to know the facts before you can distort them.

jj2007

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
451 cycles, crt_strcmp
674 cycles, strcmp_gb
676 cycles, strcmp_g3
677 cycles, strcmp_g4
675 cycles, strcmp_g5
674 cycles, strcmp_g6
675 cycles, strcmp_g7
675 cycles, strcmp_g7_sse2
140 cycles, StringsDiffer

hutch--

I have always found some humour in compiler/assembler comparisons and I have seen it from both directions yet even a cursory grasp of both shows they are very different animals. In particular Pelle's toolset collection does enough clever things to be a viable part of an experienced programmer's toolkit. Pelle has always kept his tools very up to date in terms of specification but as is the case with most programming languages, some of the baggage associated with them could be best described as very ordinary.

I have seen my share of assembler code that was absolute chyte just as I have seen my share of C code that was not worth the disk space it occupied yet with either written by an "artist" who actually understands the language in real detail, you get very good results with either if you write them well. The two languages have very different strengths, true C is portable and it is part of the language design where assembler is highly configurable on a specific platform at the expense of portability.

The solution to making a language popular is for there to be enough good quality code available and to make it accessible to folks who want to write in that language form. It can be watered down, compromised and made look like trash if its badly done (the one that always made me laugh was VBX files for C programming long ago, the worst of VB technology repackaged for people who could not write decent C code in the first place.

If enough people wrote enough decent C code that was properly interoperable with assembler code without the overhead of C runtime libraries it would be useful to both assembler programmers and C programmers.

I keep in mind a particular Sedgewick hybrid sort that truly defied any meaningful optimisation to get it faster. A particularly good algo that was a premium performer in C and even very complex optimisation would not make an assembler version faster.

Every tool has its place, the trick is to know its place and how to use it to get optimum results.

Gunther

Quote from: hutch-- on January 11, 2013, 05:10:54 PM
If enough people wrote enough decent C code that was properly interoperable with assembler code without the overhead of C runtime libraries it would be useful to both assembler programmers and C programmers. ... Every tool has its place, the trick is to know its place and how to use it to get optimum results.

Amen to that.  :t That's exactly the point.

Gunther
You have to know the facts before you can distort them.

jj2007

Hi folks,

I am bit stuck with polib, maybe somebody has a bright idea:
- I need GSL for windows (Download the latest windows GSL binary 32 bits)
- I'd like to convert \Masm32\PellesC\GSL\bin\libgsl-0.dll to a static library, \Masm32\PellesC\GSL\lib\libgsl-0.lib
- polib /MACHINE:X86 /NOUND /out:../GSL/lib/libgsl-0.lib ../GSL/bin/libgsl-0.dll creates that library
- but PellesC complains it can't find anything inside there...
- PeView reveals that the members are all decorated, like _imp_gsl_matrix_alloc, _imp_matrix_set etc
- so I hoped /NOUND would help, but it doesn't. No complaints but they are still there.

PellesC compiles it but then I get
POLINK: error: Unresolved external symbol '_gsl_matrix_alloc'.
POLINK: error: Unresolved external symbol '_gsl_matrix_set'.

I am sure it's an absolute noob error, but I am stuck. Any idea?

dedndave

http://vortex.masmcode.com/

use Erol's dll2inc, then Erol's def2lib

dll2inc libgsl-0.dll
notice that libgsl-0 has dependencies in libgslcblas-0
they both need to be present

def2lib libgsl-0.def

it might work   :P
the one i created is about 1 gb

notice Erol's little cinvoke.inc also

jj2007

Quote from: dedndave on January 19, 2013, 05:21:05 AM
the one i created is about 1 gb

You want to scare me :eusa_naughty:

The lib is 1 MB, and it works - thanks :t

dedndave

lol
oops - i read the size wrong   :biggrin:

Erol is da man