Author Topic: Pelle's Forum  (Read 29935 times)

Gunther

  • Member
  • *****
  • Posts: 4086
  • Forgive your enemies, but never forget their names
Re: Pelle's Forum
« Reply #15 on: January 05, 2013, 12:43:52 PM »
Hi folks,

I took the liberty to post something on Pelle's Forum, in response to

Quote
some really optimized compilers are known to be 1:1 with assembler (i.e. Intel C++), and even better sometimes

 ;)

oh yes, the "really optimized compilers" will do our job. Really cool response, Jochen.  :t

Gunther
Get your facts first, and then you can distort them.

ragdog

  • Member
  • ****
  • Posts: 609
Re: Pelle's Forum
« Reply #16 on: January 05, 2013, 06:54:52 PM »
Quote
.NET Framework to "make easier" programming.

Yes many people coding now in RAD Studio XE (delphi) , .Net or QT
Last have i helped a friend about Win programming an he is a good coder (delphi)
but Win Programming not any experince  ::)

Ok i know Big projects in Win programming wthout a Framework is not efficiency

I have last found a Cpp source from a good tool but this is written with a framework
No any apis oder anything  :redface:

But i love it correct Craft  :biggrin:

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1196
Re: Pelle's Forum
« Reply #17 on: January 05, 2013, 07:37:01 PM »
The general slowness of the MSVCRT functions can be partially explained by the need to run on older processors. For my test I used the Microsoft strcmp source from the PSDK, compiled with the range of optimizations provided with the VC++ Toolkit 2003 compiler.

Windows 2000 SP4, P6:
Code: [Select]
1105 cycles, crt_strcmp
882 cycles, strcmp_gb
883 cycles, strcmp_g3
882 cycles, strcmp_g4
882 cycles, strcmp_g5
882 cycles, strcmp_g6
1098 cycles, strcmp_g7
1098 cycles, strcmp_g7_sse2

1106 cycles, crt_strcmp
883 cycles, strcmp_gb
883 cycles, strcmp_g3
883 cycles, strcmp_g4
883 cycles, strcmp_g5
883 cycles, strcmp_g6
1098 cycles, strcmp_g7
1098 cycles, strcmp_g7_sse2

1106 cycles, crt_strcmp
884 cycles, strcmp_gb
883 cycles, strcmp_g3
883 cycles, strcmp_g4
883 cycles, strcmp_g5
883 cycles, strcmp_g6
1097 cycles, strcmp_g7
1097 cycles, strcmp_g7_sse2

Windows XP SP3, P4 Northwood:
Code: [Select]
633 cycles, crt_strcmp
1318 cycles, strcmp_gb
1316 cycles, strcmp_g3
1316 cycles, strcmp_g4
1316 cycles, strcmp_g5
1319 cycles, strcmp_g6
893 cycles, strcmp_g7
911 cycles, strcmp_g7_sse2

619 cycles, crt_strcmp
1317 cycles, strcmp_gb
1316 cycles, strcmp_g3
1316 cycles, strcmp_g4
1316 cycles, strcmp_g5
1316 cycles, strcmp_g6
904 cycles, strcmp_g7
914 cycles, strcmp_g7_sse2

626 cycles, crt_strcmp
1317 cycles, strcmp_gb
1315 cycles, strcmp_g3
1316 cycles, strcmp_g4
1316 cycles, strcmp_g5
1316 cycles, strcmp_g6
904 cycles, strcmp_g7
915 cycles, strcmp_g7_sse2

Note how much lower the cycle count is for the XP SP3 MSVCRT, and that this is running on a processor with a lower IPC than the P3.

The relevant parts of the code-generation options:
Code: [Select]
/G3 optimize for 80386
/G4 optimize for 80486
/G5 optimize for Pentium
/G6 optimize for PPro, P-II, P-III
/G7 optimize for Pentium 4 or Athlon
/GB optimize for blended model (default)

/arch:<SSE|SSE2> minimum CPU architecture requirements, one of:
    SSE - enable use of instructions available with SSE enabled CPUs
    SSE2 - enable use of instructions available with SSE2 enabled CPUs

The SSE2 option had no effect.

« Last Edit: January 05, 2013, 09:15:54 PM by MichaelW »
Well Microsoft, here’s another nice mess you’ve gotten us into.

Vortex

  • Member
  • *****
  • Posts: 2697
Re: Pelle's Forum
« Reply #18 on: January 05, 2013, 09:28:46 PM »
XP SP3, Pentium IV 3.2 Ghz :

Code: [Select]
1341 cycles, strcmp_g3
1341 cycles, strcmp_g4
1342 cycles, strcmp_g5
1342 cycles, strcmp_g6
922 cycles, strcmp_g7
922 cycles, strcmp_g7_sse2

701 cycles, crt_strcmp
1341 cycles, strcmp_gb
1340 cycles, strcmp_g3
1341 cycles, strcmp_g4
1341 cycles, strcmp_g5
1341 cycles, strcmp_g6
921 cycles, strcmp_g7
923 cycles, strcmp_g7_sse2

701 cycles, crt_strcmp
1341 cycles, strcmp_gb
1340 cycles, strcmp_g3
1341 cycles, strcmp_g4
1341 cycles, strcmp_g5
1341 cycles, strcmp_g6
921 cycles, strcmp_g7
921 cycles, strcmp_g7_sse2

Press any key to continue ...

jj2007

  • Member
  • *****
  • Posts: 12949
  • Assembler is fun ;-)
    • MasmBasic
Re: Pelle's Forum
« Reply #19 on: January 05, 2013, 10:17:26 PM »
The general slowness of the MSVCRT functions can be partially explained...
...
The SSE2 option had no effect.

That is an understatement ;-)

Innermost loop of strcmp_g7_sse2
004059D6     ³> Ú84C9              Útest cl, cl
004059D8     ³.³74 14             ³jz short 004059EE
004059DA     ³. ³8A4E 01           ³mov cl, [esi+1]
004059DD     ³. ³0FB642 01         ³movzx eax, byte ptr [edx+1]
004059E1     ³. ³83C6 01           ³add esi, 1
004059E4     ³. ³83C2 01           ³add edx, 1
004059E7     ³. ³0FB6F9            ³movzx edi, cl
004059EA     ³. ³2BC7              ³sub eax, edi
004059EC     ³.À74 E8             Àjz short 004059D6


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
616 cycles, crt_strcmp
886 cycles, strcmp_gb
885 cycles, strcmp_g3
886 cycles, strcmp_g4
885 cycles, strcmp_g5
886 cycles, strcmp_g6
889 cycles, strcmp_g7
890 cycles, strcmp_g7_sse2
119 cycles, StringsDiffer

616 cycles, crt_strcmp
886 cycles, strcmp_gb
886 cycles, strcmp_g3
885 cycles, strcmp_g4
886 cycles, strcmp_g5
885 cycles, strcmp_g6
889 cycles, strcmp_g7
888 cycles, strcmp_g7_sse2
120 cycles, StringsDiffer

616 cycles, crt_strcmp
885 cycles, strcmp_gb
886 cycles, strcmp_g3
888 cycles, strcmp_g4
888 cycles, strcmp_g5
887 cycles, strcmp_g6
888 cycles, strcmp_g7
889 cycles, strcmp_g7_sse2
119 cycles, StringsDiffer


Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
430 cycles, crt_strcmp
474 cycles, strcmp_gb
471 cycles, strcmp_g3
474 cycles, strcmp_g4
470 cycles, strcmp_g5
477 cycles, strcmp_g6
475 cycles, strcmp_g7
470 cycles, strcmp_g7_sse2
64 cycles, StringsDiffer

430 cycles, crt_strcmp
471 cycles, strcmp_gb
473 cycles, strcmp_g3
471 cycles, strcmp_g4
473 cycles, strcmp_g5
470 cycles, strcmp_g6
470 cycles, strcmp_g7
471 cycles, strcmp_g7_sse2
65 cycles, StringsDiffer

430 cycles, crt_strcmp
472 cycles, strcmp_gb
471 cycles, strcmp_g3
472 cycles, strcmp_g4
471 cycles, strcmp_g5
474 cycles, strcmp_g6
470 cycles, strcmp_g7
469 cycles, strcmp_g7_sse2
64 cycles, StringsDiffer



Gunther

  • Member
  • *****
  • Posts: 4086
  • Forgive your enemies, but never forget their names
Re: Pelle's Forum
« Reply #20 on: January 06, 2013, 04:08:14 AM »
My test results:
Code: [Select]
strings 1+2 are not equal
strings 1+1 are equal
strings 1+2 are not equal
strings 1+1 are equal

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE3, SSSE3, SSE4.1, SSE4.2, AVX)

Press any key to continue ...
421 cycles, crt_strcmp
406 cycles, strcmp_gb
408 cycles, strcmp_g3
406 cycles, strcmp_g4
410 cycles, strcmp_g5
406 cycles, strcmp_g6
408 cycles, strcmp_g7
407 cycles, strcmp_g7_sse2
49 cycles, StringsDiffer
376 cycles, crt_strcmp
407 cycles, strcmp_gb
407 cycles, strcmp_g3
408 cycles, strcmp_g4
406 cycles, strcmp_g5
407 cycles, strcmp_g6
407 cycles, strcmp_g7
408 cycles, strcmp_g7_sse2
49 cycles, StringsDiffer
376 cycles, crt_strcmp
406 cycles, strcmp_gb
407 cycles, strcmp_g3
408 cycles, strcmp_g4
407 cycles, strcmp_g5
406 cycles, strcmp_g6
407 cycles, strcmp_g7
408 cycles, strcmp_g7_sse2
49 cycles, StringsDiffer

Gunther
Get your facts first, and then you can distort them.

jj2007

  • Member
  • *****
  • Posts: 12949
  • Assembler is fun ;-)
    • MasmBasic
Re: Pelle's Forum
« Reply #21 on: January 06, 2013, 04:17:16 AM »
Hi Gunther,

You should try the /*8 compiler switch :biggrin:

Gunther

  • Member
  • *****
  • Posts: 4086
  • Forgive your enemies, but never forget their names
Re: Pelle's Forum
« Reply #22 on: January 06, 2013, 04:25:04 AM »
Hi Jochen,

You should try the /*8 compiler switch :biggrin:

evil to him who evil thinks.

Gunther
Get your facts first, and then you can distort them.

jj2007

  • Member
  • *****
  • Posts: 12949
  • Assembler is fun ;-)
    • MasmBasic
Re: Pelle's Forum
« Reply #23 on: January 07, 2013, 06:18:59 PM »
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
451 cycles, crt_strcmp
674 cycles, strcmp_gb
676 cycles, strcmp_g3
677 cycles, strcmp_g4
675 cycles, strcmp_g5
674 cycles, strcmp_g6
675 cycles, strcmp_g7
675 cycles, strcmp_g7_sse2
140 cycles, StringsDiffer

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 9748
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Pelle's Forum
« Reply #24 on: January 11, 2013, 05:10:54 PM »
I have always found some humour in compiler/assembler comparisons and I have seen it from both directions yet even a cursory grasp of both shows they are very different animals. In particular Pelle's toolset collection does enough clever things to be a viable part of an experienced programmer's toolkit. Pelle has always kept his tools very up to date in terms of specification but as is the case with most programming languages, some of the baggage associated with them could be best described as very ordinary.

I have seen my share of assembler code that was absolute chyte just as I have seen my share of C code that was not worth the disk space it occupied yet with either written by an "artist" who actually understands the language in real detail, you get very good results with either if you write them well. The two languages have very different strengths, true C is portable and it is part of the language design where assembler is highly configurable on a specific platform at the expense of portability.

The solution to making a language popular is for there to be enough good quality code available and to make it accessible to folks who want to write in that language form. It can be watered down, compromised and made look like trash if its badly done (the one that always made me laugh was VBX files for C programming long ago, the worst of VB technology repackaged for people who could not write decent C code in the first place.

If enough people wrote enough decent C code that was properly interoperable with assembler code without the overhead of C runtime libraries it would be useful to both assembler programmers and C programmers.

I keep in mind a particular Sedgewick hybrid sort that truly defied any meaningful optimisation to get it faster. A particularly good algo that was a premium performer in C and even very complex optimisation would not make an assembler version faster.

Every tool has its place, the trick is to know its place and how to use it to get optimum results.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

Gunther

  • Member
  • *****
  • Posts: 4086
  • Forgive your enemies, but never forget their names
Re: Pelle's Forum
« Reply #25 on: January 11, 2013, 10:39:17 PM »
If enough people wrote enough decent C code that was properly interoperable with assembler code without the overhead of C runtime libraries it would be useful to both assembler programmers and C programmers. ... Every tool has its place, the trick is to know its place and how to use it to get optimum results.

Amen to that.  :t That's exactly the point.

Gunther
Get your facts first, and then you can distort them.

jj2007

  • Member
  • *****
  • Posts: 12949
  • Assembler is fun ;-)
    • MasmBasic
Re: Pelle's Forum
« Reply #26 on: January 19, 2013, 04:08:05 AM »
Hi folks,

I am bit stuck with polib, maybe somebody has a bright idea:
- I need GSL for windows (Download the latest windows GSL binary 32 bits)
- I'd like to convert \Masm32\PellesC\GSL\bin\libgsl-0.dll to a static library, \Masm32\PellesC\GSL\lib\libgsl-0.lib
- polib /MACHINE:X86 /NOUND /out:../GSL/lib/libgsl-0.lib ../GSL/bin/libgsl-0.dll creates that library
- but PellesC complains it can't find anything inside there...
- PeView reveals that the members are all decorated, like _imp_gsl_matrix_alloc, _imp_matrix_set etc
- so I hoped /NOUND would help, but it doesn't. No complaints but they are still there.

PellesC compiles it but then I get
POLINK: error: Unresolved external symbol '_gsl_matrix_alloc'.
POLINK: error: Unresolved external symbol '_gsl_matrix_set'.

I am sure it's an absolute noob error, but I am stuck. Any idea?

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: Pelle's Forum
« Reply #27 on: January 19, 2013, 05:21:05 AM »
http://vortex.masmcode.com/

use Erol's dll2inc, then Erol's def2lib

dll2inc libgsl-0.dll
notice that libgsl-0 has dependencies in libgslcblas-0
they both need to be present

def2lib libgsl-0.def

it might work   :P
the one i created is about 1 gb

notice Erol's little cinvoke.inc also

jj2007

  • Member
  • *****
  • Posts: 12949
  • Assembler is fun ;-)
    • MasmBasic
Re: Pelle's Forum
« Reply #28 on: January 19, 2013, 05:48:43 AM »
the one i created is about 1 gb

You want to scare me :eusa_naughty:

The lib is 1 MB, and it works - thanks :t

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: Pelle's Forum
« Reply #29 on: January 19, 2013, 05:52:30 AM »
lol
oops - i read the size wrong   :biggrin:

Erol is da man