Hi all.
It is true that code written in assembly is faster than code
written in any HiLevel language.
Someday, in foretime, I read in this forum that some source
was faster when was written in C/C++ than in assembly.
Why ?
The answer is below.
These days I read in the Web the follow:
..........................
I wrote this function in C++, assembly (in-line) and assembly (MASM).
Here is the C++ Code:
char cppToUpper(char c)
{
if (c > 122 || c < 97 )
return c;
else return c - 32;
}
Here is the inline assembly Code:
char cToUpper(int c)
{
//
//cout << cLowerLimit;
_asm
{
//Copy the character onto the arithmetic register for single bytes
mov eax, c;
//Test the Upper Limit
cmp eax, 122; // Compare the Character to 122
ja End; // Jump to the end if above -- the character is too high to be a lower case letter
//Test the lower limit
cmp eax, 97 //Compare the character to 97
jb End; // Jump to the end if below == the character is too low to be a lower case letter
//Now the operation begins
sub eax, 32; //Subtract 32 from the character in the register
End:
// mov result, al; //Move the Character in the register into the result variable
}
}
And here is the function in pure assembly language:
.686
.model flat, stdcall
option casemap :none
.code
cUpperCase2 proc cValue:DWORD
mov eax, cValue
cmp eax, 122
ja TEnd
cmp eax, 97
jb TEnd
sub eax, 32
TEnd:
ret
cUpperCase2 endp
end
Now, here is what the C++ function disassembles to:
char cppToUpper(char c)
{
01271680 push ebp
01271681 mov ebp,esp
01271683 sub esp,0C0h
01271689 push ebx
0127168A push esi
0127168B push edi
0127168C lea edi,[ebp-0C0h]
01271692 mov ecx,30h
01271697 mov eax,0CCCCCCCCh
0127169C rep stos dword ptr es:[edi]
if (c > 122 || c < 97 )
0127169E movsx eax,byte ptr [c]
012716A2 cmp eax,7Ah
012716A5 jg cppToUpper+30h (12716B0h)
012716A7 movsx eax,byte ptr [c]
012716AB cmp eax,61h
012716AE jge cppToUpper+37h (12716B7h)
return c;
012716B0 mov al,byte ptr [c]
012716B3 jmp cppToUpper+3Eh (12716BEh)
012716B5 jmp cppToUpper+3Eh (12716BEh)
else return c - 32;
012716B7 movsx eax,byte ptr [c]
012716BB sub eax,20h
}
012716BE pop edi
012716BF pop esi
012716C0 pop ebx
012716C1 mov esp,ebp
012716C3 pop ebp
012716C4 ret
ALRIGHT HERE'S the question. Why is the C++ code considerably faster even though it compiles to far more instructions than my assembly language code uses? 48 "ticks" expire when executing the pure assembly language function 10,000,000 times (I'll put this stuff at the very bottom); 0 ticks when executing it in C++, and 16 when using inline assembly?
I am impressed that I was even able to get it to work in assembly but perplexed at the performance results. I'll put the main() function below along with the efficiency timing stuff for your reference.
Any ideas? I am just trying to learn a little assembly because I am curious about how computers actually work.
#include "stdafx.h"
#include <iostream>
#include <string>
#include "windows.h"
#include "time.h"
using namespace std;
extern "C" int _stdcall cUpperCase2(char c);
class stopwatch
{
public:
stopwatch() : start(clock()){} //start counting time
~stopwatch();
private:
clock_t start;
};
stopwatch::~stopwatch()
{
clock_t total = clock()-start; //get elapsed time
cout<<"total of ticks for this activity: "<<total<<endl;
cout <<"in seconds: "<< double(total/CLK_TCK) <<endl;
}
void main()
{
bool bAgain = true;
while (bAgain)
{
// unsigned long lTimeNow = t_time;
char c = 'a';
char d = '!';
char e;
//cout << "A lowercase character will be converted to Uppercase:" << endl;
//cin >> c;
{
stopwatch watch;
for (int i=0; i < 10000000; i++)
{
e= cUpperCase2(c);
e= cUpperCase2(d);
}
}
cout << "That was the external function written in assembler." << endl;
{
stopwatch watch;
for (int i=0; i < 10000000; i++)
{
// cout << cppToUpper(c);
//cout << cppToUpper(d);
e= cppToUpper(c);
e= cppToUpper(d);
}
}
cout << "That was C++\n";
{
stopwatch watch;
for (int i=0; i < 10000000; i++)
{
e= cToUpper(c);
e= cToUpper(d);
}
}
cout << "That was in line assembler\n";
cout << "Enter a letter and hit enter to exit (a will repeat) . . ." << endl;
cin >> c;
//return 0;
if (c != 'a')
bAgain = false;
}
My answer is that C/C++ compiler never executes the above loop when call the cppToUpper function.
This is because C/C++ compiler knows at compile time the result and put the result without execute the loop.
This called optimize.
But if in the above loop put below the call e= cToUpper(d);
the follow:
if(i > 10000000)
break;
the C/C++ will execut the loop and the result is different.
The conclusion is that sometimes the C/C++ is faster.
Manos.
For fairness, please use a release build, turn all optimization and use a realistic function/algorithm with real-word-data. If the function's input depends on some runtime-input (e.g. command line or user input), the compiler can't remove the function as it did in your test. Also remarks that the compiler maybe inline your code.
Quote from: Manos on May 04, 2013, 04:11:50 AM
These days I read in the Web the follow:
In the DaniWeb (http://www.daniweb.com/software-development/assembly/threads/155136/assembly-vs.-c-performance)? Rarely seen so many confused people in one thread :P
When comparing reasonable C++ and assembly code, they are often equally fast.
If the C++ code is more than 5% faster, go and check if it does not eliminate some steps by guessing that two constants can be condensed into one (compilers can be clever 8))
No problem - do the same in assembler, and you are back on par.
If, on the other hand, you believe the C++ code is not fast enough, then disassemble the innermost loop and trim it with hand-made assembly. Depending on the quality of the initial code, improvements between 10% and a factor 10 are always possible. Search the Laboratory for the word
timings to see what's feasible. Many good algos have been written "against" the C Runtime Library, which is probably one of the libraries that have been "beaten to death" by M$ programmers to tickle the last bit of performance out of Windows. And voilà, assembly is still often a factor 2 or 3 faster. Ask Lingo (http://www.masmforum.com/board/index.php?topic=13701.msg107596#msg107596) if you can find him ;)
Main goal of this topic is not absolutely correct, because you are really discussed C-code, shown that it with prolog and epilog and so on, that on assembly is discarding - correct opinion, assembly is more efficient than C is proved be experience, but C++ compilers often have more high level of optimisation, and underscore before asm keyword are showing, that it's giving system depended results - not language.
Hoping this isn't going to turn into a flame war about different languages,seen way to may of those. :dazzled:
You tend to do comparisons like this by writing the task in both languages and comparing them in a benchmark, combining the two formats in a compiler does justice to neither, the inline assembler messes up the compiler optimisation and the C compiler generally uses registers in non-standard ways which increases the overhead of calling an inline assembler routine.
Microsoft have long had the solution, write your C/C++ in a C compiler and write your assembler code with an assembler, then LINK them together and you get the bast of both worlds, not the worst.
The next factor is its easy to write both lousy C and lousy assembler, if you are going to make comparisons you need to benchmark both to locally optimise each one. THEN do the comparison.
i suspect that, for some rare cases, you can implement some things faster in assembler
also, in C, some things are easier, like COM and .NET, etc - and C is more maintainable
otherwise - it's a good design/bad design thing, as Hutch says
use the right tool for the job
now, in my case, i am not very proficient in C
and - i don't write code for a living, either
i prefer assembler and write in assembler
This code eliminates the C++ stuff, adds some function test code (currently commented out), adds a test of the CRT function, adds a naked function, and removes the prologue and epilogue from the external assembly procedure.
//=============================================================================
#include <windows.h>
#include <conio.h>
#include <stdio.h>
#include "counter_c.c"
//=============================================================================
// These for safety on single-core systems.
#define PP HIGH_PRIORITY_CLASS
#define TP THREAD_PRIORITY_NORMAL
// These for multi-core systems.
//#define PP REALTIME_PRIORITY_CLASS
//#define TP THREAD_PRIORITY_TIME_CRITICAL
#define LOOPS 10000000
//=============================================================================
int c_toupper(int c)
{
if (c > 122 || c < 97)
return c;
else
return c - 32;
}
char ia_toupper(int c)
{
__asm
{
mov eax, c
cmp eax, 122
ja end
cmp eax, 97
jb end
sub eax, 32
end:
}
}
__declspec(naked) int nk_toupper(int c)
{
__asm
{
mov eax, [esp+4]
cmp eax, 122
ja end
cmp eax, 97
jb end
sub eax, 32
end:
ret
}
}
//----------------------------------------------------------------------------
// This is necessary to prevent the optimizer from breaking the counter code.
//----------------------------------------------------------------------------
#pragma optimize("",off)
int asm_toupper(int c);
void main(void)
{
int i, c;
/*
for(i=0;i<200;i++)
{
c = rand() >> 8;
printf("%c",toupper(c));
printf("%c",c_toupper(c));
printf("%c",ia_toupper(c));
printf("%c",nk_toupper(c));
printf("%c",asm_toupper(c));
printf("\n");
}
*/
SetProcessAffinityMask(GetCurrentProcess(),1);
Sleep(5000);
for(i=0;i<4;i++)
{
counter_begin(1,LOOPS,PP,TP);
counter_end(1)
printf( "%d cycles, empty\n", counter_cycles );
counter_begin(2,LOOPS,PP,TP);
c = toupper(95);
c = toupper(110);
c = toupper(125);
counter_end(2)
printf( "%d cycles, toupper\n", counter_cycles );
counter_begin(3,LOOPS,PP,TP);
c = c_toupper(95);
c = c_toupper(110);
c = c_toupper(125);
counter_end(3)
printf( "%d cycles, c_toupper\n", counter_cycles );
counter_begin(4,LOOPS,PP,TP);
c = ia_toupper(95);
c = ia_toupper(110);
c = ia_toupper(125);
counter_end(4)
printf( "%d cycles, ia_toupper\n", counter_cycles );
counter_begin(5,LOOPS,PP,TP);
c = nk_toupper(95);
c = nk_toupper(110);
c = nk_toupper(125);
counter_end(5)
printf( "%d cycles, nk_toupper\n", counter_cycles );
counter_begin(6,LOOPS,PP,TP);
c = asm_toupper(95);
c = asm_toupper(110);
c = asm_toupper(125);
counter_end(6)
printf( "%d cycles, asm_toupper\n\n", counter_cycles );
}
getch();
}
#pragma optimize("",on)
Results on my P3:
0 cycles, empty
49 cycles, toupper
20 cycles, c_toupper
26 cycles, ia_toupper
20 cycles, nk_toupper
20 cycles, asm_toupper
0 cycles, empty
49 cycles, toupper
20 cycles, c_toupper
27 cycles, ia_toupper
20 cycles, nk_toupper
20 cycles, asm_toupper
0 cycles, empty
49 cycles, toupper
20 cycles, c_toupper
27 cycles, ia_toupper
20 cycles, nk_toupper
20 cycles, asm_toupper
0 cycles, empty
49 cycles, toupper
20 cycles, c_toupper
27 cycles, ia_toupper
20 cycles, nk_toupper
20 cycles, asm_toupper
This is the result on my Core2 quad. (3 gig)
0 cycles, empty
31 cycles, toupper
52 cycles, c_toupper
63 cycles, ia_toupper
39 cycles, nk_toupper
30 cycles, asm_toupper
0 cycles, empty
31 cycles, toupper
44 cycles, c_toupper
63 cycles, ia_toupper
32 cycles, nk_toupper
31 cycles, asm_toupper
0 cycles, empty
31 cycles, toupper
48 cycles, c_toupper
63 cycles, ia_toupper
41 cycles, nk_toupper
29 cycles, asm_toupper
0 cycles, empty
31 cycles, toupper
57 cycles, c_toupper
61 cycles, ia_toupper
41 cycles, nk_toupper
30 cycles, asm_toupper
Some people have not understood my spirit of my words.
Assembly is the best for small programs and for writtng APIs, libraries and drivers.
But if you attempt to write a big program like my IDE, you will spend a ton of time.
And a question: Why MASM, JWASM, POASM, NASM and Windows are written in C language ?
In my MSDN for VStudio 6.0, Microsoft writes:
C Language Reference
The C language is a general-purpose programming language known for its efficiency, economy, and portability. While these characteristics make it a good choice for almost any kind of programming, C has proven especially useful in systems programming because it facilitates writing fast, compact programs that are readily adaptable to other systems. Well-written C programs are often as fast as assembly-language programs, and they are typically easier for programmers to read and maintain.
And a little words to jj2007
.
You have no the right to call Microsoft as M$.
If you don't like Microsoft, turn to Linux.
Manos.
Hi Manos,
Quote from: Manos on May 04, 2013, 06:27:15 PM
In my MSDN for VStudio 6.0, Microsoft writes:
C Language Reference
The C language is a general-purpose programming language known for its efficiency, economy, and portability. While these characteristics make it a good choice for almost any kind of programming, C has proven especially useful in systems programming because it facilitates writing fast, compact programs that are readily adaptable to other systems. Well-written C programs are often as fast as assembly-language programs, and they are typically easier for programmers to read and maintain.
and is it right, because Microsoft wrote it? Do you really believe that?
Quote from: Manos on May 04, 2013, 06:27:15 PM
And a question: Why MASM, JWASM, POASM, NASM and Windows are written in C language ?
The answer is easy: they are written in C for better maintenance, but that has nothing to do with performance. There are on the other hand good assemblers (FASM, SolAsm) which are written in assembly language. Furthermore, there are compilers written in assembly language, too. You should, for example, have a look into that thread (http://masm32.com/board/index.php?topic=964.0), especially reply #6.
Gunther
Quote from: Gunther on May 04, 2013, 07:30:26 PMQuote from: Manos on May 04, 2013, 06:27:15 PM
In my MSDN for VStudio 6.0, Microsoft writes:
C Language Reference
The C language is a general-purpose programming language known for its efficiency, economy, and portability. While these characteristics make it a good choice for almost any kind of programming, C has proven especially useful in systems programming because it facilitates writing fast, compact programs that are readily adaptable to other systems. Well-written C programs are often as fast as assembly-language programs, and they are typically easier for programmers to read and maintain.
and is it right, because Microsoft wrote it? Do you really believe that?
So, it's wrong because MS wrote it?
I would accept it with a few modifications:
QuoteThe C language is a general-purpose programming language known for its efficiency(?), economy, and portability. While these characteristics make it a good choice for almost any kind of programming, C has proven especially useful in systems programming because it facilitates writing fast, compact programs that are readily adaptable to other systems. Well-written C programs are often as fast as assembly-language programs, and they are typically easier for programmers to read and maintain.
Quote from: Manos on May 04, 2013, 06:27:15 PMAnd a little words to jj2007
.
You have no the right to call Microsoft as M$.
of course he has the right!
the assembler program is probably smaller than the hll one :P
Hi qWord,
Quote from: qWord on May 04, 2013, 09:48:56 PM
So, it's wrong because MS wrote it?
I would say: yes, because you made a lot of changes to the original statement to accept it, didn't you? 8)
Quote from: qWord on May 04, 2013, 09:48:56 PM
of course he has the right!
Without any doubt. :t
Gunther
You do know Microsoft is not a religion right?..really one could say anything about MS and not be burned alive on a stack if motherboards.
Quote from: dedndave on May 04, 2013, 10:34:25 PM
the assembler program is probably smaller than the hll one :P
And if the C program was done in plain, straightforward C it was probably easier and faster to code and less likely to contain bugs/errors.
Quote from: anunitu on May 04, 2013, 11:27:21 PM
You do know Microsoft is not a religion right?..really one could say anything about MS and not be burned alive on a stack if motherboards.
A serious person never speak ironicaly for other persons, companies or works.
If someone does not like a company or a project, simply don't use this.
But some people behave sometimes like juniors in high school.
This forum is supposed to be about talking for programming, neither for speaking ironicaly nor for attackes.
Manos.
Its basically the case that what you are familiar with will in part dictate what you program in. The last rewrite of my editor was completely in MASM and I saved more than 2 weeks in writing time over the earlier version. It is just on 200k of assembler code for the bare editor, then there is the code for the dedicated DLLs which is a mix of HLL and assembler, the scripting engine is almost exclusively assembler using the dynamic string of basic for technical reasons.
That none of it is all that big is because it IS written in assembler and much of the speed gain in writing better code was because it was written without the irritations of higher level languages. Its typically the case of being front end unfriendly but very back end friendly.
The main gain writing in C is portability but only if you are using fully portable libraries with no API code. Once you are committed to a specific operating system with Windows API functions, MASM code is just as fast to write as any higher level language that uses the same functions and it is free of the irritations and restrictions of higher level languages.
VC98 is antique junk and the assumptions from that era are no longer sound, even if they were back then, the odd bits of C the I compile these days is usually in VC2003 as it was both a better compiler and better linker with a greater range of libraries.
First link in the signature field of this message.
Pure hardcore C++ written to be compilable for both 32 bit and 64 bit platforms without change of single line of code.
The question of the EXE file size in this case is a question of lazyness, and, yes, written in assembly with the same functionality it will not be anyhow much smaller. Aligned by section size EXE will not be smaller at all.
But yes, it is not the anyhow complex program, and, what is more important, it uses pretty linear logic which uses integer arithmetic.
There, in the field of the code which is easy "serializable", like simple integer math, control flow structures, calltables, jumptables and other similar stuff are, which do have equal translation from the HLL to the machine code, the C optimizing compiler-generated code is not slower and is not "bigger" than the best possible ASM code. But, still, FPU code, tricky and/or sophisticated integer code, SSE code and so on - is the kingdom of ASM.
Sometimes the question of "what language to use: C or ASM" is just a matter of design, development time for given particular project and of course lazyness :greensml:
Jochen, :t
Quote from: Manos on May 04, 2013, 06:27:15 PM
And a question: Why MASM, JWASM, POASM, NASM and Windows are written in C language ?
Probably this has something to do with the UNIX culture.
In the 1970s, those Bell Lab folks (Dennis Ritchie, Ken Thompson, et al) needed a language than asm to rewrite UNIX. Hence, C was born.
They used C not only for the UNIX kernel, but also for compiler, assembler, editor, etc.
And these UNIX systems have C compiler included, then C became widespreaded. Everybody started to write in C (http://www.youtube.com/watch?v=1S1fISh-pag).
Now why are those things are written in C?
- C is well known
- C is portable
- C provides a little higher abstraction than asm
- C can be compiled into efficient machine code (given a smart compiler)
- C is not that complicated (compared to C++, for example)
I think C is fine for doing system programming. I don't buy the old UNIX hackers mindset though, that is to write everything in C. If people asked me to write applications, I will choose languages "higher" than C, like Delphi or Java, for example.
I've always found it odd to try to compare a HLL to assembly language -- since any HLL, interpreted or compiled, will in the end result in execution of assembly language instructions, it is obvious that no HLL can ever be faster than optimal assembly language.
What this fails to take into account, is of course that it requires considerable knowledge, skill, and usually time, to get even a near-optimal solution in assembly language. Also, once you have reached a near-optimal solution, even small changes in the specification, can result in large amounts of work to change the code to solve the new problem in a near-optimal way.
There are few situations where the extra effort is worth it; usually inner loops in time critical code. Elsewhere, I believe spending a fraction of the time writing, debugging, and maintaining the code in a HLL makes more sense :t.
Quote from: Jibz on May 05, 2013, 07:00:18 AMit requires considerable knowledge, skill, and usually time, to get even a near-optimal solution in assembly language.
Yes it requires an effort, once, to get a library function working, but so what? Over years, we had a lot of fun picking a CRT function, e.g. strcmp, and giving it a thorough beating, see here (http://masm32.com/board/index.php?topic=1167.msg11925#msg11925). And why is a factor five faster than CRT only "near-optimal"?
Now that may be an extreme example, but over time we have seen many CRT functions been replaced by assembler equivalents that did the job in less than half the time.
So much for speed and innermost loops. But what about a real life app? Does it really take longer to write
invoke CreateWindowEx instead of CreateWindowEx
();, with three extra chars that will ruin your fingers over the years?
Sure it will take more time if you mean "pure" assembler - Hutch maintains a nice demo at \Masm32\examples\exampl07\slickhuh
But only masochists use pure assembler, reasonable coders use macros mixed with pure assembler. For example, loading a text file from disk and shuffling it into a string array costs me one line.
One.
Handling arrays, whether "double" aka REAL8 or REAL4 or any integer size, can be easier with assembly macros than with most HLLs including C/C++
qWord has written a marvelous simple math library (http://sourceforge.net/projects/smplmath/) that makes arithmetics as easy as in any other HLL.
We have rejecting and non-rejecting loops, C and Basic-style for loops.
We have a Switch/Case macro, and it has
a lot more power (http://masm32.com/board/index.php?topic=1185.0) than its C equivalent.
We have several macros for SEH, e.g. mine (http://masm32.com/board/index.php?topic=185.0) - but almost nobody uses them. Because we control our code so tightly that we would not allow a release version throwing exceptions.
Which brings me to my last point: Quality. HLL must be better, right? That's theory, practice is that
assembler forces the coder to reflect thoroughly on each and every line, and that produces better quality than code that relies on "my compiler knows how to do that".
Of course, C/C++ is everywhere, and most of the big commercial apps are written in that language. That is why I am not impressed, I see too much bullshit produced from apparently huge teams of programmers in
respected big software companies. MS Word forgetting to redraw, for example. And every time I shut my puter down instead of hibernating it, on reboot Adobe pops up and solemnly swears that this time the known security and performance problems are finally solved. IMHO they will be solved when Adobe goes back to BASIC.
[edit: fixed a garbled phrase]
I am a moderate here, I am of the view that Microsoft have it right in providing BOTH CL.EXE and ML.EXE with a compatible object module format so that you can link them both together so you can get the best (or worst) of both worlds in an application. What I am not a fan of is inline assembler in an optimising compiler as it makes a mess of both capacities, to this extent one of the few things I approve of in 64 bit code is the removal of inline assembler so that you MUST write a separate module for assembler code.
Now having cast my eye over a lot of C and assembler code in my time, one thing you CAN garrantee is that badly written code in either performs badly, equally well written code in either performs well enough in most instances and an appropriate level of competence in both is necessary to perform any sensible comparison.
Then there is the readability issue and I have seen enough of both to say that much of both is unreadable, with bad assembler it looks something like an assembler dump and while I am practiced at reading this stuff, it is no joy to work on. With badly written C I have seen some of the most appalling messes with the author trying to be profound in bundling a whole heap of functions together and non-symetrical brace indentation and to read or fix it you have to carefully pick it apart and re-edit it to make sense of it.
My own C code is still written 1990 K&R style so I can read it but I write very little C these days, that what I have MASM for. In a forum of this type where there are a large number of people literate in multiple languages, a topic of this type is little better than noise as most members know the difference and use the tool of their choice for the task they have in mind.
For folks who are happy t use mixed languages there are viable options using both.
C + asm
asm + C
asm
C
It easy enough to write a separate module in C and then link it into a MASM app as well as the other way around. I remember some years ago finding this massive collection of sort algos written in C so I compiled each one into a module and tested them in a MASM test piece. The few that were any good I converted directly to MASM, mainly for the practice of optimising the C code to get a bit more pace out of it.
I will second Hutch's proposal.
(a bit kidding now)
try {Jochen, I agree with all sentences of your speech,
}except {Quote from: jj2007 on May 05, 2013, 08:58:54 AM
We have several macros for SEH, e.g. mine (http://masm32.com/board/index.php?topic=185.0) - but almost nobody uses them. Because we control our code so tightly that we would not allow a release version throwing exceptions.
}I'll say that it mostly means that "almost nobody" writes the code which really needs these things nor bothers with any other exception handling than showing own message about exception and exiting, avoiding final "Send
all your data crash report to the
moon you know where" OS' dialog.
Really, there are much tasks which may require handling of planned exceptions, and that would not be a flaw in the program design nor the flaw of programmer's competency, it will instead be part of design, the right feature.
As a simple example - catching an In-page I/O exception (C0000006H) when one is reading the huge sparse memory-mapped file from the disk where there is no much free space to hold that file if it would have been not sparse.
Quote from: Antariy on May 05, 2013, 12:43:24 PM
Quote from: jj2007 on May 05, 2013, 08:58:54 AM
We have several macros for SEH, e.g. mine (http://masm32.com/board/index.php?topic=185.0) - but almost nobody uses them. Because we control our code so tightly that we would not allow a release version throwing exceptions.
As a simple example - catching an In-page I/O exception (C0000006H) when one is reading the huge sparse memory-mapped file from the disk where there is no much free space to hold that file if it would have been not sparse.
Alex,
You are right, of course SEH has its legit uses. Thanks for the example.
But in 99% of all cases you can live without SEH, or limit it to a subproc. The OS calls them exceptions because they should be exceptional events.
However, it seems to be extremely widespread in C/C++ programming to a) install an SEH right at the beginning and b) deliberately raise exceptions for little errors that could have been avoided by simple error checking, or fixed in more benign ways.
Somebody will surely pop up with a good theory why computer science requires that there must be a
mov fs:[0], esp at line 7 of the disassembly, but a) all of my BASIC compilers could live comfortably without SEH and b) my suspicion is that users of products of certain
respected big software companies didn't like those ugly message boxes, so the marketing department asked for SEH... ;-)
Jochen, now I agree with you totally :biggrin:
I tested three functions that do the same work, using MASM.
1). The Win API lstrlen
2). The Hutch's library StrLen
3). The strlen of MSVCRT.LIB that I have in my System, (WinXP).
Follow are the source and the results:
includelib \MSVCRT.LIB
strlen PROTO C pstr:DWORD
LOCAL dwTime:DWORD
invoke GetTickCount
mov dwTime, eax
push esi
xor esi, esi
TestLoop:
; invoke lstrlen, addr szText
; invoke StrLen, addr szText
invoke strlen, addr szText
inc esi
cmp esi, 10000000
jb TestLoop
pop esi
invoke GetTickCount
sub eax, dwTime
PrintDec eax
Results:
lstrlen 328 ticks
StrLen 94 ticks
strlen 94 ticks
Manos.
Ok, once more...
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
loop overhead is approx. 17/10 cycles
2510 cycles for 10 * lstrlen
972 cycles for 10 * StrLen
1450 cycles for 10 * crt_strlen
371 cycles for 10 * Len
2512 cycles for 10 * lstrlen
972 cycles for 10 * StrLen
1515 cycles for 10 * crt_strlen
371 cycles for 10 * Len
2510 cycles for 10 * lstrlen
971 cycles for 10 * StrLen
1405 cycles for 10 * crt_strlen
371 cycles for 10 * Len
Jochen,
you gave the right answer. :t All these HLL vs Assembly discussion (I've seen a lot over the years) are a bit fruitless. Here are my results:
Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 28/10 cycles
1940 cycles for 10 * lstrlen
544 cycles for 10 * StrLen
1367 cycles for 10 * crt_strlen
169 cycles for 10 * Len
1987 cycles for 10 * lstrlen
570 cycles for 10 * StrLen
784 cycles for 10 * crt_strlen
172 cycles for 10 * Len
2558 cycles for 10 * lstrlen
557 cycles for 10 * StrLen
1367 cycles for 10 * crt_strlen
174 cycles for 10 * Len
100 = eax lstrlen
100 = eax StrLen
100 = eax crt_strlen
100 = eax Len
--- ok ---
Gunther
Quote from: Gunther on May 05, 2013, 10:34:28 PM
Jochen,
you gave the right answer. :t All these HLL vs Assembly discussion (I've seen a lot over the years) are a bit fruitless. Here are my results:
--- ok ---
Which
crt_strlen you used ?
Manos.
This is the version of "strlen" in my XP version of MSVCRT. It does not look like compiler generated code and it is about 2 years after Agner Fog designed his StrLen algo which is in the MASM32 library.
strlen:
mov ecx, [esp+4]
test ecx, 3
jz lbl1
lbl0:
mov al, [ecx]
inc ecx
test al, al
jz lbl2
test ecx, 3
jnz lbl0
add eax, 0
lbl1:
mov eax, [ecx]
mov edx, 7EFEFEFFh
add edx, eax
xor eax, 0FFFFFFFFh
xor eax, edx
add ecx, 4
test eax, 81010100h
jz lbl1
mov eax, [ecx-4]
test al, al
jz lbl5
test ah, ah
jz lbl4
test eax, 0FF0000h
jz lbl3
test eax, 0FF000000h
jz lbl2
jmp lbl1
lbl2:
lea eax, [ecx-1]
mov ecx, [esp+4]
sub eax, ecx
ret
lbl3:
lea eax, [ecx-2]
mov ecx, [esp+4]
sub eax, ecx
ret
lbl4:
lea eax, [ecx-3]
mov ecx, [esp+4]
sub eax, ecx
ret
lbl5:
lea eax, [ecx-4]
mov ecx, [esp+4]
sub eax, ecx
ret
Manos,
Quote from: Manos on May 05, 2013, 10:41:42 PM
Which crt_strlen you used ?
Manos.
Jochen has the source included. I had to repeat the tests again under my 32 bit XP under VirtualPC. Here are the results:
Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
+19 of 20 tests valid, loop overhead is approx. 37/10 cycles
1855 cycles for 10 * lstrlen
548 cycles for 10 * StrLen
608 cycles for 10 * crt_strlen
165 cycles for 10 * Len
2492 cycles for 10 * lstrlen
555 cycles for 10 * StrLen
594 cycles for 10 * crt_strlen
164 cycles for 10 * Len
2483 cycles for 10 * lstrlen
547 cycles for 10 * StrLen
606 cycles for 10 * crt_strlen
452 cycles for 10 * Len
100 = eax lstrlen
100 = eax StrLen
100 = eax crt_strlen
100 = eax Len
--- ok ---
Gunther
Quote from: hutch-- on May 05, 2013, 10:51:10 PM
This is the version of "strlen" in my XP version of MSVCRT. It does not look like compiler generated code and it is about 2 years after Agner Fog designed his StrLen algo which is in the MASM32 library.
Steve,
where is your
strlen version ?
I searched this in your library doday, before do my last post for testing, but not found.
The follow is the MS crt version for Intel:
page ,132
title strlen - return the length of a null-terminated string
;***
;strlen.asm - contains strlen() routine
;
; Copyright (c) 1985-1997, Microsoft Corporation. All rights reserved.
;
;Purpose:
; strlen returns the length of a null-terminated string,
; not including the null byte itself.
;
;*******************************************************************************
.xlist
include cruntime.inc
.list
page
;***
;strlen - return the length of a null-terminated string
;
;Purpose:
; Finds the length in bytes of the given string, not including
; the final null character.
;
; Algorithm:
; int strlen (const char * str)
; {
; int length = 0;
;
; while( *str++ )
; ++length;
;
; return( length );
; }
;
;Entry:
; const char * str - string whose length is to be computed
;
;Exit:
; EAX = length of the string "str", exclusive of the final null byte
;
;Uses:
; EAX, ECX, EDX
;
;Exceptions:
;
;*******************************************************************************
CODESEG
public strlen
strlen proc
.FPO ( 0, 1, 0, 0, 0, 0 )
string equ [esp + 4]
mov ecx,string ; ecx -> string
test ecx,3 ; test if string is aligned on 32 bits
je short main_loop
str_misaligned:
; simple byte loop until string is aligned
mov al,byte ptr [ecx]
inc ecx
test al,al
je short byte_3
test ecx,3
jne short str_misaligned
add eax,dword ptr 0 ; 5 byte nop to align label below
align 16 ; should be redundant
main_loop:
mov eax,dword ptr [ecx] ; read 4 bytes
mov edx,7efefeffh
add edx,eax
xor eax,-1
xor eax,edx
add ecx,4
test eax,81010100h
je short main_loop
; found zero byte in the loop
mov eax,[ecx - 4]
test al,al ; is it byte 0
je short byte_0
test ah,ah ; is it byte 1
je short byte_1
test eax,00ff0000h ; is it byte 2
je short byte_2
test eax,0ff000000h ; is it byte 3
je short byte_3
jmp short main_loop ; taken if bits 24-30 are clear and bit
; 31 is set
byte_3:
lea eax,[ecx - 1]
mov ecx,string
sub eax,ecx
ret
byte_2:
lea eax,[ecx - 2]
mov ecx,string
sub eax,ecx
ret
byte_1:
lea eax,[ecx - 3]
mov ecx,string
sub eax,ecx
ret
byte_0:
lea eax,[ecx - 4]
mov ecx,string
sub eax,ecx
ret
strlen endp
end
Manos.
P.S.
Below my name on the left of your forum writes:
Manos
New Member.
But I am one of the first members since 2004.
It would be better to write:
Old Member !!!
Manos,
Same plce its always been, strlen.asm in the m32lib directory. It is the version written in 1996 by Agner Fog.
Quote from: hutch-- on May 05, 2013, 10:51:10 PM
This is the version of "strlen" in my XP version of MSVCRT.
OK, included as crt_strlen2 (identical with Manos' "MS crt version for Intel"):
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
loop overhead is approx. 17/10 cycles
2677 cycles for 10 * lstrlen
973 cycles for 10 * StrLen
1484 cycles for 10 * crt_strlen
372 cycles for 10 * Len
1348 cycles for 10 * crt_strlen2
2682 cycles for 10 * lstrlen
973 cycles for 10 * StrLen
1464 cycles for 10 * crt_strlen
372 cycles for 10 * Len
1346 cycles for 10 * crt_strlen2
Manos,
Quote from: Manos on May 06, 2013, 01:08:34 AM
P.S.
Below my name on the left of your forum writes:
Manos
New Member.
But I am one of the first members since 2004.
It would be better to write:
Old Member !!!
that's right, but it has to do with the number of posts you've made. It's the forum software.
Gunther
Quote from: Gunther on May 06, 2013, 03:29:06 AM
that's right, but it has to do with the number of posts you've made. It's the forum software.
Gunther
Yes, you are right, but in some forums Administrator can change this characteristic.
For example, in my forum I have in Admin control panel this:
Manage groups
From this panel you can administer all your usergroups. You can delete, create and edit existing groups. Furthermore, you may choose group leaders, toggle open/hidden/closed group status and set the group name and description.
User defined groups
These are groups created by you or another admin on this board. You can manage memberships as well as edit group properties or even delete the group.
Manos.
Manos,
Quote from: Manos on May 06, 2013, 04:22:14 AM
Yes, you are right, but in some forums Administrator can change this characteristic.
that's clear. On the other hand, writing some more posts and being active in the forum isn't so hard. We're a very lively forum, but it's always good to have experienced and hard working coders like you on the side. :t
Gunther
OK, here is test for more or less sophisticated integer code (some of us remember this piece :biggrin:)
The C code is in Axhex2dw.c file, it linked with an ASM source with timing testbed:
INT_PTR __stdcall Axhex2dw_C(char* ptc){
INT_PTR result=0;
while(*ptc){
result=((result<<4)+(*ptc&0xF)+((*ptc>>6)*9));
++ptc;
}
return result;
}
The C code was build with MSVC10, with different optimization settings, the .OBJ file included into archive is one with maximal optimization for performance.
OK, here is the disassembly for the C code optimized to be small:
_Axhex2dw_C@4:
00000000: 8B 54 24 04 mov edx,dword ptr [esp+4]
00000004: 8A 0A mov cl,byte ptr [edx]
00000006: 33 C0 xor eax,eax
00000008: 84 C9 test cl,cl
0000000A: 74 1E je 0000002A
0000000C: 56 push esi
0000000D: 0F BE C9 movsx ecx,cl
00000010: 8B F1 mov esi,ecx
00000012: C1 FE 06 sar esi,6
00000015: 6B F6 09 imul esi,esi,9
00000018: 83 E1 0F and ecx,0Fh
0000001B: 03 F1 add esi,ecx
0000001D: C1 E0 04 shl eax,4
00000020: 03 C6 add eax,esi
00000022: 42 inc edx
00000023: 8A 0A mov cl,byte ptr [edx]
00000025: 84 C9 test cl,cl
00000027: 75 E4 jne 0000000D
00000029: 5E pop esi
0000002A: C2 04 00 ret 4
45 bytes long.
The timings:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
43 cycles for Small 1
44 cycles for Small 2
43 cycles for Small 3
46 cycles for Small 3.1
43 cycles for Small 4
100 cycles for C version
51 cycles for Small 1
44 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
99 cycles for C version
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
46 cycles for Small 3.1
43 cycles for Small 4
99 cycles for C version
--- ok ---
Axhex2dw1 (the "Small 1") is 69 bytes long, Axhex2dw2 (the "Small 2") is 48 bytes long.
OK, now the test with maximum performance optimization for C code.
The disassembly:
_Axhex2dw_C@4:
00000000: 56 push esi
00000001: 8B 74 24 08 mov esi,dword ptr [esp+8]
00000005: 8A 0E mov cl,byte ptr [esi]
00000007: 33 C0 xor eax,eax
00000009: 84 C9 test cl,cl
0000000B: 74 20 je 0000002D
0000000D: 8D 49 00 lea ecx,[ecx]
00000010: 0F BE C9 movsx ecx,cl
00000013: 8B D1 mov edx,ecx
00000015: C1 FA 06 sar edx,6
00000018: 83 E1 0F and ecx,0Fh
0000001B: 8D 14 D2 lea edx,[edx+edx*8]
0000001E: 03 D1 add edx,ecx
00000020: 8A 4E 01 mov cl,byte ptr [esi+1]
00000023: 46 inc esi
00000024: C1 E0 04 shl eax,4
00000027: 03 C2 add eax,edx
00000029: 84 C9 test cl,cl
0000002B: 75 E3 jne 00000010
0000002D: 5E pop esi
0000002E: C2 04 00 ret 4
49 bytes long. You may see that the compiler used LEA to multiply EDX by 9 - the algo in C was intentionally written in such a way that optimizing compiler will produce "at first look" the code similar to the handwritten code, i.e. it was written speed-optimized already in HLL, and you can see that inner loop logic is the same as in the handwritten code, so, this time the timings for the C code should probably be very good, but...
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
46 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
51 cycles for Small 4
79 cycles for C version
43 cycles for Small 1
44 cycles for Small 2
43 cycles for Small 3
50 cycles for Small 3.1
43 cycles for Small 4
86 cycles for C version
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
46 cycles for Small 3.1
43 cycles for Small 4
79 cycles for C version
--- ok ---
The logic is the same (of course, algo is the same), but the implementation is different. So, like I said in my first message in the thread, sophisticated (even this one, that's not really too sophisticated) algos aren't the best things that even optimizing compiler can produce well. Also some things like grabbing a char, then promoting it to dword with two instructions instead of one, then the strange compiler's fear to get the same byte twice from a memory reference, etc - these things make algo slower. Well, being the program, the compiler "writes" very good code, we should agree with that :biggrin: It's really big and hard people's work behind the short description "Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86" :eusa_clap:
It will be very interesting to see the results on different machines, as this time this is also the test "for which machines does Microsoft optimize their compilers?" - i.e. on which machines VC's code performs better - so we can say "on this machine Windows (Word/Photoshop/etc) works faster than on that (with equal CPU freq)! Just because it uses MSVC" :lol:
You don't have to be a genius to know that this forum is DIFFERENT to the last one that was hosted in the UK. After a lot of work I have made an archived version of the old forum available but everybody who is a member of this forum started with a zero post count, me included as the administrator. I did not write the software and nor do I care who likes it or not, it does the job and it is not going to be modified to suit a quirk of that few folks who have not done the work to build a new forum and archive the old one.
The advice has already been given by more active members, make some more posts.
Quote from: Antariy on May 06, 2013, 01:23:07 PM
It will be very interesting to see the results on different machines
Hi Alex,
On paper we have the same machine but results are different:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
41 cycles for Small 1
38 cycles for Small 2
41 cycles for Small 3
41 cycles for Small 3.1
41 cycles for Small 4
75 cycles for C version
41 cycles for Small 1
49 cycles for Small 2
41 cycles for Small 3
41 cycles for Small 3.1
41 cycles for Small 4
50 cycles for C version
41 cycles for Small 1
38 cycles for Small 2
53 cycles for Small 3
41 cycles for Small 3.1
41 cycles for Small 4
51 cycles for C versionThe 75 cycles peak is not an accident, it's there for every run I tried.
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
39 cycles for Small 1
42 cycles for Small 2
39 cycles for Small 3
39 cycles for Small 3.1
39 cycles for Small 4
44 cycles for C version
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
15 cycles for Small 1
16 cycles for Small 2
16 cycles for Small 3
39 cycles for Small 3.1
14 cycles for Small 4
26 cycles for C version
17 cycles for Small 1
16 cycles for Small 2
16 cycles for Small 3
16 cycles for Small 3.1
19 cycles for Small 4
26 cycles for C version
16 cycles for Small 1
17 cycles for Small 2
18 cycles for Small 3
19 cycles for Small 3.1
18 cycles for Small 4
26 cycles for C version
--- ok ---
I wonder how much influence the OS has in these timings too, like whether running 32-bit code on a 64-bit OS has a penalty?
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
22 cycles for Small 1
23 cycles for Small 2
24 cycles for Small 3
24 cycles for Small 3.1
24 cycles for Small 4
28 cycles for C version
23 cycles for Small 1
24 cycles for Small 2
20 cycles for Small 3
19 cycles for Small 3.1
19 cycles for Small 4
34 cycles for C version
20 cycles for Small 1
20 cycles for Small 2
20 cycles for Small 3
20 cycles for Small 3.1
20 cycles for Small 4
33 cycles for C version
Hi, Jochen :biggrin:
I think it's maybe because of first-time pass - code cache did not probably purge and in next tests at the same run your CPU has already ready "food". And it seem that at first time the code with superfluous instructions and not best arranged logic takes much longer to be decoded and executed, than next time it just executed from the code cache. But, speaking of "real life"(tm) app, this means that this code is crazly unoptimal (since in real app it will not be called millions of times consequently, so, the best code is that code which also gets decoded faster... but, again, what means +/- 30 clocks if the proc will be called, let's say, once in a second... that's the true reason why compilers are so popular for "general programming"... but, but, again, the entire proggie consists from these small bits and if every of it will be a bit faster, entire proggie will be faster... well, philosophy is starting here :greensml:)
It's interesting also how different tweaks of the assembly versions of Axhex2dw perform. "Small 1" (you remember it, of course :biggrin:) still looks like more or less stable in the timings. And, even being longer than C version in therms of code size, it still faster.
Hi, habran, thanks for testing it :t
Hi, John, thank you, too :biggrin: As for influence of the OS - I think you're right, and switching to a 32 bit execution context has a penalty under 64 bit OS, like it is for 16 bit apps under 32 bit OS (but since my x64 experience is small - I cannot say it 100%).
There is an old rule that occasionally applies, if you get a good enough algorithm, then the coding does not matter as much. I have an example in mind, a hybrid sort of Robert Sedgewick originally written in C and it was genuinely fast. I used it as an algorithm to test out a tool I was designing and converted it directly to unoptimised assembler, removed the stack frame, dropped the instruction count, inlined all of the satellite functions and it would not go faster than the C original.
I got a lower better optimised instruction sequence and it stubbornly refused to go faster. It does not happen all that often but it was interesting to see. Basically a perfect algorithm that was highly insensitive to coding technique.
Quote from: Antariy on May 06, 2013, 01:23:07 PM
OK, now the test with maximum performance optimization for C code.
The disassembly:
_Axhex2dw_C@4:
00000000: 56 push esi
00000001: 8B 74 24 08 mov esi,dword ptr [esp+8]
00000005: 8A 0E mov cl,byte ptr [esi]
00000007: 33 C0 xor eax,eax
00000009: 84 C9 test cl,cl
0000000B: 74 20 je 0000002D
0000000D: 8D 49 00 lea ecx,[ecx]
00000010: 0F BE C9 movsx ecx,cl
00000013: 8B D1 mov edx,ecx
00000015: C1 FA 06 sar edx,6
00000018: 83 E1 0F and ecx,0Fh
0000001B: 8D 14 D2 lea edx,[edx+edx*8]
0000001E: 03 D1 add edx,ecx
00000020: 8A 4E 01 mov cl,byte ptr [esi+1]
00000023: 46 inc esi
00000024: C1 E0 04 shl eax,4
00000027: 03 C2 add eax,edx
00000029: 84 C9 test cl,cl
0000002B: 75 E3 jne 00000010
0000002D: 5E pop esi
0000002E: C2 04 00 ret 4
49 bytes long.
Hi Alex,
I invested "considerable knowledge, skill, and half an hour of my precious time, to get a near-optimal solution in assembly language".
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
39 cycles for Small 1
42 cycles for Small 2
39 cycles for Small 3
39 cycles for Small 3.1
39 cycles for Small 4
46 cycles for C version
38 cycles for C mod JJI hope 17% improvement on the optimising C compiler qualifies as a "near-optimal solution" ;-)
Axhex2dw_CJ proc src ; old C original modified by JJ
push esi
mov esi, dword ptr [esp+8]
movsx ecx, byte ptr [esi]
xor eax, eax
jmp @go
; test ecx, ecx
; je bye
@@: ; movsx ecx, cl ; now superfluous
mov edx, ecx
sar edx, 6
and ecx, 0Fh
lea edx, [edx+edx*8]
add edx, ecx
movsx ecx, byte ptr [esi+1]
shl eax, 4
inc esi
add eax, edx
@go: test ecx, ecx
jne @B
bye: pop esi
ret 4
Axhex2dw_CJ endp
Quote from: hutch-- on May 06, 2013, 01:43:35 PM
I did not write the software and nor do I care who likes it or not, it does the job and it is not going to be modified to suit a quirk of that few folks who have not done the work to build a new forum and archive the old one.
The advice has already been given by more active members, make some more posts.
I Know how forums software works and I know to program forum software with VBasic and Java scripts.
When I wrote:
P.S.
Below my name on the left of your forum writes:
Manos
New Member.
But I am one of the first members since 2004.
It would be better to write:
Old Member !!!,
Just, I had done a joke.
But some people have not understood my spirit of my words.
If I had the self-exaltation to see my name with stars,
I could post in this forum good morning, good afternoon and good night every day.
Manos.
P.S.
Your old last forum in U.K. was very faster.
Quote from: hutch-- on May 06, 2013, 05:25:12 PM
I have an example in mind, a hybrid sort of Robert Sedgewick originally written in C and it was genuinely fast.
Is the code listing available somewhere on the internet?
Or is it available in his book, i.e Algorithms in C ?
Hi Jochen :t
In that post I also noted that for "first look" compiler does good job, but some small bits seem to be an ancient assumptions in its optimization techniques. Probably, like Hutch said, this is just a case when the algo is in the its edge of its performance - well, its creation had grown in front of your eyes, you remember :biggrin: So, this is very good example of Human vs compiler - the same algo => the same inner loop logic in HLL and ASM => the same inner loop code in compiler generated code => and STILL the Human can improve compiler's work for particular task and/or hardware. And you have just demonstrated it :t
For my CPU timings almost do not change (incredible!) - I ran it multiple times, these timings are smallest:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
43 cycles for Small 1
44 cycles for Small 2
43 cycles for Small 3
46 cycles for Small 3.1
43 cycles for Small 4
79 cycles for C version
79 cycles for C mod JJ
43 cycles for Small 1
44 cycles for Small 2
37 cycles for Small 3
47 cycles for Small 3.1
46 cycles for Small 4
79 cycles for C version
72 cycles for C mod JJ
43 cycles for Small 1
44 cycles for Small 2
43 cycles for Small 3
46 cycles for Small 3.1
46 cycles for Small 4
79 cycles for C version
79 cycles for C mod JJ
48 bytes for Axhex2dw_C2
43 bytes for Axhex2dw_CJ
ABCDEF01 returned
--- ok ---
But your CPU obviously does not like longer and superfluous code that generated by the compiler, it still likes Human's code :biggrin: Interesting how more modern CPUs will run it.
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
19 cycles for Small 1
10 cycles for Small 2
39 cycles for Small 3
38 cycles for Small 3.1
19 cycles for Small 4
26 cycles for C version
19 cycles for C mod JJ
17 cycles for Small 1
17 cycles for Small 2
18 cycles for Small 3
18 cycles for Small 3.1
18 cycles for Small 4
26 cycles for C version
20 cycles for C mod JJ
16 cycles for Small 1
19 cycles for Small 2
19 cycles for Small 3
18 cycles for Small 3.1
18 cycles for Small 4
25 cycles for C version
19 cycles for C mod JJ
48 bytes for Axhex2dw_C2
43 bytes for Axhex2dw_CJ
ABCDEF01 returned
--- ok ---
Behold and see...
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
17 cycles for Small 1
18 cycles for Small 2
19 cycles for Small 3
18 cycles for Small 3.1
19 cycles for Small 4
25 cycles for C version
19 cycles for C mod JJ
18 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
11 hsz2dw2 (unrolled 4 times)
17 cycles for Small 1
23 cycles for Small 2
16 cycles for Small 3
17 cycles for Small 3.1
23 cycles for Small 4
24 cycles for C version
19 cycles for C mod JJ
16 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
13 hsz2dw2 (unrolled 4 times)
20 cycles for Small 1
15 cycles for Small 2
19 cycles for Small 3
19 cycles for Small 3.1
19 cycles for Small 4
26 cycles for C version
19 cycles for C mod JJ
16 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
13 hsz2dw2 (unrolled 4 times)
48 bytes for Axhex2dw_C2
43 bytes for Axhex2dw_CJ
ABCDEF01 returned
--- ok ---
unsigned int hsz2dw2(char* psz){
const static unsigned char lut[256] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,9,0,0,0,0,0,0,0
,10,11,12,13,14,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,10,11,12,13,14,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 };
register unsigned int c;
register unsigned int r=0;
register unsigned char* p = (unsigned char*) psz;
while(1) {
if(!(c=*p))
break;
r<<=4;
r+=lut[c];
p++;
if(!(c=*p))
break;
r<<=4;
r+=lut[c];
p++;
if(!(c=*p))
break;
r<<=4;
r+=lut[c];
p++;
if(!(c=*p))
break;
r<<=4;
r+=lut[c];
p++;
}
return r;
}
:biggrin:
Refuses to optimise for AMD ;-)
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
39 cycles for Small 1
42 cycles for Small 2
39 cycles for Small 3
39 cycles for Small 3.1
39 cycles for Small 4
46 cycles for C version
38 cycles for C mod JJ
47 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
43 hsz2dw2 (unrolled 4 times)
:biggrin:
> I Know how forums software works and I know to program forum software with VBasic and Java scripts.
You would be surprised just how bad it would run on this 64 bit Unix server. The forum is written in PHP, not VBscript and JAVA and NO the forum software will not be modified.
Hi,
Three more data points for you.
Cheers,
(SSE1)
61 cycles for Small 1
57 cycles for Small 2
59 cycles for Small 3
60 cycles for Small 3.1
60 cycles for Small 4
71 cycles for C version
65 cycles for C mod JJ
57 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
28 hsz2dw2 (unrolled 4 times)
60 cycles for Small 1
59 cycles for Small 2
59 cycles for Small 3
60 cycles for Small 3.1
62 cycles for Small 4
71 cycles for C version
63 cycles for C mod JJ
55 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
29 hsz2dw2 (unrolled 4 times)
60 cycles for Small 1
60 cycles for Small 2
59 cycles for Small 3
60 cycles for Small 3.1
60 cycles for Small 4
71 cycles for C version
63 cycles for C mod JJ
55 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
28 hsz2dw2 (unrolled 4 times)
48 bytes for Axhex2dw_C2
43 bytes for Axhex2dw_CJ
ABCDEF01 returned
--- ok ---
120 cycles for Small 1
143 cycles for Small 2
125 cycles for Small 3
132 cycles for Small 3.1
131 cycles for Small 4
117 cycles for C version
107 cycles for C mod JJ
101 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
100 hsz2dw2 (unrolled 4 times)
119 cycles for Small 1
141 cycles for Small 2
136 cycles for Small 3
125 cycles for Small 3.1
123 cycles for Small 4
117 cycles for C version
106 cycles for C mod JJ
102 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
100 hsz2dw2 (unrolled 4 times)
117 cycles for Small 1
139 cycles for Small 2
130 cycles for Small 3
125 cycles for Small 3.1
127 cycles for Small 4
117 cycles for C version
106 cycles for C mod JJ
102 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
100 hsz2dw2 (unrolled 4 times)
48 bytes for Axhex2dw_C2
43 bytes for Axhex2dw_CJ
ABCDEF01 returned
--- ok ---
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
41 cycles for Small 1
38 cycles for Small 2
45 cycles for Small 3
41 cycles for Small 3.1
41 cycles for Small 4
51 cycles for C version
56 cycles for C mod JJ
31 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
24 hsz2dw2 (unrolled 4 times)
41 cycles for Small 1
38 cycles for Small 2
41 cycles for Small 3
41 cycles for Small 3.1
41 cycles for Small 4
51 cycles for C version
50 cycles for C mod JJ
31 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
24 hsz2dw2 (unrolled 4 times)
41 cycles for Small 1
38 cycles for Small 2
41 cycles for Small 3
41 cycles for Small 3.1
41 cycles for Small 4
51 cycles for C version
50 cycles for C mod JJ
31 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
24 hsz2dw2 (unrolled 4 times)
48 bytes for Axhex2dw_C2
43 bytes for Axhex2dw_CJ
ABCDEF01 returned
--- ok ---
Hi qWord :t
Yeah, unrolling it is the way to make it faster, but the tested algo is not unrolled - that's the point. It's "classic" more or less small, "looped" code, these characteristics are intentional - that was not a contest but rather a test :biggrin:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
79 cycles for C version
79 cycles for C mod JJ
40 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
31 hsz2dw2 (unrolled 4 times)
43 cycles for Small 1
45 cycles for Small 2
66 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
79 cycles for C version
79 cycles for C mod JJ
37 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
31 hsz2dw2 (unrolled 4 times)
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
79 cycles for C version
79 cycles for C mod JJ
37 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
31 hsz2dw2 (unrolled 4 times)
48 bytes for Axhex2dw_C2
43 bytes for Axhex2dw_CJ
ABCDEF01 returned
--- ok ---
Actually, the testbed I posted was a trimmed version I've made some time ago... will post it now - it contains the procs which were in the contest here earlier (I used Axhex2dw just because is the fastest (at least till now) from tests hex2dw procs with the characteristics: case insensitive, does not check input, is looped (i.e. for every digit there is one loop iteration - not unrolled at all), and it looks like it is copyrighted by me, at least no one dispute the rights for ~3 years :lol:)
OK, here is the timings for the archive attached (it is old testbed):
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
25 cycles for Fast version
27 cycles for Fast version under AMD
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
28 cycles for MMX 1
28 cycles for MMX 2
32 cycles for SSE1
Other's Versions:
48 cycles for Axhex2dw improved by Hutch (1)
83 cycles for Axhex2dw improved by Hutch (2)
28 cycles for Lingo's SSE version
24 cycles for Lingo's BIG integer version
23 cycles for Jochen's WORD-Indexed version
27 cycles for Dave's version (with minor changes)
25 cycles for Fast version
27 cycles for Fast version under AMD
43 cycles for Small 1
45 cycles for Small 2
43 cycles for Small 3
59 cycles for Small 3.1
43 cycles for Small 4
28 cycles for MMX 1
28 cycles for MMX 2
30 cycles for SSE1
Other's Versions:
48 cycles for Axhex2dw improved by Hutch (1)
83 cycles for Axhex2dw improved by Hutch (2)
28 cycles for Lingo's SSE version
24 cycles for Lingo's BIG integer version
23 cycles for Jochen's WORD-Indexed version
27 cycles for Dave's version (with minor changes)
25 cycles for Fast version
30 cycles for Fast version under AMD
43 cycles for Small 1
116 cycles for Small 2
43 cycles for Small 3
47 cycles for Small 3.1
43 cycles for Small 4
28 cycles for MMX 1
28 cycles for MMX 2
46 cycles for SSE1
Other's Versions:
48 cycles for Axhex2dw improved by Hutch (1)
83 cycles for Axhex2dw improved by Hutch (2)
28 cycles for Lingo's SSE version
24 cycles for Lingo's BIG integer version
23 cycles for Jochen's WORD-Indexed version
27 cycles for Dave's version (with minor changes)
==========
Codesizes:
Axhex2dw_Unrolled: 396
Axhex2dw_Unrolled_AMD: 396
Axhex2dw1 - 1: 69
Axhex2dw2 - 2: 48
Axhex2dw3 - 3: 57
Axhex2dw3_1 - 3.1: 56
Axhex2dw3 - 4: 61
Axhex2dw_MMX: 128
Axhex2dw_MMX2: 160
Axhex2dw_SSE: 160
Alex_Short_Hutch: 59
Axhex2dw_Hutch2: 54
Hex2dwLingoSSE: 160
lingo_htodw: 1950
ax_jj_htodw: 174
krbhtodw: 547
--- ok ---
krbhtodw - the author is Dave (KeepingRealBusy) with minor changes made with his permission - it's the most universal proc - it check the input, it has possibility to process "ignorant chars". It's lookup table.
The fastest GPR code by Jochen (jj2007) - ax_jj_htodw - it's word-indexed lookuptable.
All not "Other's versions" are mine, but when posted in this thread I excluded every not GPR, every unrolled and/or every lookup table based versions. Well, there are new CPUs were released since then, and maybe it's interesting to test all these procs again :biggrin:
BTW: Jochen's code is 174 bytes long AND its lookup table is once initialized and does not take the space in the EXE, so it's not only the fastest, but the smallest from unrolled versions at the same time (in the size included the size of hex2dw code + size of table initialization code).
OK - I obviously missed the "spirit" of this thread...
Hi Alex,
here are the timings for your 32Alex's_hex2dw.exe:
Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
26 cycles for Fast version
29 cycles for Fast version under AMD
50 cycles for Small 1
49 cycles for Small 2
51 cycles for Small 3
54 cycles for Small 3.1
51 cycles for Small 4
14 cycles for MMX 1
15 cycles for MMX 2
15 cycles for SSE1
Other's Versions:
54 cycles for Axhex2dw improved by Hutch (1)
62 cycles for Axhex2dw improved by Hutch (2)
10 cycles for Lingo's SSE version
19 cycles for Lingo's BIG integer version
17 cycles for Jochen's WORD-Indexed version
35 cycles for Dave's version (with minor changes)
29 cycles for Fast version
32 cycles for Fast version under AMD
48 cycles for Small 1
52 cycles for Small 2
54 cycles for Small 3
54 cycles for Small 3.1
50 cycles for Small 4
11 cycles for MMX 1
15 cycles for MMX 2
15 cycles for SSE1
Other's Versions:
65 cycles for Axhex2dw improved by Hutch (1)
60 cycles for Axhex2dw improved by Hutch (2)
10 cycles for Lingo's SSE version
19 cycles for Lingo's BIG integer version
17 cycles for Jochen's WORD-Indexed version
34 cycles for Dave's version (with minor changes)
29 cycles for Fast version
31 cycles for Fast version under AMD
48 cycles for Small 1
53 cycles for Small 2
54 cycles for Small 3
54 cycles for Small 3.1
54 cycles for Small 4
14 cycles for MMX 1
15 cycles for MMX 2
15 cycles for SSE1
Other's Versions:
55 cycles for Axhex2dw improved by Hutch (1)
62 cycles for Axhex2dw improved by Hutch (2)
10 cycles for Lingo's SSE version
19 cycles for Lingo's BIG integer version
17 cycles for Jochen's WORD-Indexed version
34 cycles for Dave's version (with minor changes)
==========
Codesizes:
Axhex2dw_Unrolled: 396
Axhex2dw_Unrolled_AMD: 396
Axhex2dw1 - 1: 69
Axhex2dw2 - 2: 48
Axhex2dw3 - 3: 57
Axhex2dw3_1 - 3.1: 56
Axhex2dw3 - 4: 61
Axhex2dw_MMX: 128
Axhex2dw_MMX2: 160
Axhex2dw_SSE: 160
Alex_Short_Hutch: 59
Axhex2dw_Hutch2: 54
Hex2dwLingoSSE: 160
lingo_htodw: 1950
ax_jj_htodw: 174
krbhtodw: 547
--- ok ---
Quote from: qWord on May 07, 2013, 12:53:42 AM
OK - I obviously missed the "spirit" of this thread...
:biggrin: as if the subject has never come up before
(http://rationalwiki.org/w/images/7/75/Deadhorse.gif)
Quote from: qWord on May 07, 2013, 12:53:42 AM
OK - I obviously missed the "spirit" of this thread...
Hey, your code was actually quite good. Even if your 'piler refuses to optimise for my AMD ;)
Quote from: dedndave on May 07, 2013, 01:16:18 AM
:biggrin: as if the subject has never come up before
(http://rationalwiki.org/w/images/7/75/Deadhorse.gif)
oh yes, it's a very new topic. :lol: :lol: :lol:
Gunther
Quote from: qWord on May 06, 2013, 09:33:07 PM
Behold and see...
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
41 cycles for Small 1
38 cycles for Small 2
40 cycles for Small 3
42 cycles for Small 3.1
41 cycles for Small 4
50 cycles for C version
53 cycles for C mod JJ
58 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
24 hsz2dw2 (unrolled 4 times)
41 cycles for Small 1
39 cycles for Small 2
41 cycles for Small 3
57 cycles for Small 3.1
23 cycles for Small 4
51 cycles for C version
53 cycles for C mod JJ
31 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
24 hsz2dw2 (unrolled 4 times)
43 cycles for Small 1
38 cycles for Small 2
41 cycles for Small 3
42 cycles for Small 3.1
41 cycles for Small 4
50 cycles for C version
51 cycles for C mod JJ
31 hsz2dw (Microsoft 32-Bit C/C++ optimization compiler v16.00.40219.01)
24 hsz2dw2 (unrolled 4 times)Well, a LUT is difficult to beat ;-)
:t
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz (SSE4)
7 cycles for Fast version
8 cycles for Fast version under AMD
16 cycles for Small 1
18 cycles for Small 2
19 cycles for Small 3
19 cycles for Small 3.1
19 cycles for Small 4
4 cycles for MMX 1
5 cycles for MMX 2
4 cycles for SSE1
Other's Versions:
23 cycles for Axhex2dw improved by Hutch (1)
22 cycles for Axhex2dw improved by Hutch (2)
3 cycles for Lingo's SSE version
7 cycles for Lingo's BIG integer version
21 cycles for Jochen's WORD-Indexed version
12 cycles for Dave's version (with minor changes)
11 cycles for Fast version
14 cycles for Fast version under AMD
17 cycles for Small 1
19 cycles for Small 2
18 cycles for Small 3
19 cycles for Small 3.1
18 cycles for Small 4
5 cycles for MMX 1
4 cycles for MMX 2
4 cycles for SSE1
Other's Versions:
22 cycles for Axhex2dw improved by Hutch (1)
21 cycles for Axhex2dw improved by Hutch (2)
2 cycles for Lingo's SSE version
7 cycles for Lingo's BIG integer version
5 cycles for Jochen's WORD-Indexed version
12 cycles for Dave's version (with minor changes)
10 cycles for Fast version
13 cycles for Fast version under AMD
17 cycles for Small 1
17 cycles for Small 2
18 cycles for Small 3
19 cycles for Small 3.1
19 cycles for Small 4
5 cycles for MMX 1
6 cycles for MMX 2
5 cycles for SSE1
Other's Versions:
23 cycles for Axhex2dw improved by Hutch (1)
22 cycles for Axhex2dw improved by Hutch (2)
4 cycles for Lingo's SSE version
7 cycles for Lingo's BIG integer version
6 cycles for Jochen's WORD-Indexed version
11 cycles for Dave's version (with minor changes)
==========
Codesizes:
Axhex2dw_Unrolled: 396
Axhex2dw_Unrolled_AMD: 396
Axhex2dw1 - 1: 69
Axhex2dw2 - 2: 48
Axhex2dw3 - 3: 57
Axhex2dw3_1 - 3.1: 56
Axhex2dw3 - 4: 61
Axhex2dw_MMX: 128
Axhex2dw_MMX2: 160
Axhex2dw_SSE: 160
Alex_Short_Hutch: 59
Axhex2dw_Hutch2: 54
Hex2dwLingoSSE: 160
lingo_htodw: 1950
ax_jj_htodw: 174
krbhtodw: 547
--- ok ---
Jochen,
Quote from: jj2007 on May 07, 2013, 03:31:40 AM
Well, a LUT is difficult to beat ;-)
that's true, but an old wisdom.
Gunther
Quote from: qWord on May 07, 2013, 12:53:42 AM
OK - I obviously missed the "spirit" of this thread...
No, it's OK, your code is good - fast and well readable, really :t Thanks for posting it :biggrin: It was informational test, too, now in this strange thread (@Dave - :biggrin:) we do have rolled and unrolled C (and you have provided both) and ASM versions of the code, mixed in a crazy testbeds :biggrin:
Quote from: habran on May 07, 2013, 05:43:20 AM
:t
Thank you,
habran :t
If your OS is 32 bit, then it really seems as a best idea to run 32 bit proggies under 32 bit OS :biggrin:
It is 64 bit Win 7 :biggrin:
IMO there is no penalty for running 32 on 64 but 64 is certainly faster because of 64 bit programing :t
That was my assumption just because your timing results seem to be smaller than other's with the same CPU :biggrin:
It is Toshiba Qosmio 16 gig ram laptop with 2.3 gig i7 and 64 bit Windows 7 Home with AVX :t
I think that qWord has got the same one
It is a great toy :bgrin:
Hi habran,
Quote from: habran on May 07, 2013, 02:18:51 PM
It is Toshiba Qosmio 16 gig ram laptop with 2.3 gig i7 and 64 bit Windows 7 Home with AVX :t
I think that qWord has got the same one
It is a great toy :bgrin:
it's probably the Ivy Bridge, isn't it?
Gunther
Yes Gunther, the Ivy Bridge it is :biggrin:
Quote
I already posted this before
here are specifications:
Intel® Core™ i7-3610QM Processor
(6M Cache, up to 3.30 GHz)
Specifications
Essentials
Status Launched
Launch Date Q2'12
Processor Number i7-3610QM
# of Cores 4
# of Threads 8
Clock Speed 2.3 GHz
Max Turbo Frequency 3.3 GHz
Intel® Smart Cache 6 MB
Bus/Core Ratio 23
DMI 5 GT/s
Instruction Set 64-bit
Instruction Set Extensions AVX
Embedded Options Available No
Lithography 22 nm
Max TDP 45 W
Recommended Customer Price TRAY: $378.00
Hi habran,
Quote from: habran on May 07, 2013, 11:20:17 PM
Quote
I already posted this before
here are specifications:
Intel® Core™ i7-3610QM Processor
(6M Cache, up to 3.30 GHz)
Specifications
Essentials
Status Launched
Launch Date Q2'12
Processor Number i7-3610QM
# of Cores 4
# of Threads 8
Clock Speed 2.3 GHz
Max Turbo Frequency 3.3 GHz
Intel® Smart Cache 6 MB
Bus/Core Ratio 23
DMI 5 GT/s
Instruction Set 64-bit
Instruction Set Extensions AVX
Embedded Options Available No
Lithography 22 nm
Max TDP 45 W
Recommended Customer Price TRAY: $378.00
an excellent machine. Runs Windows 64 as the only OS? How did you manage the new "BIOS"?
Gunther
Thanks Gunther
I did not have to touch anything
I just installed my tools and copyed my projects :biggrin:
Hi habran,
Quote from: habran on May 08, 2013, 08:44:43 AM
I did not have to touch anything
I just installed my tools and copyed my projects :biggrin:
so, do you have an EFI drive, too? Or is your hard disk not over 2.2 GB size?
Gunther
Gunther, I have C drive of 685 GB and D drive 931 GB
for my purpose it is more than enough :t
habran,
Quote from: habran on May 09, 2013, 05:58:31 AM
Gunther, I have C drive of 685 GB and D drive 931 GB
for my purpose it is more than enough :t
Okay, you can work with the original BIOS. My disk is over 2 TB and I had to deal with EFI. Installing different operating systems isn't pure joy.
But we shouldn't no longer discuss our hardware equipment, because the thread has another goal.
Gunther
If I were you I would change it to two or three smaller drivers :bgrin:
Hi habran,
Quote from: habran on May 09, 2013, 07:01:57 AM
If I were you I would change it to two or three smaller drivers :bgrin:
good proposal and that is exactly what I've done. So the trouble did start. Here is some source for futher reading. (https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface)
Gunther
what trouble? :icon_eek:
I'v read about UEFI and it pisses me of, WTF is tat :icon13:
I don't understand why would we need that crap :dazzled:
your machine is 3.4 gig and it suppose to be lightning fast
you pay big money to get the best thing and than you get some crap :(
that is not tolerable
I stopped baying desktops, I find laptops more suitable for everything
and if I need to go somewhere I can take it with me easy together with my mobile
internet connection
I hate seating at the desk
with the laptop I can enjoy a comfort of a recliner
I put a board over the chair and the laptop on it and a cup of a long black ;)
Hi habran,
Quote from: habran on May 09, 2013, 11:26:14 AM
what trouble? :icon_eek:
I'v read about UEFI and it pisses me of, WTF is tat :icon13:
I don't understand why would we need that crap :dazzled:
The point is: if you've a machine (no matter if desktop or laptop) with a hard disk over 2.2 TB, you can't manage it with the old master boot record. In that case you need UEFI with GPT. It has advantages; you need no longer logical drives, because every drive is primary. UEFI leaves a dummy MBR at your disk. But there are disadvantages, too. For example, Windows XP or Windows 7 (32 bit) are not EFI aware (very simple: no appropriate drivers). So you can't install these systems, for example, parallel with Windows 8. Moreover, if you would like to run Windows 8 and Linux in parallel, you'll need an EFI aware boot manager. I had to do one week to clear that questions. I've now Windows 7 and Linux installed (both 64 bits) and the 32 bit versions as virtual machines. My boot manager is GRUB 2.
Gunther
Windows 8 sucks :(
I am happy with Windows 7, 64 bit
If I want to bye new machine I will wait Windows 9 probably ;)
thanks for clarification Gunther :t
i did manage to find a GPT driver from Paragon that will work under XP :P
the question i have is....
when it comes to hard drives, how big is too big ?
personally, i think if you exceed 2 Tb, then you have too many eggs in one basket, anyways - lol
let's face it - when the drive crashes, how much stuff do you want to lose
better to have (3) 1 Tb drives that work and 1 that doesn't
than to have a 4 Tb drive that doesn't
Quote from: dedndave on May 09, 2013, 10:21:40 PM
personally, i think if you exceed 2 Tb, then you have too many eggs in one basket, anyways - lol
I agree with you totally :t
Quote from: dedndave on May 09, 2013, 10:21:40 PM
better to have (3) 1 Tb drives that work and 1 that doesn't
than to have a 4 Tb drive that doesn't
I agree with you :t
Hi Erol,
Quote from: Vortex on May 10, 2013, 03:23:46 AM
Quote from: dedndave on May 09, 2013, 10:21:40 PM
better to have (3) 1 Tb drives that work and 1 that doesn't
than to have a 4 Tb drive that doesn't
I agree with you :t
the point is: the 4 TB hard drive will work with UEFI and GPT. Unfortunately, that's the future because Intel, AMD, Microsoft, Hewlett-Packard and other big players are the "fans" of that idea.
Gunther
*&^%$#@!~ :(
Hi Gunther,
Dave's approach is very logical. A high capacity hard drive can be a big risk. Splitting the data across multiple drives is more safe.
Hi Erol,
Quote from: Vortex on May 10, 2013, 06:02:07 AM
Dave's approach is very logical. A high capacity hard drive can be a big risk. Splitting the data across multiple drives is more safe.
no doubt about it. But will it be possible in the future to buy smaller hard disk. My desktop PC has a 2.2 TB hard disk. I needed it, because I want to learn about the new Ivy Bridge architecture. So, what now?
Gunther
I have had the solution for years, multi-partition machines with 4 hard disks. My now old Core2 quad has 2 x 1 tb drives and 2 x 2 tb drives split into 12 partitions, the first 2 have 259 gig partitions and the last 2 have 500 gig partitions. As a safety margin I keep another XP machine that will read any of the disks from the quad if it ever goes bang and i can also read the disks on one of the old Win2000 machines.
If the i7 64 bit box ever goes bang I am in trouble as they have a different disk format that a 32 bit OS cannot read.