I am a FASM user playing around with MASM since a couple of days.
Just one question...
I need for some reason a very large buffer to load a file from disk and process it.
No problem, but ...
It takes an awful time to compile the asmfile and the created exefile is actually to large for what it does.
Is there a possibility with MASM to simply reserve the needed bytes like I can do with FASM using e.g.
the_buffer rb 5000000
Thank you in advance !
Hi clamicun,
First things first: Welcome to the forum :icon14:
Your problem is a well-known Masm bug with dup, below a simple example. It will take over 2 seconds on an i5.
Workarounds:
a) use heap buffers instead, see halloc and hfree in \Masm32\help\hlhelp.chm, Macro categories/Memory Allocation
b) use JWasm, it's a perfect Masm clone and does not have this bug, and is much better and faster anyway:
- direct download link (http://sourceforge.net/projects/jwasm/files/latest/download)
- the only file that you need is JWasm.exe; extract to \Masm32\bin, then either
1. rename ML.exe to ML_old.exe, then JWasm.exe to ML.exe or
2. instruct your IDE to use JWasm.exe instead of ML.exe *)
include \masm32\include\masm32rt.inc
.data?
somebuffer db 500000 dup(?)
.code
start: print "ok"
exit
end start
*) For example, the MasmBasic IDE (http://masm32.com/board/index.php?topic=94.0) will simply tell you "You need \Masm32\bin\JWasm.exe", i.e. no need to rename ml.exe etc
i seem to recall a couple other work-arounds
one is to use ORG
buffer db ? ;create a 100004 byte buffer
ORG buffer+100000
db 4 dup(?)
another is to use several smaller defines
i forget what the size is that MASM starts to screw up, but let's say it's 64 KB
buffer db 65536 dup(?)
db 65536 dup(?)
db 65536 dup(?)
another would be to acquire MASM version 8.x
as i recall, the bug is fixed :biggrin:
Thank you guys !
jj2007 - yes ok. I installed JWASM. Compiles the file in 3 seconds. MASM needs 35 !
The exefilesize does not change of course. 900 KB. FASM creates a 30 KB file.
PS.
This verification routine is absolutely boring.
With FASM you usually put your RB stuff at the end so it doesn't add to the EXE size.
With MASM you put it into a special section - .data? (uninitialized data). This section doesn't actually exist in the EXE image.
Quote from: clamicun on November 30, 2014, 03:35:07 PM
jj2007 - yes ok. I installed JWASM. Compiles the file in 3 seconds. MASM needs 35 !
3 seconds?? Must be a fairly big file. How many lines? FASM source, and compiles neatly with JWasm? Sounds like a miracle ;-)
My biggest source, with 16,000+ lines, takes exactly one second to build on an i5.
QuoteThe exefilesize does not change of course. 900 KB. FASM creates a 30 KB file.
As Sinsi rightly wrote, just add a "?":
.data
?mybuffer db 800000 dup(?)
QuoteThis verification routine is absolutely boring.
The anti-troll thing? You need to pass it only once :biggrin:
Here are two very useful links for you: old (http://www.masmforum.com/board/index.php?action=search;advanced) and new (http://masm32.com/board/index.php?action=search;advanced;search=) forum search.
Hi clamicun,
welcome to the forum.
Gunther
Thank you guys a lot !
sinsi - I could have come to that conclusion myself. Now the exefile is compiled in mseconds and the size is 24 KB !!
Your dog is marvelous - I had a couple of dachshounds in my life.
jj2007 - This jWasm compiler is lightyears ahead of ml.exe
Gunter hi - you from Germany like myself ?
-------------------------------------
One more question please...
Can someone please give me an explanation on this construction - I do not get it.
What sort of handle is it ?
mov edi, lparam
mov eax, [edi.NMHDR].hwndFrom ??
Thank you guys.
PS:
The verification routine is getting disgusting - no way to jump it ?
WM_NOTIFY is typically sent from a child (usually a control, like a button or edit box, etc)
http://msdn.microsoft.com/en-us/library/windows/desktop/bb775583%28v=vs.85%29.aspx (http://msdn.microsoft.com/en-us/library/windows/desktop/bb775583%28v=vs.85%29.aspx)
if you notice on that page, the NMHDR structure is a link...
http://msdn.microsoft.com/en-us/library/windows/desktop/bb775514%28v=vs.85%29.aspx (http://msdn.microsoft.com/en-us/library/windows/desktop/bb775514%28v=vs.85%29.aspx)
to understand better how it works, it might be best to look up the specific control type
you haven't told us what type of control it is
Clamicum,
Quote from: clamicun on December 01, 2014, 04:21:18 AM
Gunter hi - you from Germany like myself ?
yes indeed, I stay in Berlin.
Gunther
Quote from: clamicun on November 30, 2014, 10:02:03 AM
I need for some reason a very large buffer to load a file from disk and process it.
While this has nothing to do with your original question, using a memory-mapped file would probably be both easier and faster (depending on your needs) than allocating a static buffer and reading the whole thing in at once. Or, failing that, reading it in as 4K chunks into two separate buffers and swapping between which one you are processing.
-r
Quote from: clamicun on December 01, 2014, 04:21:18 AM
Your dog is marvelous - I had a couple of dachshounds in my life.
My
doberman takes offence (well she would if she was still alive :biggrin:)
> well she would if she was still alive
Is that why she was so good looking ? :biggrin:
sinsi
I apologize to your late doberman.
But very good that you did not cut her ears.
Dachshounds and dobermans (dobermen ??) are in fact quite similar - except for the size.
----------
dedndave
Yes I see - hwndFrom is defined in NMHDR structure
redskull
In deed - Chunking makes it a bit faster - mseconds ?
Nice day to you all
PS.
Noone will answer to this verification routine ?
Quote from: clamicun on December 01, 2014, 11:42:03 PMChunking makes it a bit faster - mseconds ?
Depends on many factors. For example, reading 960kB of \Masm32\include\Windows.inc completely into a heapalloc'ed buffer takes about one millisecond on my 500 Euro notebook. Surely that can be made much faster with memory-mapped files ;-)
In short: check if it's worth to invest your time. But learning never hurts, and memory-mapped files are an interesting exercise anyway.
QuoteNoone will answer to this verification routine ?
Zu Weihnachten bekommst Du freien Zugang geschenkt. Vielleicht macht Hutch sogar den Nikolaus :biggrin:
Quote from: clamicun on December 01, 2014, 11:42:03 PM
In deed - Chunking makes it a bit faster - mseconds ?
As was said, a lot depends on how you are using it, but allocating a huge buffer and filling it is usually the worst possible way. Hard drives read by sectors (512) and windows organizes them by cluster (4096), so if you do a 5MB blocking read, your program is going to grind to a halt while the drive spins around, transfers each sector, and then searches around for the next cluster, and does this over and over until the buffer is full.
If you are accessing the data in a sequential manner (start with byte 1, and process until the end), then all this searching time is wasted. By establishing two buffers of the cluster size, you start processing the data after the first cluster is read in, while windows goes about filling the next one in the background. In a perfect world you never stall for data except for the first read, and you essentially eliminate all your loading time entirely.
In the other usage pattern, you need to access any given byte of the file at any given time, so there is no way around having the whole thing loaded. But by memory-mapping the file, windows doesn't load the appropriate 4K chunk until you try and access it, which at the very least will improve responsiveness, since the user doesn't have to sit through an awful loading screen while you read up every byte off the disk. Or, even better, since the only chunks that get loaded are the ones that get read, if a particular use case only touches one cluster, you never end up loading the other twelve hundred at all. Plus you free up that memory for your program to use in its working set, since windows no longer has to worry about paging out the buffer.
Plus, as an aside, this probably shouldn't be a static allocation anyway, since it seems far fetched that the only kind of file your program will need to process is one that's always going to be exactly 5 million bytes. You are either wasting space if the file is smaller or missing functionality if the file is bigger (or, if you are in control of the external files and are going to ensure they are always said size, you are imposing an unnecessary limitation on yourself).
-r
Quote from: redskull on December 02, 2014, 08:15:42 AMif you do a 5MB blocking read, your program is going to grind to a halt while the drive spins around, transfers each sector, and then searches around for the next cluster, and does this over and over until the buffer is full.
If you are accessing the data in a sequential manner (start with byte 1, and process until the end), then all this searching time is wasted. By establishing two buffers of the cluster size, you start processing the data after the first cluster is read in,
Red,
Your theory is OK, and I really feel we should make it a nice timing project in the lab. Problems I see:
- disk cache
*): after the first test, your 5MB will come in rather quickly (and is an uncached read realistic?)
- the handling of overlaps between the two buffers., i.e. checking for
each byte if you need to switch the buffer needs time
- if your processing is not read-only, you'll need a copy of the two buffers
- re memory-mapped files: fine, could eliminate the need to check for overlaps, but what if you need to write into your buffer but not into your file? My Recall() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1172), for example, does that; same for the line tokeniser recently made by Hutch.
Again: to be tested :biggrin:
*) see http://stackoverflow.com/questions/7405868/how-to-invalidate-the-file-system-cache/8113490#8113490 for a method that
might flush the disk cache maintained by the OS; it won't affect the hard drive's cache, though - and that is the one that smoothens the physical access problems.
i think it would be very difficult to get meaningful results
every machine/OS/setup/etc will yield different results - lol
Quote from: sinsi on November 30, 2014, 04:25:48 PM
With MASM you put it into a special section - .data? (uninitialized data). This section doesn't actually exist in the EXE image.
To think I've missed this, all these years...
So Sinsi.. you're saying that the .data? section is not in the compiled exe file.
Where would it be kept..
Would it be recorded in the PE header then, and space allocated when the exe is loaded ?
:icon_exclaim:
of course it doesn't exist - at least, not byte-by-byte
there is a table entry that gives a size and offset - that's all that's needed (and a name - .bss i think)
it is uninitialized :biggrin:
no need to make the EXE larger with a bunch of 0's
In the section header there are two fields, VirtualSize (memory needed for the section) and SizeOfRawData (in the exe image).
When a section contains only uninitialized data, SizeOfRawData (and PointerToRawData) will be zero i.e. not in the image.
Hi sinsi,
Quote from: sinsi on December 02, 2014, 08:20:54 PM
In the section header there are two fields, VirtualSize (memory needed for the section) and SizeOfRawData (in the exe image).
When a section contains only uninitialized data, SizeOfRawData (and PointerToRawData) will be zero i.e. not in the image.
I think that's true for the EXE image. But if the EXE is loaded into the RAM, the necessary space is allocated.
Gunther
Hi Gunther, that's what VirtualSize is. As an added bonus the memory will be zeroed, no need to initialize variables to 0.
Hi sinsi,
Quote from: sinsi on December 02, 2014, 09:53:16 PM
As an added bonus the memory will be zeroed, no need to initialize variables to 0.
good to know. Is that feature documented by MS?
Gunther
Quote from: Gunther on December 03, 2014, 04:25:04 AMIs that feature documented by MS?
Search old & new forum for
uninitiali documented :biggrin:
Hint: You can safely assume that tons of professional and not-so-professional software would crash into your face if the .bss was
not initialised to zero.
Well this is nice to know...
Was playing around (testing) with a 4MB buffer...
My original EXE was just over 1MB... with the buffer in the .data? section, but just 'under' (?) 4MB in the .data section.
(This could explain bloatware that does so little for it's payload ??)
This can save me farting around with the global heap for a lot of known buffer sizes
Currently I use the heap to the tune of 80MBs worth of buffers and all the management is giving me a headache
:)
Quote from: jj2007 on December 03, 2014, 04:36:45 AM
Hint: You can safely assume that tons of professional and not-so-professional software would crash into your face if the .bss was not initialised to zero.
This has always been my gripe with non-assembler programs (pro, or non-pro) - they never initialised to zero.
As I always say to the family... 'Never assume or think - make sure!!'
:)
Jochen,
Quote from: jj2007 on December 03, 2014, 04:36:45 AM
Hint: You can safely assume that tons of professional and not-so-professional software would crash into your face if the .bss was not initialised to zero.
I know that it's safe for the ELF format. I assume that MS made the specification here (http://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx), but I'm not sure.
Gunther
Quote from: Gunther on December 03, 2014, 07:29:42 AMI assume that MS made the specification here (http://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx), but I'm not sure.
see Chapter 3: Section Table --> section headers --> members VirtualSize and SizeOfRawData
Jochen and I had this discussion some time ago
i don't recall where we found it, but it is documented
Couldn't find it either, Dave, but qWord's link is crystal clear:
QuoteVirtualSize
The total size of the section when loaded into memory. If this value is greater than SizeOfRawData, the section is zero-padded
SizeOfRawData
The size of the section (for object files) or the size of the initialized data on disk (for image files). For executable images, this must be a multiple of FileAlignment from the optional header. If this is less than VirtualSize, the remainder of the section is zero-filled
Section Data
Initialized data for a section consists of simple blocks of bytes. However, for sections that contain all zeros, the section data need not be included.
The data for each section is located at the file offset that was given by the PointerToRawData field in the section header. The size of this data in the file is indicated by the SizeOfRawData field. If SizeOfRawData is less than VirtualSize, the remainder is padded with zeros
.bss = "Blank Space Section"
It is necessarily zero initialised.
C programs have always relied on this fact, so even if it wasn't officially, it would have to be de facto.
bss = block started by symbol
bss = better save space
Jokes apart, there are sources (http://en.wikipedia.org/wiki/.bss#BSS_in_C) that pretend that the language C somehow initialises global data. A quick test with Pelles C shows it's the OS, not C.
- Launch Olly with the exe produced by the snippet below
- get the address (for me, it was 4090C4)
- relaunch and poke a nice string into that memory before the code starts running
- hit F9 and let C do its initialisation stuff
- the poked string is still there :biggrin:
#include <stdio.h>
char* buffer[400000];
int main(int argc, char* argv[]) {
_asm int 3;
_asm mov ecx, buffer; // get the address of the string here
printf("The string we poked: [%s]\n", buffer);
}
BSS = Block Started by Symbol
To be fair, in DOS the C runtime would fill bss with zeros before main().
Hi sinsi,
Quote from: sinsi on December 03, 2014, 07:18:28 PM
BSS = Block Started by Symbol
To be fair, in DOS the C runtime would fill bss with zeros before main().
right. But the root of BSS is the assembly language. It was a pseudo-operation in the United Aircraft Symbolic Assembly Program (UA-SAP), the standard assembler for the IBM 704, developed in the 1950s.
Gunther
For all of the gravity of this discussion, the bottom line is if you need more than a trivial amount of memory, dynamically allocate it, otherwise your application will sit in memory occupying far more than you may need all of the time. With dynamic allocation you can turn it on and off, with BSS memory you are stuck with it until the app terminates. It is really lousy code design to set big blocks of BSS memory when with dynamic allocation, you can routinely allocate into the gigabyte range.
Quote from: hutch-- on December 03, 2014, 08:55:18 PM
It is really lousy code design to set big blocks of BSS memory when with dynamic allocation, you can routinely allocate into the gigabyte range.
Amen to that. :t
Gunther
Quote from: hutch-- on December 03, 2014, 08:55:18 PM
It is really lousy code design to set big blocks of BSS memory when with dynamic allocation, you can routinely allocate into the gigabyte range.
Normally with intermittent memory requirments I'd agree with you....
On my lot I need the larger buffers for 'immediate data' which is referenced continously. Doing it via heap allocation, means reloading and recalculating every time a button is pressed.
Had to weigh up between speed and memory allocation and the easiest/fastest method of doing it.
:)
under DOS, the BSS segment might have been filled with whatever garbage previously filled the area
compiler code may have cleared it for you, but ASM programmers didn't necessarily have that startup code
Van,
Just step back 1 branch of your call tree, allocate and deallocate in the same scope OR use a GLOBAL handle for the memory and you won't have either problem.
jj2007 and dedndave.
With PellesC it's possible to use it without CRT like this#include <stdio.h>
void __stdcall ExitProcess(int);
#pragma comment(lib, "msvcrt.lib")
char buffer[400000];
//int main(int argc, char* argv[])
void __cdecl mainCRTStartup(void)
{
printf("The string we poked: [%s]\n", buffer);
ExitProcess(0);
}
and it generates this codemainCRTStartup
[401000] push ebp
[401001] mov ebp,esp
[401003] push buffer
[401008] push 402000
[40100D] call _printf
[401012] add esp,8
[401015] push 0
[401017] call _ExitProcess@4
[40101C] pop ebp
[40101D] ret
for x64mainCRTStartup
[40001000] sub rsp,28
[40001004] lea rdx,[buffer]
[4000100B] lea rcx,[0000000140003010]
[40001012] call printf
[40001017] mov ecx,0
[4000101C] call ExitProcess
[40001021] add rsp,28
[40001025] ret
Quote from: TWell on December 04, 2014, 09:08:14 PM
With PellesC it's possible to use it without CRT
The point is actually that even WITH the usual compiler overheads, nobody fumbles with global buffers except the OS itself:
- compile my code above
- load it into Olly
- run it once until the breakpoint, just to see where that buffer is located
- reload with Ctrl F2
- put the address into eax, e.g. 4090C4
- right-click on eax and click "Follow in dump"
- double-click the cell under Hex dump in the lower left corner
- in the "Edit data" popup, write
Assembler is better into the upper edit box and click OK
At this point, your code has not yet started. Only the OS has done its job so far, but the CRT and its overhead have not yet done ANYTHING.
Now you run the code with F9, and you arrive at the point where printf gets its argument. If the theory is correct that the C compiler zeroes the .bss segment, then there should be zerobytes in the buffer. But instead (tested with Pelles C and VC Express), there is the text you poked in before the program even started.
My point was to show that OS gives that zeroed memory as good multiuser OS should do.
Quote from: TWell on December 05, 2014, 04:12:11 AM
My point was to show that OS gives that zeroed memory as good multiuser OS should do.
We agree on that point, but proving it is difficult. You might find zerobytes in the global buffer because the OS gave you "by accident" a memory area that was zeroed. Maybe one in a Million times the OS gives you garbage - how can you prove that? So I chose the other way: To prove (successfully) that the C compiler does nothing to the memory, i.e. it relies on the memory being zeroed by the OS.
Quote from: hutch-- on December 04, 2014, 06:48:07 PM
Van,
Just step back 1 branch of your call tree, allocate and deallocate in the same scope OR use a GLOBAL handle for the memory and you won't have either problem.
Yup.. I've looked at that, but it doesn't make much difference as I have to keep the buffers alive for the whole duration of the applicaton. Essentially it's either using the .data? or the heap thing.
I suppose it would be more manageable picking up errors using the heap, than it being out of one's control using .data? section.
I'll try a balance of both - the larger buffers in the heap.
:biggrin:
Quote from: jj2007 on December 05, 2014, 04:42:53 AM
We agree on that point, but proving it is difficult. You might find zerobytes in the global buffer because the OS gave you "by accident" a memory area that was zeroed. Maybe one in a Million times the OS gives you garbage - how can you prove that? So I chose the other way: To prove (successfully) that the C compiler does nothing to the memory, i.e. it relies on the memory being zeroed by the OS.
The fact that C programs rely on the memory being zeroed
is proof that it will be zeroed.
It it wasn't purposely zeroed by the OS, the majority of software would have problems (aside from the numerous pre-existing bugs :P)
It is well known that the majority of software relies on this, and so it is necessarily enforced by the OS.
Making assumptions based on "we don't know for sure this one thing might not be done once every million times" is ridiculous. If you write software that relies on the underlying OS, you have to make certain assumptions of its competence and trust those assumptions, otherwise you're going to be reimplementing the OS in every single program.
Quote from: Tedd on December 05, 2014, 11:53:10 PMThe fact that C programs rely on the memory being zeroed is proof that it will be zeroed.
That was exactly my point, thanks for confirming it :icon14:
If there is one thing I have learnt writing Windows code (since win 3.0) is not to trust an assumption about the future intent of the OS developer. Time after time I have seen these assumptions broken, Win95 assumption were broken with the introduction of UNICODE in later OS versions as the core guts of the OS is different. I am currently seeing it in code that was developed in 32 bit XP that used fully documented API calls break in Win7 64 bit, one of the more common ones is some SendMessage() calls fail due to timing issues and where you replace it with PostMessage() to get the necessary time lag.
Now bring this approach back to the distinction between initialised memory and uninitialised memory, the distinction is there for a reason, one you write values to in the .DATA section and the other you allocate space and initialise them in runtime code to whatever you want. Trusting and hoping that BSS memory will always be zero is a formula for a future OS making your code go BANG. You advantage in using uninitialised memory is disk space saving, not a pseudo initialised to zero .DATA section. If you want zero filled memory, allocate it and if you need to re-use it zero filled, use a simple algo to zero it again.
I have yet to see the point of writing sloppy dangerous code.
Quote from: jj2007 on December 05, 2014, 04:42:53 AM
We agree on that point, but proving it is difficult.
Well, yes and no. Any common-criteria certified operating system has to give out
initialized pages as a basic security measure; if pages were uninitialized, you would end up with the data from the process most recently ended, potentially receiving all manner of fun plain-text passwords and other sensitive data. Windows has a whole class of "demand zero" page faults to specify such an occurrence (as opposed to when pages are read from disk, for example, and can be initialized to their contents), as well as a background thread to do this during downtime.
That being said, nothing says that it has to be initialized to
zero other than convention. The MMU on Windows 10, for example, could give out pages initialized to 1, or 2, or BAAD F00D, or whatever else it wanted if they felt so inclined. So no, depending on it being initialized specifically to zero is never a good idea.
-r
Quote from: redskull on December 06, 2014, 07:28:53 AMThe MMU on Windows 10, for example, could give out pages initialized to 1, or 2, or BAAD F00D
SCARY!! Indeed, that would be much more fun than the year 2000 bug :greensml:
(and Pelles C, Microsoft Visual C and probably a bunch of other compilers will have to be rewritten, because currently they do rely on uninitialised memory being zeroed by the OS...)
If you use VirtualAlloc (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887%28v=vs.85%29.aspx) it's guaranteed zero-filled.
QuoteMemory allocated by this function is automatically initialized to zero, unless MEM_RESET is specified.
Quote
SCARY!! Indeed, that would be much more fun than the year 2000 bug :greensml:
(and Pelles C, Microsoft Visual C and probably a bunch of other compilers will have to be rewritten, because currently they do rely on uninitialised memory being zeroed by the OS...)
Well, I don't think it's scary at all...
I always 'zip' my variables/buffers (.data(?) or heap) before I use them, as a matter of course.
Never had a problem
:)
Hi,
Quote from: redskull on December 06, 2014, 07:28:53 AM
That being said, nothing says that it has to be initialized to zero other than convention. The MMU on Windows 10, for example, could give out pages initialized to 1, or 2, or BAAD F00D, or whatever else it wanted if they felt so inclined. So no, depending on it being initialized specifically to zero is never a good idea.
Yeah, I worked on a mainframe where the uninitialized memory
(in a FORTRAN program) was set to an illegal value for a floating point
number. So, if you did not actually specify something, your program
died. Sets up some programming habits that last a long time.
Regards,
Steve N.