ML.exe super-slow with large BYTE values?

sinsi · September 29, 2013, 04:05:16 PM

If you are looking at that much data why not 64-bit? Even file mapping is limited to a dword map length in 32-bit windows, so you still have the problem.

Gunther · September 29, 2013, 08:11:48 PM

Hi Dave,

sinsi is right. 64 bit is the way to go for your problem.

Gunther

dedndave · September 29, 2013, 09:42:42 PM

Quote from: sinsi on September 29, 2013, 04:05:16 PM
Even file mapping is limited to a dword map length in 32-bit windows, so you still have the problem.

a dword supports more address space than you can actually use, of course :P

but - file mapping, in this case, would be similar to the paging system you are using
with the major exception that you can control which portions of the file are visible at any given time
you can create numerous views to a file at once :t

i think you mentioned you'd like 3 views at any given time - that should work nicely
and - you don't have to hog a lot of memory - make them a comfortable size

KeepingRealBusy · September 30, 2013, 05:29:15 AM

To all,

Thank you very much for the suggestions.

I would like to avoid the 64 bit conversion (a whole new world to contend with and learn, at least I know to start with JWASM).

The following is a copy of my internal documentation describing my working process (the initial files I worked with) with actual times, and some guesses about different processing options and there expected times, and finally a description of a potential memory mapped process with several questions. Anyone care to make some guesses about the answers to the questions? Also, feel free to discuss the different options I have made guesses about.

Code Select


;
;   A problem developed while processing the High Data files because of the file
;   size. The Low Data files were all about the same size (3,758,103,552 BYTES)
;   and when split into 512 parts, they were 7,340,046 BYTES each. When further
;   split into 4 parts, each part was 1,835,011 BYTES. The input buffer was
;   16,429,056 BYTES, and the 512 buffers were each 8,208,384 BYTES, and 1/4 of
;   the input buffer was 4,107,264 BYTES. Each 512 split size would fit in a
;   split buffer, and then when split 4 ways, each part would fit in the 1/4
;   sized input buffer, so compacting the 4 parts to a single buffer was a
;   simple move operation, and everything fit in the buffers.
;
;   Not so lucky with the High Data files. The total size was the same (the same
;   number of records), but the file sizes varied from 5,080,753,211 BYTES to
;   2,608,592,091 BYTES. There was no way to split 5 GB into memory contained
;   buffers since only 3.5 GB of memory could be allocated. There were several
;   things that could be done to process such large files:
;
;       As a baseline from processing the Low Data files: read the input, split
;       to 512 buffers, read all buffers and split each 4 ways, move the pieces
;       to an output buffer, write the combined pieces to the output file.
;       Overall, this is 1 read, move all records, move all records, move all
;       records (compact 4 pieces), and finally 1 write. This is 2 I/O's and 3
;       moves.
;
;       Accumulated times:
;
;       Time:     6:38:05.558 I/O time.     (2 I/O's)
;       Time:     1:19:09.954 Process time. (3 moves)
;
;       Time:     3:19:02.799 I/O time.     (1 I/O)
;       Time:     0:26:10.333 Process time. (1 move)
;
;       The new method for processing High Data files: read the input, write out
;       the 512 buffers as they fill, read back the data to split it 4 ways,
;       compact the 4 pieces, then write out the 4 pieces with the preface index
;       DWORDS. Overall, this is 1 read, move all records, 1 write, 1 read,
;       move all records, move all records (compact 4 pieces), and finally 1
;       write. This is 4 I/O's and 3 moves. I don't believe that Windows could
;       cache the intermediate writes and then the later reads (even with
;       unbuffered I/O it seems to do this whether you want it or not),
;       especially since the total data size is 5 GB, processed 4 times.
;
;       Guesstimate times:
;
;       Time:    13:16:11.196 I/O time.     (4 I/O's)
;       Time:     1:19:09.954 Process time. (3 moves)
;
;       Another method for processing High Data files: read the input block,
;       split the block in 4 steps (128 files at a time, 4 buffers per file),
;       for each of the 128 buffer quads, compact the 4 pieces, write out the 4
;       pieces with the preface index DWORDS, then repeat 4 times. Overall,
;       this is 4 reads, move all records (compact 4 pieces), and finally 1
;       write. This is 5 I/O's and 1 move. Again, I don't believe that caching
;       would be able to reduce this because of the total data size.
;
;       Guesstimate times:
;
;       Time:    16:35:49.995 I/O time.     (5 I/O's)
;       Time:     0:26:10.333 Process time. (1 move)
;
;       Now for an attempt to do this with memory mapped files. If I worked with
;       records only (no big buffers in my process), then I would have to read
;       all of the 7 BYTE records, one by one, and then move the 6 BYTE records,
;       one by one, into 4*131072 memory mapped files. First question, can
;       Windows handle 524,289 memory mapped files (2,147,487,744 BYTES for 4096
;       BYTE buffers) all at once? How efficiently can it move 7 BYTE records or
;       6 Byte records? Once the Data is read and/or written, the memory mapped
;       I/O seems to be done as unbuffered I/O (in blocks of 4096 BYTES) so that
;       should be the same as my unbuffered I/O. If instead, I created my
;       internal input buffer as 4096*7, and my split buffers as 524,288*4096*3,
;       I would need 6,442,450,944 BYTES, but I can only get 3.5 GB of memory,
;       so that option is out. I could get 1/4 of that but would have to do two
;       splits instead of just one split (more data movement, more I/O). How
;       would that affect memory mapped file processing in Windows, where
;       Windows has only 6 GB of real memory (the 131,073 4096 BYTE Windows
;       buffers would only take 536,875,008 BYTES, my 131,073 buffers would be
;       1,610,612,736 BYTES)?
;

Dave.

KeepingRealBusy · September 30, 2013, 05:37:18 AM

To all,

OBTW, there are 256 Low input files being split into 512 output files each, each output file internally split into 4 blocks with a preface index (already done), then repeat for the High input files (needs to be done). There are 0x80000000*64 low 7 BYTE input records, and 0x80000000*64 low 6 BYTE output records, and the same number of high records to be done.

Dave.

The MASM Forum

News:

ML.exe super-slow with large BYTE values?

sinsi

Gunther

dedndave

KeepingRealBusy

KeepingRealBusy