I have recently converted ENT to handle huge files. With such files, it became too slow, so I re-implemented it in MASM.
I created several test files to check out the handling of the various ENT options for folding UPPER case to lower case (but as ISO), and used some early version of ENT as a test file (117 MB, but would not be changing with any new additions for debugging ENT so the base test case always remained constant over different executable versions). I created a large batch check function to test ENT vs MASMENT for all of the test files with all of the applicable options, then ran my DIFF against the ENT/MASM files for each test case to see if the results were the same. I found and fixed various trivial errors, primarily the spacing on the lines just so the files compared with no errors other than the start times and end times.
I was looking at the results for executing against the ENT.EXE as a data file. There were differences in the Monte Carlo PI simulation value compared with PI. I printed out several 100 lines of the intermediate values used to create the MCPI for both ENT and MASMENT executions. The problem was that ENT was coming up with more points on the circle than my MASM version (the same number of input bytes were reported in both programs). I toyed around with this for a bit, and finally decided that the problem may have been rounding errors (the problem did not occur with the very short test files, but was present in the 117 MB ent.exe test file). I even went so far as to change the ENT code to use my MASM code for the calculations (enclosing the code in an __asm { }block in the C program, commenting out the C code. The problem still persisted.
I was not sure about C I/O, especially using a C executable as a data file, especially since the supplied source files came up with warnings about "unknown publisher". I did not know whether any of my problems were due to C reading the file differently (it was opened as binary read and also forced to binary in case this was a console input execution). I changed the test file to use my MASMENT executable (33 MB). This resulted in matching output, but the file was much smaller than the 117 MB ENT executable test file. So I copied the MASMENT executable 4 times to create a 131 MB file. This larger file also had the count differences. Note, the input data BYTE counts matched, the code to create the test was a copy of my MASM code, but it still came up mismatched.
I guess I will have to create a file in the Monte Carlo function, write out the bytes used for each coordinate pair (just 6 BYTE blocks from the file, 3 BYTES for each x or y coordinate), and write out any spare bytes (less than 6) at the end of the input processing, then close the file. Then I can then use FC to compare the data against the test file to see where the files start differing. I don't know what else I can do - the intermediate data is 4 lines for each coordinate pair for 131 KB/6 pairs - too much to look through manually.
Any thoughts?
Dave.