Author Topic: Maximum size for Readfile/Writefile (ex) Functions  (Read 3664 times)

TWell

  • Member
  • ****
  • Posts: 748
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #15 on: January 04, 2017, 11:21:43 PM »
Win32API ReadFile()/WriteFile() and RTL stream functions are different stories, latter about file streams.


nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #16 on: January 05, 2017, 12:02:59 AM »
Win32API ReadFile()/WriteFile() and RTL stream functions are different stories, latter about file streams.

True, but most software produced with regards to IO-handling uses these libraries. In addition to this most sizeable files, sound, video, compression, text based files like code, scripts and html and so on are constructed for streaming. This means that few (if any) needs large IO buffers for reading.

So if you have to it's possible but that doesn't mean it's a recommendable approach.

K_F

  • Member
  • *****
  • Posts: 1288
  • Anybody out there?
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #17 on: January 05, 2017, 02:57:30 AM »
My only recommendation is ensure you set the linker option /LARGEADDRESSAWARE so you get the most Win32 memory, matters less on Win7 64 but on Win 10 64 with more chyte loaded into Win32 memory by the OS, you may not have enough memory by default to handle large allocations.

Thanks Hutch, ..gentlemen  :biggrin:
I can break it up into 4 data segments, but would prefer the larrrrge buffer method = less hassle.
I'll try out what you say there
 :t
'Sire, Sire!... the peasants are Revolting !!!'
'Yes, they are.. aren't they....'

adeyblue

  • Member
  • **
  • Posts: 89
    • Airesoft
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #18 on: January 05, 2017, 05:34:22 AM »
This works just fine (x64 only). I guessed it'd be faster with the NO_BUFFERING but it still takes forever on my laptop though (1030850 ms!).
Code: [Select]
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <mmsystem.h>
#include <stdio.h>

int main()
{
    // the first > 4GB file I found, change it obvs
    PCWSTR pName = L"C:\\Users\\Adrian\\VirtualBox VMs\\7x64\\Snapshots\\{1d8218b0-3169-406d-9d03-e9b82c60af74}.vdi";
    HANDLE hFile = CreateFileW(
        pName,
        GENERIC_READ,
        0,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_NO_BUFFERING,
        NULL
    );
    if(hFile == INVALID_HANDLE_VALUE)
    {
        return printf("CreateFile error = %lu\n", GetLastError());
    }
    DWORD size = MAXDWORD - 4095;
    PVOID pData = VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE);
    if(!pData)
    {
        DWORD err = GetLastError();
        CloseHandle(hFile);
        return printf("VirtulAlloc error = %lu\n", err);
    }
    DWORD bytesRead = 0;
    DWORD preTime = timeGetTime();
    BOOL ret = ReadFile(hFile, pData, size, &bytesRead, NULL);
    DWORD postTime = timeGetTime();
    DWORD err = GetLastError();
    VirtualFree(pData, 0, MEM_RELEASE);
    CloseHandle(hFile);
    printf(
        "ReadFile returned %d, asked for %#x, read = %#x, lastError = %lu, took %lu ms\n",
        ret,
        size,
        bytesRead,
        err,
        (postTime - preTime)
    );
}

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #19 on: January 05, 2017, 07:26:37 AM »
This works just fine (x64 only). I guessed it'd be faster with the NO_BUFFERING but it still takes forever on my laptop though (1030850 ms!).

could you try this one:
Code: [Select]
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <mmsystem.h>
#include <stdio.h>

#define BUFSIZE 512

int main()
{
    // the first > 4GB file I found, change it obvs
    PCWSTR pName = L"C:\\Users\\Adrian\\VirtualBox VMs\\7x64\\Snapshots\\{1d8218b0-3169-406d-9d03-e9b82c60af74}.vdi";
    HANDLE hFile = CreateFileW(
        pName,
        GENERIC_READ,
        0,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_NO_BUFFERING,
        NULL
    );
    if(hFile == INVALID_HANDLE_VALUE)
    {
        return printf("CreateFile error = %lu\n", GetLastError());
    }
    DWORD size = 0;
    DWORD bytesRead = 0;
    char  buffer[BUFSIZE];
    DWORD preTime = timeGetTime();
    while (ReadFile(hFile, buffer, BUFSIZE, &bytesRead, NULL) != FALSE) {
size += bytesRead;
if (bytesRead != BUFSIZE)
    break;
    }
    DWORD postTime = timeGetTime();
    DWORD err = GetLastError();
    CloseHandle(hFile);
    printf("read = %#x, lastError = %lu, took %lu ms\n",
size, err, (postTime - preTime) );
}

TWell

  • Member
  • ****
  • Posts: 748
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #20 on: January 05, 2017, 07:43:37 AM »
Better to test it with smaller file first.

500Mb 5884ms/175959ms ;)

EDIT:
Another test set:
Code: [Select]
read once
1. read = 0x27508800, lastError = 0, took 5756 ms FILE_FLAG_NO_BUFFERING
2. read = 0x27508800, lastError = 0, took 6480 ms
block read size 512
3. read = 0x27508800, lastError = 0, took 11610 ms
4. read = 0x27508800, lastError = 0, took 162148 ms FILE_FLAG_NO_BUFFERING
changes to original code:
Code: [Select]
    DWORD size2 = 0;
    DWORD bytesRead = 0;
    char  *p;
    p = pData;
    DWORD preTime = timeGetTime();
    while (ReadFile(hFile, p, BUFSIZE, &bytesRead, NULL) != FALSE) {
        size2 += bytesRead;
        p += bytesRead;
        if (bytesRead != BUFSIZE)
            break;
    }
    DWORD postTime = timeGetTime();
« Last Edit: January 05, 2017, 08:29:32 PM by TWell »

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #21 on: January 05, 2017, 08:21:29 AM »
witchcraft!

jj2007

  • Member
  • *****
  • Posts: 7559
  • Assembler is fun ;-)
    • MasmBasic
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #22 on: January 05, 2017, 11:25:39 AM »
Test with a 1.3 GB file:

AdeyBlue:
read = 0x4f9c3d54, lastError = 0, took 16629 ms

Nidud:
read = 0x4f9c3d54, lastError = 0, took 438779 ms  // with FILE_FLAG_NO_BUFFERING
read = 0x4f9c3d54, lastError = 0, took 596296 ms  // with FILE_FLAG_NO_BUFFERING
read = 0x4f9c3d54, lastError = 0, took 15600 ms    // with flag 0

MasmBasic:
Reading took 16431 ms for 0x4F9C3D54 bytes

In AdeyBlue's version, I had to replace DWORD size = MAXDWORD - 4095; with DWORD size = GetFileSize(hFile, 0)+4096;, and use 0 instead of FILE_FLAG_NO_BUFFERING to make it work.

Nidud's version with flag 0 is the fastest. However, this does not take account that in the case of AdeyBlue's and the MasmBasic version, a 1.3 GB buffer is available for analysis, while in Nidud's version the 2.5 Million little buffer contents are gone.

For example, my file is a multiple concatenation of Bible.txt, and the 'analysis' might consist in counting certain keywords:
Reading took 16510 ms for 0x4F9C3D54 bytes, 990 occurrences of 'Jehovah' were found

This is the source used:

include \masm32\MasmBasic\MasmBasic.inc      ; download
  Init
  NanoTimer()
  Let esi=FileRead$("\Masm32\MasmBasic\AscUser\BibleBig.txt")
  Inkey Str$("Reading took %i ms", NanoTimer(ms)), " for 0x", Hex$(Len(esi)), " bytes", Str$(", %i occurrences of 'Jehovah' were found", Count(esi, "Jehovah"))
EndOfCode


Doing the same with streaming, i.e. with 2.5 Million little buffers, might be slightly more complex, given that 'Jehovah' could occur everywhere, for example as JehoBAADF00D at the end of a little buffer. But the software industry has certainly solutions for such problems.

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #23 on: January 06, 2017, 03:14:43 AM »
Well, it's difficult to do a real "live" test on this issue since the purpose of the whole exercise is more related to other applications and not only the test case itself. The main reason for using a stream is multitasking where both hardware (CPU cache) and software (file cache) is constructed to benefit small buffers.

So in general terms a simple read test as done here will play on the file cache but not so much on the CPU cache. The latter will hit you when you start using the data you read, and trashing the CPU cache will also have a negative effect on other applications (and thereby energy consumption) as well.

Playing the system will be using a standard page-aligned IO-buffer:
Code: [Select]
_aligned_malloc(_MAXIOBUF, _PAGESIZE_)

This should in theory also give some benefit to the application and not only to the system as a whole.
Code: [Select]
;
; build: asmc -pe ReadTest.asm
;
.x64
.model flat, fastcall
option win64:3

_PAGESIZE_ equ 0x1000
_MAXIOBUF equ 0x4000
GENERIC_READ equ 0x80000000
OPEN_EXISTING equ 3

option dllimport:<msvcrt>
exit proto :qword
printf proto :ptr byte, :vararg
malloc proto :qword
_aligned_malloc proto :qword, :dword
free proto :qword
__getmainargs proto :ptr, :ptr, :ptr, :ptr, :ptr
option dllimport:<kernel32>
CreateFileA proto :qword, :qword, :qword, :qword, :qword, :qword, :qword
ReadFile proto :qword, :qword, :qword, :qword, :qword
CloseHandle proto :qword
GetFileSizeEx proto :qword, :qword
GetTickCount proto
GetLastError proto
option dllimport:NONE

.code

read_stream proc uses rsi rdi rbx FileName:ptr sbyte

local lpNumberOfBytes:qword

printf("read_stream " )

.ifd CreateFileA(
FileName,
GENERIC_READ,
0,
0,
OPEN_EXISTING,
0,
0 ) == -1

printf("CreateFile error = %lu\n", GetLastError())
xor rax,rax

.else
mov rdi,rax
xor rbx,rbx
mov lpNumberOfBytes,rbx
mov rsi,_aligned_malloc(_MAXIOBUF, _PAGESIZE_)
.while ReadFile(rdi, rsi, _MAXIOBUF, addr lpNumberOfBytes, 0)

add rbx,lpNumberOfBytes
.break .if lpNumberOfBytes < _MAXIOBUF
.endw
CloseHandle(rdi)
mov rax,rbx
.endif

ret
read_stream endp

read_buffer proc FileName:ptr sbyte

local lpNumberOfBytes:qword

printf("read_buffer " )

.ifd CreateFileA(
FileName,
GENERIC_READ,
0,
0,
OPEN_EXISTING,
0,
0 ) == -1

printf("CreateFile error = %lu\n", GetLastError())
xor rax,rax

.else
mov rdi,rax
GetFileSizeEx(rdi, addr lpNumberOfBytes)
mov rbx,lpNumberOfBytes
mov lpNumberOfBytes,0
mov rsi,malloc(rbx)
.if rax
ReadFile(rdi, rsi, rbx, addr lpNumberOfBytes, 0)
mov rbx,lpNumberOfBytes
.else
printf("No Memory\n")
xor rbx,rbx
.endif
free(rsi)
CloseHandle(rdi)
mov rax,rbx
.endif
ret

read_buffer endp


main proc argc:dword, argv:qword

mov rsi,argv
mov edi,argc
.if edi == 3

lodsq
lodsq
mov rdi,rax
mov r12,GetTickCount()
lodsq
.if byte ptr [rax+1] == 's'
mov rsi,read_stream(rdi)
.else
mov rsi,read_buffer(rdi)
.endif
sub GetTickCount(),r12
printf("= %d, TickCount(%d)\n", rsi, rax )

.else
printf("USAGE ReadTest <file_name> </[s|b]>\n" )

.endif

xor eax,eax
ret

main endp

mainCRTStartup PROC
local argc:dword,
argv:qword,
p128[64]

lea rax,p128
__getmainargs( addr argc, addr argv, rax, rax, rax )

exit( main( argc, argv ) )

mainCRTStartup ENDP

END mainCRTStartup

The result from 130M and 430M:
Code: [Select]
read_stream = 132185377, TickCount(62)
read_buffer = 132185377, TickCount(140)

read_stream = 438777856, TickCount(202)
read_buffer = 438777856, TickCount(374)

jj2007

  • Member
  • *****
  • Posts: 7559
  • Assembler is fun ;-)
    • MasmBasic
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #24 on: January 06, 2017, 04:49:18 AM »
The main reason for using a stream is multitasking where both hardware (CPU cache) and software (file cache) is constructed to benefit small buffers.

Can you explain why the CPU cache benefits from streaming instead of reading everything in one big buffer?

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #25 on: January 06, 2017, 06:18:33 AM »
I do that from time to time but you seem to have a very selfish and narrow veiw on the whole consept.

http://masm32.com/board/index.php?topic=4111.msg43709#msg43709

If the buffer you use is larger than the CPU cache you basically delete everything in there. I will assume from your question that the thinking here is that flushing the CPU cache in the beginning wont matter because you fill it up with stuff that benefit this particular algo later. The problem is that other active parts of the system also use it to store cache-lines, hence the narrow thinking. So this do not only create problems for other applications but also the operating system itself, which your algo also in the end depends on.

It may see tempting to use this approach but when it comes to implementing it to a real application you will end up with an array of technical problems in addition to the ones already mention. Advising people to grab and use as much memory as possible as a strategy is in other words not a very good advise.

Quote
Doing the same with streaming, i.e. with 2.5 Million little buffers, might be slightly more complex,

There is only one small buffer which occupy a very small part of the CPU cache.
 
Quote
given that 'Jehovah' could occur everywhere,

 :biggrin:

Quote
for example as JehoBAADF00D at the end of a little buffer. But the software industry has certainly solutions for such problems.

Yes. Text is not parsed in chunk's but in lines. Each line is fetched a byte by byte pull from the small buffer. When at the end of the little buffer "Jeho" is already moved to the line-buffer (small stack buffer) and the little buffer is re-fulled and the byte by byte pull continues.

jj2007

  • Member
  • *****
  • Posts: 7559
  • Assembler is fun ;-)
    • MasmBasic
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #26 on: January 06, 2017, 10:15:01 AM »
Yes. Text is not parsed in chunk's but in lines. Each line is fetched a byte by byte pull from the small buffer. When at the end of the little buffer "Jeho" is already moved to the line-buffer (small stack buffer) and the little buffer is re-fulled and the byte by byte pull continues.

Do you have an idea how slow that would be...?

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #27 on: January 06, 2017, 10:59:32 AM »
Yes.

nidud

  • Member
  • *****
  • Posts: 1371
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #28 on: January 06, 2017, 12:15:13 PM »
simple test case:
Code: [Select]
;
; build: asmc -pe stream.asm
;
.x64
.model flat, fastcall
option win64:3
option dllimport:<msvcrt>
exit proto :qword
printf proto :ptr, :vararg
fprintf proto :ptr, :ptr, :vararg
fopen proto :ptr, :ptr
fclose proto :ptr
fgets proto :ptr, :dword, :ptr
perror proto :ptr

__getmainargs proto :ptr, :ptr, :ptr, :ptr, :ptr
option dllimport:<kernel32>
GetTickCount proto
option dllimport:NONE

.code

stream proc uses rsi rdi rbx FileName:ptr sbyte
local buffer[256]:byte

.if fopen(FileName, "rt")

mov rsi,rax
.if fopen("result.txt", "wt+")

mov rdi,rax
xor rbx,rbx
lea r12,buffer
.while fgets(r12, 256, rsi)

inc rbx
fprintf(rdi, "%d %s", rbx, r12)
.endw
fclose(rdi)
.else
perror("result.txt")
.endif
fclose(rsi)
.else
perror(FileName)
.endif
ret

stream endp

main proc argc:dword, argv:qword

mov rsi,argv
mov edi,argc
.if edi == 2
lodsq
lodsq
mov rdi,rax
mov rsi,GetTickCount()
stream( rdi )
sub GetTickCount(),rsi

printf("TickCount(%d)\n", rax)
.else

printf("USAGE stream <file_name>\n")
.endif
xor eax,eax
ret

main endp

mainCRTStartup PROC
local argc:dword,
argv:qword,
p128[64]

lea rax,p128
__getmainargs( addr argc, addr argv, rax, rax, rax )
exit( main( argc, argv ) )

mainCRTStartup ENDP

END mainCRTStartup

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4815
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #29 on: January 06, 2017, 01:29:02 PM »
If I have it right, the reason for a page sized buffer is to work at a driver level in ring0 where you are really getting down to the bare bones of hardware without being at the mercy of task switching by the OS. Where ring3 application level code may be slower using a small buffer due to OS privilege access restrictions, functions that are passed directly to low level drivers in ring0 are not.

Now what would be interesting would be a test where you set the access priority in ring3 to either above normal or time critical to see if the file IO is faster when using large buffers.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin: