Author Topic: Maximum size for Readfile/Writefile (ex) Functions  (Read 3574 times)

nidud

  • Member
  • *****
  • Posts: 1370
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #30 on: January 06, 2017, 03:34:48 PM »
Well, if you load everything into memory first, like a game console, you don't need no stenking female multitask file IO crap :lol:

nidud

  • Member
  • *****
  • Posts: 1370
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #31 on: January 07, 2017, 01:23:31 AM »
PC-games are also more or less constructed as single-task applications so the drivers they use may give some hints on how this is achieved. Using the Alt-Tab thing while playing usually don't end well. I'm not sure they depend on fast file IO or just load everything into memory but I will assume they're not to concerned about using large buffers.

So it's like getting stuck in traffic-jam with a muscle car seeing all the girls flying by on their mopeds, however, if you have the road to yourself  :biggrin:

Be interesting to see how the simple stream sample above plays out compared to the large buffer approach.

nidud

  • Member
  • *****
  • Posts: 1370
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #32 on: January 08, 2017, 04:21:44 AM »
Quote
using a buffer above 16K is a bit over the top for many reasons so that should be avoided if possible.

I can't see a single good reason why one shouldn't fill a buffer in one go. Of course, 1.6GB as in the Win7 example is pretty fat, and you should not rely on it to work on all systems, but 16k as the maximum? Ridiculous.

The main reason for using a stream is multitasking where both hardware (CPU cache) and software (file cache) is constructed to benefit small buffers.

Can you explain why the CPU cache benefits from streaming instead of reading everything in one big buffer?

Well, it's difficult to do a real "live" test on this issue since the purpose of the whole exercise is more related to other applications and not only the test case itself. The main reason for using a stream is multitasking where both hardware (CPU cache) and software (file cache) is constructed to benefit small buffers.

So in general terms a simple read test as done here will play on the file cache but not so much on the CPU cache. The latter will hit you when you start using the data you read, and trashing the CPU cache will also have a negative effect on other applications (and thereby energy consumption) as well.

Using a standard page-aligned IO-buffer should in theory also give some benefit to the application and not only to the system as a whole.

Code: [Select]
read_stream = 132185377, TickCount(62)
read_buffer = 132185377, TickCount(140)

This result is somehow surprising. I would have thought it to improve the result of the 512 buffer test, or at best be on par on the buffer-read, so the theory that it also give some benefit to the application and not only to the system as a whole seems to be correct.

Yes. Text is not parsed in chunk's but in lines. Each line is fetched a byte by byte pull from the small buffer. When at the end of the little buffer "Jeho" is already moved to the line-buffer (small stack buffer) and the little buffer is re-fulled and the byte by byte pull continues.

Do you have an idea how slow that would be...?

My answer to this question was yes, but compare to what?

Be interesting to see how the simple stream sample above plays out compared to the large buffer approach.

I was hoping for a MasmBasic test here using large buffers and fast string parsing  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 7540
  • Assembler is fun ;-)
    • MasmBasic
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #33 on: January 08, 2017, 04:49:20 AM »
I was hoping for a MasmBasic test here using large buffers and fast string parsing  :biggrin:

You got it in reply #22. I am hoping for an example with small buffers doing the same: count reliably the occurrence of a specific word in a gigabyte textfile. You can start with something simpler, like counting the lines of the file, or the average line length. Whatever suits you.

(hint: line count is easier because you don't need to work with overlapping buffers, as in the example where somelongword sits right on the boundary of the buffer)

nidud

  • Member
  • *****
  • Posts: 1370
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #34 on: January 08, 2017, 05:50:28 AM »
 :biggrin:

Sound good.

The aim was to test the standard IO functions provided by the OS, both Input and Output, so what you suggesting covers the I-bit, but it does include parsing (ie. using the buffer). If you could provide output as well (assuming this also use a large buffer) that would be good.

However, you need to produce binaries (or at least source), use GetTickCount() for the timing, and we also need a common text-file for the source. I suggest the following:
Code: [Select]
for %%q in (\masm32\m32lib\*.asm \masm32\include\*.inc) do type %%q >> s1.txt
for %%q in (1 2 3 4 5 6 7 8 9) do type s1.txt >> source.txt
del s1.txt

This will create a 40M text file and I suggest using "offset" as a search word. That gives around 2500 hits.

nidud

  • Member
  • *****
  • Posts: 1370
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #35 on: January 08, 2017, 06:23:12 AM »
Binaries for both Input and Output
The hit-count for "offset" was reduced to 459, using strstr() - case
However, all bytes being pulled so it's not an issue.

I_stream.asm
Code: [Select]
;
; build: asmc -pe I_stream.asm
;
.x64
.model flat, fastcall
option win64:3
option dllimport:<msvcrt>
exit proto :qword
printf proto :ptr, :vararg
fprintf proto :ptr, :ptr, :vararg
fopen proto :ptr, :ptr
fclose proto :ptr
fgets proto :ptr, :dword, :ptr
perror proto :ptr
strstr proto :ptr, :ptr
option dllimport:<kernel32>
GetTickCount proto
option dllimport:NONE

.code

main proc
local buffer[256]:byte

.if fopen("source.txt", "rt")

mov rsi,rax
xor rdi,rdi
lea r12,buffer
mov rbx,GetTickCount()

.while fgets(r12, 256, rsi)

.if strstr(r12, "offset")

inc rdi
.endif
.endw
fclose(rsi)
sub GetTickCount(),rbx
printf("Hits(%d), TickCount(%d)\n", rdi, rax)
.else
perror("source.txt")
.endif
exit(0)

main endp

END main

I0_stream.asm
Code: [Select]
;
; build: asmc -pe IO_stream.asm
;
.x64
.model flat, fastcall
option win64:3
option dllimport:<msvcrt>
exit proto :qword
printf proto :ptr, :vararg
fprintf proto :ptr, :ptr, :vararg
fopen proto :ptr, :ptr
fclose proto :ptr
fgets proto :ptr, :dword, :ptr
perror proto :ptr
option dllimport:<kernel32>
GetTickCount proto
option dllimport:NONE

.code

main proc
local buffer[256]:byte

.if fopen("source.txt", "rt")

mov rsi,rax
.if fopen("result.txt", "wt+")

mov rdi,rax
mov r13,GetTickCount()
xor rbx,rbx
lea r12,buffer
.while fgets(r12, 256, rsi)

inc rbx
fprintf(rdi, "%d %s", rbx, r12)
.endw
fclose(rdi)
fclose(rsi)
sub GetTickCount(),r13
printf("TickCount(%d)\n", rax)
.else
fclose(rsi)
perror("result.txt")
.endif
.else
perror("source.txt")
.endif
exit(0)

main endp

END main

EDIT: buffer 128 to 256...
« Last Edit: January 08, 2017, 10:13:52 AM by nidud »

jj2007

  • Member
  • *****
  • Posts: 7540
  • Assembler is fun ;-)
    • MasmBasic
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #36 on: January 08, 2017, 09:33:33 AM »
Interesting results - fgets is surprisingly fast. When I try to build your source, I get some errors:

Code: [Select]
C:\Masm32\MasmBasic\Timings>\Masm32\Bin\asmc -pe I_stream.asm
Doszip Macro Assembler Version 2.21
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.

 Assembling: I_stream.asm
I_stream.asm(32) : error A2008: syntax error : )
I_stream.asm(42) : error A2008: syntax error : ),rbx

C:\Masm32\MasmBasic\Timings>\Masm32\Bin\asmc -pe IO_stream.asm
Doszip Macro Assembler Version 2.21
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.

 Assembling: IO_stream.asm
IO_stream.asm(31) : error A2008: syntax error : )
IO_stream.asm(41) : error A2008: syntax error : ),r13

nidud

  • Member
  • *****
  • Posts: 1370
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #37 on: January 08, 2017, 10:11:01 AM »
The byte pulling used in fgets is usually slow so I didn't expect it would be very fast but I guess it depend on the dll version used.

It appear the line buffer in the source is to small so it should be increased to 256 (the hit-count is wrong).

And here's the latest version of asmc

jj2007

  • Member
  • *****
  • Posts: 7540
  • Assembler is fun ;-)
    • MasmBasic
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #38 on: January 08, 2017, 10:57:35 AM »
Your I_Stream result: Hits(495), TickCount(437)

Code: [Select]
Method A: reading lines
876 ms for reading
39 ms for counting 2250 occurrences of 'offset'
36 ms for counting 2250 occurrences of 'offset'
38 ms for counting 2250 occurrences of 'offset'
36 ms for counting 2250 occurrences of 'offset'
38 ms for counting 2250 occurrences of 'offset'

Method B: counting in the whole buffer
27 ms for reading
39 ms for counting 513 occurrences of 'offset'
40 ms for counting 513 occurrences of 'offset'
40 ms for counting 513 occurrences of 'offset'
42 ms for counting 513 occurrences of 'offset'
41 ms for counting 513 occurrences of 'offset'

Method A:
Code: [Select]
For_ ct=0 To ecx-1
.if Instr_(FAST, L$(ct), "offset", 2)
Let o$(edi)=L$(ct)
inc edi
.endif
Next

Method B:
Code: [Select]
mov ecx, Count(esi, "offset")
Very divergent results. In short: My method A is slow for the reading part, i.e. the translation of a file into an array of strings. Everything else is faster.

The different results (2250 instead of 495) are because of mode 2 in Instr_() - there are many occurrences of Offset.

For case-sensitive Instr_(), both your and my line-oriented versions get 495 hits. The simpler Count() method yields 513 hits because some lines have more than one offset.

TWell

  • Member
  • ****
  • Posts: 748
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #39 on: January 08, 2017, 09:21:14 PM »
I_stream
Hits(459), TickCount(1187)

msvcrt strstr in GlobalAlloc buffer
Hits(468), TickCount(203)

nidud

  • Member
  • *****
  • Posts: 1370
    • https://github.com/nidud/asmc
Re: Maximum size for Readfile/Writefile (ex) Functions
« Reply #40 on: January 09, 2017, 03:22:20 AM »
Testing pure reading using chunks/large buffers
Code: [Select]
  15 ReadFile(_MAXIOBUF) - stream
  31 ReadFile(GetFileSizeEx())
  47 FileRead$()

Parsing of lines is as expected rather slow
Code: [Select]
296 fgets(asmc_lib)
312 fgets(msvcrt)

Reusing the buffers should in theory benefit small buffers but the penalty is not as bad as suspected. However, the small buffer is repeating this scan for each read-chunk so it does indeed get some help from the system.
Code: [Select]
include \masm32\MasmBasic\MasmBasic.inc
  Init
  mov ebx,GetTickCount()
  Let esi=FileRead$("source.txt") ; zero terminated?
  mov edx,Len(esi)   ; ?
REPEAT 3
mov edi,esi
xor eax,eax
mov ecx,edx
repnz scasb
ENDM
  sub GetTickCount(),ebx
  printf("%4d FileRead$()\n", eax)
EndOfCode

Code: [Select]
mov r12,rax
mov lpNumberOfBytes,0
mov rsi,_aligned_malloc(_MAXIOBUF, _PAGESIZE_)
mov rbx,GetTickCount()
.while ReadFile(r12, rsi, _MAXIOBUF, addr lpNumberOfBytes, 0)
.break .if lpNumberOfBytes < _MAXIOBUF
REPEAT 3
mov rdi,rsi
xor eax,eax
mov ecx,_MAXIOBUF
repnz scasb
ENDM
.endw
sub GetTickCount(),rbx
printf("%4d ReadFile(_MAXIOBUF) - stream\n", rax)

Code: [Select]
109 ReadFile(_MAXIOBUF) - stream
125 ReadFile(GetFileSizeEx())
140 FileRead$()

As for fast algos this may also be applied and used in a stream, and in most cases using files as input you have to use one. I have spent some time on this subject writing search algos that spans over multiple drives. This forces you to stream the input given there will always be files larger than available memory.

The method used is a fast algo in cases where the file is less or equal to the buffer size. If more input is needed for compare or compression, what's left in the end of the buffer is copied to the front – read(buffer+rest, size-rest).

Case sensitive options also forces a byte-pull so you end up with similar solutions but with a binary search directly on the input without the line fetch. In pure text mode the fgets() approach will be more practical and thus still used in assemblers and similar applications.