So just a few general questions:
1. Lets say I have some value I want to store temporarily in a register - which register is "safe"? For instance I make to function calls, both return into eax so I need to store the value of the first return somewhere else. Perhaps this is in a loop so say ecx is also in use, etc.
2. Now lets say that this value is an array of bytes, a string. Can I just "mov" the 32-bit string into a register?
3. Do all PROCs need a prototype? For instance this:
abc PROC
push 0
ret
abc ENDP
END abc
4. Does x86 have a "jump and link"? For instance, MIPS (RISC) allows you to jump to a label, then you can jump back to where you left off - or should this be done exclusively using procedures?
Thanks again for all the help!
Quote from: Alek on December 03, 2018, 11:53:12 AM
1. Lets say I have some value I want to store temporarily in a register - which register is "safe"? For instance I make to function calls, both return into eax so I need to store the value of the first return somewhere else. Perhaps this is in a loop so say ecx is also in use, etc.
This is a too much a general question. So you should put this question into a context. i.e. if is a windows api function, etc.
Quote from: Alek on December 03, 2018, 11:53:12 AM
2. Now lets say that this value is an array of bytes, a string. Can I just "mov" the 32-bit string into a register?
If the string is really a 32 bit length string of course!
Quote from: Alek on December 03, 2018, 11:53:12 AM
3. Do all PROCs need a prototype? For instance this:
abc PROC
push 0
ret
abc ENDP
END abc
No.
Quote from: Alek on December 03, 2018, 11:53:12 AM
4. Does x86 have a "jump and link"? For instance, MIPS (RISC) allows you to jump to a label, then you can jump back to where you left off - or should this be done exclusively using procedures?
No need of procedures, but there is no "jump and link" instruction, so you can do 2 jumps. :idea:
So in context for question 1, lets say I have 10 different operations and I want to store them each separately. Their values can require all 32 bits. What do we do?
For question 3, I have "call abc" in my "main PROC", however "The current stack frame was not found in a loaded module. Source cannot be shown for this location." I'm using AsmDude which tells me (without running the application) that "abc" is an undefined label. So how would I call this procedure?
Alek,
Register preservation rules are called the Intel ABI (Application Binary Interface) which are split between volatile registers and non volatile registers.
Volatile registers are EAX ECX EDX
Non volatile registers are EBX ESI EDI
Stackframe and basepointer reguisters EBP ESP
The volatile registers can be modified safely but you need to know that any procedure call you make can also modify them.
Non volatile registers must be preserved if they are used in a procedure.
The EBP register can be used if it is first preserved AND you are writing a procedure that has no stack frame.
The ESP register is the stack pointer and you need to know exactly what you are doing before you try and modify it.
Return values for the normal integer registers are always in EAX.
Return values for FP and SSE use different registers.
With normal procedures which use a stack frame, you have 6 registers and EBP and ESP are used to set up the stack frame.
If you need to have more than 3 registers you must preserve the 3 non volatile registers.
myproc proc args etc ....
push ebx
push esi
push edi
; all of your procedure code
pop edi
pop esi
pop ebx
ret <arg count x 4>
myproc endp
Question 2
No, you must determine the ADDRESS of the string and load the ADDRESS into a register or 32 bit variable.
Question 3
NO you don't have to use a prototype but you cannot use "invoke" without one. You can use the normal PUSH/CALL notation to call the procedure manually.
Thanks for the detailed response. One more final question for tonight (if you're up for it):
So you said that I must determine the address of the string and load it into a register. Here's an example of what I specifically want to do
I have this in my code (.data section):
message db "Hello World",13,10
However I want to reuse it because Ill use this in a console procedure which always uses this address for the message parameter.
1. Is it safe to overwrite the data of "message" and continually use its address?
2. Is this approach a "good" or "bad" practice?
Thanks again to both hutch-- and Felipe.
There are a couple of ways.
.data
TextMsg db "Howdy Folks",13,10,0 ; must have the terminating "0"
pTxt dd TextMsg
.code
etc ....
If its called from an "invoke" operator you can use the "ADDR TextMsg" notation.
If you need to load the address into a register you can use "mov eax, OFFSET TextMsg".
You can also use "lea eax, TextMsg".
There is nothing wrong with reusing an address but its done differently so you are safe with the loaded length of any new text. If you need GLOBAL scope you allocate a buffer in the uninitialised data section.
.data?
buffer db 260 dup (?)
.code
etc ....
You can also use LOCAL memory if the buffer is only needed in a procedure.
LOCAL TxtBuff[260]:BYTE
LOCAL ptxt :DWORD
lea eax, TxtBuff
mov ptxt, eax
Fantastic, Ill try using everything you mentioned.
Alright, here's a better idea of what I'm trying to do.
.data
lpBuffer2 db 256 dup (?)
.code
;mov dword ptr lpBuffer, "Hello World!"
mov dword ptr lpBuffer2+8, "dlro"
mov dword ptr lpBuffer2 + 4, "W ol"
mov dword ptr lpBuffer2, "leH"
mov nNumberOfCharsToWrite, 12
call print_message
Assume the print_message works, which it does. Is there any way to move over the characters in one chunk at a time, like I have in the commented section?
It looks like you are trying to load string memory from registers with the reverse order characters. It can be done but its by no means the only way to load string data. String data is normally written left to right and you would normally stream it to the start address 1 characters at a time. You see a bit of stuff done like this with the CPUID instruction when you want processor ID data but that is hardware based, not code.
Try and tell us what you are doing and there may be a better way to do it.
Well I want to try and call strings like I normally would in a higher level language.
For instance right now the way most people write strings in assembly (c++ pseudo-code):
std::string a = "hello world";
std::cout << a << std::endl;
The way I want to write my string output in assembly code:
std::cout << "hello world" << std:endl;
As you can see, the field "a" would be defined in the data section. In the second example, its defined locally. Why the two versions? It gives more flexibility; perhaps there's some string I want to create that cannot be defined ahead of time.
Example (again):
1. Program asks for username
2. Program accepts username and combines it with string
3. Program can now output "Alek, hello world".
Alternatively you would output each part separately in different calls - but from what I know making those high level Windows API calls are slow and it would be faster to combine the string ahead of time. Of course this is only one reason, not counting "clarity" as being a preference (to me it would make more sense coming from a higher level like C/C++).
Let me see, it sounds like you want to store text in an assembler module and call it from what looks like C++. With either ASCII or UNICODE the fastest way is to place the string data in the initialised data section and set pointers to each string.
.data
txt1 db "This is a test",0 ; the text
ptxt1 dq txt1 ; the pointer to it
If you have a large number of text items you could use a QWORD array in allocated memory (Heap or Global Alloc). With either you would just pass the address. ASCII data is byte aligned, unicode is at least 2 byte aligned.
I had a look at your page and it would be simplified with the install64.zip file. The problem is I must keep adding to the project so it will keep changing.
RE: The project name, It would require a complete rebuild to be able to use MASM64 which I don't have the life and time for so it would make more sense to call it "MASM32 64 bit version".
Hi Alek,
You can use the Masm32 library functions to receive keyboard input and write it to the console :
include \masm32\include\masm32rt.inc
BUFFER_SIZE = 128
.data
string1 db 'Please type your name : ',13,10,0
string2 db ', hello world.',0
.data?
buffer db BUFFER_SIZE dup(?)
.code
start:
invoke StdOut,ADDR string1
invoke StdIn,ADDR buffer,BUFFER_SIZE
invoke szCatStr,ADDR buffer,ADDR string2
invoke StdOut,ADDR buffer
invoke ExitProcess,0
END start
Another option is the use the functions from the MS C run-time library :
include \masm32\include\masm32rt.inc
BUFFER_SIZE = 128
.data
format db '%s',0
string1 db 'Please type your name : ',13,10,0
string2 db '%s, hello world.',0
.data?
buffer db BUFFER_SIZE dup(?)
.code
start:
invoke crt_printf,ADDR string1
invoke crt_scanf,ADDR format,ADDR buffer
invoke crt_printf,ADDR string2,ADDR buffer
invoke ExitProcess,0
END start
Hi Alex,
I don't know if this answers your question properly but give it a try. It builds OK with the 64 bit version.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
.data
; titl db "MessageBox",0
; pttl dq titl
; ifmt1 db "Array:",13,10 ; control string
; db "mas3[0] = %d"
; db " mas3[8] = %d",0
; ifmt1 db "Array:",13,10,"mas3[0] = %d"," mas3[8] = %d",0
; pcst dq ifmt1 ; pointer to control string
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
; invoke MessageBox,0,pcst,"MessageBox",MB_OK
rcall MessageBox,0,chr$("Array:",13,10,"mas3[0] = %d"," mas3[8] = %d"),"MessageBox",MB_OK
invoke ExitProcess,0
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
This is the batch file to build it.
@echo off
set appname=alex
if exist %appname%.obj del %appname%.obj
if exist %appname%.exe del %appname%.exe
\masm32\bin64\ml64.exe /c %appname%.asm
\masm32\bin64\polink.exe /SUBSYSTEM:WINDOWS /MACHINE:X64 /ENTRY:entry_point /nologo /LARGEADDRESSAWARE %appname%.obj
dir %appname%.*
pause
Hey guys sorry for the late reply (in the process of moving states!), I'll try all these suggestions here soon. Thanks again for all the help - will report back :biggrin:
Alright I had some time to go through the examples you guys had for me.
1. The first one which you posted hutch (Reply #12), wouldn't ptxt1 be the same as "offset txt1"? I could be mistaken.
2. Vortex - this is VERY close to what I was looking for. The MS C runtime example however doesn't look like the strings get combined - instead you are just printing them off together. The first example is close though.
-It looks like the stdin call moves whatever is in the input to the buffer
-string2 is then appended to the buffer
Cool, but can we do this WITHOUT making the stdin calls? For instance:
.data
buffer db BUFFER_SIZE dup(?)
.code
mov offset buffer, offset ["hello world"]
Also do you have the official docs on szCatStr?
3. The third example by hutch was a bit beyond me right now, I think namely because of the UI stuff (I think its UI). Ill have to check that out again once I get a bit better.
Hi Alek,
txt1 db "This is a test",0 ; the text
ptxt1 dq txt1 ; the pointer to it
As long as you put the pointer variable after the data, MASM will load the address of the string data into the QWORD variable. It gives you a tidy single variable that can be used with mnemonics or API/Procedure calls. OFFSET still works in some contexts in 64 bit MASM but it is not the same as in 32 bit. The /LARGEADDRESSAWARE format in Win64 causes problems with OFFSET where LEA or the method above don't have any problems at all.
You will find "szCatStr proc src:QWORD, lpAdd:QWORD" in the "m64lib" directory and its useful enough for single text append but there is a better faster version that is macro driven called "mcat" where each append does not need to scan the length. This is straight out of the help file.
mcat
Add a number of strings to a pre-allocated buffer.
This can be either LOCAL or in the .DATA? section and
must be large enough to hold all of the characters.
mcat pBuffer,"String 1",ptxt1,str$(number),"More Text"
Limits are either text line length or 24 arguments, which ever comes first.
The third example is a bit messy but it shows that assembler code can be written in many different ways. Differing from at least some high level languages, there is no "One True Way" but many different ways to do things and its part of the flexibility of assembler coding. You obviously come from a C background and that will be useful to you but be careful of not letting it be a straight jacket on the style of assembler you want to write.
Here is a test piece for you. I have attached it as a zip file. Two different methods of copying text data from one buffer to another. The first with a stack frame, the second without. Note that there are many ways to do things like this.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
.data
text db "The time has come the walrus said,"
db " to speak of many things.",13,10,13,10,0 ; split line technique
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL buf1[128]:BYTE ; local 128 byte buffer
LOCAL ptr1 :QWORD ; pointer for source text
LOCAL pout :QWORD ; pointer for output buffer
lea rax, text ; load buffer address
mov ptr1, rax ; store it in LOCAL pointer
lea rax, buf1 ; get the output buffer address
mov pout, rax ; store it in LOCAL pointer
invoke cpytxt,ptr1,pout ; call the procedure
conout pout,lf ; display with console output
waitkey ; stop so you can see the results
.exit ; macro to exit process
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
CopyText proc psrc:QWORD,pdst:QWORD
mov r11, psrc ; load source address
mov r10, pdst ; load destination address
mov r9, -1 ; set up the loop increment
lbl:
add r9, 1 ; start at 0, increment offset
movzx rax, BYTE PTR [r11+r9] ; copy BYTE into a register
mov BYTE PTR [r10+r9], al ; copy the byte to output buffer
test rax, rax ; test if RAX is terminator
jnz lbl ; loop back if not
ret ; return to caller
CopyText endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
cpytxt proc
mov r11, -1 ; set up the loop increment
lbl:
add r11, 1 ; start at 0, increment offset
movzx rax, BYTE PTR [rcx+r11] ; copy BYTE into a register
mov BYTE PTR [rdx+r11], al ; copy the byte to output buffer
test rax, rax ; test if RAX is terminator
jnz lbl ; loop back if not
ret ; return to caller
cpytxt endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
What optimizing C compiler do//#include <string.h>
char * strcpy(char * dst, const char * src);
#pragma intrinsic(strcpy)
char *s = "test string";
int __cdecl main(void)
{
char buf[260];
strcpy(buf, s);
puts(buf);
return 0;
}
main:
[0000000000000000] 4881EC38010000 sub rsp,138h
[0000000000000007] 488B0500000000 mov rax,qword ptr [s]
[000000000000000E] 488D542420 lea rdx,[rsp+20h]
[0000000000000013] 482BD0 sub rdx,rax
[0000000000000016] 66660F1F840000000000 nop [rax+rax+0]
[0000000000000020] 0FB608 movzx ecx,byte ptr [rax]
[0000000000000023] 880C02 mov byte ptr [rdx+rax],cl
[0000000000000026] 488D4001 lea rax,[rax+1]
[000000000000002A] 84C9 test cl,cl
[000000000000002C] 75F2 jne 0000000000000020
[000000000000002E] 488D4C2420 lea rcx,[rsp+20h]
[0000000000000033] E800000000 call puts
[0000000000000038] 33C0 xor eax,eax
[000000000000003A] 4881C438010000 add rsp,138h
[0000000000000041] C3 ret
Strange, I always heard that Microsoft VS could not optimize better than other compilers. :badgrin:
#define _CRT_SECURE_NO_WARNINGS
#include <string.h>
#include <stdio.h>
char s[] = "test string";
int main()
{
00007FF719CF1000 sub rsp,138h
char buf[260];
strcpy(buf, s);
00007FF719CF1007 xor eax,eax
char buf[260];
strcpy(buf, s);
00007FF719CF1009 lea rdx,[s (07FF719CF3038h)]
00007FF719CF1010 movzx ecx,byte ptr [rax+rdx]
00007FF719CF1014 mov byte ptr buf[rax],cl
00007FF719CF1018 lea rax,[rax+1]
00007FF719CF101C test cl,cl
00007FF719CF101E jne main+10h (07FF719CF1010h)
puts(buf);
00007FF719CF1020 lea rcx,[buf]
00007FF719CF1025 call qword ptr [__imp_puts (07FF719CF2150h)]
return 0;
00007FF719CF102B xor eax,eax
}
00007FF719CF102D add rsp,138h
00007FF719CF1034 ret
OOPS, sorry it can do a little better: :t
#define _CRT_SECURE_NO_WARNINGS
#include <string.h>
#include <stdio.h>
char s[] = "test string";
int main()
{
00007FF7A4E51000 sub rsp,138h
char buf[260];
strcpy(buf, s);
00007FF7A4E51007 xor ecx,ecx
00007FF7A4E51009 lea rax,[s (07FF7A4E53020h)]
00007FF7A4E51010 mov al,byte ptr [rcx+rax]
00007FF7A4E51013 inc rcx
00007FF7A4E51016 mov byte ptr [rsp+rcx+1Fh],al
00007FF7A4E5101A test al,al
00007FF7A4E5101C jne main+9h (07FF7A4E51009h)
puts(buf);
00007FF7A4E5101E lea rcx,[buf]
00007FF7A4E51023 call qword ptr [__imp_puts (07FF7A4E520F0h)]
return 0;
00007FF7A4E51029 xor eax,eax
}
00007FF7A4E5102B add rsp,138h
00007FF7A4E51032 ret
The humours is that any of these algos when even vaguely optimised will hit the memory wall.
Quote from: hutch-- on December 09, 2018, 12:16:47 AM
The humours is that any of these algos when even vaguely optimised will hit the memory wall.
We need to continue watching closely that matter but technology always find solutions in due time. :icon_rolleyes:
Hi Alek,
Quote from: Alek on December 08, 2018, 11:50:03 AM
2. Vortex - this is VERY close to what I was looking for. The MS C runtime example however doesn't look like the strings get combined - instead you are just printing them off together. The first example is close though.
-It looks like the stdin call moves whatever is in the input to the buffer
-string2 is then appended to the buffer
Cool, but can we do this WITHOUT making the stdin calls? For instance:
.data
buffer db BUFFER_SIZE dup(?)
.code
mov offset buffer, offset ["hello world"]
Also do you have the official docs on szCatStr?
About the masm32 library functions :
\masm32\help\masmlib.chm -> Zero Terminated String Functions -> szCatStr
Here is the new C run-time example, you can prefer the traditional strcat function :
include \masm32\include\masm32rt.inc
BUFFER_SIZE = 128
.data
format db '%s',0
string1 db 'Please type your name : ',13,10,0
string2 db ', hello world.',0
.data?
buffer db BUFFER_SIZE dup(?)
.code
start:
invoke crt_printf,ADDR string1
invoke crt_scanf,ADDR format,ADDR buffer
invoke crt_strcat,ADDR buffer,ADDR string2
invoke crt_printf,ADDR format,ADDR buffer
invoke ExitProcess,0
END start
The CTXT macro probably does what you wish, the definition can be found in the file \masm32\macros\macros.asm
include \masm32\include\masm32rt.inc
BUFFER_SIZE = 128
.data?
buffer db BUFFER_SIZE dup(?)
.code
start:
invoke StdOut,CTXT("Please type your name",13,10)
invoke StdIn,ADDR buffer,BUFFER_SIZE
invoke szCatStr,ADDR buffer,CTXT(", hello world.")
invoke StdOut,ADDR buffer
invoke ExitProcess,0
END start
Quote from: hutch-- on December 08, 2018, 05:16:48 PM
Here is a test piece for you. I have attached it as a zip file. Two different methods of copying text data from one buffer to another. The first with a stack frame, the second without. Note that there are many ways to do things like this.
<edited to save space>
Insane - incredibly well documented too this is awesome
@Vortex: During this I also realized I could just call C functions, like strcpy. Any differences between this and szCatStr besides performance between the two functions? Is there a performance overhead of calling C functions?
Quote from: Alek on December 09, 2018, 09:55:17 AMIs there a performance overhead of calling C functions?
Not really. If you are worried about performance, check the Faster Memcopy (http://masm32.com/board/index.php?topic=4067.0) thread.
Quote from: hutch-- on December 08, 2018, 05:16:48 PMNote that there are many ways to do things like this.
Here is an extremely fast algo:
mov ecx, (sizeof somestring)/8
.Repeat
dec ecx
fld real8 ptr somestring[8*ecx]
fstp real8 ptr somedest[8*ecx]
.Until Zero?
Quote from: jj2007 on December 09, 2018, 11:34:36 AM
Quote from: Alek on December 09, 2018, 09:55:17 AMIs there a performance overhead of calling C functions?
Not really. If you are worried about performance, check the Faster Memcopy (http://masm32.com/board/index.php?topic=4067.0) thread.
I'm not too concerned, was just curious in why someone would want to use szCatStr over strcpy
Quote from: Alek on December 09, 2018, 11:37:52 AMwhy someone would want to use szCatStr over strcpy
Check the docs, they don't do the same thing. There is also szMultiCat btw - the
Zero Terminated String Functions section in MasmLib.chm is a fascinating one 8)
Hi Alek,
Quote@Vortex: During this I also realized I could just call C functions, like strcpy. Any differences between this and szCatStr besides performance between the two functions? Is there a performance overhead of calling C functions?
I don't think that there would be a dramatic difference in general. The C functions are good for practical programming.
Erol is right, you will tend to find timing variations are usually hardware based, different processors yield different times. The "szCatStr" proc scans the length then writes to the end of the buffer. It is no slouch and its handy for a few appends but if you are doing a high count of appends, there are better ways to do it where you don't have to repeatedly scan the string length. The "szappend" is a lot more efficient with streamed appends as it uses a current location pointer to update the next write address.
Quote from: hutch-- on December 09, 2018, 08:48:33 PMThe "szappend" is a lot more efficient with streamed appends as it uses a current location pointer to update the next write address.
Is it faster than szMultiCat?
They are slightly different animals, the multicat algo will take a number of arguments and for that count it is fast enough but if you are stream appending a very large count to a buffer, the append algo avoids the length scan as it returns the end position of the last write and the next write starts at the end of the last write.