I stumbled over this because one of my little tools accumulated working space over time, so I investigated:
include \masm32\MasmBasic\MasmBasic.inc
.code
MbwRec2 proc
push ecx
call MbBufferGet ; we need a slot in the circular buffer
mov ecx, [esp+8] ; source string
push 0 ; 2 args for MbBufferFix
push eax
MemState("leaked kBytes in: %i\n")
jecxz @F ; return a Null$
invoke MultiByteToWideChar, CP_UTF8, 0, ecx, -1, eax, MbBufSize/8
dec eax
shl eax, 1 ; returned chars, we need bytes
add [esp], eax
@@:
MemState("leaked kBytes out: %i\n")
call MbBufferFix
pop ecx
retn 4
MbwRec2 endp
Init
push 999 ; 1000 iterations
.Repeat
push Chr$("Hello World, how are you today?")
call MbwRec2
dec stack
.Until Sign?
EndOfCode
The MemState (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1368) macro is a simple way to check if memory got lost between two points in code; it prints only if there is a loss. For the 1,000 iterations above it spits out this:leaked kBytes out: 4
leaked kBytes in: 20
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
That is 4*17=68 kBytes lost, a significant leak, especially since the converted string is plain Ascii and just 21 bytes long.
Tested on Win7-64, WinXP and Win10, source and exe attached.
P.S.: When testing with 10 Mio iterations with 1024 byte strings, it turns out that after a while the API stops allocating more, at about 500kBytes; weird logic ::)
Quote
...
MBToWC called with eax=000000000044d804 and ecx=00000000005aa3d8
MBToWC called with eax=000000000044e01c and ecx=00000000005aa808
MBToWC called with eax=000000000044e834 and ecx=00000000005aa3d8
MBToWC called with eax=000000000044f04c and ecx=00000000005aa808
Nah, it's fine. The working set is bound to have a new page added every time the output string (eax) is in a different page. Also WinDbg breakpoint command are great.
FYI MultiByteToWideChar doesn't actually use any allocation functions at all, heap, virtual, anything. It's quite safe
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>
#include <psapi.h>
#pragma comment(lib, "psapi.lib")
ULONG totalKBLeaked = 0;
HANDLE hCurProcess = LongToHandle(-1);
void PrintWSChanges()
{
PSAPI_WS_WATCH_INFORMATION wwi[100] = {0};
PSAPI_WS_WATCH_INFORMATION* pWwi = wwi;
// the first time through the printf code pages will show up
if(GetWsChanges(hCurProcess, wwi, sizeof(wwi)))
{
for(; pWwi->FaultingPc; ++pWwi)
{
printf("\tPage at %p added by %p\n", pWwi->FaultingVa, pWwi->FaultingPc);
}
}
}
void __stdcall MbwRec2(const char* pStr, PWSTR pOutBuffer, int outLen)
{
PROCESS_MEMORY_COUNTERS pmc1 = {}, pmc2 = {};
GetProcessMemoryInfo(hCurProcess, &pmc1, sizeof(pmc1));
MultiByteToWideChar(CP_UTF8, 0, pStr, -1, pOutBuffer, outLen);
GetProcessMemoryInfo(hCurProcess, &pmc2, sizeof(pmc2));
ULONG kbLeaked = (pmc2.WorkingSetSize - pmc1.WorkingSetSize) / 1024;
totalKBLeaked += kbLeaked;
if(kbLeaked)
{
printf("Diff between working sets is %lukb\n", kbLeaked);
PrintWSChanges();
}
}
int main(int argc, wchar_t** argv)
{
int loops = 100000;
WCHAR buffer[200];
PWSTR pOutBuffer = buffer;
DWORD bufSize = ARRAYSIZE(buffer) / 2;
InitializeProcessForWsWatch(hCurProcess);
for(int i = 1; i <= loops; ++i)
{
// allocate a new page every 10,000 loops
if((i % 10000) == 0)
{
pOutBuffer = (PWSTR)VirtualAlloc(NULL, 0x1000, MEM_COMMIT, PAGE_READWRITE);
bufSize = 0x1000 / 2;
}
MbwRec2("Hello World, how are you today?", pOutBuffer, bufSize);
}
printf("Total KB Leaked in %d loops=%lu\n", loops, totalKBLeaked);
}
Diff between working sets is 4kb
Page at 001D0001 added by 776E79D6
Page at 58CBABB1 added by 58CBABB0
Page at 58CB53A1 added by 58CB53A0
Page at 58CB2EF1 added by 58CB2EF0
Page at 58D23A61 added by 58D23A60
Page at 58D64EF1 added by 58D64EF0
Page at 58CF8EA1 added by 58CF8EA0
Page at 58CF9001 added by 58CF8FFF
Page at 58C6E8AD added by 58CF91F9
Page at 58CFA035 added by 58CF925D
Page at 58CD11E1 added by 58CD11E0
Page at 58CEB5A1 added by 58CEB5A0
Page at 58CB6081 added by 58CB6080
Page at 58D22CE1 added by 58D22CE0
Page at 58D27A71 added by 58D27A70
Page at 739A6255 added by 739A6254
Page at 739CDCD1 added by 739CDCD0
Diff between working sets is 4kb
Page at 001E0001 added by 776E79D6
Diff between working sets is 4kb
Page at 00270001 added by 776E79D6
Diff between working sets is 4kb
Page at 00280001 added by 776E79D6
Diff between working sets is 4kb
Page at 00290001 added by 776E79D6
Diff between working sets is 4kb
Page at 003A0001 added by 776E79D6
Diff between working sets is 4kb
Page at 003B0001 added by 776E79D6
Diff between working sets is 4kb
Page at 003C0001 added by 776E79D6
Diff between working sets is 4kb
Page at 003D0001 added by 776E79D6
Diff between working sets is 4kb
Page at 003E0001 added by 776E79D6
Total KB Leaked in 100000 loops=40
In this case 776E79D6 is the part of RtlUTF8ToUnicode that writes the output string and it predictably shows up the first time a new page is written to.
Quote from: adeyblue on September 28, 2017, 11:27:43 AMThe working set is bound to have a new page added every time the output string (eax) is in a different page.
Thanks for looking into this, adeyblue. Both input and output string are always in the .data? section, and even if I write something to that buffer, it won't change the behaviour of MultiByteToWideChar: It adds 4k every now and then until the total bytes added reach about 500k. Since half a megabyte is not a big deal, I don't consider it a real problem, though.
Btw, compliments: Your code built "as is" with MSVC; Pelles C threw loads of errors, though, while GCC had only one complaint:
Tmp.cpp: In function 'int main(int, wchar_t**)':
Tmp.cpp:44:37: error: 'ARRAYSIZE' was not declared in this scope
DWORD bufSize = ARRAYSIZE(buffer) / 2;
i think your 500k is the default heap allocation limit [ presumably its the process heap ] 4096 is the basic page as you all know but from the stuff i did it seemed to preempt large numbers of heap allocations by grabbing large and increasing blocks of memory [ i was putting over 10 million records in compressed form in there] Blocks started [ this is from recollection: at about 0100000h to about 0500000h depending n whether it had a heavy lunch !] because i had lists with child lists they werent compacted in the same page the child pages being blatted from about 00400000 upwards .. the parent lists were quite obvious in memory being of the form 1F1000 or 3F1000 the last 1000 being the heap management i think though i have seen this at the front in some blocks where data always starts at 07D0 after the start of the page . This is a bit vague and only a DIM recollection by a DIM person ... i used in program displays [ like old fashioned Cobol programmers would i suppose} and Procwin to debug the programs and would like to thank them [Japeth?] profusely as it was so much easier than Olly or Soft ice though i love the IDA free because it shows you your code in logical blocks and would quite like to write something similar but presumably requires a lot more experience of using GDI than i have at the moment or completely controlling the creation of items on the window manually
regards Mike b
Thanks, Mike. To me it seems that it keeps wasting little blocks until 500k are reached, then it becomes reasonable and reuses them somehow. My leak problem with that application is solved, it runs without problems. Same for the map viewer, it is pretty stable after a while, meaning there is no leak.
Quote from: mikeburr on October 08, 2017, 03:46:46 AMProcwin to debug the programs and would like to thank them [Japeth?] profusely as it was so much easier than Olly or Soft ice
You should give the deb macro (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1019) a chance. I like Olly a lot and use it frequently, but even more I use deb, because it is often a lot faster to find a bug with
deb, especially if you have a console.
All you need is MasmBasic (http://masm32.com/board/index.php?topic=94.0) and replacing the include stuff with a one-liner:
; instead of include \masm32\include\masm32rt.inc:
include \masm32\MasmBasic\MasmBasic.inc
.data
MyArray dd 25, 18, 23, 17, 9, 2, 6
HelloW$ db "Hello World", 0
.code
start:
xor ebx, ebx ; set two non-volatile
xor esi, esi ; registers to zero
.Repeat
add esi, MyArray[4*ebx]
; deb 4, "in the loop", esi
inc ebx
.Until ebx>=lengthof MyArray
MsgBox 0, cat$(str$(esi), " is the sum"), offset HelloW$, MB_OK
exit
end start
You may be the victim of the lazy flush characteristic of allocated memory in much the same way as disk IO where it stays in cache for a while after the file is closed.