News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

MultiByteToWideChar leaks

Started by jj2007, September 27, 2017, 07:35:01 PM

Previous topic - Next topic

jj2007

I stumbled over this because one of my little tools accumulated working space over time, so I investigated:
include \masm32\MasmBasic\MasmBasic.inc
.code
MbwRec2 proc
  push ecx
  call MbBufferGet ; we need a slot in the circular buffer
  mov ecx, [esp+8] ; source string
  push 0 ; 2 args for MbBufferFix
  push eax
  MemState("leaked kBytes in: %i\n")
  jecxz @F ; return a Null$
  invoke MultiByteToWideChar, CP_UTF8, 0, ecx, -1, eax, MbBufSize/8
  dec eax
  shl eax, 1 ; returned chars, we need bytes
  add [esp], eax
@@:
  MemState("leaked kBytes out: %i\n")
  call MbBufferFix
  pop ecx
  retn 4
MbwRec2 endp
  Init
  push 999 ; 1000 iterations
  .Repeat
push Chr$("Hello World, how are you today?")
call MbwRec2
dec stack
  .Until Sign?
EndOfCode


The MemState macro is a simple way to check if memory got lost between two points in code; it prints only if there is a loss. For the 1,000 iterations above it spits out this:leaked kBytes out: 4
leaked kBytes in: 20
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4
leaked kBytes out: 4


That is 4*17=68 kBytes lost, a significant leak, especially since the converted string is plain Ascii and just 21 bytes long.

Tested on Win7-64, WinXP and Win10, source and exe attached.

P.S.: When testing with 10 Mio iterations with 1024 byte strings, it turns out that after a while the API stops allocating more, at about 500kBytes; weird logic ::)

adeyblue

Quote
...
MBToWC called with eax=000000000044d804 and ecx=00000000005aa3d8
MBToWC called with eax=000000000044e01c and ecx=00000000005aa808
MBToWC called with eax=000000000044e834 and ecx=00000000005aa3d8
MBToWC called with eax=000000000044f04c and ecx=00000000005aa808
Nah, it's fine. The working set is bound to have a new page added every time the output string (eax) is in a different page. Also WinDbg breakpoint command are great.

FYI MultiByteToWideChar doesn't actually use any allocation functions at all, heap, virtual, anything. It's quite safe

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>
#include <psapi.h>
#pragma comment(lib, "psapi.lib")

ULONG totalKBLeaked = 0;
HANDLE hCurProcess = LongToHandle(-1);

void PrintWSChanges()
{
    PSAPI_WS_WATCH_INFORMATION wwi[100] = {0};
    PSAPI_WS_WATCH_INFORMATION* pWwi = wwi;
    // the first time through the printf code pages will show up
    if(GetWsChanges(hCurProcess, wwi, sizeof(wwi)))
    {
        for(; pWwi->FaultingPc; ++pWwi)
        {
            printf("\tPage at %p added by %p\n", pWwi->FaultingVa, pWwi->FaultingPc);
        }
    }
}

void __stdcall MbwRec2(const char* pStr, PWSTR pOutBuffer, int outLen)
{
    PROCESS_MEMORY_COUNTERS pmc1 = {}, pmc2 = {};
    GetProcessMemoryInfo(hCurProcess, &pmc1, sizeof(pmc1));
    MultiByteToWideChar(CP_UTF8, 0, pStr, -1, pOutBuffer, outLen);
    GetProcessMemoryInfo(hCurProcess, &pmc2, sizeof(pmc2));
    ULONG kbLeaked = (pmc2.WorkingSetSize - pmc1.WorkingSetSize) / 1024;
    totalKBLeaked += kbLeaked;
    if(kbLeaked)
    {
        printf("Diff between working sets is %lukb\n", kbLeaked);
        PrintWSChanges();
    }
}

int main(int argc, wchar_t** argv)
{
    int loops = 100000;
    WCHAR buffer[200];
    PWSTR pOutBuffer = buffer;
    DWORD bufSize = ARRAYSIZE(buffer) / 2;
    InitializeProcessForWsWatch(hCurProcess);
    for(int i = 1; i <= loops; ++i)
    {
        // allocate a new page every 10,000 loops
        if((i % 10000) == 0)
        {
            pOutBuffer = (PWSTR)VirtualAlloc(NULL, 0x1000, MEM_COMMIT, PAGE_READWRITE);
            bufSize = 0x1000 / 2;
        }
        MbwRec2("Hello World, how are you today?", pOutBuffer, bufSize);
    }
    printf("Total KB Leaked in %d loops=%lu\n", loops, totalKBLeaked);
}


Diff between working sets is 4kb
        Page at 001D0001 added by 776E79D6
        Page at 58CBABB1 added by 58CBABB0
        Page at 58CB53A1 added by 58CB53A0
        Page at 58CB2EF1 added by 58CB2EF0
        Page at 58D23A61 added by 58D23A60
        Page at 58D64EF1 added by 58D64EF0
        Page at 58CF8EA1 added by 58CF8EA0
        Page at 58CF9001 added by 58CF8FFF
        Page at 58C6E8AD added by 58CF91F9
        Page at 58CFA035 added by 58CF925D
        Page at 58CD11E1 added by 58CD11E0
        Page at 58CEB5A1 added by 58CEB5A0
        Page at 58CB6081 added by 58CB6080
        Page at 58D22CE1 added by 58D22CE0
        Page at 58D27A71 added by 58D27A70
        Page at 739A6255 added by 739A6254
        Page at 739CDCD1 added by 739CDCD0
Diff between working sets is 4kb
        Page at 001E0001 added by 776E79D6
Diff between working sets is 4kb
        Page at 00270001 added by 776E79D6
Diff between working sets is 4kb
        Page at 00280001 added by 776E79D6
Diff between working sets is 4kb
        Page at 00290001 added by 776E79D6
Diff between working sets is 4kb
        Page at 003A0001 added by 776E79D6
Diff between working sets is 4kb
        Page at 003B0001 added by 776E79D6
Diff between working sets is 4kb
        Page at 003C0001 added by 776E79D6
Diff between working sets is 4kb
        Page at 003D0001 added by 776E79D6
Diff between working sets is 4kb
        Page at 003E0001 added by 776E79D6
Total KB Leaked in 100000 loops=40

In this case 776E79D6 is the part of RtlUTF8ToUnicode that writes the output string and it predictably shows up the first time a new page is written to.

jj2007

Quote from: adeyblue on September 28, 2017, 11:27:43 AMThe working set is bound to have a new page added every time the output string (eax) is in a different page.

Thanks for looking into this, adeyblue. Both input and output string are always in the .data? section, and even if I write something to that buffer, it won't change the behaviour of MultiByteToWideChar: It adds 4k every now and then until the total bytes added reach about 500k. Since half a megabyte is not a big deal, I don't consider it a real problem, though.

Btw, compliments: Your code built "as is" with MSVC; Pelles C threw loads of errors, though, while GCC had only one complaint:Tmp.cpp: In function 'int main(int, wchar_t**)':
Tmp.cpp:44:37: error: 'ARRAYSIZE' was not declared in this scope
     DWORD bufSize = ARRAYSIZE(buffer) / 2;

mikeburr

i think your 500k is the default heap allocation limit [ presumably its the process heap ]   4096 is the basic page as you all know but from the stuff i did it seemed to preempt large numbers of heap allocations by grabbing large and increasing blocks of memory [ i was putting over 10 million records in compressed form in there]  Blocks  started [ this is from recollection: at about 0100000h to about 0500000h depending n whether it had a heavy lunch !]  because i had lists with child lists they werent  compacted in the  same page  the child pages being blatted from about 00400000 upwards .. the parent lists were quite obvious in memory being of the form 1F1000  or 3F1000   the last 1000 being the heap management i think though i have seen this at the front in some blocks where data always starts at 07D0 after the start of the page . This is a bit vague and only a DIM recollection by a DIM person ... i used in program displays [ like old fashioned Cobol programmers would i suppose} and Procwin to debug the programs and would like to thank them [Japeth?] profusely as it was so much easier than Olly or Soft ice  though i love the IDA free because it shows you your code in logical blocks and would quite like to write something similar but presumably requires a lot more experience of using GDI than i have at the moment   or completely controlling the creation of items on the window manually
regards Mike b

jj2007

Thanks, Mike. To me it seems that it keeps wasting little blocks until 500k are reached, then it becomes reasonable and reuses them somehow. My leak problem with that application is solved, it runs without problems. Same for the map viewer, it is pretty stable after a while, meaning there is no leak.

Quote from: mikeburr on October 08, 2017, 03:46:46 AMProcwin to debug the programs and would like to thank them [Japeth?] profusely as it was so much easier than Olly or Soft ice

You should give the deb macro a chance. I like Olly a lot and use it frequently, but even more I use deb, because it is often a lot faster to find a bug with deb, especially if you have a console.

All you need is MasmBasic and replacing the include stuff with a one-liner:

; instead of include \masm32\include\masm32rt.inc:
include \masm32\MasmBasic\MasmBasic.inc

.data
MyArray dd 25, 18, 23, 17, 9, 2, 6
HelloW$ db "Hello World", 0

.code
start:
  xor ebx, ebx ; set two non-volatile
  xor esi, esi ; registers to zero
  .Repeat
add esi, MyArray[4*ebx]
; deb 4, "in the loop", esi
inc ebx
  .Until ebx>=lengthof MyArray
  MsgBox 0, cat$(str$(esi), " is the sum"), offset HelloW$, MB_OK
  exit

end start

hutch--

You may be the victim of the lazy flush characteristic of allocated memory in much the same way as disk IO where it stays in cache for a while after the file is closed.