News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

How to use thread local storage

Started by markallyn, February 06, 2018, 05:59:36 AM

Previous topic - Next topic

markallyn

Hello everyone,

I've been trying to write a program that uses thread local storage.  The below code is an UNSUCCESSFUL effort to do so.  As commented, the code assembles and links, but it refuses to load.  This is connected with the _tls_used public symbol.  I (ironically) do not know how to code this.  There may be other problems too, but for sure this is one of them.

Quote
;=======================================
;masmtls.asm
;This code will assemble and link.  It will not load at runtime. 

include \masm32\include64\masm64rt.inc


externdef __imp_GetCommandLineA:PROC
externdef __imp_MessageBoxA:PROC
   
PUBLIC   _tls_index
PUBLIC   _tls_used

IMAGE_TLS_DIRECTORY STRUCT
StartAddressOfRawData    QWORD 1
EndAddressOfRawData    QWORD 1
AddressOfIndex    QWORD 1
AddressOfCallBacks    QWORD 1
SizeOfZeroFill   DWORD 1
Characteristics    DWORD 1
IMAGE_TLS_DIRECTORY ENDS

;=======================================
.code
        main   PROC   
   push rdi ;save rdi
   sub rsp, 32 ; mandatory top 4 stack elements to be used by APIs
   
   call [__imp_GetCommandLineA]
   
   xor r9, r9
   lea r8, [label2]
   mov rdx, rax
   xor rcx, rcx
   
   call [__imp_MessageBoxA]
   
   
   add rsp, 32
   pop rdi
   ret
main   ENDP
   
   
tlsfunc    PROC
   push rdi
   sub rsp, 32
   
   
   xor r9, r9
   lea r8, [msg]
   lea rdx, [tlsmsg]
   xor rcx, rcx
   
   call [__imp_MessageBoxA]
   
   
   add rsp, 32
   pop rdi

   ret
tlsfunc   ENDP

;===============================================

.data      

msg   BYTE "Hello World!",0   ; the string to print, 10=line feed
tlsmsg   BYTE "I'm the TLS callback function.",0
label2 BYTE "Command line is",0

_tls_index QWORD 0
array_tls_index QWORD _tls_index, 0
array_tls_func QWORD tlsfunc, 0

;===============================================
;THIS IS WHAT I DON'T UNDERSTAND

_tls_used:   

StartAddressOfRawData IMAGE_TLS_DIRECTORY  <0>
EndAddressOfRawData IMAGE_TLS_DIRECTORY <0>
AddressOfIndex IMAGE_TLS_DIRECTORY <array_tls_index>
AddressOfCallBacks IMAGE_TLS_DIRECTORY <array_tls_func>
SizeOfZeroFill IMAGE_TLS_DIRECTORY <0>
Characteristics IMAGE_TLS_DIRECTORY <0>

END

I am pretty much a novice at masm coding, as you experts will see immediately.  Perhaps I'm way over my head with this problem, but I've spent considerable effort to make the program work and there isn't much on the Forum or the Web that provides guidance.

Thanks,
Mark Allyn

aw27

There are indeed my errors, namely you forgot how to initialize a structure and that _tls_used is already a variable used by the C runtime.

PS: Note that I am not treating you like a 75 y.o. member, I answer this way to everybody.

jj2007

Mark,

As you noted, there are almost no examples around, but there is a short Accessing C __declspec(thread) from x64 MASM (ml64.exe) in the old forum, which I found searching for thread local storage here.

Sorry that I can't help you more, but I never used TLS, let alone in 64-bit land... keep us posted.

markallyn

Good afternoon/evening AW27 and JJ:

aw27:  I note with thanks that you treat me like everyone else.  That's just as it should be.  BTW, I had noticed this quite some time ago... You are very consistent. 

Thanks indeed for the tips.

JJ:  I will follow the link.  I will post what I learn when I have the thing up and running.  I would prefer nowadays (with encouragement from Hutch!) to post the actual code and a zip file as well.

Regards,
Mark

markallyn

aw27 and JJ:

Update:  aw27 was dead-on right about my incorrectly setting up and initializing the IMAGE_TLS_DIRECTORY struct.  I went about re-doing the STRUCT and made progress as a result.  The program now loads.

However, it doesn't start correctly--makes a jump to a bad location instead of jmpING to MAIN.

Progress, but not finality.

Regards,
Mark

markallyn

aw27 and JJ:

OK.  Fiddling around and getting rid of push/pop rdi (see commented code) in the two functions made all the difference.  Don't know how they crept in....

Quote
;=======================================
;masmtls.asm
;This code will assemble and link.  It will load at runtime.
;It runs.  Critical change was getting rid of the push/pop rdi instructions in
;the two functions.  Also fixing up the creation and initializing of the STRUCT
;as suggested by aw27 this day, Februay 5, 2018.

include \masm32\include64\masm64rt.inc


externdef __imp_GetCommandLineA:PROC
externdef __imp_MessageBoxA:PROC
   
PUBLIC   _tls_index
PUBLIC   _tls_used

IMAGE_TLS_DIRECTORY STRUCT
StartAddressOfRawData    QWORD ?
EndAddressOfRawData    QWORD ?
AddressOfIndex    QWORD ?
AddressOfCallBacks    QWORD ?
SizeOfZeroFill   DWORD ?
Characteristics    DWORD ?
IMAGE_TLS_DIRECTORY ENDS

;=======================================
.code
        main   PROC   
   ;push rdi
   ;sub rsp, 48 ; mandatory top 4 stack elements to be used by APIs
   
   call __imp_GetCommandLineA
   
   xor r9, r9
   lea r8, [label2]
   mov rdx, rax
   xor rcx, rcx
   
   call __imp_MessageBoxA
   
   
   ;add rsp, 48
   ;pop rdi
   ret
main   ENDP
   
   
tlsfunc    PROC
   ;push rdi
   ;sub rsp, 48
   
   
   xor r9, r9
   lea r8, [msg]
   lea rdx, [tlsmsg]
   xor rcx, rcx
   
   call __imp_MessageBoxA
   
   
   ;add rsp, 48
   ;pop rdi

   ret
tlsfunc   ENDP

;===============================================

.data      

msg   BYTE "Hello World!",0   
tlsmsg   BYTE "I'm the TLS callback function.",0
label2  BYTE "Command line is",0

_tls_index QWORD 0
array_tls_index QWORD _tls_index, 0
array_tls_func QWORD tlsfunc, 0

;===============================================
;THIS IS WHAT the struct initialization now looks like.  Thanks to aw27.

_tls_used   IMAGE_TLS_DIRECTORY <0,0,array_tls_index,array_tls_func,0,0>   



END

So, it runs as hoped.  The moral of this little tale:  I still got a great deal to learn; not a lot of time to do so.

Regards,
Mark Allyn

jj2007

Congrats, Mark :t

Push+pop is a no-no in 64-bit land, I also had to learn that. It is because the memory below rsp gets used by some sorts of stack frame - "shadow space". A shadowy concept imho ;-)

markallyn

Good evening, JJ:

Thanks for the congrats.  When an expert like yourself offers this kind of encouragement it is much, much appreciated by novices such as I.  Tomorrow I shall post a further modification that makes the code shorter. 

Regards,
Mark

sinsi

>Push+pop is a no-no in 64-bit land

If you do your own stack adjustments then push and pop are perfectly OK.
Tá fuinneoga a haon déag níos fearr :biggrin:

aw27

It is not a question of pushes and pops.
Hutch's STACKFRAME macro activated through masm64rt.inc made the stack adjustment for you. So, when you push an odd number of octets, as you did, the stack becomes unaligned, that caused the problem.

markallyn

Hello everyone,

I tested aw27's suggestion regarding unbalanced stack.  I did so by uncommenting the push rdi instruction in the calling program shown above and inserting and rsp, -16 immediately afterwards. 

In other words:
Quote
push rdi
and rsp, -16

The program ran correctly.  Without the second instruction in the snippet the program crashes as before.

Regards,
Mark Allyn

markallyn

Hello everyone,

One of the puzzling aspects of this little tls episode is where to find the structure definition of IMAGE_TLS_DIRECTORY.  It nowadays may be found in WINNT.H, but it is not as far as I can tell resident in Win64.inc.  So, I had to construct my own definition as shown in the code.  But, surely there must be a better way to go about it.  If occurred to me that one could convert the winnt.h file to a winnt.inc file by for example using Japheth's h2incx.exe. 

Perhaps a bigger question for a beginner like me is why any of the .h files in \masm32\include are in the package at all.  Unless they are there for reference only? Or to be converted into .inc files as needed?

Regards to all,
Mark

hutch--

Mark,

Unless you have added more, the only .H file in the include directory is "resource.h" which is the form that RC.EXE requires when compiling resources. RE: Japheth's h2inc utility, long ago Microsoft stopped using their own earlier version because the conversions were very poor, Japheth's version is not much better and requires a massive amount of editing to fix all of the incompatible results.

Both the 32 and 64 bit versions of MASM are very picky about the include files and will squark hundreds of errors if you have any errors in the include file.

Thread local storage is basically something you use with a current Microsoft C compiler, its a lot of messing around for no real gain at a lower level. Dynamic memory allocation is a better system and you don't have to mess around with TLS at all.

markallyn

Good afternoon/morning/whatever, Huch,

Let's see.  You are 16 hours ahead of me and it's about 6:30 PM here, so it really should be good morning, Hutch.

Squarks is for sure.  I tried to include winnt.h and there was a blizzard of them.

And, it occurred to me after posting that even if I successfully converted winnt.h to winnt.inc there would probably be definition conflicts between that file and others in the include64 file, although I didn't check this.

And, it also occurred to me that the .h files would in fact be quite useful if you wanted to write C code and call .asm functions with them. 

As for the utility of tsl callbacks I have no doubt you are absolutely correct that they don't buy much else that couldn't be gained by other means.  Since in my current state of beautiful idleness--i.e. I don't have to earn a living--I don't mind exploring and learning via failed experiments.  And The Campus is generally such a congenial place to learn from experts that I am a happy guy.

Regards,
Mark

hutch--

Mark,

Something you need to get the swing of is how the STACKFRAME macros work. The entry point is always correctly aligned and any procedure that you call that uses a stack frame is also correctly aligned. When you call a procedure that has no stack frame, it uses the existing alignment from the calling procedure so it is also always correctly aligned. Now you can use PUSH / POP in certain circumstances when the code is only working in 64 bit integer instructions but if you use LOCAL variables with the alignment set to SSE or AVX you mess up the alignment of that data.

In most instances you are better off using LOCAL variables and using the MOV mnemonic rather than using PUSH / POP as it is generally faster AND does not modify the alignment. The Win64 ABI is a complicated mess but it is also more efficient than the old STDCALL used in Win32 that used the stack to pass arguments. The alignment is for performance reasons and using 4 registers for the 1st 4 arguments is more efficient than 4 stack pushes. As there are a large number of API functions and locally written procedures that use 4 or less arguments, register passing using the normal FASTCALL has less overhead than the format used in Win32.

It is no joy to digest but the STACKFRAME macro and its variants show how it all works and it is probably the only really safe way to write general purpose code in Win64 is to automate the calling method. The type of code that was around when I first srated on Win64 was fully of highly unreliable stack twiddling techniques and the whole purpose of designing the STACKFRAME macro was to get reliable code without the hassles of stack twiddling.