The MASM Forum

General => The Campus => Topic started by: TBRANSO1 on January 24, 2019, 03:44:48 PM

Title: Amazed or Mortified?
Post by: TBRANSO1 on January 24, 2019, 03:44:48 PM
Well, I just produced the most useless threading program.  I did it for practice.  I can't help but be amazed at the depth of the MASM32 library, but kind of scary that we can produce this with such ease in assembly.

VOILA!!!
For the beginners in the crowd... this is a program demonstrating Kernel threads, and how to modify global variables without synchronizations tools, like mutex or semaphores.  This uses the Local Thread Storage api and it locks down atomically access to the global variable so that each Thread gets it's own global for use and doesn't mess with the other threads use of their globals.


include \masm32\include\masm32rt.inc

ExitProcess PROTO :DWORD
CommonFunc PROTO
ThreadFunction PROTO
ExitError PROTO :LPSTR
ExitThread PROTO :LPSTR

.CONST
threadcount EQU 4

.DATA
tlsAllcFailedMsg DB "TlasAlloc failed!, error code: ", 0
dwCreateThdMsg DB "Create thread error, error code: ", 0
tlsValErrMsg DB "TlsGetValue error, error code: ", 0
tlsSetValErrMsg DB "TlsSetValue error, error code: ", 0

.DATA?
dwTlsIndex DWORD ?

.CODE
start:
call main
inkey
push 0
call ExitProcess

CommonFunc PROC
LOCAL lpvData:LPVOID

push dwTlsIndex
call TlsGetValue
mov lpvData, eax
.IF lpvData == 0
INVOKE GetLastError
.IF (!eax)
INVOKE ExitError, addr tlsValErrMsg
.ENDIF
.ENDIF

INVOKE GetCurrentThreadId

printf("Common: thread %d: lpvData = %lx\n", eax, lpvData)

INVOKE Sleep, 1000
CommonFunc ENDP

ThreadFunction PROC
LOCAL lpvData:LPVOID

push 0100h
push LPTR
call LocalAlloc
mov lpvData, eax
INVOKE TlsSetValue, dwTlsIndex, lpvData
.IF (!eax)
INVOKE ExitError, addr tlsSetValErrMsg
.ENDIF

call CommonFunc

INVOKE TlsGetValue, dwTlsIndex
.IF eax != 0
INVOKE LocalFree, lpvData
.ENDIF

xor eax, eax
ret
ThreadFunction ENDP

ExitError PROC lpszMessage:LPSTR
INVOKE GetLastError
printf("%s %d\n", lpszMessage, str$(eax))
push 0
call ExitProcess
ExitError ENDP

main PROC
LOCAL IDThread:DWORD
LOCAL hThread[threadcount]:HANDLE
LOCAL i:DWORD

INVOKE TlsAlloc
.IF dwTlsIndex == TLS_OUT_OF_INDEXES
INVOKE ExitError, addr tlsAllcFailedMsg
.ENDIF

mov i, 0
.WHILE i < threadcount
add i, 1
INVOKE CreateThread, 0, 0, addr ThreadFunction, 0, 0, addr IDThread
mov esi, i
mov [hThread+esi*4], eax
.IF eax == 0
INVOKE ExitError, addr dwCreateThdMsg
.ENDIF
.ENDW

INVOKE WaitForMultipleObjects, 4, addr hThread, TRUE, -1

INVOKE TlsFree, dwTlsIndex

xor eax, eax
ret
main ENDP
end start


8)
Title: Re: Amazed or Mortified?
Post by: hutch-- on January 25, 2019, 06:25:49 AM
After a while you will get used to it, you can do truly wicked things with MASM.  :biggrin:
Title: Re: Amazed or Mortified?
Post by: zedd151 on January 25, 2019, 06:31:00 AM
I never did understand TheadLocalStorage really, but this thread (pardon the pun) sparked my interest in doing a little more research, to better my understanding of how it works.
Title: Re: Amazed or Mortified?
Post by: AK_AK on January 26, 2019, 12:31:04 PM
Feel the power, it puts a 30-30 to shame !  :badgrin:

i was amazed when i first learned of TLS let alone how it could be used, there is so much potential there.
just remember, the stronger you are the greater strives you should take to show your gentle side.
Title: Re: Amazed or Mortified?
Post by: TBRANSO1 on January 26, 2019, 03:43:49 PM
Jeesh, I didn't know that Thread Local Storage would turn so many people on.  :shock:
Title: Re: Amazed or Mortified?
Post by: jj2007 on January 26, 2019, 08:17:38 PM
There is a simple reason: Our algos are so blazing fast that we never need multiple threads. And in the rare cases we need TLS, we just roll our own  8)
Title: Re: Amazed or Mortified?
Post by: hutch-- on January 26, 2019, 11:53:47 PM
I know the environment that TLS was designed for but I am yet to see the reason to use it with assembler code in Windows. Any memory you allocate in any thread you allocate it in has a separate handle so there is no chance of overlap (this is what protected mode is for). As long as you do not try to use memory common to all threads IE: GLOBAL allocation, I have yet to see the gain of TLS.

Even if you have some reason to use GLOBAL memory, there are methods of access control so that two or more threads don't clash in their memory access.
Title: Re: Amazed or Mortified?
Post by: TBRANSO1 on January 27, 2019, 01:45:40 AM
Hutch,

Since I got your attention...

If I wanted to mostly the crt libraries, which is the least amount of includes to use?  instead of the more general masm32rt.inc

I ask b/c it seems if I don't use the general blanket masm32rt.inc, I get crt link errors all over the place.

If you understand what I am asking?
Title: Re: Amazed or Mortified?
Post by: hutch-- on January 27, 2019, 01:58:00 AM
I don't know exactly what you are doing but if you have a look at the MSVCRT.INC file in MASM32, it prefixes "crt__" to the list of function names. You will get errors if you try to use the original MSVCRT function names. The problem is a naming range clash so the prefix solved the problem. What I don't know is if you need prototypes for API calls for MSVCRT.
Title: Re: Amazed or Mortified?
Post by: HSE on January 27, 2019, 06:37:27 AM
Most libraries are very modular. If you don't use a function, that function is not added to your program.
Prototypes are already declared when masm32rt.inc is included.
Title: Re: Amazed or Mortified?
Post by: AK_AK on January 27, 2019, 08:29:11 AM
i think i is somewhere in the MSDN but i recall the TLS being described as a way to [Xpass dataX] make data commonly available
{X with a thread along with the call for execution.X] among threads of a parent process
a backdoor of a sort that allows threads to [Xpass information to a separate thread.X] share a common pool

https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-local-storage

here is a gem, if you are using multiple threads, and they must access tha same data set, there is less overhead than synching threads to pass data.
the threads use a common "structure" and can read write modify with no burden of lock synch release hueristics, thus faster.

The TLS comes in when a data structure can be used locally in the thread, and results or changes can be reflected ASYNCHRONOUSLY in the common data structure


there is a way that a thread could be created under the parent process of a "target" thread and be used to manipulate the data.
this is what i found intriguing about TLS, and gave me concern about the potential for abuse.

more about TLS here :

https://software.intel.com/en-us/articles/use-thread-local-storage-to-reduce-synchronization
Title: Re: Amazed or Mortified?
Post by: TBRANSO1 on January 27, 2019, 10:52:31 AM
Quote from: AK_AK on January 27, 2019, 08:29:11 AM
i think i is somewhere in the MSDN but i recall the TLS being described as a way to [Xpass dataX] make data commonly available
{X with a thread along with the call for execution.X] among threads of a parent process
a backdoor of a sort that allows threads to [Xpass information to a separate thread.X] share a common pool

Isn't that one hell of a definition?  :dazzled:
Title: Re: Amazed or Mortified?
Post by: AK_AK on January 27, 2019, 11:23:46 AM
i edited the comment but wanted to be fair and included my mistakes so [XblahblahblahX] is junk,

lets try this: if you have a bunch of data and it doesnt matter what time the data is changed, or if the changes are in proper order, then you can use TLS to share data . that means only the change in data is important, and threads using it are communicating by way of a postit note mechanism.

otherwise multithreading will need to lock the data, change it then unlock it so other threads can access the data.

a real world example, would be:  interest calculations on multiple bank accounts, or stock picks in a real time stock trading application.
Title: Re: Amazed or Mortified?
Post by: TBRANSO1 on January 27, 2019, 11:35:03 AM
Quote from: AK_AK on January 27, 2019, 11:23:46 AM
lets try this: if you have a bunch of data and it doesn't matter what time the data is changed, or if the changes are in proper order, then you can use TLS to share data . that means only the change in data is important, and threads using it are communicating by way of a post-it note mechanism.

a real world example, would be:  interest calculations on multiple bank accounts, or stock picks in a real time stock trading application.

I like the post-it note explanation.  :icon14:
Title: Re: Amazed or Mortified?
Post by: hutch-- on January 27, 2019, 11:36:38 AM
There are a few tricks when creating multiple threads with the same data, With CreateThread() you pass a structure full of whatever you need the thread to have, the caller then runs a spinlock that waits for a reply from the newly created thread and when the reply is received, it the creates the next one. Without this the structure is overwritten by the following thread creation.
Title: Re: Amazed or Mortified?
Post by: AK_AK on January 27, 2019, 11:49:37 AM
...if i understand you [Hutch]  correctly,   You are describing why, with pure assembler, there is no need for TLS [ TLS is win32 programming ]
going through the literature; straddling between MASM, and win32 programming is a way to make something simple, into somthing unnesscarily complicated.

What i found interesting is what seems to be a bunch of WIN32 functions [such as TLS] that are stop gaps for very exclusive shortcomings of the win32 language.

pure MASM seems to not have this problem. The condition of mentally straddling both languages came about for me when i began writing MASM code but making calls to the windows API functions from an assembler routine.

Perhaps this may explain why im so scatter brained about it for the time being?
Title: Re: Amazed or Mortified?
Post by: hutch-- on January 27, 2019, 11:59:21 AM
Its more the case that the reference material is all over the place and it caters for a variety of different languages where assembler code is relatively free of many of the mechanisms of higher level languages. You lost the familiarity of the high level language to gain the access to many other things. It is often the case that something that is useful in one language simply has no value in another.

In assembler you are free of most of these mechanisms but have to construct your own. It is more work but it gives you the control you need to do different things. Much high level code that you have to deal with to interact with the OS is like shoveling sewerage where the real deal is architectural freedom and algorithms written in pure mnemonics when you are chasing speed and/or power.
Title: Re: Amazed or Mortified?
Post by: TBRANSO1 on January 27, 2019, 12:41:51 PM
Quote from: hutch-- on January 27, 2019, 11:36:38 AM
There are a few tricks when creating multiple threads with the same data, With CreateThread() you pass a structure full of whatever you need the thread to have, the caller then runs a spinlock that waits for a reply from the newly created thread and when the reply is received, it the creates the next one. Without this the structure is overwritten by the following thread creation.

I will tackle this exercise next, sounds interesting.

Lots of gems in this thread by you guys.

Hutch,

As for your libraries, they are extensive.  Since I created the Powershell script to search for functions, I have found mostly what I am looking for... the cool thing is that the returns from the script give me the library, line numbers where the functions are mentioned.

I'm a noob to Win32 API, and Windows programming.  The naming conventions were a tough nut to crack at first, but I'm getting it down.  I've got a pdf copy of Petzhold's famous book, so I need to go through that and learn more.

The confusing thing about all of the Microsoft DLLs, are that there are so many DLLs, and each one from each generation seems to change the data types and functions a little bit.  And I realize that since the Windows OS has been a target of malware for the past 4 decades that MS has always had to recreate new functions to deal with security risks and the new era of multiprocessor, concurrent and parallel programming.  It's hard to distinguish what was written in the DOS era, Win32 era, NT era, and now the x64 UWP era. The Linux/Unix libraries are little more codified and packaged tightly, so dealing with POSIX is a lot easier. But, I am still learning architecturally where they are in the hierarchy, and what they do, where they belong, etc... but now I've got a better understanding that your libraries are Kernel focused, which is cool... no need to carry extra baggage around, right?  :exclaim:

Quote
Perhaps this may explain why im so scatter brained about it for the time being?

me, too
Title: Re: Amazed or Mortified?
Post by: AK_AK on January 27, 2019, 01:07:59 PM
here is something i recently found at the end of some breadcrumbs.


https://en.wikibooks.org/wiki/Windows_Programming

and there is this fellow, who has some advanced topics that i would recommend for later:

https://www.agner.org/optimize/

some of the links in that site might be dead, but the meat and bones are there, and its a good piece, in my opinion to include in your  programming/development resources.

thnx again Hutch;Tbrans et.al.
Title: Re: Amazed or Mortified?
Post by: TBRANSO1 on January 27, 2019, 02:13:34 PM
Quote from: AK_AK on January 27, 2019, 01:07:59 PM
here is something i recently found at the end of some breadcrumbs.


https://en.wikibooks.org/wiki/Windows_Programming

and there is this fellow, who has some advanced topics that i would recommend for later:

https://www.agner.org/optimize/

some of the links in that site might be dead, but the meat and bones are there, and its a good piece, in my opinion to include in your  programming/development resources.

thnx again Hutch;Tbrans et.al.

Yes, I have those links.  The hours of the day are the limit.  There are a lot of resources devoted to MS code, just takes time and practice to digest it all.

I have seen the Agner site, to highfalutin for me right now.  I have just barely got to the intermediate level in the use of the six 32-bit registers, and getting started on the 64-bit... trying to use the FPU, SSE, ST, XMM. is next.  I'm not a math person, so I don't have the imagination or ability to think of code to use those fancy registers.  I'd just like to be able to compute a very long Fibonacci or Factorial series beyond the limits of 64-bits, like in Python or the Boost libraries, where you can get a number a page long.

Title: Re: Amazed or Mortified?
Post by: jj2007 on January 27, 2019, 02:51:18 PM
Quote from: TBRANSO1 on January 27, 2019, 12:41:51 PMThe confusing thing about all of the Microsoft DLLs, are that there are so many DLLs, and each one from each generation seems to change the data types and functions a little bit.

You can do all your coding with a 20 year old Win32.hlp file. Try that with any other OS 8) 
QuoteThe TlsAlloc function allocates a thread local storage (TLS) index. Any thread of the process can subsequently use this index to store and retrieve values that are local to the thread.

DWORD TlsAlloc(VOID)


Parameters

This function has no parameters.

Return Values

If the function succeeds, the return value is a TLS index.
If the function fails, the return value is 0xFFFFFFFF. To get extended error information, call GetLastError.

Remarks

The threads of the process can use the TLS index in subsequent calls to the TlsFree, TlsSetValue, or TlsGetValue functions.
TLS indexes are typically allocated during process or dynamic-link library (DLL) initialization. Once allocated, each thread of the process can use a TLS index to access its own TLS storage slot. To store a value in its slot, a thread specifies the index in a call to TlsSetValue. The thread specifies the same index in a subsequent call to TlsGetValue, to retrieve the stored value.

The constant TLS_MINIMUM_AVAILABLE defines the minimum number of TLS indexes available in each process. This minimum is guaranteed to be at least 64 for all systems.
TLS indexes are not valid across process boundaries. A DLL cannot assume that an index assigned in one process is valid in another process.
A DLL might use TlsAlloc, TlsSetValue, TlsGetValue, and TlsFree as follows: ...
Title: Re: Amazed or Mortified?
Post by: TBRANSO1 on January 27, 2019, 03:32:17 PM
@Jochen,

Yeah, after looking into this... I gotta drift of the underpinnings of the TLS.

The TLS is a an array data structure with slots of a size to contain the contents of the global data of the programmer's choosing (or it can be done automagically).  Then each when the thread gets, and plays with the data on a local variable, then sets it back.  It's just getting access to the slot for that time, I'm sure behind the scenes the array data structure is a synchronized queue.

I think the structure is interesting, as it allows different functions anywhere to access the global indexed structure and pull out the data, work on it, and store it back, it seems like a monitor in HLL of sorts... the information is not hanging it's nuts out in the breeze for everyone to kick them.  :icon_redface:  how rude of me, sorry... LOL


Title: Re: Amazed or Mortified?
Post by: AK_AK on January 28, 2019, 11:03:26 AM
That sounds like you have, an understanding to the extent, required to be able to use this tool for a purpose.
good stuff TBRANS.

Im wondering if you have encountered the reason DLLs exist in the first place? there is a bare minimum set you can get away with.
That is the place where the WIN API lurks and sleeps until called on.
You can even roll your own DLLs, and this was an industry standard at the time, also a source of security compromise and a major frustration known as DLL Hell.  basically DLLs are a library of executable routines,

I have read, that DLLs have to do with page limits in the CPU.
you want the main routine to be small enough, to fit in the CPU cache, and call out to DLL segments that will fit in cache as well.
if you smashed it all together you dont fit in the CPU, and you have a massive performance penalty.

Sound good?
Title: Re: Amazed or Mortified?
Post by: felipe on January 28, 2019, 01:52:57 PM
you will never have all what you want in the cache memory. dlls are good to make the executable size smaller and to make changes in the routines without affecting the program that use them... :idea:
Title: Re: Amazed or Mortified?
Post by: hutch-- on January 28, 2019, 02:16:24 PM
I think you need to know where DLLs came from. Long long ago Windows would run in 2 megabytes of ram and that meant extremely efficient code to do that. A DLL is a very good technique for reducing memory demands in that you can load, use then remove the DLL so that you don't have a pile of dead code in memory that is not being used most of the time. In the DOS days you could shell out to another program, get it to do something then shut it down but the level of inter-communication was very poor, generally done with temp disk files.

DLL hell comes from elsewhere, when Microsoft started to release different versions of a particular DLL, you could get bitten by not having the right version but if you wrote a DLL for a specific task, it was not effected by Microsoft's stuffups. They are still a very efficient method of extending an application where you use the "Load on call" technique, perform the task required then release the DLL and all of its resources. It also allows you to share a capacity between apps so that all of them don't have to have a pile of dead code that is barely ever used.
Title: Re: Amazed or Mortified?
Post by: AK_AK on January 28, 2019, 04:27:54 PM
..Yes i remember the time when everybody software manufacturer, and thier mother would write a custom DLL for thier applications, and version mismatch was a repeated issue [DLL HELL]   The use of a DLL was a way to avoid reinventing a wheel or a screwdriver or a canopener.  you just pack the things [routines] into a library and they are available for repetitive use, by many applications.  DLLs had the luxury of being large, but each individual routine or resource was optimized in sizel to fit a particular memory space

If the code calling the DLL routine was kept small enought to fit in the CPU cache, there was a speed boost , as the cache is fast compared to ram.
if the DLL routine was also small enough to fit in the cache, or in a cache page, then the speed boost was preserved.
when a piece of code is to big to fit entirely in cache then, time is consumed rewriting CPU cache.
SO.. a cache sized piece of code, that calls on cache sized pieces of code often, is supposed to be faster then a piece of code , that is bloated with identical routines peppered throughout, the main routine.

I forget where this was i think it was an optimization technique from optimization for IA32 programming. The overall goal was supposed to be keeping your code running in the CPU cache, and not spending time to rewrite the cache.
Title: Re: Amazed or Mortified?
Post by: TBRANSO1 on January 30, 2019, 02:41:13 AM
I was on Bytepointer. I happened to be poking around, and found a document related to the PE and the Linker.  In the 500 or more pages, I was just zipping through it, since most of it was beyond my grasp at the moment, but I saw TLS and stopped.

So, this is way the TLS works if we're using it.  When it finds that the developer is calling upon the TLSIndex call and TLSAloc. The linker creates a special section below the .Data section called .Tls.. it is here that the special data structure offset is place and the pointer to the Heap where it is located.  The amount of slots is determined by the developer or by default.  So, this is where the indices, and data is placed for threads that are using this function.

Title: Re: Amazed or Mortified?
Post by: TimoVJL on January 30, 2019, 03:35:47 AM
__declspec(thread) int tls_i = 1;
int __cdecl main(void)
{
printf("tls_i: %d\n", tls_i);
return 0;
}
_main:
00000000  55                       push ebp
00000001  8BEC                     mov ebp, esp
00000003  A100000000               mov eax, dword ptr [__tls_index]
00000008  648B0D00000000           mov ecx, dword ptr fs:[__tls_array]
0000000F  8B1481                   mov edx, dword ptr [ecx+eax*4]
00000012  8B8200000000             mov eax, dword ptr [_tls_i]
00000018  50                       push eax
00000019  6800000000               push $SG4295
0000001E  E800000000               call _printf
00000023  83C408                   add esp, 8h
00000026  33C0                     xor eax, eax
00000028  5D                       pop ebp
00000029  C3                       ret
Title: Re: Amazed or Mortified?
Post by: TBRANSO1 on January 30, 2019, 04:12:35 AM
Quote from: TimoVJL on January 30, 2019, 03:35:47 AM

00000008  648B0D00000000           mov ecx, dword ptr fs:[__tls_array]


Right there! Unraveled the mystery.

Further, I read that this convention is agreed upon by all compiler / assembler writers, and platform agnostic, so it works on every platform OS with their respective Kernel functions.

Title: Re: Amazed or Mortified?
Post by: LordAdef on February 01, 2019, 08:47:19 AM
What a nice thread!!!
Very educational.
I intend to do some threading in the future, for my little game. But really... I am way below my limit in performance, and doing GDI stuff instead of DirectX AND doing a sleep to prevent my cpu do fry.
I am porting to DirectX (with the help of Marinus) and later see what I can do with some side threads. Just for the kick. Sometimes I wonder if the hassle is worth it.
Soon I will bother you guys :eusa_dance:
Title: Re: Amazed or Mortified?
Post by: aw27 on February 01, 2019, 09:23:19 PM
While knowledge about TLS is indeed very educative we can, in many cases, do without them. First, we can and most use local variables as much as possible - they are 100% thread safe and disappear when the thread terminates. Second, we can work with synchronization objects like mutexes to access a single non-threaded global variable in the cases where the values are to be accumulated. Finally, we can use an indexed global variable with an index per thread.
Title: Re: Amazed or Mortified?
Post by: hutch-- on February 01, 2019, 09:42:30 PM
There is a technique that I use from time to time with multi-threaded code that calls the same thread more that once, something you would use to handle multiple internet connections, create a structure in each thread as it is started having passed any required information to the start of the thread and then pass the address of that structure to any stage up the call tree so that data is accessible from any point in that thread. Works effectively like a GLOBAL but is created on the stack when the thread starts.

Any memory you allocate within this system is fully independent of any other and you can create as many threads as you have memory and thread handles to deal with. I am sure that TLS has its uses but I have not found one yet.