The MASM Forum

Projects => ObjAsm => Topic started by: Biterider on April 06, 2022, 06:20:31 AM

Title: Coroutines
Post by: Biterider on April 06, 2022, 06:20:31 AM
Hi
I want to start a discussion about coroutines.
They're a relatively old concept, before pre-emptive multitasking was implemented, but some people claim that they're still useful and that they can only be implemented using assembler. That last statement caught my attention and I took a closer look at this technique.
One of the challenges I see is the context switches and their high cost in terms of speed. I found some APIs that did the hard work so that at the end a testbed can be created to validate the whole concept.
What bothers me the most is that I don't see any advantage over well-prepared code that is broken up into short routines that are called one after the other. This is a much cleaner approach in my opinion without all the complexity of managing an execution context.
Maybe I'm missing something, so I'm interested in your opinion and experience on this topic.

Biterider
Title: Re: Coroutines
Post by: jj2007 on April 06, 2022, 07:18:26 AM
Hi Biterider,

That sounds way more interesting than the endless covid and Ucraine threads, but perhaps you should give us some meat to put our teeth into: what exactly do you mean with "coroutines"? How are they different from simple threads on the one and Switch... Case... Endsw constructs on the other hand?
Title: Re: Coroutines
Post by: Biterider on April 06, 2022, 03:36:08 PM
Hi
Sorry... I was so focused on the topic that I forgot to give you more context.  :biggrin:

Coroutines are a superclass of regular routines. The concept comes from the time of cooperative multitasking. The core idea is to stop a longer-running routine at certain points in the code, switch context, and pass control to another routine. Sometime later, the context of the first routine is restored and control is transferred back to the point where it was interrupted.
This process continues until the routine reaches its end.

Wikipedia has a very good article on this https://en.wikipedia.org/wiki/Coroutine (https://en.wikipedia.org/wiki/Coroutine)  :icon_idea:

Biterider
Title: Re: Coroutines
Post by: jj2007 on April 06, 2022, 06:33:52 PM
I've seen the article, but I can't see the big difference to CreateThread & friends...
Title: Re: Coroutines
Post by: Biterider on April 06, 2022, 08:11:27 PM
Hi JJ
That's my point too.
The difference that one is pre-emptive and the other cooperative.
The first has a known overhead of switching from one thread to another, while the second should be less expensive depending on your implementation, and it is always single-threaded. This reduces some synch. complexity and an speedup in some cases.

When you look at how it can be implemented, you immediately concluded that you need a strategy for preserving and restoring context. Luckily, there are some APIs that do this work for you. Without having tested it, I suspect that this is not a lightweight thing.

Biterider
Title: Re: Coroutines
Post by: jj2007 on April 06, 2022, 08:45:48 PM
My standard procedure in Gui applications is:
- CreateThread...
- Threadproc proc
  ... do work ...
  when ready, SendMessage hMainWindow, WM_THREADHASDONEITSWORK, result1, result2
  ret
  ThreadProc endp
- Switch uMsg
  Case WM_THREADHASDONEITSWORK
      react

So what's the added value of a coroutine?
Title: Re: Coroutines
Post by: Biterider on April 06, 2022, 08:59:44 PM
Hi JJ
That's my question :biggrin:
What are the benefits of coroutines?

Are they easier to program at the expense of execution speed?
Are they faster than preemptive multitasking? Always or just in some situations?

Before I go any deeper, I want to understand the goals properly...

Biterider

Title: Re: Coroutines
Post by: HSE on April 06, 2022, 09:44:08 PM
Hi!!

From most complete ignorance, I have the idea that OS can run a Thread in a different core. But if coroutines don't interact with OS they are monocore by force. No?
Title: Re: Coroutines
Post by: jj2007 on April 06, 2022, 09:44:42 PM
It could just be HLL fake. A name that doesn't mean much, a name for things we do all the time in Assembly. But to understand, we would need a real life example...
Title: Re: Coroutines
Post by: HSE on April 06, 2022, 10:30:39 PM
Look like Kotlin coroutines have little to do with Knuth coroutines:
Quote
The CoroutineDispatcher that is designed for offloading blocking IO tasks to a shared pool of threads.

This HLL coroutines are some kind of big architecture.
Title: Re: Coroutines
Post by: jj2007 on April 06, 2022, 10:43:32 PM
designed for offloading blocking IO tasks to a shared pool of threads

Now explain what is the difference to CreateThread. It's not a "shared" "pool", whatever the philosophical content of these wise words might be (the opposite would be "unshared non-pooled single thread"), but otherwise it looks like the exact definition of a thread's purpose. Offload blocking IO tasks, yeah, that's it. Download a web page without making the application unresponsive, for example. Sounds less professional than "offloading blocking IO tasks to a shared pool of threads", but it's trivial, and trivial tasks do not need that aura of professionalism imho.
Title: Re: Coroutines
Post by: Biterider on April 07, 2022, 05:02:15 AM
Hi
@HSE: IMHO, Kotlin uses a completely different approach than Donald Knuth's original definition. Reading about this topic, I noticed that the actual implementation varies a lot from language to language.
@JJ: you have to think about when the term coroutine was coined and what the technological status was at that time.

I checked the MS page that describes the so-called fibers. They come very close to what Kunth intended. Just replace the API names with Kunth's and you have a pretty close match.
https://docs.microsoft.com/en-us/windows/win32/procthread/fibers (https://docs.microsoft.com/en-us/windows/win32/procthread/fibers)

Biterider
Title: Re: Coroutines
Post by: jj2007 on April 07, 2022, 05:42:28 AM
Quote
In general, fibers do not provide advantages over a well-designed multithreaded application.

Says Micros**t :tongue:

Actually, I had read about fibers years ago, but since virtually nobody uses them, I lost interest.
Title: Re: Coroutines
Post by: FORTRANS on April 07, 2022, 07:33:18 AM
Hi,

   After reviewing my notes, I found a program I wrote to
implement a coroutine after reading about them in Knuth's
book.  So a routine that gets user input and updates the
upper display.  If a request from the user to go to the bottom
of the screen is seen, a jump to the other routine is made.
Which gets user input and updates the lower screen until a
request to go to the upper display is seen.

   Unfortunately, the comments in the program state that it
didn't really get the intended job done.  Supporting Knuth's
comment that it is difficult to find a good, small example of
coroutines.  But it is easy to write a really clunky one.

Cheers,

Steve N.
Title: Re: Coroutines
Post by: jj2007 on April 07, 2022, 07:51:20 AM
Here is an explanation with some examples (https://www.boost.org/doc/libs/1_68_0/libs/coroutine/doc/html/coroutine/intro.html), although I have no clue what they really want to say or to achieve :tongue:

(https://www.boost.org/doc/libs/1_68_0/libs/coroutine/doc/images/foo_bar.png)
Title: Re: Coroutines
Post by: Biterider on April 07, 2022, 03:44:15 PM
Hi
This is a nice example that graphically shows what is intended. The output clarifies what is happening.
You habe 2 procedures foo() and bar(). The former is interrupted at a fixed point (after cout<<"a") and the same thread executes bar() up to its own first fixed interruption point (cout<<"1"). At that point, execution is transfered back to foo(), which continues from the point at which it left. This process continues until the end of the coroutines is reached.

This is a nice example that graphically shows what is intended. The output clarifies what is happening.
You have 2 procedures foo() and bar(). The former breaks at a fixed point (after cout<<"a"), and the same thread executes bar() up to its own first breakpoint (cout<<"1"). At that point, execution is transferred back to foo(), which continues from the point at which it left. this process continues until the end of the coroutines is reached.

If you look at the explanation from the Boost library (link above), it shows the different possibilities it offers and a timing chart showing the flow of operations.

Biterider
Title: Re: Coroutines
Post by: Biterider on April 07, 2022, 06:29:41 PM
Hi
I wrote a small program to check how fibers work.
They do exactly what Knuth describes.

The only "strange" behavior is that they don't "return". The last activity of a fiber is switching back to the main fiber using SwitchToFiber.
An example can be seen here https://docs.microsoft.com/en-us/windows/win32/procthread/using-fibers (https://docs.microsoft.com/en-us/windows/win32/procthread/using-fibers).
I also checked that the executing thread doesn't change!

Next I'll see how much overhead is associated with switching contexts...

Biterider
Title: Re: Coroutines
Post by: Biterider on April 08, 2022, 03:05:28 AM
Very good reading on this topic
https://graphitemaster.github.io/fibers/ (https://graphitemaster.github.io/fibers/)

Biterider
Title: Re: Coroutines
Post by: jj2007 on April 08, 2022, 04:12:23 AM
Quote
fibers yield themselves to allow another fiber to run

Now what is the big difference between "yield" and return or jmp?
Title: Re: Coroutines
Post by: Biterider on April 08, 2022, 04:15:11 AM
Hi
I looked at the x64 SwitchToFiber API implementation to see how expensive it is.
I reduced the code and replaced the omitted parts with ellipses.

Code: [Select]
mov         rdx,qword ptr gs:[30h]    <--- Saving some internal information
 mov         rax,qword ptr [rdx+20h] 
 mov         r8,qword ptr [rcx+20h] 
 mov         qword ptr [rdx+1478h],r8 
 mov         qword ptr [rdx+20h],rcx 
 mov         r8,qword ptr [rdx+10h] 
 mov         qword ptr [rax+18h],r8 
 ...
 lea         r8,[rax+30h] 
 mov         qword ptr [r8+90h],rbx   <--- Storing registers
 mov         qword ptr [r8+0A0h],rbp 
 mov         qword ptr [r8+0A8h],rsi 
 mov         qword ptr [r8+0B0h],rdi 
 ...
 movaps      xmmword ptr [r8+200h],xmm6 
 movaps      xmmword ptr [r8+210h],xmm7 
 movaps      xmmword ptr [r8+220h],xmm8 
 ...
 stmxcsr     dword ptr [r8+34h] 
 fnclex 
 wait 
 fnstcw      word ptr [r8+100h] 
 mov         r9,qword ptr [rsp]     
 mov         qword ptr [r8+0F8h],r9 
 mov         qword ptr [r8+98h],rsp 
 mov         r8,qword ptr [rcx+10h] 
 mov         qword ptr [rdx+8],r8 
 ...
 rdsspq      rdx                       <--- begin Shadow Stack manipulation
 mov         r9,qword ptr [rcx+528h] 
 rstorssp    qword ptr [r9] 
 saveprevssp 
 sub         rdx,8 
 mov         qword ptr [rax+528h],rdx 
 lea         r8,[rcx+30h] 
 mov         rbx,qword ptr [r8+90h]     <--- restoring destination register content
 mov         rbp,qword ptr [r8+0A0h] 
 mov         rsi,qword ptr [r8+0A8h] 
 mov         rdi,qword ptr [r8+0B0h] 
 ...
 movaps      xmm6,xmmword ptr [r8+200h] 
 movaps      xmm7,xmmword ptr [r8+210h] 
 movaps      xmm8,xmmword ptr [r8+220h]
...
 ldmxcsr     dword ptr [r8+34h] 
 fldcw       word ptr [r8+100h] 
 mov         rsp,qword ptr [r8+98h]    <---- !!! magic happens here
 ret 

As you can see, context switching is very time consuming and requires some extra shadow stack manipulation (stuff for another thread). Check the last line before "ret", it does the trick. It restores the previous stack and returns (sort of) to the original caller. In reality, the operating system performs some additional checks and detours and then returns.

So far for this analysis.

Biterider