Author Topic: Coroutines  (Read 1187 times)

Biterider

  • Moderator
  • Member
  • *****
  • Posts: 941
  • ObjAsm Developer
    • ObjAsm
Re: Coroutines
« Reply #15 on: April 07, 2022, 03:44:15 PM »
Hi
This is a nice example that graphically shows what is intended. The output clarifies what is happening.
You habe 2 procedures foo() and bar(). The former is interrupted at a fixed point (after cout<<"a") and the same thread executes bar() up to its own first fixed interruption point (cout<<"1"). At that point, execution is transfered back to foo(), which continues from the point at which it left. This process continues until the end of the coroutines is reached.

This is a nice example that graphically shows what is intended. The output clarifies what is happening.
You have 2 procedures foo() and bar(). The former breaks at a fixed point (after cout<<"a"), and the same thread executes bar() up to its own first breakpoint (cout<<"1"). At that point, execution is transferred back to foo(), which continues from the point at which it left. this process continues until the end of the coroutines is reached.

If you look at the explanation from the Boost library (link above), it shows the different possibilities it offers and a timing chart showing the flow of operations.

Biterider

Biterider

  • Moderator
  • Member
  • *****
  • Posts: 941
  • ObjAsm Developer
    • ObjAsm
Re: Coroutines
« Reply #16 on: April 07, 2022, 06:29:41 PM »
Hi
I wrote a small program to check how fibers work.
They do exactly what Knuth describes.

The only "strange" behavior is that they don't "return". The last activity of a fiber is switching back to the main fiber using SwitchToFiber.
An example can be seen here https://docs.microsoft.com/en-us/windows/win32/procthread/using-fibers.
I also checked that the executing thread doesn't change!

Next I'll see how much overhead is associated with switching contexts...

Biterider

Biterider

  • Moderator
  • Member
  • *****
  • Posts: 941
  • ObjAsm Developer
    • ObjAsm
Re: Coroutines
« Reply #17 on: April 08, 2022, 03:05:28 AM »
Very good reading on this topic
https://graphitemaster.github.io/fibers/

Biterider

jj2007

  • Member
  • *****
  • Posts: 12694
  • Assembler is fun ;-)
    • MasmBasic
Re: Coroutines
« Reply #18 on: April 08, 2022, 04:12:23 AM »
Quote
fibers yield themselves to allow another fiber to run

Now what is the big difference between "yield" and return or jmp?

Biterider

  • Moderator
  • Member
  • *****
  • Posts: 941
  • ObjAsm Developer
    • ObjAsm
Re: Coroutines
« Reply #19 on: April 08, 2022, 04:15:11 AM »
Hi
I looked at the x64 SwitchToFiber API implementation to see how expensive it is.
I reduced the code and replaced the omitted parts with ellipses.

Code: [Select]
mov         rdx,qword ptr gs:[30h]    <--- Saving some internal information
 mov         rax,qword ptr [rdx+20h] 
 mov         r8,qword ptr [rcx+20h] 
 mov         qword ptr [rdx+1478h],r8 
 mov         qword ptr [rdx+20h],rcx 
 mov         r8,qword ptr [rdx+10h] 
 mov         qword ptr [rax+18h],r8 
 ...
 lea         r8,[rax+30h] 
 mov         qword ptr [r8+90h],rbx   <--- Storing registers
 mov         qword ptr [r8+0A0h],rbp 
 mov         qword ptr [r8+0A8h],rsi 
 mov         qword ptr [r8+0B0h],rdi 
 ...
 movaps      xmmword ptr [r8+200h],xmm6 
 movaps      xmmword ptr [r8+210h],xmm7 
 movaps      xmmword ptr [r8+220h],xmm8 
 ...
 stmxcsr     dword ptr [r8+34h] 
 fnclex 
 wait 
 fnstcw      word ptr [r8+100h] 
 mov         r9,qword ptr [rsp]     
 mov         qword ptr [r8+0F8h],r9 
 mov         qword ptr [r8+98h],rsp 
 mov         r8,qword ptr [rcx+10h] 
 mov         qword ptr [rdx+8],r8 
 ...
 rdsspq      rdx                       <--- begin Shadow Stack manipulation
 mov         r9,qword ptr [rcx+528h] 
 rstorssp    qword ptr [r9] 
 saveprevssp 
 sub         rdx,8 
 mov         qword ptr [rax+528h],rdx 
 lea         r8,[rcx+30h] 
 mov         rbx,qword ptr [r8+90h]     <--- restoring destination register content
 mov         rbp,qword ptr [r8+0A0h] 
 mov         rsi,qword ptr [r8+0A8h] 
 mov         rdi,qword ptr [r8+0B0h] 
 ...
 movaps      xmm6,xmmword ptr [r8+200h] 
 movaps      xmm7,xmmword ptr [r8+210h] 
 movaps      xmm8,xmmword ptr [r8+220h]
...
 ldmxcsr     dword ptr [r8+34h] 
 fldcw       word ptr [r8+100h] 
 mov         rsp,qword ptr [r8+98h]    <---- !!! magic happens here
 ret 

As you can see, context switching is very time consuming and requires some extra shadow stack manipulation (stuff for another thread). Check the last line before "ret", it does the trick. It restores the previous stack and returns (sort of) to the original caller. In reality, the operating system performs some additional checks and detours and then returns.

So far for this analysis.

Biterider