News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Coroutines

Started by Biterider, April 06, 2022, 06:20:31 AM

Previous topic - Next topic

Biterider

Hi
This is a nice example that graphically shows what is intended. The output clarifies what is happening.
You habe 2 procedures foo() and bar(). The former is interrupted at a fixed point (after cout<<"a") and the same thread executes bar() up to its own first fixed interruption point (cout<<"1"). At that point, execution is transfered back to foo(), which continues from the point at which it left. This process continues until the end of the coroutines is reached.

This is a nice example that graphically shows what is intended. The output clarifies what is happening.
You have 2 procedures foo() and bar(). The former breaks at a fixed point (after cout<<"a"), and the same thread executes bar() up to its own first breakpoint (cout<<"1"). At that point, execution is transferred back to foo(), which continues from the point at which it left. this process continues until the end of the coroutines is reached.

If you look at the explanation from the Boost library (link above), it shows the different possibilities it offers and a timing chart showing the flow of operations.

Biterider

Biterider

Hi
I wrote a small program to check how fibers work.
They do exactly what Knuth describes.

The only "strange" behavior is that they don't "return". The last activity of a fiber is switching back to the main fiber using SwitchToFiber.
An example can be seen here https://docs.microsoft.com/en-us/windows/win32/procthread/using-fibers.
I also checked that the executing thread doesn't change!

Next I'll see how much overhead is associated with switching contexts...

Biterider

Biterider


jj2007

Quotefibers yield themselves to allow another fiber to run

Now what is the big difference between "yield" and return or jmp?

Biterider

Hi
I looked at the x64 SwitchToFiber API implementation to see how expensive it is.
I reduced the code and replaced the omitted parts with ellipses.

mov         rdx,qword ptr gs:[30h]    <--- Saving some internal information
mov         rax,qword ptr [rdx+20h] 
mov         r8,qword ptr [rcx+20h] 
mov         qword ptr [rdx+1478h],r8 
mov         qword ptr [rdx+20h],rcx 
mov         r8,qword ptr [rdx+10h] 
mov         qword ptr [rax+18h],r8 
...
lea         r8,[rax+30h] 
mov         qword ptr [r8+90h],rbx   <--- Storing registers
mov         qword ptr [r8+0A0h],rbp 
mov         qword ptr [r8+0A8h],rsi 
mov         qword ptr [r8+0B0h],rdi 
...
movaps      xmmword ptr [r8+200h],xmm6 
movaps      xmmword ptr [r8+210h],xmm7 
movaps      xmmword ptr [r8+220h],xmm8 
...
stmxcsr     dword ptr [r8+34h] 
fnclex 
wait 
fnstcw      word ptr [r8+100h] 
mov         r9,qword ptr [rsp]     
mov         qword ptr [r8+0F8h],r9 
mov         qword ptr [r8+98h],rsp 
mov         r8,qword ptr [rcx+10h] 
mov         qword ptr [rdx+8],r8 
...
rdsspq      rdx                       <--- begin Shadow Stack manipulation
mov         r9,qword ptr [rcx+528h] 
rstorssp    qword ptr [r9] 
saveprevssp 
sub         rdx,8 
mov         qword ptr [rax+528h],rdx 
lea         r8,[rcx+30h] 
mov         rbx,qword ptr [r8+90h]     <--- restoring destination register content
mov         rbp,qword ptr [r8+0A0h] 
mov         rsi,qword ptr [r8+0A8h] 
mov         rdi,qword ptr [r8+0B0h] 
...
movaps      xmm6,xmmword ptr [r8+200h] 
movaps      xmm7,xmmword ptr [r8+210h] 
movaps      xmm8,xmmword ptr [r8+220h]
...
ldmxcsr     dword ptr [r8+34h] 
fldcw       word ptr [r8+100h] 
mov         rsp,qword ptr [r8+98h]    <---- !!! magic happens here
ret 


As you can see, context switching is very time consuming and requires some extra shadow stack manipulation (stuff for another thread). Check the last line before "ret", it does the trick. It restores the previous stack and returns (sort of) to the original caller. In reality, the operating system performs some additional checks and detours and then returns.

So far for this analysis.

Biterider