Ok - well after some more (and more) reading :( - I think I've got the main idea:
Statement:IOCP is guaranteed to do operation in FIFO <-- however in multi-threaded scenarios (due to thread contention)
the order of the de-queued overlapped operation is almost guaranteed to be "out-of-order" [for socket operations, not necessarily file ops)
For a confusing explanation see:
(firstly)
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx(secondly)
https://stackoverflow.com/questions/27955812/why-are-i-o-completion-port-packets-queued-in-fifo-order-if-they-may-be-dequeued(thirdly)
https://www.apriorit.com/dev-blog/412-win-api-programming-iocpComment: re: Statement (above) - sockets context
This is not a problem if we do a single read & write per socket AND per singular worker thread (the end). However this does not really take
advantage of the IOCP architecture and you end up with performance equal to event based asynchronous I/O & procedure call (APC).
For some metrics see:
http://www.winsocketdotnetworkprogramming.com/winsock2programming/winsock2advancedscalableapp6a.htmlOther opinions saying the same :
https://stackoverflow.com/questions/32053666/how-to-ensure-thread-safe-in-iocp-receiveAlternative 1Specifically a single read and multiple writes without directly sequencing read/write operations is possible. This would increase performance
moderately over event based (APC). A pending write list (linked list) is used to queue a shared "pending_write_list" across all threads (locking
a pending write list critical section) and processing ALL the writes in a per single worker thread manner. Unfortunately this creates the scenario
where only a single thread ends up handling the write queue in practice, especially when LARGE outbound writes are needed. The other threads
sit idle as the queue is being emptied.
For relevant explanation see :
https://accu.org/index.php/journals/1956For a code example - refer again to the convoluted and erroneous:
http://www.winsocketdotnetworkprogramming.com/winsock2programming/winsock2advancedscalableapp6b.htmlAlternative 2 Possibly close to optimalAssign read and write sequence numbers atomically on a per socket operation (your buffer object containing the overlapped structure).
En-queue these multiple read or write buffers per socket in an independent pending_read and pending_write list (shared across threads).
Sort the reads and write lists per socket and per sequence number ascending within the list. Per single worker thread, process a threshold
(0 to maximum tolerance) of read and write ops. The process will de-queue on a list, only if the sequence number per socket - has a sequence number + 1
for that socket also in the list OR that sequence number + 1 is the next assignable sequence number for that operation - on that socket.
Else it should move on to the next socket in the list, as outstanding reads or writes have not as yet arrived or a socket error occurred.
Any comments - corrections?