You're welcome.
I assembled that program in Linux with debug information. Then I disassembled with 2 different programs, radare2 and objdump. I'm attaching both disassembly, maybe can be usefull.
PS: I disabled draw from source code, that was taking long time to print each screen.
The linux calling convention is:
call rdi rsi rdx rcx r8 r9
The program result:
1945 elementos ;elements
qtd movimentacoes = 6249 ;moves
qtd comparacoes = 8420 ;comparations