In digital electronics,had task to wire several nand gates to first make 4bit adder
Second task add more nand gates that perform 4bit neg= adder changes to subtractor
I think that low level skill combined with port asm snippets/macros to microcode is needed
To be able make new instructions
Thanks to very big chips nowadays,gpu pixelshaders already has instructions dreamed of earlier
1 cycle trigo functions thanks to luts in hardware
And newer pixelshader versions come with more instructions,like earlier changes in x86 ,286,386,486,586,686,earliest x64
Memtomem copy if not directly supported on main cpu,could be performed with gpu if you use vram instead
Using one of apis that let you create graphics surface in vram and switch between locked (cpu can read/write) and unlocked
About 32 registers,there already are more than that in x64
16 gp regs+control regs,rip reg,16 xmm/ymm regs+control reg,8 fpu regs+control regs