Using assembler versus a compiler will make a difference in the benchmarks. Whether it is noticeable or negligible depends on a few factors. You can use the CPU/GPU (or APU) for calculating. You can also use the supported instruction sets on the hardware to get precision, efficiency and speed. A second factor would be the compiler that is being used. Modern compilers are constantly being updated to use newer code generating methods. You could instead write it by hand. That makes you in control of what is happening, and how you want it done. You can still run performance tests whether you wrote it by hand, or you let a compiler do the dirty work. That way you can find where the bottlenecks are (if any), and work on that section of the code (or assembler output of the compiler.)
I will provide an opinion on this, though. I will say that it would mostly fall on the hardware, computer and software design; this would be the storage and processing technologies used; the throughput of the hardware, cables and connectors; even the operating system being used (multitasking abilities, memory management capabilities, etc.). [I feel that I may be missing something, any others?]
The are just many factors to this. It all falls on proper designing.