Agner Fog on using FPU and xmm regs in Win7-64: (http://www.agner.org/optimize/calling_conventions.pdf)
QuoteHowever, a public discussion forum quotes the following answers from Microsoft engineers
regarding this issue: "From: Program Manager in Visual C++ Group, Sent: Thursday, May
26, 2005 10:38 AM. It does preserve the state. It's the DDK page that has stale information,
which I've requested it to be changed. Let them know that the OS does preserve state of
x87 and MMX registers on context switches." and "From: Software Engineer in Windows
Kernel Group, Sent: Thursday, May 26, 2005 11:06 AM. For user threads the state of legacy
floating point is preserved at context switch. But it is not true for kernel threads.
Kernel mode drivers can not use legacy floating point instructions." (www.planetamd64.com/index.php?showtopic=3458&st=100).
The issue has finally been resolved with the long overdue publication of a more detailed ABI
for x64 Windows in the form of a document entitled "x64 Software Conventions", well hidden
in the bin directory (not the help directory) of some compiler packages. This document says:
"The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are preserved across
context switches. There is no explicit calling convention for these registers. The use of
these registers is strictly prohibited in kernel mode code." The same text has later appeared
at the Microsoft website (msdn2.microsoft.com/en-us/library/a32tsf7t(VS.80).aspx).
My tests indicate that these registers are saved correctly during task switches and thread
switches in 64-bit mode, even in an early beta version of x64 Windows.
I like the red part. It somehow implies that the very latest version of the Windows kernel uses, well, "legacy floating point instructions" :biggrin:
I think they want to speed up context switches in kernel land. Maybe they also want prevent the slow transcendental function to increase the interruptibility of kernel code.
The question is which kind of driver needs FPU stuff? Basic FP-Arithmetic is still available through SSEx.
From memory Microsoft abandoned FPU code some time ago for 64 bit versions, over time SSE will probably do the job if they extend the maths to 128 bit. FPU code can still handle numbers in the 80 bit range but it would seem that Intel also want to shift most maths to SSE rather than the now ancient FPU.
Quote from: hutch-- on June 14, 2012, 12:57:51 AM
From memory Microsoft abandoned FPU code some time ago for 64 bit versions, over time SSE will probably do the job if they extend the maths to 128 bit.
Only kernel code is affected. User mode applications can still use the FPU.
Quote from: hutch-- on June 14, 2012, 12:57:51 AM
From memory Microsoft abandoned FPU code some time ago for 64 bit versions, over time SSE will probably do the job if they extend the maths to 128 bit. FPU code can still handle numbers in the 80 bit range but it would seem that Intel also want to shift most maths to SSE rather than the now ancient FPU.
maybe that implies the future intentions of intel (assuming that intel and ms collaborate)
they may intend to phase it out over the next few generations of processors
> Kernel mode drivers can not use legacy floating point instructions
They say "don't use them", not: "if you use them, preserve them". Who would be affected by "wrong" FPU values if not Kernel code itself? Or do I misunderstand something completely? Kernel-wise I am a noob...
may have something to do with handling FPU exceptions
i thought you were an expert on that stuff :biggrin:
For kernel threads, the FPU registers/status is not saved while a context switch occurs. That means that the whole FPU contents can change from one instruction to the next, if a context switch has occurred between them.
OK, got it, thanks :t
This is mind-boggling,...
As if kernel-mode programming wasn't confusing enough already,...
I've done some tests checking how much it costs to save & restore the xmm regs that Win7-64 so merciless trashes:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
89 cycles for fxsave
83 cycles for fxrstor
152 cycles for fsave
113 cycles for frstor
89 cycles for fxsave
83 cycles for fxrstor
152 cycles for fsave
113 cycles for frstor
172 cycles on my puter. Looks like a lot but effectively they are needed only around some probably utterly slow Windows API calls.
prescott w/htt - XP MCE2005 SP3
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
162 cycles for fxsave
243 cycles for fxrstor
530 cycles for fsave
576 cycles for frstor
158 cycles for fxsave
243 cycles for fxrstor
528 cycles for fsave
578 cycles for frstor
P4 Northwood w/ht XP SP3
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE2)
87 cycles for fxsave
207 cycles for fxrstor
443 cycles for fsave
526 cycles for frstor
84 cycles for fxsave
202 cycles for fxrstor
434 cycles for fsave
529 cycles for frstor
Interesting to see the drop in IPC for the Prescott compared to the Northwood.
I had similar timing results back when I had both the Northwood and a Prescott as dev boxes, the 2.8 gig Northwood was usually faster than the 3 gig Prescott and had noticable less lag. Apparently the Prescott has a much longer pipeline.
Getting faster...
Interesting that fsave/frstor is always much slower. The x variants save 512 bytes to memory.
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
38 cycles for fxsave
73 cycles for fxrstor
166 cycles for fsave
130 cycles for frstor
38 cycles for fxsave
73 cycles for fxrstor
167 cycles for fsave
130 cycles for frstor
Hi,
For some reason, it crashed on my P-MMX. (Right,
like I expected an error message? <g>) It did run
on my Sony, so the CPUID code is not what locked it
up earlier. (Presumably.) Though there was an odd
pause the first time I ran this one.
pre-P4 (SSE1)
61 cycles for fxsave
63 cycles for fxrstor
108 cycles for fsave
86 cycles for frstor
61 cycles for fxsave
63 cycles for fxrstor
108 cycles for fsave
86 cycles for frstor
--- ok --- Mobile Intel(R) Celeron(R) processor 600MHz (SSE2)
61 cycles for fxsave
63 cycles for fxrstor
116 cycles for fsave
92 cycles for frstor
61 cycles for fxsave
63 cycles for fxrstor
117 cycles for fsave
93 cycles for frstor
--- ok ---
Cheers,
Steve N.