Failing to Fail | OS/2 Museum
The opposite day I used to be going over varied variations of the venerable DOS/16M DOS extender from Rational Techniques (later Tenberry Software program). The DOS/16M growth package comes with a utility referred to as PMINFO.EXE which is supposed to present the consumer some concept in regards to the efficiency of a system operating in protected mode.
I do know that the utility has hassle on sooner CPUs and I anticipated it to fail about like this:
However operating the utility on an older laptop computer with an Intel Haswell processor, I as a substitute acquired this:
Reasonably than cleanly exiting after catching a floating-point division by zero, this system crashed with a normal safety fault. That appears like a bug, however why wouldn’t it be taking place? And the place is the bug?
To get a greater sense of the issue, I used the On the spot-D debugger shipped with DOS/16M. There I may see the faulting code:
It didn’t take me too lengthy to find out that the code is a part of the floating-point exception dealing with logic, and that it comes from the EMOEM.OBJ file shipped with DOS/16M.
Now, the EMOEM module is supplied in supply kind with many Microsoft compilers, together with Microsoft C 5.1 and 6.0 (a kind of was doubtless used to construct PMINFO.EXE). However the crashing code fragment is not within the code supplied by Microsoft. So why is it there and what’s it purported to do?
It took me a short while to grasp what the code is doing, however as soon as I did, it was apparent why it’s there. The issue the code is attempting to resolve is brought on by the truth that the x87 setting differs between actual and guarded mode. The unique real-mode solely format utilized by the 8087 shops a linear 20-bit handle of the floating-point instruction (as a result of the 8087 doesn’t know what the unique segmented 16:16 handle was!) plus 11 bits of the FPU instruction opcode (5 bits are all the time the ESC opcode). The 20-bit linear handle and the 11-bit opcode are saved in two consecutive 16-bit phrases, with one bit left unused.
In protected mode on the 80287, a 20-bit linear handle isn’t sufficient. Intel modified the x87 setting format to retailer the complete 16:16 segmented handle, and the FPU opcode is not saved.
DOS/16M was designed to work with compilers producing real-mode DOS code. Therefore libraries shipped with these compilers count on the unique 8087 setting format when dealing with floating-point exceptions. However as a result of DOS/16M purposes actually run in protected mode, the FPU will likely be storing the x87 setting within the newer, protected-mode format.
The additional code within the DOS/16M EMOEM.OBJ is clearly meant to learn the opcode from the saved CS:IP handle, probably skip one byte of a prefix, after which modify the saved setting, writing the 11 opcode bits proper the place real-mode exception dealing with code expects to seek out them. (Notice that the code makes no try to supply a 20-bit linear handle, since that wouldn’t work anyway.)
So why does this code not work on my Haswell laptop computer? As a result of the CPU shouldn’t be fairly backwards appropriate.
99% Backward Compatibility
The unique 8087 all the time saved the FPU setting updated, together with the FPU opcode in addition to instruction and knowledge addresses. That mirrored the inner working of the 8087.
The 287 already modified issues from a software program perspective, which was a results of the completely different interface between the CPU and FPU. On the 8087, the saved instruction handle factors to the ESC opcode. On the 287 and later, it factors to any prefixes that may precede the ESC opcode. This transformation was clearly an enchancment, and though it had the potential to upset current floating-point exception handlers, in follow most likely didn’t as a result of most FPU directions which might be more likely to fault (division, multiplication, transcendental directions) aren’t used with prefixes anyway.
The FXSAVE instruction added within the later Pentium II fashions subtly modified how the processor saves the final FP instruction opcode and code/knowledge addresses. Reasonably than saving these knowledge objects each time, they’re solely saved when there’s a pending floating-point exception. This displays the precise utilization, since solely FP exception handlers are more likely to want this data.
Within the P4 microarchitecture, Intel added a (presumably) efficiency optimization referred to as “fopcode compatibility mode”. Bit 2 within the multi-purpose IA32_MISC_ENABLE MSR determines whether or not the CPU tracks the FP opcode (aka fopcode) for each instruction as earlier than, or whether or not it’s up to date solely upon encountering an exception. Newer Intel CPUs not assist fixed updating of the FP opcode in any respect and solely replace it when exceptions happen.
None of that could be a drawback for PMINFO.EXE. However the subsequent step that Intel took to cut back x87 backward compatibility truly is.
Within the Haswell and later CPUs, Intel launched a brand new CPUID bit. When (in Intel parlance) CPUID.(EAX=07H,ECX=0H):EBX[bit 13] is ready, the processor nonetheless tracks the final FP instruction code and knowledge addresses, however not saves phase register values; that’s, the code and knowledge phase values are all the time saved as zeroes.
This doesn’t have an effect on real-mode code (since solely linear addresses are saved), but it surely does influence segmented protected-mode exception handlers. Such because the one in DOS/16M.
Whereas earlier adjustments, akin to not all the time monitoring the final FP opcode, are simply seen by software program, they don’t trigger hassle in follow. However not saving the phase registers does actually upset legacy off-the-shelf software program. Not usually, but it surely does. PMINFO.EXE is likely one of the victims, however removed from the one one.
Attainable Workarounds?
Working across the poor CPUs is kind of troublesome. A naive strategy could be to intercept the #MF (math fault) exception and file the present CS and DS, however that may be solely typically appropriate.
The rationale why the FPU individually tracks the instruction and knowledge pointers is that, traditionally, the FPU was a totally separate chip operating in parallel with the CPU. Math exceptions had been reported asynchronously by way of the interrupt controller. The CPU might be doing kind of something when the maths interrupt arrived; the FPU itself had to supply the instruction pointer in order that the maths error handler may discover out what truly faulted.
Even on fashionable CPUs the place all the pieces is one piece of silicon and floating-point errors are reported through #MF exceptions, the issue stays. The #MF exception is reported in some unspecified time in the future after the instruction which brought on it, particularly on the subsequent floating-point instruction or a WAIT instruction. However such an instruction might be executed in a distinct phase, or in a multi-tasking OS, in a distinct job.
That’s actually the case with the DOS/16M PMINFO.EXE. The #MF exception is triggered on a WAIT instruction in a floating-point emulator phase, which is completely different from the phase the place the instruction inflicting the FP exception is.
The upshot is that by the point the #MF occurs, it’s too late to file the code and knowledge phase values. The one chance may be to power math instruction emulation with the CR0.EM bit, and observe the present code and knowledge pointers, however that may be fairly intrusive and sluggish. At that time it could be less complicated to only run the legacy code by way of software program emulation.
Happily the influence of this drawback is pretty restricted. It’s uncommon for software program to deal with math exceptions throughout regular operation; most of the time, math exceptions trigger a deadly error, and in such circumstances the sensible distinction between terminating a program resulting from a math fault versus a normal safety fault isn’t important. Whereas failing to fail correctly is annoying, this system nonetheless fails both means.
There’s a potential workaround that customers could apply in some circumstances. As soon as upon a time, Microsoft supplied a package deal referred to as WINFLOAT.EXE described in KB article Q97265. Stated package deal features a utility referred to as HIDE87.COM which hides a math co-processor from Home windows 3.x purposes, and probably from some DOS purposes. This forces software program emulation constructed into Home windows for use, avoiding the deficiency of newer Intel CPUs.
Notice that the WINFLOAT package deal can be utilized to get some sense of whether or not math exception dealing with works in any respect in a given setup. Right here it’s not working (as anticipated) on a Haswell CPU:
For comparability, right here it’s operating on a non-crippled CPU:
So far, AMD processors present higher backward compatibility and don’t undergo from this specific drawback.
Addendum: Similar Symptom, Completely different Trigger
Round 2013, customers of a number of virtualization merchandise (VMware, VirtualBox, KVM, XP mode in Home windows 7) complained of crashes in WIN87EM.DLL and related. The symptom was equivalent, a math fault handler crashing as a result of the code phase of a faulting FPU instruction was zero. Such reviews may be discovered here, here, or here.
However the trigger was fairly completely different. It particularly affected 64-bit hypervisors operating 32-bit or 16-bit visitor software program. In the middle of regular operation, a hypervisor usually wants to save lots of and restore the FPU state, utilizing FXSAVE/FXRSTOR or related directions.
The directions all can save the FPU state in numerous codecs; the 2 related codecs are 64-bit with no segments and 64-bit offsets, or 32-bit with 16-bit phase and 32-bit offset.
A hypervisor can save the state twice, as soon as in 32-bit and as soon as in 64-bit format. That means it’s potential to get better each the segments and 64-bit offsets. However when restoring state, the hypervisor is confronted with a binary alternative: Both restore the 64-bit format, zeroing the phase registers, or restore the 32-bit format, maintaining the phase values however zeroing any excessive bits of 64-bit offsets.
It ought to now be obvious that if a 64-bit hypervisor solely makes use of the 64-bit type of FPU save/restore directions, the phase register contents saved within the FPU state will likely be misplaced after saving and restoring the FPU state. Relying on the hypervisor and visitor mixture, this loss may be uncommon and unpredictable, or it could occur with 100% reproducibility.
Hypervisors had been mounted to selectively save and restore both 32-bit or 64-bit state. One potential strategy is as follows: Save the 64-bit FPU state. If the excessive DWORD of both the code or knowledge pointer is non-zero, hold this state and restore 64-bit state once more. In any other case save the FPU state once more in 32-bit format, and restore it as 32-bit. This strategy works effectively in follow and adapts to the software program operating within the visitor.
As traditional, the satan is within the particulars.