Operating system maintenance looks easy. By today's standards, the code base for an operating system is small, the code is as close to bug-free as you are likely to see in the wild, and maintenance requests usually look like minor tweaks. Most of the time the coding part of OS work actually is easy. The trap is the design work and serious thought that must precede even the slightest change. Design mistakes can stay with the OS forever.
They say that motorcycle riders go through three stages: fear, over-confidence, and finally respectful care. The same stages apply to OS programmers. Motorcyclists in the over-confident stage may be injured or killed. Programmers can create a subtle defect that is fantastically expensive to remove.
I wrote this paper to help educate the programmers at Microware who write OS code. As far as I know, the rules in this paper are well understood by OS programmers, but unwritten. People learn them by painful experience, or by having a grizzled OS veteran explain: "If we ship what you've built this will happen, then that will happen, and then all of our lives will become horrible."
This was written for people maintaining OS-9, but I believe it applies equally well to any operating system. These policies should be taken as precepts by people designing, enhancing, or maintaining an operating system.
Much system software ignores these precepts. That greatly simplifies work on the software, but by some definitions, the result is not an operating system.
The precepts refer repeatedly to processes. Processes are the traditional combination of protection domain and at least one thread. In the context of the precepts, the threads aspect of a process becomes nearly insignificant, and the protection domain aspect remains. System software with many threads and one protection domain probably are not operating systems by my definition, and will have to invent their own precepts. System software that offers unconventional combinations of threads and protection domains, like Psyche or VxWorks/AE, probably qualify as operating systems and should be able to map these precepts to their space by replacing every instance of process with realm, protection domain, or the equivalent term of their choice.
- The OS Must Protect Itself
No action of a nonsuper-user process can crash the OS: no sequence of instructions, no system call, no application error. The operating system must protect itself from user actions whether they are accidental or malicious.
Time does not crash the operating system. The operating system must not leak resources, and it must recover all resources when processes terminate whether intentionally or unintentionally.
- The OS Must Not Attack Processes
The interface between the operating system and user code must be fully defined by the manual. The OS cannot allow itself to change user registers, memory, I/O paths, or other resources without permission unless the process experiences a program exception.
- The OS Must Defend Processes Against Attack
The operating system must build strong walls between processes and enforce those walls to the best of its ability. Only privileged (super user) processes may be permitted to cross protection boundaries, and those processes are considered part of the OS, and are subject to these precepts.
- An OS Must Hide the Details of I/O, and Other Facilities That Are Not Accessible in User State
The OS hides I/O devices and processor artifacts. This lets users write code that can be moved from machine to machine without changes.
- Maintain User-State Compatibility at Almost Any Cost
The interface between the operating system and user state code is defined by an ABI and list of system calls. The interface may be upgraded, but old interfaces must be supported forever.
- Don't Force Code into System State
It should be rare that a process needs to become super user or execute in system state. If something cannot be done in user state, and it does not compromise another precept, the service should be added to the OS.
- Look for Indirect Problems
This is a warning that defines the scope of the first four precepts. You have to look for subtle feature interactions, and expect that programs will do unexpected or irrational things.
- Never Rely on Behavior You Do Not Enforce
This is an important corollary of precepts II and V. If the OS permits a behavior it must expect and support that behavior. The OS cannot just say that a certain field must be zero (for future expansion) on a certain system call. It must check that field and return an error if it is not zero. If it doesn't check, precept V will force it always to accept any value in the field without complaint, and render the field useless for future expansion.
First, some examples of operating systems: OS-9, UNIX, LynxOS, VMS, QNX, Windows NT, Windows 95/98, and MVS. Examples of lower forms of system software include: Ariel, VxWorks, PSOS, Nucleus, and MacOS prior to release X. Perhaps Windows 95/98 belongs in both categories. So long as it has a DOS box, accessible BIOS, and 16-bit legacy APIs it cannot offer reliable protection.
The boundaries of the operating system extend to include all privileged software. Device drivers, file managers, and super user programs all have to be designed and coded with the same paranoid discipline that characterizes the kernel itself. Consider the UNIX mail system. It was written so that part of the program that received mail had to run in super user state to manage mail files. Unfortunately, the program was designed as a robust utility, not an OS component, so the mail system became a vulnerability. A cracker tricked it into giving him access to a shell with super user privileges. Then he owned the system. The moral is that if you have special privileges, or the ability to get them, you must follow the precepts.
The aforementioned eight precepts are goals and guidelines. No operating system does a perfect job of following them. Some shortcomings result from compromises between the principles; some from compromises with general good software metrics, like size and performance; some from clashes with marketing goals; and many of them probably happen just because the programmers haven't had time to perfect the software.
The OS Must Protect Itself
This is harder than it sounds. Any decent operating system will run robustly if it is treated nicely, but an operating system is most important when it is cruelly abused. One of the measures for the quality of an operating system is the limit it places on tolerable abuse. Most operating systems try to tolerate anything user-state software can do. Really robust system software also tolerates abuse from the underlying hardware: interrupt showers, defective memory, missing devices, and power failures.
Even surviving abuse by application software is difficult. One simple, but nasty, benchmark is crashme, a program that attempts to execute random data. Every time the operating system catches a fault, the program throws control back to a random location in the random data. This random instruction stream does things to the processor and the kernel that no programmer would consider.
Not allowing time to crash the operating system means that the OS has to track and reclaim resources. This makes process termination a major function for the operating system. It needs to free resources cleanly and in the right order. There are a couple of problems here.
What about persistent resources? Those are supposed to survive beyond the termination of the process that created them, so the operating system should not free them on termination. Processes that fail to free persistent resources that will no longer be used create a resource leak that the operating system cannot close. How does the OS deal with running out of resources? Generally, the operating system tries to limp along, but future requests for the exhausted resource must fail.
How about locks? A process can terminate while holding a lock. Should the OS release it? If it doesn't, processes could wait for that lock forever. If it does, it could release a waiting process right into a half-updated data structure that the lock was guarding. The OS could set up a recovery procedure that can be executed in case the holder of a lock exits. OS-9 just supports a mechanism for defining special types of lock with special protection. It doesn't provide any lock recovery as a standard service.
The OS Must Not Attack Processes
This doesn't mean the operating system lets a process do anything it wants. It only bars gratuitous interference with processes. Actions like unexpectedly changing applications' register values, closing paths, or reclaiming a process' memory are not acceptable behavior.
Generally, violations of this principle are coding bugs in the operating system, not design flaws, but they cause problems in application code that are hard to debug. The canonical problem is an interrupt service routine that corrupts a register. The OS saves and restores state for the ISR, but it may not save all state; for instance, if the OS does not save and restore floating-point registers and an ISR changes one, the effect on the executing application can be disastrous.
The OS Must Defend Processes Against Attack
This is a major reason for processes. A process is a protection domain enforced by the operating system. The OS provides formal communication mechanisms for processes, but prevents any informal communication. The simple example is that processes may not write into each other's memory. The OS can only enforce this rule if it has an MMU, but it needs to do more than just maintain hardware memory protection. OS services that access process memory need to run under the caller's protection domain, or the OS needs to enforce protection in software. For instance, process A should not be able to direct the OS to read from a file into process B's memory.
Memory is not the only resource. The operating system should control access to open I/O paths, files, locks, and most other resources that it allocates.
The OS Must Hide Details of I/O
This is the basic job of any system software. It takes complex hardware and presents a simpler or more useful abstraction of it.
Most operating systems present a generally UNIX-like abstraction with processes and I/O that is largely device independent, but there is variation in the details, and in the scope of the abstraction. Certainly, the operating system should hide details of the device like the characteristics of the supporting hardware. A quality job will allow most programs to run without even knowing the general class of I/O device; the exact same binary image will write to a disk file or a network socket.
Some operating systems are following Plan 9 in using the file name space for all resources. That is a clever trick. The operating system's I/O abstraction is designed to let a program run without knowing whether its I/O is connected to a terminal, a disk file, or a network connection. It is a short (but profound) step to extend the abstraction to cover things that are not I/O; for example, the process table or system configuration constants.
Maintain User-State Compatibility
Operating systems are binary entities (or collections of entities), and they run applications that are presented to them as binary entities. The operating system is not compiled or linked with the applications. This means that the application and the operating system are not bound to one another. It is quite likely that an application will encounter a version of the operating system with which it has never been tested. That is normal. The operating system is expected to run any correct software that ran on an earlier version of itself.
It is fair to insist that Version 99.1 of an operating system must run binaries that ran correctly on Version 1.0. It is not fair to insist that any software that runs on Version 2.1 should also run on 2.0. The upgrade between those versions may have fixed a bug that keeps the software from running correctly. Depending on the numbering conventions, it may have added a feature that the application uses. Requiring complete compatibility in both directions would stop all OS development, even bug fixes.
There is a large and interesting gray area that surrounds incorrect binaries. If a binary executed correctly even though it was incorrect (e.g., it took advantage of a bug in the OS), a case can be made for causing it to break in a future release, but this is not a foregone conclusion. The default for OS engineers should be to perpetuate any bug that could possibly have been seen as a feature, or offer a mode that old modules can use to perpetuate the bug.
Don't Force Code into System State
If you go inside an operating system, you find something very like a threaded kernel. There is dangerous power and no protection. Although an operating system may offer convenient ways to enhance it, applications should make every effort to stay out of system state.
This does not mean that an OS must provide every conceivable service. If there is a way to accomplish a task without adding a service to the OS, that is generally sufficient. Even if there is no way to accomplish some task outside the operating system, the OS designer has to be convinced that the need is important and widespread.
"Creeping featurism" results when an OS designer gets permissive about new features. The OS gets big, complex, and hard to test and debug. For the desktop, server, or mainframe, big and complex is typical and so are bugs. Embedded systems are sensitive to performance and footprint issues, and must resist features they don't need. Consequently, operating systems for embedded systems are small, modular, and configurable.
I can see little advantage in pulling graphics and web browsing into the operating system. Other than Windows, every protected operating system I can think of has placed graphics in subsystems (like X windows, Openlook, etc.). If the GUI is outside the OS, the operating system is not harmed by problems in the graphics subsystem.
Look for Indirect Problems
The stack is a nice example of an indirect problem.
The operating system's ABI is defined at each documented interface between the operating system and the application. That includes system calls, the initial state of processes and threads, and the state at the beginning of a signal intercept routine. Between system calls the only restriction on an application is that it has to contain only legal user-state instructions.
The programmer might decide to point the register that is used as the stack pointer at some ROM, or perhaps at another process' memory. The processor allows it, and as long as the application doesn't try to access the stack, all will be well.
The indirect problem appears if somebody working on the kernel decides that it would be convenient to store something on the user's stack. The (perfectly legal) program with the stack pointer to ROM will cause the OS to take a memory fault. At best, it will crash the application. At worst, it will bring down the whole system. The (also legal) program that points the stack into someone else's RAM will only hurt the other application, but that violates precept II or III, depending on whether you blame the process with the bad stack pointer or the operating system for the attack.
At first, you might think the OS designer should have prevented this problem by stating in the ABI that the user stack pointer must always point to legally accessible RAM. The problem is that an OS cannot rely on rules that it does not enforce (precept VIII). If it says the stack pointer must always be good, it has to check it at every instruction, or else not depend on the value of the stack pointer.
Why Does It Matter?
An operating system that follows these eight precepts gives programmers a safe place to run their code. There is no interference from the OS or other processes, and the programmer can't harm anything outside the process.
These have been the unwritten laws of OS designers and programmers. OS users also have had a vague notion of them in that they appear as quality. Nothing looks as good as a protected OS.
Peter Dibble can be reached at firstname.lastname@example.org.
Page 1 of 1