When a computer starts up, it loads an operating system, also known as an OS, program for managing the hardware and running other programs. In OS terminology, a running program is called a process, and it is the OSs responsibility to keep the processes from interfering with one another, even as they run on the same system, at the same time. It’s also the OSs job to provide an interface to the hardware for the processes, such that the OS retains direct control of the hardware devices, with the processes only interacting with the devices via so called system calls provided by the OS. The OS also provides a file system which abstracts over the storage devices such that processes can read and write files without concern for how precisely they get stored. Lastly the OS provides a user interface for users to run programs and manage the file system.
A device driver is a plug-in module of the OS that handles the management of a particular I/O device. Some standardized devices may function with a generic driver. For example, a USB mouse may perform old common USB mouse functionality with a driver written for a generic USB mouse. However, many devices require more specific driver. For example, in my system, using the generic graphics driver provided by Windows only provides bare minimum functionality. To run high resolutions and play games, I must install the driver provided by AMD for my Radeon graphics card.
A primary purpose of modern operating systems is to allow for multiple processes to run concurrently, meaning at the same time. The problem of course, is that each CPU core can only execute the code of one process at a time, and the OSs own code can’t run on a core at the same time as any process. The solution then is to have each CPU core alternate between running each open process and alternate running processes with running OS code. So here if we have two CPU cores and three open processes A, B and C notice that each process only runs on one core at a time. At no point does say process B run simultaneously on both of the two cores. Also notice that OS code always runs on each core in between each process. What’s happening here is that a portion of the OS called the scheduler runs after each process to decide what OS work, if any, should be done, and which process should run next. The question then is: How does the currently running process get interrupted? Left on its own, a running process would continue indefinitely. When any hardware interrupt is triggered however, the interrupt handler passes off control to the scheduler rather than handing the processor core back to the interrupted process. The scheduler then decides what OS code to run if any and what process should run next. Laid out in full the scheme called pre-emptive multitasking works like this:
- first, the CPU receives some hardware interrupt
- then the interrupt stores the program counter so that the interrupted code can resume later
- the interrupt invokes the appropriate handler
- the handler itself saves the state of the other CPU registers so that the interrupted process can be resumed later
- the handler does whatever business the interrupting device needs
- the handler then invokes a scheduler and
- the scheduler selects a process to run
- the scheduler then restores the CPU registers to the state they were in when that process was last running so that that process may continue and
- finally the scheduler jumps execution to that process
You may now be wondering two things. First, what if no interrupt is triggered by any device for a long time? That would allow the currently running process to hog the CPU, when generally we want each process to, at least get a little time on a regular basis, say every several seconds or so. A video game for example, typically can’t go without CPU time for more than a fraction of a second, so it would be no good if some other process ran without interruption for a second or more. To ensure that the scheduler gets to run on a regular basis, whether any i/o devices need attention or not, a clock device on the mainboard is configured to send an interrupt on a regular basis, say once every 10 or 20 milliseconds. Thus the system guarantees that the scheduler gets the opportunity to change the running process on each core at least several times a second. The next thing you might wonder is how the scheduler chooses which process to run next? Using the simplest algorithm the round-robin algorithm the scheduler simply runs each process in turn, one after the other. While this ensures that every process gets run on a regular basis the more sophisticated algorithms used by Windows Linux and other modern operating systems attempt to take into account which processes need processor time more than others.
Processes not only share the CPU cores, they of course must also share the system memory. It’s the OSS job to regulate the processes use of memory to ensure that each process doesn’t interfere with the portions of memory used by other processes and by the OS itself. Here for example the processes A, B and C have been allocated their own portions of system memory. While the OS may access any portion of memory as it chooses, because the OS is supposed to be in charge of the system, each process can only access its own portion of memory. As we’ll explain shortly, this restriction is enforced by the hardware, making it impossible for a process to muck with addresses outside of its own portion of memory. However we need a loophole for this restriction because processes must be able to invoke certain routines at fixed addresses in the OSs portion of memory. These routines called system calls are the means by which processes initiate requests to the operating system. These system calls provide functionality for things like, reading and writing files, or for sending and receiving data over the network.
To invoke a system call, a process must use a specific CPU instruction, usually called syscall, in which the process specifies a system call number.
When this instruction is invoked the processor looks in the system call table for the address of the routine corresponding to the number, and jumps execution to that address. Because the operating system controls the system call table, a process can only jump execution to addresses of the operating systems choosing.
So aside from this loophole, how do the operating system and hardware restrict the process to only access its own portion of memory? Well first off the CPU actually runs in two different privilege levels. When OS code runs the CPU is put into a privilege level that allows access of the i/o devices and any address of memory. When a process runs however the CPU is put into a privileged level that triggers a hardware exception when the code attempts to directly access the i/o devices or addresses not allowed for that process. Processes are supposed to directly touch only their own memory not anything else in the system. Now to understand how the CPU knows which addresses are allowed for each process we have to first look at how a process uses memory.
Each process uses a portion of its memory for a stack for a heap and for storing the processes code itself in a section confusingly called the text section (even though the code is in binary form). The code section is straightforward, the binary instructions are stored in a contiguous chunk of memory and never modified for the duration of the process. The stack and heap though are both used to store data. The difference is that the stack stores the local variables used by the process and the heap stores everything else.
Looking at the stack first. The stack is a contiguous chunk of memory that starts out unused when the first function is called, let’s call it main, it’s local variables are stored on the stack in a grouping called a frame. When the main function itself invokes another function, let’s call it cat, the local variables of cat are stored in another frame on top along with the size of the frame and the return address (the address to jump back to when execution returns from cat). Likewise if cat calls another function, dog, then dogs local variables the size of its frame and the return address to cat are stored in another frame on top. Notice that as we add frames we have to keep track at the top of the stack because that’s where we add a frame when the next function is called. Many CPUs including x86 CPUs have a specific register for storing this address usually called the stack pointer. When a function returns the frame size is used to adjust the stack pointer back down to the last frame and execution jumps back to the return address. So when dog returns execution returns the cat and the stack pointer will point to the top of cats frame. Notice that we don’t have to actually delete any frames because the space of frame occupies will simply get overwritten by subsequent frames as needed. Also notice that the first frame is special because the program ends when the first function returns, so the first frame doesn’t need to store its size or any return address because there’s nothing to return to. Now the diagram here suggests that the stack grows from the bottom but in many cases the stack frames start at high memory addresses and grow downwards. This in fact is the case with x86 CPUs (this choice though is mostly arbitrary).
The size of stack space in some systems is kept track of with another pointer usually called the stack boundary kept in another CPU register. In CPUs with this register when the stack pointer runs past the stack boundary this triggers a hardware exception and the exception handler may increase the stack space by moving the stack boundary (second image). However the exception handler may decide at some point that the stack has grown too large and may simply refuse and then instead simply terminate the process. Generally the processes stack should only get so big a megabyte or two at the high end. When stacks grow past this size it’s generally a sign of an underlying programming error that should be corrected not accommodated. The most common cause of an overly large stack is an overly long chain of recursive function calls. When the program exceeds its available stack space the error is called a stack overflow. When a stack overflow occurs on the PC the OS usually terminates the errored process. In very simple computers however such as in embedded systems the stack size is not necessarily monitored with a stack boundary and so when a program consumes more stack space than it should the stack may poke into parts of memory used for other data or code likely causing unpredictable bugs.
The common arrangement in PCs is to store the stack at the top of a processes address space and the text the code of the process at the bottom. All the remaining space in between is available for the heap. Unlike the stack in text however no heap space exists when the process starts executing. Instead the process must explicitly request chunks of heap storage from the OS with a system call. In the call, the process specifies what size contiguous chunk he wants, but the OS decides where to locate these chunks in the address space, and the chunk locations are not necessarily adjacent.
When a process is done with a chunk of heap it should give the chunk back to the OS with a system call to deallocate. It is the responsibility of the OS to keep track of which portions of the address space are free for future allocations.
But notice that as a process allocates and D allocates chunks in memory the memory space can become more and more fragmented effectively shrinking the size of heap chunks which the OS can allocate because each chunk must be contiguous. Here for example the largest heap chunk which the OS could allocate is considerably smaller than the amount of free space remaining. Good allocation algorithms can minimize this fragmentation but the problem can’t be avoided entirely. This partly explains why you should deallocate chunks of heap when you no longer need them. By deallocating you free up areas in the address space so that they can be allocated again later. The broader reason to deallocate of course is that your process might simply run out of address space at some point. Even if your process only needs a modest amount of heat memory at any one time. If your process runs long enough without properly the allocating heap memory the process may eventually run out of address space at which point new allocations will fail likely requiring your process to terminate prematurely. So failing to properly allocate unneeded heap memory is generally regarded as a bug called a memory leak because the memory available to your program effectively dwindles away over time.
The memory of a process do not actually refer directly to actual bytes of system memory. Instead chunks of the process address space are mapped by the operating system to chunks of system memory but not necessarily contiguously or in the same order. Here for example the stack is mapped to one area of RAM in the middle the code section is mapped to another non adjacent area of RAM above it and the portions of heap are mapped in non adjacent parts of RAM in a seemingly random order. When the OS runs a process it lists these address mappings in a table and as the process runs the CPU consults this table to translate from process addresses to addresses of actual RAM. For example if the chunk of process address space starting at address 0 is mapped to byte ffff 0 0 0 0 of RAM then address 5 of the process address space translates to byte ffff 0 0 0 5 of RAM. Be clear that each process has its own complete address space and that the OS keeps a separate memory table for each process. Effectively then the process can be located by the OS in any part of RAM and each process can only access its own memory not the memory of other processes or memory used by the OS. When a process attempts to is a part of his address space which is not mapped to actual RAM in the process memory table the CPU triggers a Hardware exception and the OS then typically aborts the process with an error message complaining about a page fault because the maps chunks of memory are called pages. Each page is usually a set size which depends upon the CPU. 32-bit x86 processors for example usually use 4 kilobyte pages so in fact a more realistic diagram would show that the stack heap and code portions of a process address space are most likely not mapped as whole units, e.g each page of the stack may be mapped to different non adjacent pages of RAM and in no particular order.
To free up valuable RAM the OS may decide to swap out pages of a process to storage usually a hard drive here for example these pages of heap memory are not currently mapped to any part of RAM. Instead their data has been temporarily copied out to a hard drive and, in the process memory table these heap pages have been marked swapped. An attempt by the process to access an address in a swapped page will trigger an exception at which point the OS will copy the swapped page back to RAM and adjust the memory table accordingly before allowing the process to proceed. Thanks to swapping the total memory used by all processes may actually exceed the capacity of RAM in the system. Swapping pages in and out of storage is of course relatively slow but better that the system occasionally goes a bit slow to swap pages rather than simply cease the function by running out of memory. With swaping, the processes can use as much memory space as the system has free storage. In practice the small pages in a typical PC at any moment will rarely exceed more than a gigabyte or two of storage but most pages used by most processes don’t get used very frequently so they might as well sit in swap space most of the time.
In its life cycle a process transitions through a few major states. After the OS does all the business it needs at time of process creation, the process transitions into the waiting state. The sense of waiting here is waiting to be selected by the scheduler. When the scheduler selects the process to run, it then of course enters the running state. When the scheduler selects a different process to run on the same core this process is placed back into the waiting state. A process typically goes back and forth many times between waiting and running until the process ends at which point it enters its final state – terminated.
There is at least one more important state – blocked. In the blocked state the process is waiting for some external event in the system before can proceed rather than waiting to be scheduled. So it is in neither the state of running nor so called waiting. Most commonly the block state is triggered when a process invokes certain system calls such as for reading files. Reading a file often blocks the process because most storage devices such as hard drives are relatively much slower than the operations of the CPU and often a program cannot do anything useful until it gets the data it needs from a file. In such cases the process might as well relinquish the CPU core it was using and take itself out of the waiting pool allowing other processes to run while it waits. Once the operating system finishes retrieving the requested data from storage it unblocks the process putting it back into the waiting state so that the scheduler will consider it again for execution. Once the operating system finishes retrieving the requested data from storage it unblocks the process putting it back into the waiting state so that the scheduler will consider it again for execution. So don’t get confused both the blocked and waiting states involve waiting but only in the waiting state will the scheduler select the process to run. In the block state the process waits until the OS puts it back in the waiting state. There are several reasons to block and unblock a process but the most common reason is because that process has to wait for some slow device in the system.
As we’ve mentioned device drivers handle the business of how exactly to talk to an I/O device and that includes storage devices like hard drives. However operating systems provide an extra layer of abstraction for storage devices called the file system which presents storage space as a hierarchy of directories and files stored in those directories. When your program uses a hard drive for example you don’t want to have to concern yourself with the details of moving heads and spinning platters. You just want to read and write data and contiguous units called files and you want to have those files organized into directories. The file system provides a subtraction allowing programmers to read and write data from any kind of storage in the same way whether a hard drive an optical disk a flash drive or whatever. The storage area of each drive is divided into one or more contiguous chunks called partitions. Notice that some areas of the drive may be left blank unformatted such as the gap between the second and third partitions of this first hard drive. Most commonly though a drive is formatted to have just one partition occupying its entire storage area. Still creating multiple partitions serve some niche use cases such as installing multiple operating systems on a single drive.
In most partition formats used today, each file in directory within a partition is known by an identifier number unique within that partition. So here we have a partition with a file 35 and so we can have no other files within that partition with the ID 35 nor any directories with the ID 35. A file is simply a logically contiguous chunk of data, a sequential series of bytes. What these bytes of a file represent is entirely up to the program which writes the file. Notice though I said that a file is logically contiguous. When bytes are read and written by a program the program sees them is a contiguous sequential series but a file stored on disk may actually be stored non contiguously and out of order. It’s a responsibility of the filesystem to ensure that the logical order gets reconstructed when the file data is sent to a program. A directory, quite simply, is a list of files and other directories on the partition. The directory associates ID numbers for files or directories with names. Names which must be unique amongst all other files of directories listed in that same directory. So within a directory you can have a file and directory both names say Albert, but you can’t have more than one file named Albert or more than one directory named Albert. When a partition is newly created it starts out with no files and no directories except for one special directory called the root directory which cannot be deleted.
In Windows, each partition is assigned a drive letter, usually denoted with a suffix :. For example C : H : D : etc. A file path is a string of text denoting the location on the system of a file or directory. In Windows the root directory on these drives are known by the paths C : / H : / and D : / respectively. The path C : / Adam / Nixon refers to a file or directory named Nixon listed in a directory Adams itself listed in the root directory on the C partition. The path H : / Taylor / polk / Hayes refers to a file or directory named Hayes listed in a directory Polk listed in a directory Taylor listed in the root directory on the H partition. The path D : / Garfield refers to a file or directory named Garfield listed in the root directory on the D partition. While the preferred convention in Windows file paths is to use backslashes, forward slashes work just as well. In UNIX however, file paths must use forward slashes.
The other major difference in UNIX is that partitions are not assigned the drive letters. Instead one partition is mounted at root meaning that the path slash refers to the root directory on that partition. Each additional partition is then made accessible by mounting it to some directory on some other already mounted partition. Here partition 2 is mounted to root and then partition 1 is mounted to the directory / banana on partition 2 and in turn partition 3 is mounted to the slash banana / apple on partition 1. So be clear that / banana becomes synonymous with the root directive partition 1 such that / banana / Apple must refer to a directory Apple in the root of partition 1. With these partitions mounted like this the path / banana / Adam / Nixon now refers to a file or directory named Nixon listed in the directory Adams listed in the route of partition 1. The path / Taylor / Polk / Hayes now refers to a file or directory named Hayes listed in the directory Polk listed in the directory Taylor listed in the route of partition 2. The path / banana / apple / Garfield refers to a file our directory named Garfield listed in the root directory of partition 3. UNIX systems generally require that a directory already exists before can be used as a mount point. If the mountain directory lists any files or directories those entries are effectively obscured by the mounting. So when we mount partition 1 the slash banana on partition 2 we can no longer access the content of that directory because / banana now resolves to the root directory of partition 1.
IPC inter-process communication is an umbrella term for any mechanism of the CPU that facilitates communication between processes. The simplest kind of IPC files can be read and written by multiple processes and those can service channels of communication between them. Other mechanisms include pipes, networking sockets, signals and shared memory and we’ll discuss some of these in the unit on UNIX system calls.