PROCESSES and threads have one goal: Getting a computer to do more than one thing at a time. To do that, the processor (or processors) must switch smoothly among several tasks, which requires application programs designed to share the computer's resources.
That is why programmers need to split what programs do into processes and threads.
Every program running on a computer uses at least one process. That process consists of an address space (the part of the computer's memory where the program is running) and a flow of control (a way to know which part of the program the processor is running at any instant). In other words, a process is a place to work and a way to keep track of what a program is doing. When several programs are running at the same time, each has its own address space and flow of control.
To serve multiple users, a process may need to fork, or make a copy of itself, to create a child process. Like its parent process, the child process has its own address space and flow of control. Usually, however, when a parent process is terminated, all of the child processes it has launched will also be killed automatically.
A multitasking operating system, such as Unix or Windows, switches among the running processes, giving CPU time to each in turn. If a computer has multiple CPUs, each process may be specifically assigned to one of the CPUs.
That's fine for simple programs. But a complex modern application, such as a word processor or spreadsheet, may actually look to the operating system like several different programs, with almost continuous switching and communication among them.
That's a problem, because it takes time to switch among processes. Modern CPUs include memory management units (MMU) that prevent any process from overrunning another's address space. Moving from one process to another—called context switching—means reprogramming the MMU to point to a different address space plus saving and restoring process information.
The operating system handles the details of the context switch, but it all takes time. And because each process is isolated from the others, communicating between processes requires special functions with names like signals and pipes. Like context switches, interprocess communications require processor time.
All that time adds up when many programs are running at once or when many users each require several processes running at the same time. The more processes running, the greater the percentage of time the CPU and operating system will spend doing expensive context switches.
With enough processes to run, a server might eventually spend almost all of its time switching among processes and never do any real work.
Threading Through
To avoid that problem, programmers can use threads. A thread is like a child process, except all the threads associated with a given process share the same address space.
For example, when there are many users for the same program, a programmer can write the application so that a new thread is created for each user.
Each thread has its own flow of control, but it shares the same address space and most data with all other threads running in the same process. As far as each user can tell, the program appears to be running just for him.
The advantage? It takes much less CPU time to switch among threads than between processes, because there's no need to switch address spaces. In addition, because they share address space, threads in a process can communicate more easily with one another.
If the program is running on a computer with multiple processors, a single-process program can be run by only one CPU, while a threaded program can divide the threads up among all available processors. So moving a threaded program to a multiprocessor server should make it run faster.
The downside? Programs using threads are harder to write and debug. Not all programming libraries are designed for use with threads. And not all legacy applications work well with threaded applications. Some programming tools also make it harder to design and test threaded code.
Thread-related bugs can also be more difficult to find. Threads in a process can interfere with one another's data. The operating system may limit how many threads can perform operations, such as reading and writing data, at the same time. Scheduling different threads to avoid conflicts can be a nightmare.
Still, as complex, shared code and multiprocessor servers become more common, threads will continue to speed up multitasking.
|