Deadlock Avoidance
Introduction
We focus a great deal of attention on the four necessary conditions for
deadlock, because these form the basis for deadlock prevention. If
we can ensure that any of the four necessary conditions will never
be met, we have created a system where deadlock (involving those
resources) is impossible. There are, however, common situations
in which:
- mutual exclusion is fundamental.
- hold and block are inevitable.
- preemption is unacceptable.
- the resource dependency networks are imponderable.
For many such cases, deadlock avoidance may be an easy
and effective solution.
Consider the following deadlock situation:
- main memory is exhausted.
- we need to swap some processes out to secondary storage
to free up memory.
- swapping processes out involves the creation of new
I/O requests descriptors, which must be allocated from main memory.
The problem, in this case, is not a particular resource dependency
graph, but merely that we exhausted a critical resource. There are
many similar situations where:
- some process will free up resources when it completes.
- but the process needs more resources in order to complete.
In situations like these, it is common to keep track of free
resources, and refuse to grant requests that would put the
system into a dangerously resource-depleted state. Making
such case-by-case decisions to avoid deadlocks is called
deadlock avoidance.
Reservations
Declining to grant requests that would put the system into an unsafely
resource-depleted state is enough to prevent deadlock. But the failure
of a random allocation request (in mid-operation) might be difficult to
gracefully handle. For this reason, it is common to ask processes to
reserve their resources before they actually need them.
Consider the sbrk(2) system call. It does not actually
allocate any more memory to the process. It requests the operating
system to change the size of the data segment in the process' virtual
address space. The actual memory assignments will not happen until
the process begins referencing those newly authorized pages.
If we can determine that the requested reservation would over-tax
memory, we can return an error from the sbrk system call,
which the process can then decide how to handle. If, however,
we waited until a page was referenced before we decided we did not
have sufficient memory, we might have no alternative but to kill
the process.
This approach is not limited to memory. For example ...
- we could refuse to create new files when file system
space gets low.
- we could refuse to create new processes if we found
ourselves thrashing due to pressure on main memory.
- we could refuse to create or bind sockets when network
traffic saturates our service level agreement.
Unlike malloc(3), these other operations do not tell
us how much new resource the process wants to consume. But
in each of these cases there is a request (to which we can
return an error) before we reach actual resource exhaustion.
And it is this failable request that gives us the opportunity to
consider, and avoid a resource-exhaustion deadlock.
Over-Booking
In most situations, it is unlikely that all clients will simultaneously
request their maximum resource reservations. For this reason, it is
often considered relatively safe to grant somewhat more
reservations than we actually have the resources to fulfill.
The reward for over-booking is that we can get more work done
with the same resources. The danger is that there might a
demand that we cannot gracefully handle.
- Air lines do this all the time ... which is why they
occasionally offer cash and free trips to anyone who
is willing/able to take a later flight.
- Required network bandwidth is routinely estimated
based on the expected traffic distribution. Such a
network might be able to handle 25% more traffic
while still maintaining its Service Level Agreements
99.9% of the time.
- In operating systems, the notion of killing random
processes is so abhorrent that most operating systems
simply refuse to over-book. In fact, it is common
to under-book (e.g. reserve the last 10% for
emergency/super-user use).
Dealing with Rejection
What should a process do when some resource allocation request
(e.g. a call to malloc(3)) fails?
- A simple program might log an error message and exit.
- A stubborn program might continue retrying the request
(in hope that the problem is transient).
- A more robust program might return errors for requests that cannot
be processed (for want of resources), but continue trying to serve
new requests (in the hope that the problem is transient).
- A more civic-minded program might attempt to reduce its
resource use (and therefore the number of requests it can serve).
There are many possible responses, and different responses make sense in
different situations. But the key here is that, since the allocation
request failed with a clean error, the process has the opportunity to
try to manage the situation in the most graceful possible way.