Scribes: Moiz Mian, Sakib Shaikh, Mohammad Poswal, Syed Moizullah
Lecture date: Monday, January 23
Quick Links
Modularity & Abstraction
Mechanisms for Modularity
Other Software Engineering Issues:
Software Collection: In a large system, we can have so many functions that it is too hard to
remember everything at once.
Complexity is the problem
Modularity is the key
Modularity: Breaks the big problem into smaller more manageable pieces
Abstraction: The pieces have "nice" boundaries between each other
Lets say we have
So T = BL ~ L2
If we have M modules, there are B/M bugs per
module, which gives us L/M lines per module. Therefore, T=L2/M and we
can 'decrease' our running time.
Note- this is a very simplistic calculation. Obviously if we made one
module per line it would be really impractical.
Some metrics to think about:
Depending on the application, some metrics may be more important
than others. For example, our word counting app did not need any extra
flexibility because it was designed for only the English professor's
machine. However, simplicity was a factor for the professor because he
or she only wanted to have to boot up the computer and have the app do
everything else.
How to make the program modular?
Lets look at the read_sector function as an example
void read_sector (int s, char* buf)
This is our original
function. It assumes that we will pass in an address of the start of a
sector of memory and it will store the result in buf
. This code will
have good performance because it does exactly what it was intended for
and nothing more. But what if we don't necessarily want to read from
the start of a sector?
void read_data(int o, int n, char* buf)o
is the bytecount from start of the disk. n
is the number of bytes to
read, starting from o
. This function is more modular because it no longer assumes 512 as the
size of the input to be read. Flexibility is increased, but this assumes that the read will be succesful, and that is of course
not always the case.
int read_data_err(...)
This function returns the number of
bytes read or -1 if there was an error. This code is more robust, but
what if there is more than one device? We need more flexibility.
int read_device_data_err(int d, int o, char *buf, int n)
d
is the device number, which is unique for each device. However, the problem with this one is that it assumes we have direct
access to the place on the disk, but we could handle serial devices
like keyboards if we get rid of the bytecount variable o
. This code is a little more flexible, but still not enough.
int read_and_allocate(int d, int o, char **p, int *n)
read_and_allocate()
stores to p
and n
, and returns how many bytes
were
read. However, this is not as simple as having separate functions for
doing reading and writing. This function also has the same problem as
above with serial
devices, so its kind of a step backward.
int read_any_device(int d, int n, char *buf)
This function will read from an device, as long as it has the device
number. Now we have a modular function that doesn't try to do too much,
is able to accept a wide variety of input devices, and can tolerate
errors. Let's compare this to one of the standard POSIX read functions:
int lseek(int d, int o)
This can read from any direct access device, and has even fewer
arguments, making it simpler to use. However, it probably does more
work than our original read_sector()
does to read 512 bytes from a
sector of the disk, and so it will not have as good performance. This
hit in performance comes at the expense of all the other positive
modular characteristics we have gained. The documentation for lseek()
can be found here.
Some faults with using device numbers:
Here's how the problem was solved:
int open(char *name, int flags)
This function assigns a file descriptor to a device as a level of
abstraction. The documentation for open()
can be found here.
ssize_t read(int fd, char *buf, size_t bufsize)
And this function reads the device as if it is a file, referring to its
file descriptor (returned from open()
). It reads the input into a buffer of size bufsize. The documentation for read() can be found here.
int
- signed 32-bitsize_t
- unsigned 64-bitssize_t
- signed 64-bit (used for reporting errors as negative
numbers)Function Call Modularity as implemented in UNIX program data
With this model we can think of the API (Application Programming Interface) as a contract between caller and callee. There is also and ABI (Application Binary Interface)
Lets look at the ABI of an example function. The ABI for read()
is enforced by machine code, but it is very
complicated to use as an example. Instead, let's look at the factorial
function:
int fact(int n)
{if(n)return n*fact(n-1); else return 1;}
This function is assuming that n
will be positive and small enough that
there's no overflow.
fact: |
pushl |
%ebp |
|
|
movl |
$1, %eax |
|
|
movl |
%esp, %ebp |
|
|
subl |
$8, %esp |
|
|
movl |
%ebx, -4(%ebp) |
|
|
movl |
8(%ebp), %ebx |
|
|
testl |
%ebx, %ebx |
|
|
jne |
L5 |
|
L1: |
movl |
-4(%ebp), %ebx |
|
|
movl |
%ebp, %esp |
|
|
popl |
%ebp |
|
|
ret |
|
The ABI makes the 0th argument of fact()
the return address, then
grows the stack and does the work, then cleans up.
L5: |
leal |
-1(%ebx), %eax |
Subtract 1 |
|
movl |
%eax, %esp |
|
|
call |
fact |
Pushes return address onto stack |
|
imull |
%ebx, %eax |
|
|
jnp |
L1 |
The initial function call: fact(12)
pushl |
$12 |
||
call |
fact |
Calls the function | |
addl |
$4, %esp |
Put answer in %esp |
Now we have nice modularity between function calls, but what can go wrong here?