Virtual Memory and Distributed Systems

lecture: Wednesday, March, 3, 2013

Scribers: Ze (Albert) Yuan and Jacob Lewe

Contents

  1. General Memory
    1. Caching Policy
    2. RAM and Virtual Memory (choosing victims)
    3. Paging to Hard Disk
  2. Distributed Systems
    1. Remote Procedure Call
    2. Problems with Remote Procedure Call
    3. Example of RPC protocol

General Memory

Caching Policy

Memory Hierarchy

  1. registers ← want to use this mostly
  2. L1 cache
  3. L2 cache
  4. RAM *
  5. disk or SSD *
  6. backup (slow disk, tape, network disk)

*we will look at these

RAM and Virtual Memory

Which part of disk should we put in RAM?

  1. Suppose we have random accesses to virtual memory (lives on disk if too big)
    1. we choose small parts of disk to go into RAM. We need to fit more, but what should we write to virtual memory? (victim)
    2. pick the first page, since it doesn’t matter (you can pick any page)
    3. however, (1) is rare in practice – you typically see Locality of Reference
  2. Suppose we have Locality of Reference
    1. lots of heuristics
    2. simple one: FIFO == pick page that has been in RAM longest
      • way to check how well FIFO works is to run it on lots of apps: get their reference strings (list of pages in order each was needed)
      • suppose we have 5 virtual pages, 3 physical pages
        reference string: 0 1 2 3 0 1 4 0 1 2 3 4
      • -pull 0 1 2 into memory: physical memory full
        -replace 0 with 3, replace 1 with 0, etc.
        (use * for page faults)

          0   1   2   3   0   1    4   0   1   2   3   4 
          A  0*00 3*33 4*44444
          B  ? 1*11 0*0000 2*22
          C  ?? 2*22 1*1111 3*3
        9 page faults

      • buy 1 more physical page, we get:
          0   1   2   3   0   1    4   0   1   2   3   4 
          A  0*00000 4*444 3*3
          B  ? 1*11111 0*000 4*
          C  ?? 2*22222 1*111
          D  ??? 3*33333 2*22
        10 page faults!
        Belady’s anomaly: more memory, but slows program (happens very rarely)

    3. if you have an oracle that knows the future reference string, what’s the best policy?
        0   1   2   3   0   1    4   0   1   2   3   4 
        A  0*00000000 2*22
        B  ? 1*11111111 3*3
        C  ?? 2* 3*33 4*44444
      7 page faults
      registers actually have something like oracle!

    4. approximation to an oracle: LEAST RECENTLY USED (LRU)
        0   1   2   3   0   1    4   0   1   2   3   4 
        A  0*00 3*33 4*44 2*22
        B  ? 1*11 0*00000 3*3
        C  ?? 2*22 1*11111 4*
      10 page faults! LRU doesn't work well with this reference string
      However, LRU usually works better than FIFO
      • mechanism for LRU?
        1. Use hardware to store store clock values, using N bits → HW guys don't want to
        2. Way to simulate without HW help? - use N=1; initially set pages to inaccessible (r), will get page faults → set those pages to rw (set bit=1)
          - choose one of the 0’s as next victim (means it wasn’t used recently)
          - (reset clock values every so often)

Paging to Hard Disk

The read/write head is relatively cheap to access ⇒

Disk Reading Algorithms

  1. Shortest Seek Time First (SSTF) → tempting to do work where head already is (h): cheapest
    + maximizes throughput
    – starvation possible: while moving head, get another request nearby
  2. Elevator algorithm: SSTF, but stay in the same direction if possible
    + no starvation
    – not fair to ends of the disk (middle has least average wait time)
  3. Circular Elevator Algorithm: at end of disk seek to lowest numbered request
    + fair
    - additional time to seek from end to beginning
  4. SSTF + FCFS: have request queue, take a chunk and serve that chunk in SSTF then take next chunk

Distributed Systems

- Until now for around 1/2 of the course, we've been talking about virtualization and using virtualization to produce systems

- Another approach is to use distribution.

Diagram of distributed relationship

- The caller (client) and callee (server) don't share an address space

+ A distributed system provides hard modularity for free

- Data transfer is less efficient

- Hard modularity means that you can no longer make calls by reference

- Example you may no longer: "read(fd, buf, bufsize)" between client-server

- The caller and callee can disagree about fundamental formats such as:

floating point formats, integer formats, big endian vs little endian, or width size

- Another important thing to consider is that the network protocol must specify data representation on the wire :

e.g: 32 bit big-endian, 2's complement for floating points

- A process called marshalling and unmarshalling is used to resolve differences in data representation

Diagram of distributed relationship

- A common client/server pattern to do this is Remote Procedure Call (RPC)

Remote Procedure Call

- A Remote Procedure call looks like a function call at the client

e.g: we call time_t tod("Casablanca")

- We want our tod function to look like:

time_t tod( char* loc) {
//fancy code to calculate the time
return some_time
}

- However the actual code will look more like:

client code: {
package("casablanca", hd)
setup_IP(hd, serveraddr)
send(hd)
receive()
}5

- The server code will also be similar to this

- Doing this is tedious and error prone, especially if you have lots of functions

- Luckily this process can be automated by programs that handle all of the marshalling and unmarshalling

What Can Go Wrong with RPC

- Messages can become corrupted (RAM suffers from this too), solved by adding a checksum to the packet

- Messages can be lost or dropped

- Messages can be duplicated

- Suppose the client sends "unlink("/~/foo/hw5.txt")", and the client never receives a response, then:

1) Keep trying (100ms timeout), This implements At Least Once RPC

2) Return an error, This implements At Most Once RPC

3) Hybrid, try a couple of times then return error

3) Just try once, This implements Exactly Once RPC

Example of a RPC protocol

Example: X window Protocol

"pixrect on wheels"

- The X window protocol basically draws pixels to a screen using the RPC protocol

ex: client: draw_pixel(x,y,r,g,b)

- The setup for Professor Eggert's X server looks like this:

Professor Eggert's X server setup

- So what happens if we want to draw lots of pixels at once ?

- Well we could send each pixel at once, wait for a response, however this causes too much delay

- Instead we want to send lots of requests simultaneously

request pipelining

- We'll get lots of responses back, they could even be in the wrong order, but this doesn't matter

Valid HTML 4.01 Transitional