The Project FAQ (Thanks to Dr. Mason Chang for providing the following project information):

More information will be added as the class progresses.

 

Parsing the constant pool / .class file:

  1. My fread is weird even though I haven't reached the end of file or I get an ferror! - Make sure you open the file in "binary" mode. Add a b to the open mode. - Read the first line of the paragraph.
  2. What are the file operations on C/C++? Google fopen, fread, fclose, ferror, feof.
  3. The print and other memory operations? printf, scanf, memcpy, memset, sprintf, malloc
  4. I'm getting a crazy big number / negative number when reading a byte! A byte, when casted to an int, will sign extend the value. So if the byte value read is greater than 127, you need to chop off the bottom 8 bits. You can do this by bitwise ANDING the value with 0xFF.
  5. What's the function to get the size of a data type? sizeof
  6. Do you have an example? Sure!
    Consider the beginning of a .out file:

         Magic           CP Entries       E1   E2        NameLen1            NameChars          E3    E4      NameLen2
    FF EE CD AB | 1E 00 00 00 | 00 | 02 | 06 00 00 00 |3C 69 6E 69 74 3E | 00 | 02 | 07 00 00 00
    * | Denotes a logical separation of data as defined in the .out file spec.

    First parse the magic numbers as an integer. Then parse the constant pool size as an integer. The constant pool size refers to the number of ENTRIES in the constant pool, not the actual byte length of the constant pool. Then for each entry, the constant pool is either NONE (0), an integer (1), or a method (2). So the FIRST entry in the constant pool (E1), is the type NONE. The SECOND entry of the constant pool (E2) is a method (2). Then we parse a method constant pool entry.

    The method constant pool entry starts with an integer that tells you how many bytes a method name is. Parse that integer (NameLen1). Then we parse (6) * the number of bytes which is in (NameChars). We have finished parsing ONE method constant pool entry.

    Now we parse the THIRD entry (E3) of the constant pool which is NONE. The fourth entry of the constant pool (E4) is a method (2), so we parse another method using the same technique we used in the previous paragraph.
  7. When will I see a CP_INT? You need a really big constant integer. For example, int x = 323878237; This should turn into the Java bytecode ldc, which then references the constant pool.

 

JVM Interpreter:

  1. How do I know when to call a method? Just do a name compare. If the method you need to call is the correct method name, call it. We can assume all method names are unique.
  2. How do I implement system.out.println or the print function? Whenever you see the method name println or printInt or whatever your print function is, call the C method printf.

 

Building SSA:

  1. What are the three steps again!? 1) Create Basic Blocks. 2) Link Basic Blocks with successors / predecessors. 3) Create Instructions. To do steps 1 and 2 DO NOT require you to implement Instructions yet.
  2. How do I know what pc represents which block? You need a MAP that maps program counter locations in the bytecode (pc) to the basic block. This PC location represents the BEGINNING of the block, not the RANGE of a block. For example, if the first basic block represents PC 0 - 9, your map should only contain one entry for pc 0 -> Basic Block 0.
  3. What are the instruction types I need? Instruction, BinaryInstruction, ConditionInstruction, ConstantInt, PhiInstruction, CallInstruction, UnconditionalBranch, ConditionalBranch, ReturnInstruction. If you do arrays, you need a NewArrayInstruction, SetArrayElementInstruction, and GetArrayElementInstruction. If you do methods, you need a ParameterInstruction for method parameters.
  4. How should I represent everything? A Method object has a List<BasicBlock*>. Each BasicBlock is an object. A BasicBlock has a List<BasicBlock*> for it's successors, and another list for it's predecessors. Each BasicBlock has a List<Instruction*>. Each Instruction points to its operands.
  5. What's in a PhiInstruction? A phi should have a List<BasicBlock*> and a List<Instruction*>. Phi's really have an operand that represents a pair [BasicBlock*, Instruction*], that says if control flow comes from a basic block BB1, the phi represents this Instruction I1. A nice trick is to map each element in the List<BasicBlock*> and List<Instruction*> at the same index. So if a Phi ( [BB2, ConstantValueInstruction(42]), the 0 element of the List<BasicBlock*> can point to BB2 and the 0 element of the List<Instruction*> can point to the ConstantValueInstruction(42).
  6. I'm getting loop headers that have forward GOTOs! Make sure you build your .java files with javac from the command line, not eclipse.
  7. What are these state objects? The state keeps track of what instruction each local variable points to at this current point in time. You will have MULTIPLE state objects. You always operate on one, that you keep updating. However, the begin states and end states of each basic block are CLONED states. If you have one basic block, you have three state objects: The current one you modify, and the begin/end states which are independent states. They may POINT to the same instruction, but they are different state objects.

 

Register Allocator:

  1. What's the basic block order? You need to build your blocks in reverse post order. A sketch of the algorithm is located on wikipedia under topological sorting. Use the version that uses the depth first search. The only kicker is that when you add a node to the list, add it to the BEGINNING of the list, rather than the end of the list. 
  2. What order do I register allocate in? You iterate backwards from the Reverse post order list.
  3. What are the four steps? 1) Get blocks in reverse post order. 2) Build live ranges. 3) Build interference graph. 4) Allocate registers.
  4. Which registers do we NEVER use again? ESP, EBP
  5. How are phis treated? The OPERANDS of a phi, get their live ranges added up to the current phi instruction id. The Operands of the phi DO NOT get added into the live set.

 

Assembler:

  1. How do I read the table of doom / assemble something! - The mod r/m table is freakishly confusing. The whole point of the mod r/m table is to build 1 byte, 8 bits total. It is segmented into three parts. The top 2 bits represent the "MOD" or how to access the register. The next 3 bits represent the destination register and is a value of 0 - 7. The bottom 3 bits represent the source register and is a value of 0 - 7. So the mod r/m byte looks like:

    MOD BIT    dst reg      src register
        00             000         000 

    Let's say we wanted to move EBX to EDX. The opcode for mov is 0x8b. Since we are using the actual registers rather than referring to the value IN the register, the MOD bit is 0b11 (or 3 in decimal). EBX is 3 (0b011) in the table and is our src register. EDX is register 2 (0b010). So in total you need to write two bytes:

    0x8b; -- The mov opcode
    0xd3 = 0b11 010 011 (MOD 11, dst EDX, src EBX)

    You can build it by lots of bit shifting. Ala ( (3 << 6) | (2 << 3) | (3) ). Shift the number "3" six bits left, or with ("2" shift three bits left") or with "3". 

    0b = binary. 0x = hex. 
  2. What x86 instructions should we use? add, sub, idiv, imul, mov, call, push, pop, return, jump, cmp, jcc. Remember, idiv and imul put the result in certain registers. idiv puts the quotient in EAX, and the remainder in EDX. imul can sometimes put certain results in certain registers if you want. jccs must occur immediatley after a cmp.
  3. What am I looking for in the intel manual? How do I know which opcode to use? Look for instructions that use R32/MODRM requests. For example, add, use 0x3 which has the destination (left value) saying ADD r32, r/m32. CMP should use 3b. (CMP r32, r/m32 - the r32 should be the left. Do not use CMP r/m32, r32).
  4. How do I debug this with GDB? Lookup info registers, display /i $pc, si, x commands. Info registers gives you the values in registers. display /i $pc single steps x86 and si. Checkout examining memory here. Or the overall using gdb. GDB is painful.
  5. What happens between a call? You need to save all the registers, perform the call, then restore them. You can do this by pushing all the registers onto the stack, make the call, then pop them off the stack again.
  6. How do I call the print method? Create a C method that calls into printf and takes an integer. Get the address of that method, move it into EAX, and perform a call indirect near. 0xFF /2. The 2 here means the mod/rm bit for the destination register is set to 2. So: (3 << 6) | (2 << 3) | EAX; (EAX here is where we stored the address of printInt, but you can choose any register.
  7. How do I make debugging easier? Between each instruction you generate x86 for, you can insert a nop. The nop opcode does nothing. It will help you deliniate which instructions emitted what x86 code.
  8. How do I resolve Phis? When you reach a jump instruction, check which block you are jumping to. If the target block has phis, get the phi instructions for the target block. Insert moves in your current block from the phi operand register to the phi instruction register.
  9. I'm getting a permission denied when I try to execute jit compiled code - Make sure you mark the page as executable. This is mprotect on OSX/Linux and VirtualProtect on Windows. Include <sys/mman.h> on Unix/ OSX and "Windows.h" on Windows.
  10. What do I do for a constant int instruction? Constants are known as "immediates" in assembly. So look for mov r32, immi32. 0xb8 + rd. The +rd means 0xb8 + the destination register. So let's say you wanted to move the number 10 into register EBX (011 - 3). Remember, the Registers.h file has the registers in the order as the table of doom. So you need to do 0xb8 + EBX (0xb8 + 3) in one byte. Then 32 bit integer for the number 10:
    byte1: 0xb8 + 3.
    byte2 - 6 = 0x0000000A.
  11. What does it mean opcode +rd? Like push/pop instructions? The rd stands for the destination register. So if you want to push EDX, EDX is 010 (2), so the opcode is opcode + 2. Checkout question #11 for a concrete example.
  12. How do you deal with this CDECL stuff? Here is a really good tutorial on it.
  13. How do I generate jump offsets? Your jcc / jmp requires a 4 byte offset. This offset is from the location of the jump instruction TO the memory address of the jump target. So let's say your jump instruction exists at memory location 0x100. And you wanted to jump to address 0x400. You don't fill the 4 byte offset with 0x400. You have to do 0x400 - 0x100 = 0x300. 
  14. Which jump and return instructions should we use? Use the jcc/jmp/ret NEAR instructions, not the FAR ones. 

 

General C/C++:

  1. Where do I get containers (lists, vectors, hashmaps) in C++? You can roll your own by making wrappers around the vector class, or use the std containers here.
  2. What's the syntax to cast a void* to a function pointer? 
    void* compiledCode = assembler.assemble(method); // the location of jit compiled code
    int (*fp)(); // Declare a function pointer to a method that returns an int and takes no parameters
    fp = (int (*)())(compiledCode); // cast the void* to the function pointer
    fp(); // Actually execute it

    Heres more on C and C++ function pointers
  3. I'm making a template class but it says the class doesn't exist! Template classes must be defined in the header file, not .cpp file.
  4. What's the order for C++ include files? .cpp Files should always only include other header files. .h files should NEVER include other header files unless they are standard files (<iostream>, etc). If you are getting a type not defined, you have to forward declare it. In the header file, just type "class SomethingElse" and have pointers to those structures. Here's more info. Another student found this resource more helpful.
  5. What are some C string methods? strncmp, strncpy, strncat. You can convert a C++ string to a C string by calling the data() method on a C++ string.
  6. What is a null terminated string, or how do C strings work (not C++ std::string)? - Check this out
  7. Why does it say a type isn't recognized! You need to forward declare types. When you use the type, you must use a POINTER to the type, not the type itself. For example, if I want to have a list of <BasicBlock> in the Method object, in the Method.h file, you have to have a list of BasicBlock*, NOT BasicBlock. And forward declare BasicBlock. Then in the Method.cpp file, you need to include basicBlock.h.

 

Uncategorized:

  1. Javap doesn't exist - Install the Java JDK, not the JRE.
  2. How do I see the bytecodes in a javac generated .class file (Not our .out file)? javap -v TestClass
  3. What language features do we need to support? if statements, while, for, only local variables that are integers. Arithmetic operators (+, -, *, /).