The Project
FAQ (Thanks to Dr. Mason Chang for providing the following project information):
More
information will be added as the class progresses.
Parsing the
constant pool / .class file:
- My fread is weird even though I haven't reached the end
of file or I get an ferror! - Make sure you open the file in "binary"
mode. Add a b
to the open mode. - Read the first line of the paragraph.
- What are the file operations on C/C++? Google fopen,
fread, fclose, ferror, feof.
- The print and other memory operations? printf, scanf,
memcpy, memset, sprintf, malloc
- I'm getting a crazy big number / negative number
when reading a byte! A byte, when casted to an int, will sign extend
the value. So if the byte value read is greater than 127, you need to chop
off the bottom 8 bits. You can do this by bitwise ANDING the value with
0xFF.
- What's the function to get the size of a data type? sizeof
- Do you have an example? Sure!
Consider the beginning of a .out file:
Magic CP Entries
E1 E2 NameLen1
NameChars
E3 E4 NameLen2
FF EE CD AB | 1E 00 00 00 | 00 | 02 | 06 00 00 00 |3C 69 6E 69 74 3E | 00
| 02 | 07 00 00 00
* | Denotes a logical separation of data as defined in the .out file spec.
First parse the magic numbers as an integer. Then parse the constant pool
size as an integer. The constant pool size refers to the number of ENTRIES
in the constant pool, not the actual byte length of the constant pool. Then
for each entry, the constant pool is either NONE (0), an integer (1), or a
method (2). So the FIRST entry in the constant pool (E1), is the type
NONE. The SECOND entry of the constant pool (E2) is a method (2). Then we
parse a method constant pool entry.
The method constant pool entry starts with an integer that tells you how
many bytes a method name is. Parse that integer (NameLen1). Then we parse
(6) * the number of bytes which is in (NameChars). We have finished
parsing ONE method constant pool entry.
Now we parse the THIRD entry (E3) of the constant pool which is NONE. The
fourth entry of the constant pool (E4) is a method (2), so we parse
another method using the same technique we used in the previous paragraph.
- When will I see a CP_INT? You need a really big
constant integer. For example, int x = 323878237; This should turn into
the Java bytecode ldc, which then references the constant pool.
JVM
Interpreter:
- How do I know when to call a method? Just do a name compare.
If the method you need to call is the correct method name, call it. We can
assume all method names are unique.
- How do I implement system.out.println or the print
function? Whenever
you see the method name println or printInt or whatever your print
function is, call the C method printf.
Building SSA:
- What are the three steps again!? 1) Create Basic Blocks.
2) Link Basic Blocks with successors / predecessors. 3) Create
Instructions. To do steps 1 and 2 DO NOT require you to implement
Instructions yet.
- How do I know what pc represents which block? You need a MAP that
maps program counter locations in the bytecode (pc) to the basic block.
This PC location represents the BEGINNING of the block, not the
RANGE of a block. For example, if the first basic block represents PC 0 -
9, your map should only contain one entry for pc 0 -> Basic Block 0.
- What are the instruction types I need? Instruction,
BinaryInstruction, ConditionInstruction, ConstantInt, PhiInstruction,
CallInstruction, UnconditionalBranch, ConditionalBranch,
ReturnInstruction. If you do arrays, you need a NewArrayInstruction,
SetArrayElementInstruction, and GetArrayElementInstruction. If you do
methods, you need a ParameterInstruction for method parameters.
- How should I represent everything? A Method object
has a List<BasicBlock*>. Each BasicBlock is an object. A BasicBlock
has a List<BasicBlock*> for it's successors, and another list
for it's predecessors. Each BasicBlock has a List<Instruction*>.
Each Instruction points to its operands.
- What's in a PhiInstruction? A phi should have
a List<BasicBlock*> and a List<Instruction*>. Phi's really
have an operand that represents a pair [BasicBlock*, Instruction*], that
says if control flow comes from a basic block BB1, the phi represents this
Instruction I1. A nice trick is to map each element in the
List<BasicBlock*> and List<Instruction*> at the same index. So
if a Phi ( [BB2, ConstantValueInstruction(42]), the 0 element of the
List<BasicBlock*> can point to BB2 and the 0 element of the
List<Instruction*> can point to the ConstantValueInstruction(42).
- I'm getting loop headers that have forward GOTOs! Make sure you build
your .java files with javac from the command line, not eclipse.
- What are these state objects? The state keeps track
of what instruction each local variable points to at this current point in
time. You will have MULTIPLE state objects. You always operate on one,
that you keep updating. However, the begin states and end states of each
basic block are CLONED states. If you have one basic block, you have three
state objects: The current one you modify, and the begin/end states which
are independent states. They may POINT to the same instruction, but they
are different state objects.
Register
Allocator:
- What's the basic block order? You need to build your
blocks in reverse
post order. A sketch of the algorithm is located on wikipedia under topological
sorting. Use the version that uses the depth first search. The only
kicker is that when you add a node to the list, add it to
the BEGINNING of the list, rather than the end of the
list.
- What order do I register allocate in? You iterate backwards
from the Reverse post order list.
- What are the four steps? 1) Get blocks in
reverse post order. 2) Build live ranges. 3) Build interference graph. 4)
Allocate registers.
- Which registers do we NEVER use again? ESP, EBP
- How are phis treated? The OPERANDS of a phi, get their live ranges added up
to the current phi instruction id. The Operands of the phi DO NOT get
added into the live set.
Assembler:
- How do I read the table of doom / assemble
something! - The
mod r/m table is freakishly confusing. The whole point of the mod r/m
table is to build 1 byte, 8 bits total. It is segmented into three parts.
The top 2 bits represent the "MOD" or how to access the
register. The next 3 bits represent the destination register and is a
value of 0 - 7. The bottom 3 bits represent the source register
and is a value of 0 - 7. So the mod r/m byte looks like:
MOD BIT dst reg src register
00 000
000
Let's say we wanted to move EBX to EDX. The opcode for mov is 0x8b. Since
we are using the actual registers rather than referring to the value IN
the register, the MOD bit is 0b11 (or 3 in decimal). EBX is 3 (0b011) in
the table and is our src register. EDX is register 2 (0b010). So in total
you need to write two bytes:
0x8b; -- The mov opcode
0xd3 = 0b11 010 011 (MOD 11, dst EDX, src EBX)
You can build it by lots of bit shifting. Ala ( (3 << 6) | (2
<< 3) | (3) ). Shift the number "3" six bits left, or with
("2" shift three bits left") or with "3".
0b = binary. 0x = hex.
- What x86 instructions should we use? add, sub, idiv,
imul, mov, call, push, pop, return, jump, cmp, jcc. Remember, idiv and
imul put the result in certain registers. idiv puts the quotient in EAX,
and the remainder in EDX. imul can sometimes put certain results in
certain registers if you want. jccs must occur immediatley after a cmp.
- What am I looking for in the intel manual? How do I
know which opcode to use? Look for instructions that use R32/MODRM
requests. For example, add, use 0x3 which has the destination (left value)
saying ADD r32, r/m32. CMP should use 3b. (CMP r32, r/m32 - the r32 should
be the left. Do not use CMP r/m32, r32).
- How do I debug this with GDB? Lookup info registers,
display /i $pc, si, x commands. Info registers gives you the values in
registers. display /i $pc single steps x86 and si. Checkout examining memory
here. Or the overall
using gdb. GDB is painful.
- What happens between a call? You need to save all
the registers, perform the call, then restore them. You can do this by
pushing all the registers onto the stack, make the call, then pop them off
the stack again.
- How do I call the print method? Create a C method that
calls into printf and takes an integer. Get the address of that method,
move it into EAX, and perform a call indirect near. 0xFF /2. The 2 here
means the mod/rm bit for the destination register is set to 2. So: (3
<< 6) | (2 << 3) | EAX; (EAX here is where we stored the
address of printInt, but you can choose any register.
- How do I make debugging easier? Between each
instruction you generate x86 for, you can insert a nop. The nop opcode
does nothing. It will help you deliniate which instructions emitted what
x86 code.
- How do I resolve Phis? When you reach a jump instruction,
check which block you are jumping to. If the target block has phis, get
the phi instructions for the target block. Insert moves in your current
block from the phi operand register to the phi instruction register.
- I'm getting a permission denied when I try to execute
jit compiled code -
Make sure you mark the page as executable. This is mprotect
on OSX/Linux and VirtualProtect
on Windows. Include <sys/mman.h> on Unix/ OSX and
"Windows.h" on Windows.
- What do I do for a constant int instruction? Constants are known as
"immediates" in assembly. So look for mov r32, immi32. 0xb8 +
rd. The +rd means 0xb8 + the destination register. So let's say you wanted
to move the number 10 into register EBX (011 - 3). Remember, the
Registers.h file has the registers in the order as the table of doom. So
you need to do 0xb8 + EBX (0xb8 + 3) in one byte. Then 32 bit integer for
the number 10:
byte1: 0xb8 + 3.
byte2 - 6 = 0x0000000A.
- What does it mean opcode +rd? Like push/pop
instructions?
The rd stands for the destination register. So if you want to push EDX,
EDX is 010 (2), so the opcode is opcode + 2. Checkout question #11 for a
concrete example.
- How do you deal with this CDECL stuff? Here is a really
good tutorial on it.
- How do I generate jump offsets? Your jcc / jmp requires
a 4 byte offset. This offset is from the location of the jump instruction
TO the memory address of the jump target. So let's say your jump
instruction exists at memory location 0x100. And you wanted to jump to
address 0x400. You don't fill the 4 byte offset with 0x400. You have to do
0x400 - 0x100 = 0x300.
- Which jump and return instructions should we use? Use the jcc/jmp/ret
NEAR instructions, not the FAR ones.
General
C/C++:
- Where do I get containers (lists, vectors, hashmaps) in
C++? You
can roll your own by making wrappers around the vector class, or use
the std containers
here.
- What's the syntax to cast a void* to a function
pointer?
void* compiledCode = assembler.assemble(method); // the location of jit
compiled code
int (*fp)(); // Declare a function pointer to a method that returns an int
and takes no parameters
fp = (int (*)())(compiledCode); // cast the void* to the function pointer
fp(); // Actually execute it
Heres more on C and C++ function pointers.
- I'm making a template class but it says the class
doesn't exist! Template
classes must be defined in the header file, not .cpp file.
- What's the order for C++ include files? .cpp Files should
always only include other header files. .h files should NEVER include
other header files unless they are standard files (<iostream>, etc).
If you are getting a type not defined, you have to forward declare it. In
the header file, just type "class SomethingElse" and have
pointers to those structures. Here's
more info. Another student found this resource more
helpful.
- What are some C string methods? strncmp, strncpy,
strncat. You can convert a C++ string to a C string by calling the data()
method on a C++ string.
- What is a null terminated string, or how do C strings
work (not C++ std::string)? - Check
this out.
- Why does it say a type isn't recognized! You need to forward
declare types. When you use the type, you must use a POINTER to the type,
not the type itself. For example, if I want to have a list of
<BasicBlock> in the Method object, in the Method.h file, you have to
have a list of BasicBlock*, NOT BasicBlock. And forward declare
BasicBlock. Then in the Method.cpp file, you need to include basicBlock.h.
Uncategorized:
- Javap doesn't exist - Install the Java JDK,
not the JRE.
- How do I see the bytecodes in a javac generated .class
file (Not our .out file)? javap -v TestClass
- What language features do we need to support? if statements, while,
for, only local variables that are integers. Arithmetic operators (+, -,
*, /).