Originally I wrote ARC as standalone so I could test it without linking the whole thing into AR (the AR binary takes just too long to link). Currently ARC has been integrated into AR, and the standalone mode of ARC is broken.
The standalone arc interpreter starts at main in code.cc. After initializing various things (e.g. the standard types like string and int) the interpreter loads up the class named /main in the root of the virtual filesystem (the root can be changed by supplying another one on the command line). In standalone mode, the interpreter is usually given test/ as filesystem root, which runs the class that exercises various expects of the interpreter.
In standalone mode, all the files contained in the .compilelist file are compiled. Then we hope we managed to compile a /main class and try to look for it. That class is then created, a virtual machine is setup with it, and the function create is called within it.
Some benchmarks are then printed out - usually the ARC code manages some 600,000 instructions per second, if all safety checks are off. This can probably be optimized, but I haven't bothered yet.
compile is used to compile a class. It takes the whole class as parameter. I use a perhaps peculiar mechanism to "upgrade" classes: if the class already exists, the existing object is replaced. Since I keep pointers to classes around (e.g. derived classes keep a pointer to their superclass) this is done by some tricks which involve calling the destructor in place. Maybe I should clean it up someday, but keeping the pointer the same gives some advantages.
Compilation sets up a setjmp buffer. If during compilation something happens, we longjmp back to here. This is far easier than handling error codes. I don't like exceptions - and I didn't feel like introducing exceptions to the code just because of this one place. longjmp may result in some memory loss, but it's negligible.
Compilation happens using a flex/bison generated code, located in arc.l (lexer) and arc.y (parser). The lexer returns tokens to the parser, and the parser tries to put them together as rules. Classes are used heavily there: everything that's statement-like inherits from Fragment. Things that may have a value inherit from Expression. This works fairly well IMHO.
After a tree has been built up, we first call typecheck() (virtual function) on all its contents. All fragments call typecheck on their children then use this results to check if they are valid (e.g. for a BinaryExpression a+b, the expression calls typecheck on a, which determines its type is Integer. b determines it type is Character. Now BinaryExpression dies with a compile error.
After everything has been typechecked, the code is actually generated, by similarly calling generate() everywhere. Each instruction generates the code for its children then its (in varying order: for example a for loop would generate some code for the initializers, then the code for the condition, then perhaps jump out based on the condition, then generate the code for the block, then the post-block expression - and then back to the beginning). Labels are allocated and start at 0; the code keeps a list of where label references are and once everything is in place, those places are backpatched with the real locations of the code.
The virtual machine (Machine.cc) sets the instruction pointer to 0. If there are any locals variables in the function executed, a sufficient number of zeroed variables are pushed on the stack (initialization like int a = 4 is done by actually executing a = 4 when the code starts running).
The 32-bit instruction at the instruction pointer address is examined and broken up in 3 fragments: there is a 6-bit opcode value, a 2-bit opcode flag value and a 24-bit immediate value. We look up the opcode in the opcode table, and move as many arguments from the stack to a separate opcode argument stack. The opcode function (h/opgen.h) is then called.
As it is with stack based machines, everything operates on the stack. There are no real registers that can be manipulated, except the instruction register. Thus, a calculation like a + b will produce the following code: PUSH local variable "a" on the stack (oGetLocal). PUSH local variable "b" on the stack. Call oAdd. The oAdd instruction pops the top two things on the stack, adds them, and stores the result on top of the stack. It is very easy to generate instructions for a machine like this.
The immediate value is used for small parameters: for example, to push a string we store the string number in the immediate value. To jump to another instruction in the same function we store the instruction number in the immediate value.
To flag value is used to slightly modify the behavior of opcodes (rather than introducing new opcodes). There is only one use for that currently: if set to 1, a function call will make the callee NOT clean up the stack. Usually, a function call consists of pushing all the arguments on the stack, pushing the object (if making a method call within another object) and calling one of the oXCall functions. That function finishes with popping the return value if any, popping all arguments and locals and then pushing the return value back. When the flag is set, only locals are popped. This is used for passing "by reference": when a function wants to modify one of its parameters and not just their value, it will use this flag and leave the modified value on the stack. The caller then generates extra codes to copy the value back in the variable where it originated from.
A function call also pushes a frame on the function call stack. It contains the current value of the frame pointer. The frame pointer is then set to point at the first argument passed: a function always accesses its arguments and local variables based on this frame pointer no matter how much is on the stack
Exceptions are in their infancy still. If an exception occurs, a flag is set. The virtual machine interpretor sees this flag and looks back through the exception frames. Each exception frame is pushed when a try {} block is entered and popped when it's exited. The exception frame holds what exceptions are handled here and where to jump to in case of error. When handling exceptions, each frame is examined in turn to see if it can handle this exception. When one is found, any function frames on the stack are popped (freeing any locals) and the error handler is entered.
ARC code can call native functions within native objects. All ARC objects are of type Entity (h/Entity.h). All these objects have a pointer to an Arcable (h/Arcable.h), which defines what methods this object has. For objects which are fully ARC code this is really a Class (h/Class.h).
For native objects this points to a wrapper. The wrapper is automatically generated by running the util/glue2 utility on the interface files. The interface file (called *.intf) have the ARC functions and properties the class should export. Initially, I used to mark the header files directly with comments like // ARC, but that proved too intrusive. The new way is also much less complicated - parsing C++ code was difficult and unreliable.
Functions within those classes which also have this comment by them have a static wrapper created within the Wrapper class. This wrapper takes one argument: A Machine*. It pops the arguments off the machine's stack and calls the native function with them. Any return value if pushed back.
These wrapper functions are then inserted into a hash table which contains the function name (including types) and the wrapper function pointers.
Variables cannot be accessed directly (this is true for both native and normal ARC objects). When an attempt to access variable foobar in an object is done, the compiler looks for the function get$foobar in the object. This allows for Delphi-like properties: you can make it so doing ch.room = some_room will, rather than setting the room directly and screwing things up majorly, call a function which removes the character from the current room and readds her to the target room. The glue generate allows this code to be directly generated read and/or write for variables, so for a variable like "name", you can let the glue generated create the required get$name function.
This is currently really simple - I definitely want multiple inheritance in the future. Classes which inherit have a pointer to their superclass. When a function is called, we first try to search for it in the current class, the the superclass. This allows for function overriding.
Variables are handled in a similar way. Since variables are referred to by index, a class inherits the variables when inheriting and continues to use the next available index. This will work somewhat different with multiple inheritance, but I have a plan for that.
The directory aa/ contains the ARC/AR interface.