rcpu

VM emulator and assembler written in Crystal

Stars
37

= RCPU :experimental: true :toc: :toc-placement!:

link:http://github.com/ddfreyne/rcpu[_RCPU_] is a virtual machine emulator and assembler written in Crystal.

CAUTION: This project is very much a work in progress.

toc::[]

== Building

You will need link:http://crystal-lang.org/[Crystal] to build RCPU. Once you have Crystal, run kbd:[make], which will generate the compiled assembler and emulator binaries in the current directory.

== Usage

To assemble a file, use rcpu-assemble, passing the input filename (with the rcs extension). This will generate a corresponding rcb file. For example:

% ./rcpu-assemble samples/fib.rcs

To run a file, use rcpu-emulate, passing the input filename (with the rcb extension). For example:

% ./rcpu-emulate samples/fib.rcb

The rcs extension is used for RCPU source files, while rcb is for RCPU binary (i.e. compiled) files.

== Tests

Run kbd:[make spec] to compile and run the tests.

== Architecture

=== Registers

[options="header",cols="1,1,1,3"] |=== | Name | Code | Size (bytes) | Purpose | r0 | 0x00 | 4 | general-purpose | r1 | 0x01 | 4 | general-purpose | r2 | 0x02 | 4 | general-purpose | r3 | 0x03 | 4 | general-purpose | r4 | 0x04 | 4 | general-purpose | r5 | 0x05 | 4 | general-purpose | r6 | 0x06 | 4 | general-purpose | r7 | 0x07 | 4 | general-purpose | rpc | 0x08 | 4 | program counter (a.k.a instruction pointer) | rflags | 0x09 | 1 | contains result of cmp(i) | rsp | 0x0a | 4 | stack pointer | rbp | 0x0b | 4 | base pointer | rr | 0x0c | 4 | return value |===

=== Instruction format

Instructions are of variable length.

The first byte is the opcode. Register arguments are one byte long, and label/immediate arguments are four bytes long.

=== Instruction set

[options="header",cols="1,1,1,2"] |=== | Mnemonic | Opcode | Arguments | Effect 4+h|Function handling | call(i) | 0x01, 0x02 | reg/imm | | ret | 0x03 | (none) | 4+h|Stack management | push(i) | 0x04, 0x05 | reg/imm | push a0 onto stack | pop | 0x06 | reg | pop into a0 4+h|Branching | j(i) | 0x07, 0x08 | reg/imm | pc ← a0 | je(i) | 0x09, 0x0a | reg/imm | if = then pc ← a0 | jne(i) | 0x0b, 0x0c | reg/imm | if ≠ then pc ← a0 | jg(i) | 0x0d, 0x0e | reg/imm | if > then pc ← a0 | jge(i) | 0x0f, 0x10 | reg/imm | if ≥ then pc ← a0 | jl(i) | 0x11, 0x12 | reg/imm | if < then pc ← a0 | jle(i) | 0x13, 0x14 | reg/imm | if ≤ then pc ← a0 4+h|Arithmetic | cmp(i) | 0x15, 0x16 | reg, reg/imm | (see below) | mod(i) | 0x17, 0x18 | reg, reg, reg/imm | a0 ← a1 % a2 | add(i) | 0x19, 0x1a | reg, reg, reg/imm | a0 ← a1 + a2 | sub(i) | 0x1b, 0x1c | reg, reg, reg/imm | a0 ← a1 - a2 | mul(i) | 0x1d, 0x1e | reg, reg, reg/imm | a0 ← a1 * a2 | div(i) | 0x1f, 0x20 | reg, reg, reg/imm | a0 ← a1 / a2 | xor(i) | 0x21, 0x22 | reg, reg, reg/imm | a0 ← a1 ^ a2 | or(i) | 0x23, 0x24 | reg, reg, reg/imm | a0 ← a1 | a2 | and(i) | 0x25, 0x26 | reg, reg, reg/imm | a0 ← a1 & a2 | shl(i) | 0x27, 0x28 | reg, reg, reg/imm | a0 ← a1 << a2 | shr(i) | 0x29, 0x2a | reg, reg, reg/imm | a0 ← a1 >> a2 | not | 0x2b | reg, reg | a0 ← ~a1 4+h|Register handling | mov | 0x2c | reg, reg | a0 ← a1 | li | 0x2d | reg, imm | a0 ← a1 4+h|Memory handling | lw | 0x2e | reg, imm | (see below) | lh | 0x2f | reg, imm | (see below) | lb | 0x30 | reg, imm | (see below) | sw | 0x31 | reg, imm | (see below) | sh | 0x32 | reg, imm | (see below) | sb | 0x33 | reg, imm | (see below) 4+h|Special | sleep | 0xfd | imm | sleep for a0 ms | prn | 0xfe | reg | print a0 | halt | 0xff | (none) | stops emulation |===

cmp(i) updates the flags register and sets the 0x01 bit to true if the arguments are equal, and the 0x02 bit to true if the first argument is greater than the second.

lw, lh and lb load data from memory into a register. lw loads a word (4 bytes), lh loads a half word (2 bytes) and lb loads a byte. Similarly, sw, sh and sb store data from a register into memory.

Several opcodes have an (i) variant. These variants take a four-byte immediate argument (meaning the data is encoded in the instruction) rather than a register name. For opcodes that have immediate variants, the Opcode column contains the non-immediate variant followed by the immediate variant.

Label arguments are identical to immediate arguments.

=== Memory layout

[options="header",cols="1,3"] |=== | Range | Use | 0x00 – … | Program code | … – 0xffff | Stack (grows downward, word-aligned) | 0x00010000 - 0x00014b00 | Video memory (160 by 120 pixels, 1 byte per pixel) |===

=== Assembly language syntax

A lines can be an instruction line, a label line, or a data directive line. Blank lines are ignored.

Comments start with the # character and can appear anywhere on a line, including a blank line. For example:


# load coords
li r2, 0                 # x (in px)
li r3, 0                 # y (in px)

An instruction line starts with a tab character, followed by the instruction mnemonic, and arguments separated by commas. For example:


li r3, 0                 # y (in px)
jei @print-string-done
addi rsp, rsp, 12

Register arguments are indicated with an r prefix (e.g. rsp or r0).

Immediate values can be given in decimal (e.g. 123), in hexadecimal (starting with 0x, e.g. 0xfe), or in binary (starting with 0b, e.g. 0b10010000).

Label arguments start with the @ character.

A label line starts with an identifier, followed by a colon. For example:


print-string-loop:

A data directive line starts with a period, followed by the directive name, followed by optional arguments. For example:


.byte 0x73 # s .byte 0x6c # l .byte 0x65 # e .byte 0x65 # e .byte 0x70 # p

.word @char-left-parenthesis # (
.word @char-right-parenthesis # )
.word @char-question-mark # * - TODO
.word @char-question-mark # + - TODO
.word @char-comma # ,
.word @char-dash # -
.word @char-period # .
.word @char-slash # /

The supported data directives are .byte, .half and .word; they insert a byte, a half word (two bytes) or a word (four bytes), respectively.

See the examples in the samples directory for inspiration.

== To do

  • Finish implementing all opcodes
  • Tests
  • Line/column numbers in parser error messages
  • RCPU prefix in binaries