If you are reading this article I guess that you are interested in programming and building tools from scratch. Today I would like to tell you how to build a simple programming language and a virtual machine for its execution. Nowadays the community has a lot of technologies, languages and frameworks, so our current goal is not to invent “yet another useless language, and present it like a silver bullet”, but to understand how to design and animate programs written in our own language.
Having discussed our goal, we can start the building process. First of all we need to describe some grammatics, that our language will follow.
Disclaimer: I’m very interested in esoteric programming languages like HQ9+ whose creator has invented a bucket of esoteric programming languages, for example ZOMBIE, a programming language for evil necromancers. I would like to support the naming tradition and name the programming language that we are inventing SLAP. Simple Language for Aggressive Programmers. Its distinctive feature will be aggressive language statements that perform regular operations.
Our language will be simple, and statements will look like assembler opcodes. It means that every line of code contains one operation, and there are no operations or statements that can be placed in two or more lines.
SLAP will have a few groups of opcodes:
SLAP’s memory model will have only one global scope, because it has no ability to define functions, and no garbage collection, therefore, once defined, the variable will live forever.
This is a minimalistic design which allows to implement any (or almost any) algorithm and interact with users. Let’s describe each group, its usage rules and restrictions.
PLOPIN name value
PLOPIN opcode defines variables with specified name and value in global scope. There are no restrictions for name and value, but there are only two data types - int and string. If the value is not int, it will be saved as string. To redefine existing variable or assign value of any variable to another variable, you can use the following syntax:
PLOPIN var1, var2
The virtual machine that executes our code, will use stack (sometime in the future. not now), so the language must have an ability to interact with stack.
There are two opcodes:
SLAPSHOT [var | val]
- pushes the value of the variable (or directly saves value) to the top of the stack;BACKSLAP [var]
- pops the value from the top of the stack and saves it into a variable. If the variable with the given name does not exist, it will be defined.Next, we need a few opcodes to do some arithmetical operations:
operand1 SLAP operand2
is equal to operand1 = operand1 + operand2operand1 UNSLAP operand2
is equal to operand1 = operand1 - operand2operand1 MULTISLAP operand2
is equal to operand1 = operand1 * operand2operand1 MULTIUNSLAP operand2
is equal to operand1 = operand1 / operand2As you can see, we have no kind of operand to save the result into, so you can backup the value of the left operand to the stack, or use PLOPIN to copy your variable.
These kinds of operands produce boolean values which can be used by flow control opcodes:
operand1 PUNCHES operand2
- equal to operand1 > operand2operand1 NONSMACK operand2
- equal to operand1 == operand2I would not implement opcodes for less, less than, greater than, and not equal operations just because I’m lazy, and that two opcodes are enough to implement certain kinds of algorithms.
It is necessary to mention that these opcodes will add the result of comparison on the top of the stack.
Opcodes that were defined above are necessary, but we have no way to interact with users. Let’s fix this!
SLAPPUT [var]
- used to prompt user and save input to the variableSLAPSCREEN [var | val]
- used to print the value or the variable (or just direct value) to the screenThe most important part of our concept is flow control. In regular programming languages we have statements like if-else and loops that can be used for branching or cycling the flow of execution. Let the SLAP have the following abilities:
[mark]
- any word or other character sequence that is not an opcode is used like a “bookmark”. Jump opcodes will seek the mark and pass the control to that instructionBEAT [mark]
- unconditionally jump to markKNOCK [mark]
- more polite version of BEAT. KNOCK will jump to the mark only if there is a logical True (or 1) on the top of the stack (produced by PUNCHES or NONSMACK opcodes).As practice shows, this amount of opcodes and their meanings is enough to implement something like factorial calculation, or Fibonacci number calculation.
Now, we have decided on a language concept, and it’s time to talk about its implementation. We will use Python for the first implementation of a SLAP’s virtual machine.
I have created the class SlapVM, which consists of:
Furthermore, we need a bunch of methods:
I need to say a few words about opcodes’ implementation. As you may see, all opcodes have operands. Opcode can have one or two operands. I will call them left operand (or left argument - larg) and right operand (or right argument - rarg). Each method implementing opcode must have the following sequence of arguments:
def opcode(cls, larg, rarg, *args):
pass
That helps us to keep all opcodes compatible with any count of arguments. Or you can just use an args list, but this approach is more implicit than the first, so I prefer the first convention.
Our “sample” SlapVM code will look like the following:
import tokenize
from io import StringIO
class SlapVM(object):
__code=list()
__ip=0
__stack = list()
__vars=dict()
@classmethod
def run(cls, code: str):
pass
@classmethod
def run_line(cls, code):
pass
@classmethod
def get_opcodes(cls):
return {
"slapshot": cls.slapshot,
"backslap": cls.backslap,
"slap": cls.slap,
"unslap": cls.unslap,
"multislap": cls.multislap,
"multiunslap": cls.multiunslap,
"slapscreen": cls.slapscreen,
"slapput": cls.slapput,
"knock": cls.knock,
"beat": cls.beat,
"plopin": cls.plop_in,
"punches": cls.punches,
"nonsmack": cls.nonsmack
}
@classmethod
def slapshot(cls, var, *args):
pass
After that, I need to explain how the run and run_line methods work. Look at the following code:
@classmethod
def run(cls, code: str):
cls.__stack = list()
cls.__vars=dict()
cls.__code=list()
cls.__ip=0
cls.__code=code.splitlines()
while cls.__ip < len(cls.__code):
cls.run_line(cls.__code[cls.__ip])
cls.__ip = cls.__ip + 1
@classmethod
def run_line(cls, code):
stream = StringIO(code)
tokens = tokenize.generate_tokens(stream.readline)
larg = rarg = opcode = None
for toknum, tokval, _, _, _ in tokens:
if toknum == tokenize.NAME:
if cls.is_opcode(tokval):
opcode = tokval
elif larg is None:
larg = int(tokval) if tokval.isdigit() else tokval
else:
rarg = int(tokval) if tokval.isdigit() else tokval
if toknum == tokenize.NUMBER:
if larg is None:
larg = int(tokval) if tokval.isdigit() else tokval
else:
rarg = int(tokval) if tokval.isdigit() else tokval
if opcode is not None:
cls.execute_code_object(opcode, larg, rarg)
I will start the explanation from the run_line method.
How run_line method works
We know that our program consists of lines, and each line can be executed by VM. So, from this point of view, the run_line function is the main method that executes our code. It performs the following steps:
Summarizing, the run_line method runs the line of code.
How run method works
run is the method that stays on the top of SlapVM and becomes an entrypoint for code execution. It performs simple steps:
If somewhere in the code the opcode KNOCK or BEAT occurs, it directly changes the __ip value, so the next line will be executed from that offset into __code list.
Next, we need to define methods used to do the work that opcodes are supposed to handle. I would not add a complete listing here, you can see it on github.
We have finished a great job, but we still need to write and execute a simple program. Let it be a factorial calculator.
SLAPPUT factorial
PLOPIN fact_tmp factorial
factorial_calc
fact_tmp UNSLAP 1
factorial MULTISLAP fact_tmp
fact_tmp PUNCHES 1
KNOCK factorial_calc
SLAPSCREEN factorial
This program asks the user for a number, copies it to a temporary variable, and starts multiplying the first value to its copy, decrementing a copy while it is more than 1. After that temporary value becomes 1, the program prints factorial value and ends.