Creating a scripting engine

I decided to write this article because I've already made many scripting engines for my projects and I know the needs of such an engine. When I saw at Hugi's homepage that there was need for an article about coding a scripting engine I said to myself 'Hey Bad, this article is waiting for you to be written', and I did it.

Here I'll not talk and provide code about a specific language, like C or Pascal. I'll only write the way to make a scripting engine, not the code that you need. Just the theory behind it.

Ok, let's start with the...

First things

Let's say that you have the following script for a scripting demo and you want to make a script engine that executes it (warning: big script!):

; this is a script that can be used for making a simple script-based
demo
; (c) 2001 Kostas Michalopoulos aka Bad Sector

; define variables
var timer
; first part - a simple scroller, a fire effect and two bouncing
balls
set timer, 0
set_scroller "Welcome to my ScriptDemo, a fantastic demo ... blah ..."
first_part_loop:
  inc timer
  wait 10
  if timer = 20 start_fire
  if timer = 50 add_bouncing_ball
  if timer = 55 add_bouncing_ball
  if timer < 100 goto first_part_loop
fade_out fast
stop_fire
remove_bouncing_balls

; second part - two semi-visible bouncing balls infont of a tunner
effect and a scroller (again)
fade_in slow
set timer, 0
set_visibility 50
add_bouncing_ball
second_part_loop:
  inc timer
  wait 10
  if timer = 10 add_bouncing_ball
  if timer = 30 start_tunnel
  if timer = 50 set_scroller "what do you think about my lame effects ... blah ..."
if timer < 200 goto second_part_loop
fade_out fast

(Note: that script came from my imagination right now and I don't know if it will work. :))

Ok, now we have the script, but how can we execute it? We can simply make a parser that executes the script line-by-line. It's very easy, but also it's very slow. So let's do something faster. We can compile it to data that we can run faster and then execute that data... how? Take a look..

How many commands are we using? Let's count them: var, set, set_scroller, inc, wait, if, start_fire, add_bouncing_ball, end, fade_out, stop_fire, remove_bouncing_balls, fade_in, set_visibility and start_tunnel, so they are 15. Now let's refer to them as numbers instead of words. So var is now 0, set is 1, set_scrolleris 2, etc and start_tunnel is 14.

Now we can make a parser that changes the commands to numbers and their parameters in a form that is easier for the other parser (the one that will execute the pcode) to understand what it is. Look at commands wait and set_visibility. Both of them can take numbers and variables as parameters. In a script engine that doesn't compile to pcode, when the parser tries to understand what kind of parameter that is, it must check the starts and ends with " to see if is a string and it is then to use the string inside the " as parameter. If not then it must check if it is a variable and if there is a value for that variable (and use this value as parameter). If the parameter isn't a variable then it must check if is a number. All these checks must be done everytime the parser reaches that command. In our script engine, we'll do the same thing but only one time, when compiling the script to pcode. Then we'll mark the parameter as string, variable or number so the second parser know how to handle it without doing tests all the time.

The first parser

Here is how the first parser (the one that takes the script and converts/compiles it to pcode) works. This parser will parse commands line-by-line (that means that a line cannot contain more that one commands).

So, say that we have this line to parse:

      set_scroller "Welcome to my ScriptDemo, a fantastic demo ... blah ..."

Every line is separated in three parts: the spaces, the command and the parameters. The spaces are just space bar or tab characters that the programmer adds in front of every line just to make the code look better. These spaces MUST be removed so that the command will be parsed correctly. The remaining two parts are what their names say; the command and its parameters.

In the given script, the command is separated from the parameters with a space character (' ') and the parameters from each other either with a space character or a comma (','). We can easily write a piece of code that separates the command from the parameters and the parameters from each other (if you don't know how, then this is something to think about; don't wait all this from me, this is a theory article) by checking if we are inside or outside from a string (a flip-flop system will help here).

Once the commands have been separated from the parameters, we can replace them with a number (here, a good idea is to use a table). This number will be stored somewhere (for example to a command list) as a byte or an unsigned integer (word), depending on how many commands we have. Then the parameters will be checked to see if they are strings, variables or numbers (in some cases you can mark number as strings, but it's better to separate strings from number just to avoid the conversion - where it's needed - later) and they will be saved in the same place where the number of the command was saved, probably in another list (parameter list). Last, you must save the count of the parameters (how many are they?) in the same place.

Special cases

Note that there are special cases, that you must check what these parameters contain and write another kind of information in the record. For example the command if can have this form:

if parameter1 = | <> | > | < parameter2 command...

Here we can't check all parameters if they are values (strings, vars, etc), because for sure '=', '<>', '>' and '<' aren't and the same goes for the command. In this case, a number must be saved in the record; this number represens the second parameter ('=', '<>', '>' and '<'). The first and the third parameters will be saved normally as all other parameters and the command will be parsed and saved in the next place in the command list. Simply, if the if command doesn't want to run the command, the second parser will skip the next from the if command.

Another special case is the goto command and the labels. In the example script, labels are ending with the ':' character, so we can just check if the command ends with this character and if so, add it to a list (label list). Each record of this list must contain the name and the position of the next command from this label (this is easy; just use the position of the label, because you'll not save it to the commands list, so the next command will be saved in this position ;)). Then when the parser reaches to a goto command, instead of saving its parameter to the command list, search in the label list to find the given label and store in command list, its position (a pointer in numeric format maybe?) so that when the second parser reaches a number that represents the goto command, it will not have to search the label list to find the given label and will immediately jump to the position where this label was defined.

Other special cases are the fade_in and fade_out commands, but I'll leave it to you to find out why they are... :)

The second parser

Now, we have the script compiler (probably not to pcode as we said, but anyway, in a form that allows us to run it more easily and faster). The 2nd parser is the easiest part of the script engine because the dirty work is done by the first one. So we execute each command in the command list by checking its number (faster than checking a whole word, right?). Then if a command needs a parameter (for example set_scroller), we get the nth parameter from the list and use it. :) EASY. The same goes for all other commands except... special cases (again).

There are some special cases, like the if command. We have to check if the given parameters are TRUE and if not, we must skip (don't run) the next command in command list. Another special case is - of course - the goto command. If a goto command is given, we must find the specified record in the label list and continue the execution of the script (pcode or anyway) from the position specified in the record.

That is how a scripting engine is done. I hope that you found it easy (because it IS easy!).

Saving the compiled code to a file

There are cases when you probably want to save the compiled code to a file (e.g. in a diskmag), instead of compiling the sources again and again. In these cases you must save all lists (command list, parameter list in each command list recordand label list) to the file and then to load them again before use the second parser, so that the compiled script will be executed normally.

More optimizing

You can optimize your script engine by defining more numbers for one command that takes many forms. A good example is the if command that takes four forms (one for '=', one for '>', one for '<' and one for '<>'), so instead of making the engine checking these values (and use more cpu cycles), you can define one number for each form (this will make the machine code bigger though).

Another way to make the script engine faster is to make the first parser reserve some space for each variable (must make a list to avoid duplicates) and where these variables are referred, instead of saving their name, to save a pointer that points to the reserved memory. This will save a lot of cpu cycles, but you cannot save the compiled code to a file (except if everytime you load the code, re-allocate the memory for these variables).

Why all this and not a simple command-by-command parser? Because of the speed. One year ago, I was making simple parsers when I needed a script engine. The result was good, but slow (about 40-50 commands per second). In the last months, I needed a very fast script parser but I didn't know how to make one. One day when I was thinking what to do about that, I remembered that the Sierra On-Line's adventure game engine, named AGI, was fast in my old XT because was using compiled scripts. Also I remembered that until its 4th version, Visual Basic had been using p-code for its EXEcutables. So I said to myself 'this is what I've to do; to use p-code compiled scripts' and so I did. Actually when I started to program my scripting engine in DJGPP (a freeware 32bit C/C++ compiler for DOS) I wasn't sure if the result would be faster than 200-300 commands per second. And this is the reason why I was impressed when I counted 16000 commands per second! And my engine wasn't (and isn't yet :)) so optimized. For example I'm still searching for variables instead of using pointers (and I don't need to save the compiled code to any file :P). Think how much faster will this engine become after optimizing it!

Of course there is a big difference between 50 (and even 300) and 16000 commands per second, so the result deserves the work. ;)

Epilogue

I'll try to write another article describing the use of begin-end marks and maybe subfunctions. But I'll do that later, probably in one or two months, I'll see...

Good coding,

Bad Sector of The Lab