Syntax Analyzer

[<<<] [>>>]

The syntax analyzer is a module that reads the token stream delivered by the lexer module and builds a memory data structure containing the syntactically analyzed and built program. The syntax analyzer is contained in the source file `expression.c' The name of this module come from the fact that the most important and most complex task of syntax analysis is the analysis of the expressions.

For the syntax analyzer the program is a series of commands. A command is a series of symbols. There is nothing like command blocks, or one command embedding another command. Therefore the syntax definition is quite simple and yet still powerful enough to define a BASIC like language.

Because syntax analysis is quite a complex task and the syntax analyzer built for ScriptBasic is quite a complex one I recommend that you first read the tutorial from the ScriptBasic web site that talks about the syntax analysis. This is a series of slides together with real audio voice explaining the structure of the syntax analyzer of ScriptBasic.

The syntax analyzer should be configured using a structure containing the configuration parameters and the “global variables” for the syntactical analyzer. This structure contains the pointer to the array containing the syntax definition. Each element of the array defines command syntax. Command syntax is the list of the symbols that construct the command. When the syntactical analyzer tries to analyze a line it tries the array elements until it finds one matching the line. When checking a line against a syntax definition the syntactical analyzer takes the lexical elements on the line and checks that they match the next symbol in the syntax definition. A symbol can be as simple as a reserved word, like if else, endif. Such a syntax element is matched by the specific keyword. On the other hand a symbol on the syntax definition can be as complex as an expression, which is matched by a whole expression.

The syntax analyzer has some built in assumption about the language, but the actual syntax is defined in tables. This way it is possible to analyze different languages using different tables in the same program even in the same process in separate threads.

When the syntax analyzer reads the syntax definition of a line and matches the tokens from the lexer against the syntax element it may do several things:

The first is the case when the syntax element is a constant symbol. For example it is a command keyword. In this case there is nothing to do with the keyword except that the syntax analyzer has to recognize that this is the statement identified by the keyword. The actual code will be generated later when non-constant syntactical elements are found.

When the syntax analyzer sees that the next syntax element is some variable, non-constant syntax element it matches the coming tokens and creates the nodes that hold the actual value for the tokens. For example when the syntax element is string the syntax analyzer checks that the coming token is a string and creates a node that holds the string. The most important example is the syntax element expression. In this case the syntax analyzer checks that the coming tokens form an expression and not only "consumes" these tokens, but creates several nodes that hold the structure of the expression.

We can distinguish between constant and variable symbolic definition elements.

There are some special symbols that are always matched whenever they are checked by the syntax analyzer. They do not consume any lexical element from the line, and generate values in the memory structure that the analyzer builds.

The symbolic definition elements are:

Note that you can find other syntax definition elements in the file syntax.def. However these are converted to a character value by the Perl script tool syntaxer.pl These pseudo syntax definition elements are:
[<<<] [>>>]