126.96.36.199. What is an expression in ScriptBasic
188.8.131.52. Some NOTE on SymbolXXX functions
The syntax analyzer is a module that reads the token stream delivered by the lexer module and builds a memory data structure containing the syntactically analyzed and built program. The syntax analyzer is contained in the source file `expression.c' The name of this module come from the fact that the most important and most complex task of syntax analysis is the analysis of the expressions.
For the syntax analyzer the program is a series of commands. A command is a series of symbols. There is nothing like command blocks, or one command embedding another command. Therefore the syntax definition is quite simple and yet still powerful enough to define a BASIC like language.
Because syntax analysis is quite a complex task and the syntax analyzer built for ScriptBasic is quite a complex one I recommend that you first read the tutorial from the ScriptBasic web site that talks about the syntax analysis. This is a series of slides together with real audio voice explaining the structure of the syntax analyzer of ScriptBasic.
The syntax analyzer should be configured using a structure containing the configuration parameters and the “global variables” for the syntactical analyzer. This structure contains the pointer to the array containing the syntax definition. Each element of the array defines command syntax. Command syntax is the list of the symbols that construct the command. When the syntactical analyzer tries to analyze a line it tries the array elements until it finds one matching the line. When checking a line against a syntax definition the syntactical analyzer takes the lexical elements on the line and checks that they match the next symbol in the syntax definition. A symbol can be as simple as a reserved word, like if else, endif. Such a syntax element is matched by the specific keyword. On the other hand a symbol on the syntax definition can be as complex as an expression, which is matched by a whole expression.
The syntax analyzer has some built in assumption about the language, but the actual syntax is defined in tables. This way it is possible to analyze different languages using different tables in the same program even in the same process in separate threads.
When the syntax analyzer reads the syntax definition of a line and matches the tokens from the lexer against the syntax element it may do several things:
- recognizes that the syntax element matches the coming token and goes on
- recognizes that the syntax element matches the coming token(s) and creates one or more new nodes in memory that hold the values associated with the tokens
- recognizes the syntax element, does not match it against any token, and does some side effect
The first is the case when the syntax element is a constant symbol. For example it is a command keyword. In this case there is nothing to do with the keyword except that the syntax analyzer has to recognize that this is the statement identified by the keyword. The actual code will be generated later when non-constant syntactical elements are found.
When the syntax analyzer sees that the next syntax element is some variable, non-constant syntax element it matches the coming tokens and creates the nodes that hold the actual value for the tokens. For example when the syntax element is string the syntax analyzer checks that the coming token is a string and creates a node that holds the string. The most important example is the syntax element expression. In this case the syntax analyzer checks that the coming tokens form an expression and not only "consumes" these tokens, but creates several nodes that hold the structure of the expression.
We can distinguish between constant and variable symbolic definition elements.
A constant symbolic element is matched by a constant symbol. A constant symbolic element is a reserved keyword, or a special character that should appear at a certain position on the line.
A variable symbolic element on the other hand is matched by several different actual values. The simplest example of a variable symbolic element is a number. A number in the syntax definition tells the analyzer that a number should appear at the position on the line. However it does not specify the value of the number. Any number can appear and is valid at the position. When a variable symbolic element is matched the actual value, which was presented on the line and matched the symbolic definition element is stored in the memory structure that the analyzer builds.
There are some special symbols that are always matched whenever they are checked by the syntax analyzer. They do not consume any lexical element from the line, and generate values in the memory structure that the analyzer builds.
The symbolic definition elements are:
Note that you can find other syntax definition elements in the file syntax.def. However these are converted to a character value by the Perl script tool syntaxer.pl These pseudo syntax definition elements are:
expression This element matches an expression. When this syntax definition element should be matched the syntactical analyzer tries to work up an expression starting from the actual position in the lexical unit stream.
expression_list This element matches a list of expressions separated by comma characters.
string This element is matched by a single string. Whenever you think to use this syntax definition element consider using expression instead. This element matches only a single string and not an expression resulting string value.
integer This element is matched by an integer number. Whenever you think to use this syntax definition element consider using expression instead. This element matches only a single string and not an expression resulting integer value.
float This element is matched by an integer or float number. Whenever you think to use this syntax definition element consider using expression instead. This element matches only a single string and not an expression resulting integer or float value. Implementing a command that requires this symbolic element should accept integer values at the same location because this value matches any float or integer value. Any integer value passed on a float syntax location is converted to float during compile time. (Note that the C code does not use the C type float. All float numbers are stored and handled using the C type double.)
symbol This element accepts a symbol. Before using this syntax definition element you should be familiar with the other elements that may accept a symbol for a certain role. You should use this element when wanting to deal with the actual name of the symbol during run time. Note however that there are other elements that you should consider before using this syntax definition element.
absolute_symbol This element accepts a symbol similar to the syntax definition element symbol. The difference is that symbol accepts a relative symbol which is treated as belonging to the current name space unless explicit name space was defined. Absolute symbol is taken without any modification.
name_space This syntax definition element is matched by an absolute symbol and sets the current name space.
end_name_space this syntax definition element does not consume any lexical elements, but closes a name space and the surrounding name space is used afterwards.
lval This element can be matched by a left value. That is some variable reference to which value can be assigned.
lval_list This element can be matched by a list of left values. A list is several left values separated by commas.
local_start This syntax definition element is always matched, and does not consume any lexical element from the list. When this syntax definition element is reached by the syntax analyzer it starts a new local scope. Such a point is usually the start of a function or procedure.
local_end This syntax definition element is the pair of local start. This syntax definition element is always matched, and does not consume any lexical element from the list. When this syntax definition element is reached by the syntax analyzer it finishes the local scope. Such a point is usually the end of a function or procedure. Note that local scopes can not be nested.
local This syntax definition element is matched by a symbol, which is treated as a local variable. The symbol is modified according to name space. The syntax definition element is not matched whenever it is tried to be used outside of local scope.
local_list This symbol definition element is matched by a comma separated list of local.
function This symbol definition element is matched by a symbol, which is treated as a function name. Note that the local scope does not automatically start when such a syntax definition element is matched.
thisfn This symbol definition element is matched within a local scope by the name of the actual function or procedure. This is usually used to describe the assignment that assigns a return value to the function name.
label This symbol definition matches a label, which is usually used after “goto” like instructions.
label_def This symbol definition is matched by a symbol. This symbol is going to be treated as a label. All labels are global.
come_forward These symbol definition elements should be used to define block and looping structures. Whenever an instruction like for/next or if/then/else is used it has to define where to continue execution based on the condition. The symbol definition elements go-forward and come-back place the current instruction location on a compile time stack. The symbol definition elements go-back and come-forward take the last element from the same stack. They also check that the location was placed on the stack by a matching construct, assuring that no out of order nesting structures appear, like if/for/endif/next.
nl end of line ('\n' character)
tab tab character ('\t' character)