2.0.2. Execute external preprocessors

[<<<] [>>>]

This function executes the external preprocessors that are needed to be executed either by the command line options or driven by the extensions.

The command line option preprocessors are executed as listed in the character array ppszArgPreprocessor. These preprocessors are checked to be run first.

If there is no preprocessors defined on the command line then the preprocessors defined in the config file for the extensions are executed. The input file name is also modified by the code. The input file name is modified so that it will contain the source code file name after the preprocessing.

The return value of the function is the error code. This is PREPROC_ERROR_SUCCESS if the preprocessing was successful. This value is zero. If the return value is positive this is one of the error codes defined in the file errcodes.def prefixed by PREPROC_.

int epreproc(ptConfigTree pCONF,
             char *pszInputFileName,
             char **pszOutputFileName,
             char **ppszArgPreprocessor,
             void *(*thismalloc)(unsigned int),
             void (*thisfree)(void *)

The first argument pCONF is the configuration data pointer which is passed to the configuration handling routines.

The second argument pszInputFileName is the pointer to the pointer to the input file name.

The third argument is an output variable. This will point to the output file name upon success or to NULL. If this variable is NULL then an error has occured or the file needed no preprocessing. The two cases can be separated based on the return value of the function. If the file needed preprocessing and the preprocessing was successfully executed then this variable will point to a ZCHAR string allocated via the function thismalloc. This is the responsibility of the caller to deallocate this memory space after use calling the function pointed by thisfree.

The fourth argument ppszArgPreprocessor is an array of preprocessors to be used on the input file. This array contains pointers that point to ZCHAR strings. The ZCHAR strings contain the symbolic names of the external preprocessors that are defined in the configuration file. The configuration file defines the actual executable for the preprocessor and the temporary directory where the preprocessed file is stored. The final element of this pointer array should be NULL. If the pointer ppszArgPreprocessor is NULL or the pointer array pointed by this contains only the terminating NULL pointer then the extensions of the file name are used to determine what preprocessors are to be applied. Preprocessors are applied from left to right order of the file extensions.

The arguments thismalloc and thisfree should point to malloc and free or to a similar functioning function pair. These functions will be used via the myalloc.c module and also to allocate the new pszOutputFileName string in case of success. This means that the caller should use the function pointed by thisfree to release the string pointed by pszOutputFileName after the function has returned.


    This module reads the source file into the computer memory. Usually source programs are not too big compared to computer memory and thus can be read into the operational memory (RAM). ScriptBasic source code is approximately 1MB and I develop it on a station that has 386MB memory. This means that even a fairly large program can fit into the memory seamlessly. BASIC programs executed by the ScriptBasic interpreter are likely to be much smaller than that.

    The source code is stored in memory pieces that form a linked list. Each element of the list contains one line of the source code and the information of the line for debugging and error reporting purposes. This information includes the file name that the line was read and the line number. Later when the lexer (detailed later) performs lexical analysis it will inherit this information and when there is a lexical or syntactical error the line number is reported correct.

    The reader module also handles the include and import directives that are used to include files into the source file. (Note that import inserts the content of the file only if it was not loaded yet.)

    The module also processes the lines that look

    use preprocessor

    and loads the internal preprocessor named on the line. Preprocessors

    When the module is ready the latter modules have the full source file in memory ready to be processed. The module also provides getc and ungetc like functions to get the read characters one by one. These are is used by the lexer.


    The lexer module uses the line stream (or the character stream if we view it from a different point of view) provided by the reader. It reads the characters and builds up a linked list. Each element of the list contains a token, like BASIC keyword, a real or integer number, symbol, string, multi-line string, or character. The list of tokens is stored in a form of linked list in the order the tokens appear in the input. Each element also contains extra information about the token that identifies the name of the file and the line number inside the file where the token originally was.

    When the lexer is finished the list of lines is not really needed any more and the reader is ready to release the memory occupied by the source lines read into memory.

    The lexer also provides functions that are used by the syntax analyzer to read the tokens in sequence one after the other as needed by the syntax analysis.


    The syntaxer reads the list of tokens provided by the lexical analysis module and creates an internal structure that is already very similar to the executable internal code of ScriptBasic. The syntax analyzer finds any programming error that is not syntactically correct and when it is ready the result is a huge, cross-linked memory structure that contains the almost-executable code.

    The syntax analyzer is responsible building up the evaluation trees of the expressions, the execution nodes, variable numbering and so on.

    When the code refers to a variable named for example variable the syntax analyzer is responsible to allocate a slot for the variable and to convert the name to a serial number that identifies the variable whenever it is used. Beyond the syntax analyzer there are no named variables anymore (except in case of debuggers). There are global variables listed from 1 to n and local variables also listed by numbers. There are also no names for the functions. Each function is identified by a C pointer to the node where the function starts.

    To ease the life of those who want to embed ScriptBasic the symbol table that list the global variables and the functions and subroutines is appended to the byte-code and there are functions in the scriba_* embedding interface that handles these symbol tables. However ScriptBasic itself does not use variable or functions/subroutine names beyond the syntax analyzer.


    The builder is the module that creates the code, which is used by the execution system. Why do we have a separate builder? Isn't it the role of the syntax analyzer to build the code?

    Yes, and no. The code that was created by the syntax analyzer could be used to execute the BASIC program, but ScriptBasic still inserts an extra transformation before executing the program. The reason for this extra step is to create a byte code that can be stored in a continuous memory area and thus can easily be saved to or loaded from disk.

    When the syntax analyzer creates the nodes it does not know the actual number of nodes of the byte-code, nor the number of different strings, or size of the string table. While the code is created the syntax analyzer allocates memory for each new block it creates one by one. The nodes are linked together using C pointers. This means that the final memory structure is neither continuous in memory nor can be saved or loaded back to disk.

    When the builder starts the number of the nodes just as well as the total string constant size is known. The builder allocates the memory needed for the whole code and fills in the actual code. The node size is a bit smaller than that of the syntax analyzer and they refer to each other using node serial numbers instead of pointers. This is almost as efficient as using pointers and the actual value does not depend on the location of the node in memory and this way the code can be saved to disk and loaded again for execution.


    The executor kills the code. Oh no! I am just kidding.

    It actually executes the code. It gets the code that was generated by the module builder and executes the nodes one by one and finally exits.

    The following sections detail these modules and also some other modules that help these modules to perform their actual tasks.

    [<<<] [>>>]