Text and binary files

[<<<] [>>>]

Before going into details we have to talk about the different types of files and the different ways a file can be opened.

A file is a byte stream for ScriptBasic. You can open a file for reading, writing, or to do both of these operations without closing and reopening the file. Some operating systems, like Windows NT, distinguish between text files and binary files. Other operating systems, like UNIX do not make such a strict distinction.

Text files usually contain lines of text. Binary files may contain any code. When you open a file that you wan to write textual data into, or you know that the file contains textual data, you have to use one of the text mode opening. When you want to handle arbitrary data, use binary type of opening.

Calling a file to be text or binary under Windows NT is not precise. All files are the same. The differentiation between textual and binary should be set up on the way we use the files. You can open a text file in binary mode and you can open a binary file in text mode. A file itself is nor binary neither text. The difference is how we handle them.

Binary files are simple. Bytes follow each other and when we read or write such a file the operating system reads or writes the bytes as they are without any conversion. Whenever you want to go to a certain position of a file using the command seek, you can without problem.

Textual files are a bit different. Whenever you open a file in textual mode and start to read it, the operating system converts each carriage-return and line-feed character pairs to a single line feed character. Whenever you write to a file opened in textual mode all line-feed characters produce two characters in the output file: a carriage-return and a line-feed character.

Setting the position in a textual file is also tricky. Should we count the bytes in the file or the bytes we have wrote to the file. They are different. If we have send x pieces of new-line characters to the file that generated two times x bytes. The safest rule is: do not position within a text file. Read it from start, write it from start or append to it. The not so safe rule is position to "non-calculated" position. Your calculation may be erroneous, but if you position to a value that was reported by the function pos() beforehand, you may be safe.

UNIX operating systems handle all files in one way that makes life simpler. Text file lines are separated with a single line-feed character and opening a file in textual mode is just the same as opening in binary mode. If you write a program for UNIX you need not worry about textual and binary mode. But this is true only so long as long your program is running under UNIX. Whenever someone starts to use your program under Windows he or she may start to report you bugs that did not appear under UNIX. Therefore it is recommended that you use the appropriate file opening modes under UNIX the same way you would under Windows NT.

There are five different ways you can open a file. These are

There is an extra mode called socket that opens connection to a remote machine and not a file. That will be detailed later.

The first four modes are textual modes. Open a file for input whenever you want to read data from the file. If you open a file for output you can write to the file. However all data that was stored in the file is destroyed, and in case the file did not exist it is created. If you want to write new data to a file still keeping the existing content open the file for appending to it, using the keyword append.

Random gives a relative freedom to open a file for both reading and writing. When you open a file using the keyword random, you can both read from the file and write to the file. You can, for example, read from the file until you find the end of the file and then append after the current content new lines of text. In another scenario you can read the current content, seek to the position zero, which means the start of the file and rewrite the existing content. However you should only use the mode random only when you really want to write the file. Operating system permission may allow you to read a certain file and not to write. Trying to open the file in mode random will not be successful in that situation and prevent your program's ability to read the file content.

The last mode is binary. This opens the file for both reading and writing in binary mode. You can send any data to the file without conversion and you can position the file pointer to any position without worry.

All file modes but input create the file for you if it did not exists and all modes but input create the directory for the file if it did not exist. In other words there is no need to ensure that the directory exists for a file before opening it using any of the outputting modes.

The format of the open statement is:

open file for mode as [ # ] file-number [LEN=record_length]

The parts between the [ and ] characters are optional.

The `file' is the name of the file to be opened. The mode is one of the keywords above. The file-number is the number you want to refer to the opened file. This number should be between 1 and 512 in the current implementation of ScriptBasic. 512 is the maximum number of concurrently open files in ScriptBasic, unless the underlying operating system gives a lower limit. The file number should be free to use, in other words no two files can be opened at a time with the same file number. However if you close a file, the file number is released and can be used in subsequent open statements.

The # character is optionally allowed before the file number for compatibility reasons with other BASIC interpreters.

The last optional part specifies the record length in terms of bytes. If this record length is not defined the file is treated as series of bytes. If the record length is larger than one the positioning statements count the position and length in records instead of bytes.


[<<<] [>>>]