25.113.1. LIKE Details

[<<<] [>>>]

Pattern matching in ScriptBasic is similar to the pattern matching that you get used to on the UNIX or Windows NT command line. The operator like compares a string to a pattern.

string like pattern

Both string and pattern are expressions that should evaluate to a string. If the pattern matches the string the result of the operator is true, otherwise the result is false.

The pattern may contain normal characters, wild card characters and joker characters. The normal characters match themselves. The wild card characters match one or more characters from the set they are for. The joker characters match one character from the set they stand for. For example:

Const nl="\n"
print "file.txt" like "*.txt",nl
print "file0.txt" like "*?.txt",nl
print "file.text" like "*.txt",nl

will print

-1
-1
0

The wild card character * matches a list of characters of any code. The joker character ? matches a single character of any code. In the first print statement the * character matches the string file and .txt matches itself at the end of the string. In the second example * matches the string file and the joker ? matches the character 0. The wild card character * is the most general wild card character because it matches one or more of any character. There are other wild card characters. The character # matches one or more digits, $ matches one or more alphanumeric characters and finally @ matches one or more alpha characters (letters).

*	all characters
#	0123456789
$	0123456789abcdefghijklmnopqrstxyvwzABCDEFGHIJKLMNOPQRSTXYVWZ
@	abcdefghijklmnopqrstxyvwzABCDEFGHIJKLMNOPQRSTXYVWZ

A space in the pattern matches one or more white spaces, but the space is not a regular wild card character, because it behaves a bit different.

Note that wild card character match ONE or more characters and not zero or more as in other systems. Joker characters match exactly one character, and there is only one joker character by default, the character ?, which matches a single character of any code.

We can match a string to a pattern, but that is little use, unless we can tell what substring the joker or wildcard characters matched. For the purpose the function joker is available. The argument of this function is an integer number, n starting from 1 and the result is the substring that the last pattern matching operator found to match the nth joker or wild card character. For example

Const nl="\n"
if "file.txt" like "*.*" then
  print "File=",joker(1)," extension=",joker(2),nl
else
  print "did not match"
endif

will print

File=file extension=txt

If the pattern did not match the string or the argument of the function joker is zero or negative, or is larger than the serial number of the last joker or wild card character the result is undef.

Note that there is no separate function for the wild card character substrings and one for the joker characters. The function joker serves all of them counting each from left to right. The function joker does not count, nor return the spaces, because programs usually are not interested in the number of the spaces that separate the lexical elements matched by the pattern.

Sometimes you want a wild card character or joker character to match only itself. For example you want to match the string "13*52" to the pattern two numbers separated by a star. The problem is that the star character is a wild card character and therefore "#*#" matches any string that starts and ends with a digit. But that may not be a problem. A * character matches one or more characters, and therefore "#*#" will indeed match "13*52". The problem is, when we want to use the substrings.

Const nl="\n"
a="13*52" like "#*#"
print joker(1)," ",joker(3),nl
a="13*52" like "#~*#"
print joker(1)," ",joker(2),nl

will print

1 52
13 52

The first # character matches one character, the * character matches the substring "3*" and the final # matches the number 52.

The solution is the pattern escape character. The pattern escape character is the tilde character: ~. Any character following the ~ character is treated as normal character and is matched only by itself. This is true for any normal character, for wild card characters; joker characters; for the space and finally for the tilde character itself. The space character following the tilde character matches exactly one space characters.

Pattern matching is not always as simple as it seems to be from the previous examples. The pattern "*.*" matches files having extension and joker(1) and joker(2) can be used to retrieve the file name and the extension. What about the file sciba_source.tar.gz? Will it result

File=scriba_source.tar extension=gz

or

File=scriba_source extension=tar.gz

The correct result is the second. Wild card characters implemented in ScriptBasic are not greedy. They eat up only as many characters as they need.

Up to now we were talking about wild card characters and the joker character defining what matches what as final rule carved into stone. But these rules are only the default behavior of these characters and the program can alter the set of characters that a joker or wild card character matches.

There are 13 characters that can play joker or wild card character role in pattern matching. These are:

*  #  $  @  ?  &  %  !  +  /  |  <  >

When the program starts only the first five characters have special meaning the others are normal characters. To change the role of a character the program has to execute a set joker or set wild command. The syntax of the commands are:

set joker expression to expression set wild expression to expression

Both expressions should evaluate to string. The first character of the first string should be the joker or wild card character and the second string should contain all the characters that the joker or wild card character matches. The command set joker alters the behavior of the character to be a joker character matching a single character in the compared string. The command set wild alters the behavior of the character to be a wild card character matching one or more characters in the compared string. For ex-ample if you may want the & character to match all hexadecimal characters the program has to execute:

set wild "&" to "0123456789abcdefABCDEF"

If a character is currently a joker of wild card character you can alter it to be a normal character issuing one of the commands

set no joker expression set no wild expression

where expression should evaluate to a string and the first character of the string should give the character to alter the behavior of.

The two commands are identical, you may always use one or the other; you can use set no joker for a character being currently wild card character and vice versa. You can execute the command even if the character is currently a normal character in the pattern matching game.

Using the commands now we can see that


[<<<] [>>>]