Pattern Matching

Top  Previous  Next

 

Pattern matching is a way to test whether data has a particular structure. It can be used for data validation and as a means to extract the part of a data item that matches against a specified element of the pattern. The pattern matching operations are the query processor LIKE and UNLIKE keywords, the QMBasic MATCHES operator and MATCHFIELD() function, and the M search of ED. All of these compare a character string with a pattern template.

 

Pattern matching breaks the character set into three classes of character, each represented by a character type code:

AAlphabetic, upper and lowercase A - Z
NNumeric, digits 0 - 9
XAny character, including alphanumerics

 

There are also three ways to specify how many characters are present:

4Exactly 4 characters
2-7Between 2 and 7 characters
0Any number of characters, including none

 

The template consists of one or more concatenated items formed from pairs of lengths and character type:

 

0XZero or more characters of any type
nXExactly n characters of any type
n-mXBetween n and m characters of any type
0AZero or more alphabetic characters
nAExactly n alphabetic characters
n-mABetween n and m alphabetic characters
0NZero or more numeric characters
nNExactly n numeric characters
n-mNBetween n and m numeric characters
"string"A literal string which must match exactly. Either single or double quotation marks may be used.

 

The values n and m are integers with any number of digits. m must be greater than or equal to n.

 

 

The 0X code is a wildcard that matches against anything. It has a commonly used synonym:

...Zero or more characters of any type

 

 

The 0A, nA, 0N, nN and "string" patterns may be preceded by a tilde (~) to invert the match condition. For example, ~4N matches four non-numeric characters such as ABCD (not a string which is not four numeric characters such as 12C4).

 

A null string matches patterns ..., 0A, 0X, 0N, their inverses (~0A, etc) and "".

 

The 0X and n-mX patterns match against as few characters as necessary before control passes to the next pattern. For example, the string ABC123DEF matched against the pattern 0X2N0X matches the pattern components as ABC, 12 and 3DEF.

 

The 0N, n-mN, 0A, and n-mA patterns match against as many characters as possible. For example, the string ABC123DEF matched against the pattern 0X2-3N0X matches the pattern components as ABC, 123 and DEF.

 

The template string may contain alternative patterns separated by value marks. The source data will match the overall pattern if any of the pattern values match.

 

 

Examples

 

"A123BCD" would match successfully against patterns of

1A3N3A

1A1-3N3A

'A'1-3N3A

0A0N0A

1A...3A

1A~3A3A

and many more

 

It is often acceptable to omit the quotes around literal components. The above example would also match

A1-3N3A

There is no confusion between the leading A as a literal or as a character type as it is not preceded by a length value. It is, however, recommended that the quotes should be included. Omitting the quotes in a pattern used in the MATCHFIELD() function may affect the function's behaviour as each character of the literal will be counted as a separate component of the pattern.

 

 

A program might need to test whether data entered by a user is a non-negative integer (whole number) value. The QMBasic NUM() function can be used to test for numeric data but this would allow fractional or negative values. Testing against a pattern of "1-4N" would allow only integer values in the range 0 to 9999. To remove the upper limit, a pattern of 1N0N tests for one digit followed by any number of further digits, including none.