Next: *Parser, Previous: Parser Language, Up: Parser Language
The matcher language is a declarative language for specifying a matcher procedure. A matcher procedure is a procedure that accepts a single parser-buffer argument and returns a boolean value indicating whether the match it performs was successful. If the match succeeds, the internal pointer of the parser buffer is moved forward over the matched text. If the match fails, the internal pointer is unchanged.
For example, here is a matcher procedure that matches the character `a':
(lambda (b) (match-parser-buffer-char b #\a))
Here is another example that matches two given characters, c1 and c2, in sequence:
(lambda (b) (let ((p (get-parser-buffer-pointer b))) (if (match-parser-buffer-char b c1) (if (match-parser-buffer-char b c2) #t (begin (set-parser-buffer-pointer! b p) #f)) #f)))
This is code is clear, but has lots of details that get in the way of understanding what it is doing. Here is the same example in the matcher language:
(*matcher (seq (char c1) (char c2)))
This is much simpler and more intuitive. And it generates virtually the same code:
(pp (*matcher (seq (char c1) (char c2)))) -| (lambda (#[b1]) -| (let ((#[p1] (get-parser-buffer-pointer #[b1]))) -| (and (match-parser-buffer-char #[b1] c1) -| (if (match-parser-buffer-char #[b1] c2) -| #t -| (begin -| (set-parser-buffer-pointer! #[b1] #[p1]) -| #f)))))
Now that we have seen an example of the language, it's time to look at
the detail. The *matcher
special form is the interface between
the matcher language and Scheme.
The operand mexp is an expression in the matcher language. The
*matcher
expression expands into Scheme code that implements a matcher procedure.
Here are the predefined matcher expressions. New matcher expressions can be defined using the macro facility (see Parser-language Macros). We will start with the primitive expressions.
These expressions match a given character. In each case, the expression operand is a Scheme expression that must evaluate to a character at run time. The `-ci' expressions do case-insensitive matching. The `not-' expressions match any character other than the given one.
These expressions match a given string. The expression operand is a Scheme expression that must evaluate to a string at run time. The
string-ci
expression does case-insensitive matching.
These expressions match a single character that is a member of a given character set. The expression operand is a Scheme expression that must evaluate to a character set at run time.
The
end-of-input
expression is successful only when there are no more characters available to be matched.
The
discard-matched
expression always successfully matches the null string. However, it isn't meant to be used as a matching expression; it is used for its effect.discard-matched
causes all of the buffered text prior to this point to be discarded (i.e. it callsdiscard-parser-buffer-head!
on the parser buffer).Note that
discard-matched
may not be used in certain places in a matcher expression. The reason for this is that it deliberately discards information needed for backtracking, so it may not be used in a place where subsequent backtracking will need to back over it. As a rule of thumb, usediscard-matched
only in the last operand of aseq
oralt
expression (including anyseq
oralt
expressions in which it is indirectly contained).
In addition to the above primitive expressions, there are two
convenient abbreviations. A character literal (e.g. `#\A') is
a legal primitive expression, and is equivalent to a char
expression with that literal as its operand (e.g. `(char
#\A)'). Likewise, a string literal is equivalent to a string
expression (e.g. `(string "abc")').
Next there are several combinator expressions. These closely correspond to similar combinators in regular expressions. Parameters named mexp are arbitrary expressions in the matcher language.
This matches each mexp operand in sequence. For example,
(seq (char-set char-set:alphabetic) (char-set char-set:numeric))matches an alphabetic character followed by a numeric character, such as `H4'.
Note that if there are no mexp operands, the
seq
expression successfully matches the null string.
This attempts to match each mexp operand in order from left to right. The first one that successfully matches becomes the match for the entire
alt
expression.The
alt
expression participates in backtracking. If one of the mexp operands matches, but the overall match in which this expression is embedded fails, the backtracking mechanism will cause thealt
expression to try the remaining mexp operands. For example, if the expression(seq (alt "ab" "a") "b")is matched against the text `abc', the
alt
expression will initially match its first operand. But it will then fail to match the second operand of theseq
expression. This will cause thealt
to be restarted, at which time it will match `a', and the overall match will succeed.Note that if there are no mexp operands, the
alt
match will always fail.
This matches zero or more occurrences of the mexp operand. (Consequently this match always succeeds.)
The
*
expression participates in backtracking; if it matches N occurrences of mexp, but the overall match fails, it will backtrack to N-1 occurrences and continue. If the overall match continues to fail, the*
expression will continue to backtrack until there are no occurrences left.
This matches one or more occurrences of the mexp operand. It is equivalent to
(seq mexp (* mexp))
This matches zero or one occurrences of the mexp operand. It is equivalent to
(alt mexp (seq))
The
sexp
expression allows arbitrary Scheme code to be embedded inside a matcher. The expression operand must evaluate to a matcher procedure at run time; the procedure is called to match the parser buffer. For example,(*matcher (seq "a" (sexp parse-foo) "b"))expands to
(lambda (#[b1]) (let ((#[p1] (get-parser-buffer-pointer #[b1]))) (and (match-parser-buffer-char #[b1] #\a) (if (parse-foo #[b1]) (if (match-parser-buffer-char #[b1] #\b) #t (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f)) (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f)))))The case in which expression is a symbol is so common that it has an abbreviation: `(sexp symbol)' may be abbreviated as just symbol.
The
with-pointer
expression fetches the parser buffer's internal pointer (usingget-parser-buffer-pointer
), binds it to identifier, and then matches the pattern specified by mexp. Identifier must be a symbol.This is meant to be used on conjunction with
sexp
, as a way to capture a pointer to a part of the input stream that is outside thesexp
expression. An example of the use ofwith-pointer
appears above (see with-pointer example).