Next: Parser-language Macros, Previous: *Matcher, Up: Parser Language
The parser language is a declarative language for specifying a
parser procedure. A parser procedure is a procedure that
accepts a single parser-buffer argument and parses some of the input
from the buffer. If the parse is successful, the procedure returns a
vector of objects that are the result of the parse, and the internal
pointer of the parser buffer is advanced past the input that was
parsed. If the parse fails, the procedure returns #f
and the
internal pointer is unchanged. This interface is much like that of a
matcher procedure, except that on success the parser procedure returns
a vector of values rather than #t
.
The *parser
special form is the interface between the parser
language and Scheme.
The operand pexp is an expression in the parser language. The
*parser
expression expands into Scheme code that implements a parser procedure.
There are several primitive expressions in the parser language. The first two provide a bridge to the matcher language (see *Matcher):
The
match
expression performs a match on the parser buffer. The match to be performed is specified by mexp, which is an expression in the matcher language. If the match is successful, the result of thematch
expression is a vector of one element: a string containing that text.
The
noise
expression performs a match on the parser buffer. The match to be performed is specified by mexp, which is an expression in the matcher language. If the match is successful, the result of thenoise
expression is a vector of zero elements. (In other words, the text is matched and then thrown away.)The mexp operand is often a known character or string, so in the case that mexp is a character or string literal, the
noise
expression can be abbreviated as the literal. In other words, `(noise "foo")' can be abbreviated just `"foo"'.
Sometimes it is useful to be able to insert arbitrary values into the parser result. The
values
expression supports this. The expression arguments are arbitrary Scheme expressions that are evaluated at run time and returned in a vector. Thevalues
expression always succeeds and never modifies the internal pointer of the parser buffer.
The
discard-matched
expression always succeeds, returning a vector of zero elements. In all other respects it is identical to thediscard-matched
expression in the matcher language.
Next there are several combinator expressions. Parameters named pexp are arbitrary expressions in the parser language. The first few combinators are direct equivalents of those in the matcher language.
The
seq
expression parses each of the pexp operands in order. If all of the pexp operands successfully match, the result is the concatenation of their values (byvector-append
).
The
alt
expression attempts to parse each pexp operand in order from left to right. The first one that successfully parses produces the result for the entirealt
expression.Like the
alt
expression in the matcher language, this expression participates in backtracking.
The
*
expression parses zero or more occurrences of pexp. The results of the parsed occurrences are concatenated together (byvector-append
) to produce the expression's result.Like the
*
expression in the matcher language, this expression participates in backtracking.
The
*
expression parses one or more occurrences of pexp. It is equivalent to(seq pexp (* pexp))
The
*
expression parses zero or one occurrences of pexp. It is equivalent to(alt pexp (seq))
The next three expressions do not have equivalents in the matcher language. Each accepts a single pexp argument, which is parsed in the usual way. These expressions perform transformations on the returned values of a successful match.
The
transform
expression performs an arbitrary transformation of the values returned by parsing pexp. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is called with the vector of values as its argument, and must return a vector or#f
. If it returns a vector, the parse is successful, and those are the resulting values. If it returns#f
, the parse fails and the internal pointer of the parser buffer is returned to what it was before pexp was parsed.For example:
(transform (lambda (v) (if (= 0 (vector-length v)) #f v)) ...)
The
encapsulate
expression transforms the values returned by parsing pexp into a single value. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is called with the vector of values as its argument, and may return any Scheme object. The result of theencapsulate
expression is a vector of length one containing that object. (And consequentlyencapsulate
doesn't change the success or failure of pexp, only its value.)For example:
(encapsulate vector->list ...)
The
map
expression performs a per-element transform on the values returned by parsing pexp. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is mapped (byvector-map
) over the values returned from the parse. The mapped values are returned as the result of themap
expression. (And consequentlymap
doesn't change the success or failure of pexp, nor the number of values returned.)For example:
(map string->symbol ...)
Finally, as in the matcher language, we have sexp
and
with-pointer
to support embedding Scheme code in the parser.
The
sexp
expression allows arbitrary Scheme code to be embedded inside a parser. The expression operand must evaluate to a parser procedure at run time; the procedure is called to parse the parser buffer. This is the parser-language equivalent of thesexp
expression in the matcher language.The case in which expression is a symbol is so common that it has an abbreviation: `(sexp symbol)' may be abbreviated as just symbol.
The
with-pointer
expression fetches the parser buffer's internal pointer (usingget-parser-buffer-pointer
), binds it to identifier, and then parses the pattern specified by pexp. Identifier must be a symbol. This is the parser-language equivalent of thewith-pointer
expression in the matcher language.