Next: Parser Language, Previous: Port Primitives, Up: Input/Output
The parser buffer mechanism facilitates construction of parsers for complex grammars. It does this by providing an input stream with unbounded buffering and backtracking. The amount of buffering is under program control. The stream can backtrack to any position in the buffer.
The mechanism defines two data types: the parser buffer and the parser-buffer pointer. A parser buffer is like an input port with buffering and backtracking. A parser-buffer pointer is a pointer into the stream of characters provided by a parser buffer.
Note that all of the procedures defined here consider a parser buffer to contain a stream of Unicode characters.
There are several constructors for parser buffers:
Returns a parser buffer that buffers characters read from port.
Returns a parser buffer that buffers the characters in the argument substring. This is equivalent to creating a string input port and calling
input-port->parser-buffer
, but it runs faster and uses less memory.
Like
substring->parser-buffer
but buffers the entire string.
Returns a parser buffer that buffers the characters returned by calling source. Source is a procedure of three arguments: a string, a start index, and an end index (in other words, a substring specifier). Each time source is called, it writes some characters in the substring, and returns the number of characters written. When there are no more characters available, it returns zero. It must not return zero in any other circumstance.
Parser buffers and parser-buffer pointers may be distinguished from other objects:
Returns
#t
if object is a parser-buffer pointer, otherwise returns#f
.
Characters can be read from a parser buffer much as they can be read from an input port. The parser buffer maintains an internal pointer indicating its current position in the input stream. Additionally, the buffer remembers all characters that were previously read, and can look at characters arbitrarily far ahead in the stream. It is this buffering capability that facilitates complex matching and backtracking.
Returns the next character in buffer, advancing the internal pointer past that character. If there are no more characters available, returns
#f
and leaves the internal pointer unchanged.
Returns the next character in buffer, or
#f
if no characters are available. Leaves the internal pointer unchanged.
Returns a character in buffer. Index is a non-negative integer specifying the character to be returned. If index is zero, returns the next available character; if it is one, returns the character after that, and so on. If index specifies a position after the last character in buffer, returns
#f
. Leaves the internal pointer unchanged.
The internal pointer of a parser buffer can be read or written:
Returns a parser-buffer pointer object corresponding to the internal pointer of buffer.
Sets the internal pointer of buffer to the position specified by pointer. Pointer must have been returned from a previous call of
get-parser-buffer-pointer
on buffer. Additionally, if some of buffer's characters have been discarded bydiscard-parser-buffer-head!
, pointer must be outside the range that was discarded.
Returns a newly-allocated string consisting of all of the characters in buffer that fall between pointer and buffer's internal pointer. Pointer must have been returned from a previous call of
get-parser-buffer-pointer
on buffer. Additionally, if some of buffer's characters have been discarded bydiscard-parser-buffer-head!
, pointer must be outside the range that was discarded.
Discards all characters in buffer that have already been read; in other words, all characters prior to the internal pointer. After this operation has completed, it is no longer possible to move the internal pointer backwards past the current position by calling
set-parser-buffer-pointer!
.
The next rather large set of procedures does conditional matching
against the contents of a parser buffer. All matching is performed
relative to the buffer's internal pointer, so the first character to
be matched against is the next character that would be returned by
peek-parser-buffer-char
. The returned value is always
#t
for a successful match, and #f
otherwise. For
procedures whose names do not end in `-no-advance', a successful
match also moves the internal pointer of the buffer forward to the end
of the matched text; otherwise the internal pointer is unchanged.
Each of these procedures compares a single character in buffer to char. The basic comparison
match-parser-buffer-char
compares the character to char usingchar=?
. The procedures whose names contain the `-ci' modifier do case-insensitive comparison (i.e. they usechar-ci=?
). The procedures whose names contain the `not-' modifier are successful if the character doesn't match char.
These procedures compare the next character in buffer against char-set using
char-set-member?
.
These procedures match string against buffer's contents. The `-ci' procedures do case-insensitive matching.
These procedures match the specified substring against buffer's contents. The `-ci' procedures do case-insensitive matching.
The remaining procedures provide information that can be used to identify locations in a parser buffer's stream.
Returns a string describing the location of pointer in terms of its character and line indexes. This resulting string is meant to be presented to an end user in order to direct their attention to a feature in the input stream. In this string, the indexes are presented as one-based numbers.
Pointer may alternatively be a parser buffer, in which case it is equivalent to having specified the buffer's internal pointer.