Next: , Previous: Port Primitives, Up: Input/Output


14.10 Parser Buffers

The parser buffer mechanism facilitates construction of parsers for complex grammars. It does this by providing an input stream with unbounded buffering and backtracking. The amount of buffering is under program control. The stream can backtrack to any position in the buffer.

The mechanism defines two data types: the parser buffer and the parser-buffer pointer. A parser buffer is like an input port with buffering and backtracking. A parser-buffer pointer is a pointer into the stream of characters provided by a parser buffer.

Note that all of the procedures defined here consider a parser buffer to contain a stream of Unicode characters.

There are several constructors for parser buffers:

— procedure: input-port->parser-buffer port

Returns a parser buffer that buffers characters read from port.

— procedure: substring->parser-buffer string start end

Returns a parser buffer that buffers the characters in the argument substring. This is equivalent to creating a string input port and calling input-port->parser-buffer, but it runs faster and uses less memory.

— procedure: string->parser-buffer string

Like substring->parser-buffer but buffers the entire string.

— procedure: source->parser-buffer source

Returns a parser buffer that buffers the characters returned by calling source. Source is a procedure of three arguments: a string, a start index, and an end index (in other words, a substring specifier). Each time source is called, it writes some characters in the substring, and returns the number of characters written. When there are no more characters available, it returns zero. It must not return zero in any other circumstance.

Parser buffers and parser-buffer pointers may be distinguished from other objects:

— procedure: parser-buffer? object

Returns #t if object is a parser buffer, otherwise returns #f.

— procedure: parser-buffer-pointer? object

Returns #t if object is a parser-buffer pointer, otherwise returns #f.

Characters can be read from a parser buffer much as they can be read from an input port. The parser buffer maintains an internal pointer indicating its current position in the input stream. Additionally, the buffer remembers all characters that were previously read, and can look at characters arbitrarily far ahead in the stream. It is this buffering capability that facilitates complex matching and backtracking.

— procedure: read-parser-buffer-char buffer

Returns the next character in buffer, advancing the internal pointer past that character. If there are no more characters available, returns #f and leaves the internal pointer unchanged.

— procedure: peek-parser-buffer-char buffer

Returns the next character in buffer, or #f if no characters are available. Leaves the internal pointer unchanged.

— procedure: parser-buffer-ref buffer index

Returns a character in buffer. Index is a non-negative integer specifying the character to be returned. If index is zero, returns the next available character; if it is one, returns the character after that, and so on. If index specifies a position after the last character in buffer, returns #f. Leaves the internal pointer unchanged.

The internal pointer of a parser buffer can be read or written:

— procedure: get-parser-buffer-pointer buffer

Returns a parser-buffer pointer object corresponding to the internal pointer of buffer.

— procedure: set-parser-buffer-pointer! buffer pointer

Sets the internal pointer of buffer to the position specified by pointer. Pointer must have been returned from a previous call of get-parser-buffer-pointer on buffer. Additionally, if some of buffer's characters have been discarded by discard-parser-buffer-head!, pointer must be outside the range that was discarded.

— procedure: get-parser-buffer-tail buffer pointer

Returns a newly-allocated string consisting of all of the characters in buffer that fall between pointer and buffer's internal pointer. Pointer must have been returned from a previous call of get-parser-buffer-pointer on buffer. Additionally, if some of buffer's characters have been discarded by discard-parser-buffer-head!, pointer must be outside the range that was discarded.

— procedure: discard-parser-buffer-head! buffer

Discards all characters in buffer that have already been read; in other words, all characters prior to the internal pointer. After this operation has completed, it is no longer possible to move the internal pointer backwards past the current position by calling set-parser-buffer-pointer!.

The next rather large set of procedures does conditional matching against the contents of a parser buffer. All matching is performed relative to the buffer's internal pointer, so the first character to be matched against is the next character that would be returned by peek-parser-buffer-char. The returned value is always #t for a successful match, and #f otherwise. For procedures whose names do not end in `-no-advance', a successful match also moves the internal pointer of the buffer forward to the end of the matched text; otherwise the internal pointer is unchanged.

— procedure: match-parser-buffer-char buffer char
— procedure: match-parser-buffer-char-ci buffer char
— procedure: match-parser-buffer-not-char buffer char
— procedure: match-parser-buffer-not-char-ci buffer char
— procedure: match-parser-buffer-char-no-advance buffer char
— procedure: match-parser-buffer-char-ci-no-advance buffer char
— procedure: match-parser-buffer-not-char-no-advance buffer char
— procedure: match-parser-buffer-not-char-ci-no-advance buffer char

Each of these procedures compares a single character in buffer to char. The basic comparison match-parser-buffer-char compares the character to char using char=?. The procedures whose names contain the `-ci' modifier do case-insensitive comparison (i.e. they use char-ci=?). The procedures whose names contain the `not-' modifier are successful if the character doesn't match char.

— procedure: match-parser-buffer-char-in-set buffer char-set
— procedure: match-parser-buffer-char-in-set-no-advance buffer char-set

These procedures compare the next character in buffer against char-set using char-set-member?.

— procedure: match-parser-buffer-string buffer string
— procedure: match-parser-buffer-string-ci buffer string
— procedure: match-parser-buffer-string-no-advance buffer string
— procedure: match-parser-buffer-string-ci-no-advance buffer string

These procedures match string against buffer's contents. The `-ci' procedures do case-insensitive matching.

— procedure: match-parser-buffer-substring buffer string start end
— procedure: match-parser-buffer-substring-ci buffer string start end
— procedure: match-parser-buffer-substring-no-advance buffer string start end
— procedure: match-parser-buffer-substring-ci-no-advance buffer string start end

These procedures match the specified substring against buffer's contents. The `-ci' procedures do case-insensitive matching.

The remaining procedures provide information that can be used to identify locations in a parser buffer's stream.

— procedure: parser-buffer-position-string pointer

Returns a string describing the location of pointer in terms of its character and line indexes. This resulting string is meant to be presented to an end user in order to direct their attention to a feature in the input stream. In this string, the indexes are presented as one-based numbers.

Pointer may alternatively be a parser buffer, in which case it is equivalent to having specified the buffer's internal pointer.

— procedure: parser-buffer-pointer-index pointer
— procedure: parser-buffer-pointer-line pointer

Returns the character or line index, respectively, of pointer. Both indexes are zero-based.