vendor/bernparsec - Bern Libraries

The core idea

A parser is a small recipe for reading text. You build large parsers out of small ones using combinators - functions that take parsers and return a new, bigger parser. A parser is not a function; it is a plain object that describes what to read, and a single interpreter walks that description against the input. That makes parsers cheap to build, easy to print, and composable.

import vendor/bernparsec

greeting = before(string("hello"), eof())   -- "hello" then end-of-input
parseTest(greeting, "hello")
-- Output: "hello"

Every combinator on this page comes in two flavours: the terse core name (used throughout) and a plain-English friendly alias (listed in the aliases table). They are interchangeable - mix whichever reads best.

Running a parser

Running a parser against input produces a result object:

Field	Meaning
`result["ok"]`	`true` if the parser matched
`result["value"]`	the parsed value (when `ok`)
`result["state"]`	where parsing stopped - `index`, `line`, `column`
`result["expected"]`	list of what was expected (when it failed)
`result["message"]`	a human-readable error (when it failed)

parse(parser, source_name, input) → result

Run a parser, labelling the input source_name for error messages. (run is the friendly alias.)

r = parse(string("hi"), "greeting.txt", "hi there")
r["ok"]
-- Output: true
r["value"]
-- Output: "hi"

parseTest(parser, input) → value | string

Run a parser and return the parsed value on success, or a pretty error string on failure - perfect for the REPL. (Alias: quickParse.)

parseTest(char('a'), "abc")
-- Output: 'a'
parseTest(char('a'), "xyz")
-- Output: ":1:1: unexpected input; expected: 'a'"

errorBundlePretty(result) → string

Format any result as a one-line source:line:column: message; expected: … string. (Alias: prettyError.)

r = parse(char('a'), "in", "zzz")
errorBundlePretty(r)
-- Output: "in:1:1: unexpected input; expected: 'a'"

Primitive parsers

char(c) → parser

Match exactly the character c.

parseTest(char('a'), "abc")
-- Output: 'a'

string(token) → parser

Match the exact text token.

parseTest(string("let"), "let x = 1")
-- Output: "let"

anySingle() → parser

Match any single character.

parseTest(anySingle(), "Z!")
-- Output: 'Z'

oneOf(chars) → parser

Match any one character that appears in chars.

parseTest(oneOf("+-*/"), "*3")
-- Output: '*'

noneOf(chars) → parser

Match any one character that does not appear in chars.

parseTest(noneOf(" \t\n"), "hi")
-- Output: 'h'

satisfy(pred) · satisfy(pred, label) → parser

Match a character for which pred(c) is true. The optional label names it in error messages.

vowel = satisfy(\c -> oneOf("aeiou") != -1, "vowel")
parseTest(satisfy(\c -> c == 'x', "an x"), "xyz")
-- Output: 'x'

eof() → parser

Succeed only at the end of the input (consumes nothing).

parseTest(before(string("ok"), eof()), "ok")
-- Output: "ok"
parseTest(before(string("ok"), eof()), "okay")
-- Output: (error: expected end of input)

pure(value) → parser

Always succeed, yielding value without consuming input. (Alias: succeed.)

parseTest(pure(42), "anything")
-- Output: 42

fail_parser(message) → parser

Always fail with message. (Alias: failWith.)

parseTest(fail_parser("nope"), "abc")
-- Output: ":1:1: nope; expected: "

Character classes

Ready-made satisfy parsers for common classes, each with a sensible error label.

digitChar() → parser

One digit 0–9.

parseTest(digitChar(), "7x")
-- Output: '7'

letterChar() → parser

One letter a–z / A–Z.

parseTest(letterChar(), "Bern")
-- Output: 'B'

alphaNumChar() → parser

One letter or digit.

parseTest(some(alphaNumChar()), "id42 ")
-- Output: ['i', 'd', '4', '2']

spaceChar() · newline() · tab() → parser

One whitespace character, a newline, or a tab respectively.

parseTest(spaceChar(), " x")
-- Output: ' '

space() · space1() → parser

space() matches zero or more whitespace characters; space1() requires at least one.

parseTest(space(), "   hi")
-- Output: [' ', ' ', ' ']

Sequencing & mapping

mapP(parser, mapper) → parser

Run parser, then transform its result with mapper. (Alias: mapResult.)

parseTest(mapP(digitChar(), char_to_digit), "9")
-- Output: 9   (the integer, not the character)

thenP(left, right) → parser

Run both in order, keep the right value. (Alias: keepRight.)

parseTest(thenP(char('$'), decimal()), "$50")
-- Output: 50   (the '$' is consumed but discarded)

before(left, right) → parser

Run both in order, keep the left value. (Alias: keepLeft.)

parseTest(before(decimal(), char('%')), "75%")
-- Output: 75

bind(parser, to_parser) → parser

Result-dependent sequencing: run parser, then feed its value to to_parser, which returns the next parser to run. (Alias: andThen.)

-- read a digit n, then read exactly that many letters
repeated = bind(mapP(digitChar(), char_to_digit), \n -> count(n, letterChar()))
parseTest(repeated, "3abc")
-- Output: ['a', 'b', 'c']

Choice & failure

orElse(left, right) → parser

Try left; if it fails, try right. (Alias: orTry.)

yesNo = orElse(string("yes"), string("no"))
parseTest(yesNo, "no")
-- Output: "no"

choice(parsers) → parser

Try each parser in the list, returning the first that matches. (Alias: firstOf.)

keyword = choice([string("if"), string("else"), string("end")])
parseTest(keyword, "else ...")
-- Output: "else"

label(parser, expected_label) → parser

Give a parser a friendly name for error messages. (Alias: describe.)

p = label(digitChar(), "a single digit")
errorBundlePretty(parse(p, "in", "x"))
-- Output: "in:1:1: unexpected input; expected: a single digit"

try(parser) → parser

Provided for megaparsec familiarity; in BernParsec it returns the parser unchanged.

parseTest(try(string("ab")), "abc")
-- Output: "ab"

Repetition

many(parser) → parser

Match zero or more times, collecting a list. (Alias: zeroOrMore.)

parseTest(many(digitChar()), "123abc")
-- Output: ['1', '2', '3']
parseTest(many(digitChar()), "abc")
-- Output: []

some(parser) → parser

Match one or more times. (Alias: oneOrMore.)

parseTest(some(letterChar()), "Bern2")
-- Output: ['B', 'e', 'r', 'n']

count(n, parser) → parser

Match exactly n times. (Alias: repeatExactly.)

parseTest(count(4, digitChar()), "2026!")
-- Output: ['2', '0', '2', '6']

optional(parser) → parser

Match at most once, returning a Maybe (BPJust value or BPNothing). (Alias: optionally.)

parseTest(optional(char('-')), "-5")
-- Output: BPJust('-')
parseTest(optional(char('-')), "5")
-- Output: BPNothing()

sepBy(parser, separator) → parser

Zero or more items separated by separator. (Alias: separatedBy.)

csv = sepBy(decimal(), char(','))
parseTest(csv, "1,2,3")
-- Output: [1, 2, 3]
parseTest(csv, "")
-- Output: []

sepBy1(parser, separator) → parser

One or more items separated by separator. (Alias: separatedBy1.)

parseTest(sepBy1(letterChar(), char('-')), "a-b-c")
-- Output: ['a', 'b', 'c']

manyTill(parser, end_parser) → parser

Repeat parser until end_parser matches; the terminator is consumed. (Alias: repeatUntil.)

comment = thenP(string("--"), manyTill(anySingle(), newline()))
parseTest(comment, "-- a note\nrest")
-- Output: [' ', 'a', ' ', 'n', 'o', 't', 'e']

between(open, close, parser) → parser

Match parser wrapped by open and close, keeping only the inner value. (Alias: surroundedBy.)

quoted = between(char('"'), char('"'), many(noneOf("\"")))
parseTest(quoted, "\"hi\"")
-- Output: ['h', 'i']

Look-around

lookAhead(parser) → parser

Match parser but consume no input - peek at what's next. (Alias: peek.)

parseTest(lookAhead(string("ab")), "abc")
-- Output: "ab"   (the cursor stays at position 0)

notFollowedBy(parser) → parser

Succeed only if parser does not match here; consumes nothing. (Alias: notAhead.)

-- "let" not immediately followed by a letter (so not "letter")
kw = before(string("let"), notFollowedBy(letterChar()))
parseTest(kw, "let x")
-- Output: "let"

Recursion & operators

lazy(thunk) → parser

Defer building a parser until it is needed - the key to recursive grammars, where a parser refers to itself. (Alias: deferred.)

def value() -> orElse(decimal(), between(char('('), char(')'), lazy(value)))
parseTest(value(), "(((7)))")
-- Output: 7

chainl1(term, op, combine) → parser

Parse one or more terms separated by op, folding them left-associatively with combine(left, op_value, right). Ideal for arithmetic. (Alias: chainLeft.)

plus = mapP(char('+'), \_ -> "+")
sum  = chainl1(decimal(), plus, \a, _, b -> a + b)
parseTest(sum, "1+2+3")
-- Output: 6

Lexing & whitespace

lexeme(parser) → parser

Run parser, then skip any trailing whitespace - so tokens don't have to worry about the spaces after them. (Alias: token.)

parseTest(lexeme(decimal()), "42   ")
-- Output: 42   (trailing spaces consumed)

symbol(s) → parser

Match the exact text s as a token, skipping trailing whitespace.

parseTest(sepBy1(symbol("ok"), symbol(",")), "ok , ok , ok")
-- Output: ["ok", "ok", "ok"]

Numbers

decimal() → parser

An unsigned integer (one or more digits). (Aliases: digits, wholeNumber.)

parseTest(decimal(), "123")
-- Output: 123

float() → parser

A number with a fractional part (digits, a dot, digits). (Alias: decimalNumber.)

parseTest(float(), "12.50")
-- Output: 12.5

signed(number_parser) → parser

Allow an optional leading + or - in front of a number parser, negating on -. (Alias: withSign.)

parseTest(signed(float()), "-3.5")
-- Output: -3.5

integer() → parser

A signed integer - signed(decimal()). (Alias: integerNumber.)

parseTest(integer(), "-42")
-- Output: -42

Brackets

Wrappers that parse something inside brackets and skip inner whitespace (built on symbol).

parens(parser) · brackets(parser) · braces(parser) → parser

Parse parser inside ( ), [ ], or { }. (Aliases: inParens, inBrackets, inBraces.)

list = brackets(sepBy(lexeme(decimal()), symbol(",")))
parseTest(list, "[ 1, 2, 3 ]")
-- Output: [1, 2, 3]

Operators Bern 2.1

BernParsec ships a set of infix operators, inspired by Haskell's megaparsec and Applicative, so a grammar reads almost like the thing it describes. They are just the combinators above with symbols, so everything you already know still applies.

Operator	Same as	Meaning
`f <$> p`	`mapP(p, f)`	apply `f` to whatever `p` parses
`pf <*> px`	applicative	sequence two parsers; apply `pf`'s result to `px`'s
`p <* q`	`before(p, q)`	run both in order, keep the left result
`p *> q`	`thenP(p, q)`	run both in order, keep the right result
`p <\|> q`	`orElse(p, q)`	try `p`; if it fails, try `q`
`p >>= f`	`bind(p, f)`	run `p`, then build the next parser from its value
`p <?> "lbl"`	`label(p, "lbl")`	rename `p` for nicer error messages

Precedence mirrors Haskell: <$> <*> <* *> bind tightest, then <|>, then >>=, then <?> (the loosest, so it labels a whole parser). All are left-associative.

A grammar that reads like its shape

import core
import vendor/bernparsec

-- a token: a signed number, trailing spaces eaten
num = lexeme(signed(decimal()))

-- "a price tag like $12, keep the number"
price = char('$') *> num

parseTest(price, "$12")
-- Output: 12

-- "two numbers separated by a comma, in brackets"
pair = (\x, y -> [x, y]) <$> (num <* symbol(",")) <*> num
parseTest(inBrackets(pair), "[ 3, 4 ]")
-- Output: [3, 4]

-- choice + label
keyword = string("if") <|> string("else") <|> string("end")  "a keyword"
parseTest(keyword, "else")
-- Output: "else"

Note: these operators are defined inside vendor/bernparsec with Bern's custom-operator feature, and they reach your file through import like any other definition.

Friendly aliases

Every core combinator has a plain-English alias that forwards to it. Use whichever reads better - or mix both in the same grammar.

Friendly alias	Core name
`literalChar`	`char`
`anyChar`	`anySingle`
`charIn`	`oneOf`
`charNotIn`	`noneOf`
`charWhere`	`satisfy`
`text`	`string`
`endOfInput`	`eof`
`succeed`	`pure`
`failWith`	`fail_parser`
`digit` / `letter`	`digitChar` / `letterChar`
`letterOrDigit`	`alphaNumChar`
`whitespaceChar`	`spaceChar`
`mapResult`	`mapP`
`andThen`	`bind`
`keepRight` / `keepLeft`	`thenP` / `before`
`orTry`	`orElse`
`firstOf`	`choice`
`zeroOrMore` / `oneOrMore`	`many` / `some`
`repeatExactly`	`count`
`optionally`	`optional`
`separatedBy` / `separatedBy1`	`sepBy` / `sepBy1`
`repeatUntil`	`manyTill`
`peek` / `notAhead`	`lookAhead` / `notFollowedBy`
`describe`	`label`
`deferred`	`lazy`
`surroundedBy`	`between`
`chainLeft`	`chainl1`
`token`	`lexeme`
`digits` / `wholeNumber`	`decimal`
`integerNumber`	`integer`
`decimalNumber`	`float`
`withSign`	`signed`
`inParens` / `inBrackets` / `inBraces`	`parens` / `brackets` / `braces`
`run` / `quickParse` / `prettyError`	`parse` / `parseTest` / `errorBundlePretty`

Advanced: ADT form

Every parser can also be expressed as the BPParser ADT (the BP* constructors), which is handy when you want a parser you can pattern-match on or pass around as plain data. parser_from_adt / normalize_parser translate the ADT form into the dictionary form the interpreter runs.

-- a parser described as data, then run
p = parser_from_adt(BPString("hi"))
parseTest(p, "hi there")
-- Output: "hi"

The result, state, and Maybe values have ADT forms too - BPOk value state / BPErr expected message state, BPState input source index line column, and BPJust value / BPNothing. The parseADT, runParserADT, and parseTestADT functions run a parser and hand back the ADT result instead of the dictionary.

parseADT(string("hi"), "src", "hi")
-- Output: BPOk("hi", BPState("hi", "src", 2, 1, 3))

Putting it together

The combinators shine when composed into a real grammar. Here is a small arithmetic evaluator that combines lexeme, parens, lazy, chainl1, and choice to parse and evaluate expressions with correct precedence:

A four-function calculator

import vendor/bernparsec

-- a number, or a parenthesised expression (recursion via lazy)
def factor() -> choice([
    lexeme(signed(float())),
    lexeme(signed(decimal())),
    parens(lazy(expr))
])

-- '*' and '/' bind tighter than '+' and '-'
def mulOp() -> choice([
    mapP(symbol("*"), \_ -> '*'),
    mapP(symbol("/"), \_ -> '/')
])
def term() -> chainl1(factor(), mulOp(), \a, op, b ->
    if op == '*' then a * b else a / b end)

def addOp() -> choice([
    mapP(symbol("+"), \_ -> '+'),
    mapP(symbol("-"), \_ -> '-')
])
def expr() -> chainl1(term(), addOp(), \a, op, b ->
    if op == '+' then a + b else a - b end)

parseTest(expr(), "2 + 3 * 4")
-- Output: 14
parseTest(expr(), "(2 + 3) * 4")
-- Output: 20

And a key–value config line parser that brings together some, between, sepBy, and before:

Parsing "key=value" pairs

import vendor/bernparsec

ident = mapP(some(letterChar()), \cs -> "" <> cs)
val   = mapP(some(noneOf(";")), \cs -> "" <> cs)
pair  = bind(before(ident, symbol("=")), \k ->
            mapP(val, \v -> [k, v]))

config = sepBy(lexeme(pair), symbol(";"))

parseTest(config, "host=localhost; port=8080")
-- Output: [["host", "localhost"], ["port", "8080"]]

vendor/bernparsec Vendor