vendor/bernparsec Vendor

A tiny parser-combinator library for Bern. Build big parsers by gluing small ones together - the core names follow Haskell's megaparsec.

import vendor/bernparsec

The core idea

A parser is a small recipe for reading text. You build large parsers out of small ones using combinators - functions that take parsers and return a new, bigger parser. A parser is not a function; it is a plain object that describes what to read, and a single interpreter walks that description against the input. That makes parsers cheap to build, easy to print, and composable.

import vendor/bernparsec

greeting = before(string("hello"), eof())   -- "hello" then end-of-input
parseTest(greeting, "hello")
-- Output: "hello"

Every combinator on this page comes in two flavours: the terse core name (used throughout) and a plain-English friendly alias (listed in the aliases table). They are interchangeable - mix whichever reads best.

Running a parser

Running a parser against input produces a result object:

FieldMeaning
result["ok"]true if the parser matched
result["value"]the parsed value (when ok)
result["state"]where parsing stopped - index, line, column
result["expected"]list of what was expected (when it failed)
result["message"]a human-readable error (when it failed)
parse(parser, source_name, input) β†’ result

Run a parser, labelling the input source_name for error messages. (run is the friendly alias.)

r = parse(string("hi"), "greeting.txt", "hi there")
r["ok"]
-- Output: true
r["value"]
-- Output: "hi"
parseTest(parser, input) β†’ value | string

Run a parser and return the parsed value on success, or a pretty error string on failure - perfect for the REPL. (Alias: quickParse.)

parseTest(char('a'), "abc")
-- Output: 'a'
parseTest(char('a'), "xyz")
-- Output: ":1:1: unexpected input; expected: 'a'"
errorBundlePretty(result) β†’ string

Format any result as a one-line source:line:column: message; expected: … string. (Alias: prettyError.)

r = parse(char('a'), "in", "zzz")
errorBundlePretty(r)
-- Output: "in:1:1: unexpected input; expected: 'a'"

Primitive parsers

char(c) β†’ parser

Match exactly the character c.

parseTest(char('a'), "abc")
-- Output: 'a'
string(token) β†’ parser

Match the exact text token.

parseTest(string("let"), "let x = 1")
-- Output: "let"
anySingle() β†’ parser

Match any single character.

parseTest(anySingle(), "Z!")
-- Output: 'Z'
oneOf(chars) β†’ parser

Match any one character that appears in chars.

parseTest(oneOf("+-*/"), "*3")
-- Output: '*'
noneOf(chars) β†’ parser

Match any one character that does not appear in chars.

parseTest(noneOf(" \t\n"), "hi")
-- Output: 'h'
satisfy(pred) Β· satisfy(pred, label) β†’ parser

Match a character for which pred(c) is true. The optional label names it in error messages.

vowel = satisfy(\c -> oneOf("aeiou") != -1, "vowel")
parseTest(satisfy(\c -> c == 'x', "an x"), "xyz")
-- Output: 'x'
eof() β†’ parser

Succeed only at the end of the input (consumes nothing).

parseTest(before(string("ok"), eof()), "ok")
-- Output: "ok"
parseTest(before(string("ok"), eof()), "okay")
-- Output: (error: expected end of input)
pure(value) β†’ parser

Always succeed, yielding value without consuming input. (Alias: succeed.)

parseTest(pure(42), "anything")
-- Output: 42
fail_parser(message) β†’ parser

Always fail with message. (Alias: failWith.)

parseTest(fail_parser("nope"), "abc")
-- Output: ":1:1: nope; expected: "

Character classes

Ready-made satisfy parsers for common classes, each with a sensible error label.

digitChar() β†’ parser

One digit 0–9.

parseTest(digitChar(), "7x")
-- Output: '7'
letterChar() β†’ parser

One letter a–z / A–Z.

parseTest(letterChar(), "Bern")
-- Output: 'B'
alphaNumChar() β†’ parser

One letter or digit.

parseTest(some(alphaNumChar()), "id42 ")
-- Output: ['i', 'd', '4', '2']
spaceChar() Β· newline() Β· tab() β†’ parser

One whitespace character, a newline, or a tab respectively.

parseTest(spaceChar(), " x")
-- Output: ' '
space() Β· space1() β†’ parser

space() matches zero or more whitespace characters; space1() requires at least one.

parseTest(space(), "   hi")
-- Output: [' ', ' ', ' ']

Sequencing & mapping

mapP(parser, mapper) β†’ parser

Run parser, then transform its result with mapper. (Alias: mapResult.)

parseTest(mapP(digitChar(), char_to_digit), "9")
-- Output: 9   (the integer, not the character)
thenP(left, right) β†’ parser

Run both in order, keep the right value. (Alias: keepRight.)

parseTest(thenP(char('$'), decimal()), "$50")
-- Output: 50   (the '$' is consumed but discarded)
before(left, right) β†’ parser

Run both in order, keep the left value. (Alias: keepLeft.)

parseTest(before(decimal(), char('%')), "75%")
-- Output: 75
bind(parser, to_parser) β†’ parser

Result-dependent sequencing: run parser, then feed its value to to_parser, which returns the next parser to run. (Alias: andThen.)

-- read a digit n, then read exactly that many letters
repeated = bind(mapP(digitChar(), char_to_digit), \n -> count(n, letterChar()))
parseTest(repeated, "3abc")
-- Output: ['a', 'b', 'c']

Choice & failure

orElse(left, right) β†’ parser

Try left; if it fails, try right. (Alias: orTry.)

yesNo = orElse(string("yes"), string("no"))
parseTest(yesNo, "no")
-- Output: "no"
choice(parsers) β†’ parser

Try each parser in the list, returning the first that matches. (Alias: firstOf.)

keyword = choice([string("if"), string("else"), string("end")])
parseTest(keyword, "else ...")
-- Output: "else"
label(parser, expected_label) β†’ parser

Give a parser a friendly name for error messages. (Alias: describe.)

p = label(digitChar(), "a single digit")
errorBundlePretty(parse(p, "in", "x"))
-- Output: "in:1:1: unexpected input; expected: a single digit"
try(parser) β†’ parser

Provided for megaparsec familiarity; in BernParsec it returns the parser unchanged.

parseTest(try(string("ab")), "abc")
-- Output: "ab"

Repetition

many(parser) β†’ parser

Match zero or more times, collecting a list. (Alias: zeroOrMore.)

parseTest(many(digitChar()), "123abc")
-- Output: ['1', '2', '3']
parseTest(many(digitChar()), "abc")
-- Output: []
some(parser) β†’ parser

Match one or more times. (Alias: oneOrMore.)

parseTest(some(letterChar()), "Bern2")
-- Output: ['B', 'e', 'r', 'n']
count(n, parser) β†’ parser

Match exactly n times. (Alias: repeatExactly.)

parseTest(count(4, digitChar()), "2026!")
-- Output: ['2', '0', '2', '6']
optional(parser) β†’ parser

Match at most once, returning a Maybe (BPJust value or BPNothing). (Alias: optionally.)

parseTest(optional(char('-')), "-5")
-- Output: BPJust('-')
parseTest(optional(char('-')), "5")
-- Output: BPNothing()
sepBy(parser, separator) β†’ parser

Zero or more items separated by separator. (Alias: separatedBy.)

csv = sepBy(decimal(), char(','))
parseTest(csv, "1,2,3")
-- Output: [1, 2, 3]
parseTest(csv, "")
-- Output: []
sepBy1(parser, separator) β†’ parser

One or more items separated by separator. (Alias: separatedBy1.)

parseTest(sepBy1(letterChar(), char('-')), "a-b-c")
-- Output: ['a', 'b', 'c']
manyTill(parser, end_parser) β†’ parser

Repeat parser until end_parser matches; the terminator is consumed. (Alias: repeatUntil.)

comment = thenP(string("--"), manyTill(anySingle(), newline()))
parseTest(comment, "-- a note\nrest")
-- Output: [' ', 'a', ' ', 'n', 'o', 't', 'e']
between(open, close, parser) β†’ parser

Match parser wrapped by open and close, keeping only the inner value. (Alias: surroundedBy.)

quoted = between(char('"'), char('"'), many(noneOf("\"")))
parseTest(quoted, "\"hi\"")
-- Output: ['h', 'i']

Look-around

lookAhead(parser) β†’ parser

Match parser but consume no input - peek at what's next. (Alias: peek.)

parseTest(lookAhead(string("ab")), "abc")
-- Output: "ab"   (the cursor stays at position 0)
notFollowedBy(parser) β†’ parser

Succeed only if parser does not match here; consumes nothing. (Alias: notAhead.)

-- "let" not immediately followed by a letter (so not "letter")
kw = before(string("let"), notFollowedBy(letterChar()))
parseTest(kw, "let x")
-- Output: "let"

Recursion & operators

lazy(thunk) β†’ parser

Defer building a parser until it is needed - the key to recursive grammars, where a parser refers to itself. (Alias: deferred.)

def value() -> orElse(decimal(), between(char('('), char(')'), lazy(value)))
parseTest(value(), "(((7)))")
-- Output: 7
chainl1(term, op, combine) β†’ parser

Parse one or more terms separated by op, folding them left-associatively with combine(left, op_value, right). Ideal for arithmetic. (Alias: chainLeft.)

plus = mapP(char('+'), \_ -> "+")
sum  = chainl1(decimal(), plus, \a, _, b -> a + b)
parseTest(sum, "1+2+3")
-- Output: 6

Lexing & whitespace

lexeme(parser) β†’ parser

Run parser, then skip any trailing whitespace - so tokens don't have to worry about the spaces after them. (Alias: token.)

parseTest(lexeme(decimal()), "42   ")
-- Output: 42   (trailing spaces consumed)
symbol(s) β†’ parser

Match the exact text s as a token, skipping trailing whitespace.

parseTest(sepBy1(symbol("ok"), symbol(",")), "ok , ok , ok")
-- Output: ["ok", "ok", "ok"]

Numbers

decimal() β†’ parser

An unsigned integer (one or more digits). (Aliases: digits, wholeNumber.)

parseTest(decimal(), "123")
-- Output: 123
float() β†’ parser

A number with a fractional part (digits, a dot, digits). (Alias: decimalNumber.)

parseTest(float(), "12.50")
-- Output: 12.5
signed(number_parser) β†’ parser

Allow an optional leading + or - in front of a number parser, negating on -. (Alias: withSign.)

parseTest(signed(float()), "-3.5")
-- Output: -3.5
integer() β†’ parser

A signed integer - signed(decimal()). (Alias: integerNumber.)

parseTest(integer(), "-42")
-- Output: -42

Brackets

Wrappers that parse something inside brackets and skip inner whitespace (built on symbol).

parens(parser) Β· brackets(parser) Β· braces(parser) β†’ parser

Parse parser inside ( ), [ ], or { }. (Aliases: inParens, inBrackets, inBraces.)

list = brackets(sepBy(lexeme(decimal()), symbol(",")))
parseTest(list, "[ 1, 2, 3 ]")
-- Output: [1, 2, 3]

Operators Bern 2.1

BernParsec ships a set of infix operators, inspired by Haskell's megaparsec and Applicative, so a grammar reads almost like the thing it describes. They are just the combinators above with symbols, so everything you already know still applies.

OperatorSame asMeaning
f <$> pmapP(p, f)apply f to whatever p parses
pf <*> pxapplicativesequence two parsers; apply pf's result to px's
p <* qbefore(p, q)run both in order, keep the left result
p *> qthenP(p, q)run both in order, keep the right result
p <|> qorElse(p, q)try p; if it fails, try q
p >>= fbind(p, f)run p, then build the next parser from its value
p <?> "lbl"label(p, "lbl")rename p for nicer error messages

Precedence mirrors Haskell: <$> <*> <* *> bind tightest, then <|>, then >>=, then <?> (the loosest, so it labels a whole parser). All are left-associative.

A grammar that reads like its shape

import core
import vendor/bernparsec

-- a token: a signed number, trailing spaces eaten
num = lexeme(signed(decimal()))

-- "a price tag like $12, keep the number"
price = char('$') *> num

parseTest(price, "$12")
-- Output: 12

-- "two numbers separated by a comma, in brackets"
pair = (\x, y -> [x, y]) <$> (num <* symbol(",")) <*> num
parseTest(inBrackets(pair), "[ 3, 4 ]")
-- Output: [3, 4]

-- choice + label
keyword = string("if") <|> string("else") <|> string("end")  "a keyword"
parseTest(keyword, "else")
-- Output: "else"
Note: these operators are defined inside vendor/bernparsec with Bern's custom-operator feature, and they reach your file through import like any other definition.

Friendly aliases

Every core combinator has a plain-English alias that forwards to it. Use whichever reads better - or mix both in the same grammar.

Friendly aliasCore name
literalCharchar
anyCharanySingle
charInoneOf
charNotInnoneOf
charWheresatisfy
textstring
endOfInputeof
succeedpure
failWithfail_parser
digit / letterdigitChar / letterChar
letterOrDigitalphaNumChar
whitespaceCharspaceChar
mapResultmapP
andThenbind
keepRight / keepLeftthenP / before
orTryorElse
firstOfchoice
zeroOrMore / oneOrMoremany / some
repeatExactlycount
optionallyoptional
separatedBy / separatedBy1sepBy / sepBy1
repeatUntilmanyTill
peek / notAheadlookAhead / notFollowedBy
describelabel
deferredlazy
surroundedBybetween
chainLeftchainl1
tokenlexeme
digits / wholeNumberdecimal
integerNumberinteger
decimalNumberfloat
withSignsigned
inParens / inBrackets / inBracesparens / brackets / braces
run / quickParse / prettyErrorparse / parseTest / errorBundlePretty

Advanced: ADT form

Every parser can also be expressed as the BPParser ADT (the BP* constructors), which is handy when you want a parser you can pattern-match on or pass around as plain data. parser_from_adt / normalize_parser translate the ADT form into the dictionary form the interpreter runs.

-- a parser described as data, then run
p = parser_from_adt(BPString("hi"))
parseTest(p, "hi there")
-- Output: "hi"

The result, state, and Maybe values have ADT forms too - BPOk value state / BPErr expected message state, BPState input source index line column, and BPJust value / BPNothing. The parseADT, runParserADT, and parseTestADT functions run a parser and hand back the ADT result instead of the dictionary.

parseADT(string("hi"), "src", "hi")
-- Output: BPOk("hi", BPState("hi", "src", 2, 1, 3))

Putting it together

The combinators shine when composed into a real grammar. Here is a small arithmetic evaluator that combines lexeme, parens, lazy, chainl1, and choice to parse and evaluate expressions with correct precedence:

A four-function calculator

import vendor/bernparsec

-- a number, or a parenthesised expression (recursion via lazy)
def factor() -> choice([
    lexeme(signed(float())),
    lexeme(signed(decimal())),
    parens(lazy(expr))
])

-- '*' and '/' bind tighter than '+' and '-'
def mulOp() -> choice([
    mapP(symbol("*"), \_ -> '*'),
    mapP(symbol("/"), \_ -> '/')
])
def term() -> chainl1(factor(), mulOp(), \a, op, b ->
    if op == '*' then a * b else a / b end)

def addOp() -> choice([
    mapP(symbol("+"), \_ -> '+'),
    mapP(symbol("-"), \_ -> '-')
])
def expr() -> chainl1(term(), addOp(), \a, op, b ->
    if op == '+' then a + b else a - b end)

parseTest(expr(), "2 + 3 * 4")
-- Output: 14
parseTest(expr(), "(2 + 3) * 4")
-- Output: 20

And a key–value config line parser that brings together some, between, sepBy, and before:

Parsing "key=value" pairs

import vendor/bernparsec

ident = mapP(some(letterChar()), \cs -> "" <> cs)
val   = mapP(some(noneOf(";")), \cs -> "" <> cs)
pair  = bind(before(ident, symbol("=")), \k ->
            mapP(val, \v -> [k, v]))

config = sepBy(lexeme(pair), symbol(";"))

parseTest(config, "host=localhost; port=8080")
-- Output: [["host", "localhost"], ["port", "8080"]]