A tiny parser-combinator library for Bern. Build big parsers by gluing small ones together - the core names follow Haskell's megaparsec.
import vendor/bernparsec
A parser is a small recipe for reading text. You build large parsers out of small ones using combinators - functions that take parsers and return a new, bigger parser. A parser is not a function; it is a plain object that describes what to read, and a single interpreter walks that description against the input. That makes parsers cheap to build, easy to print, and composable.
import vendor/bernparsec
greeting = before(string("hello"), eof()) -- "hello" then end-of-input
parseTest(greeting, "hello")
-- Output: "hello"
Every combinator on this page comes in two flavours: the terse core name (used throughout) and a plain-English friendly alias (listed in the aliases table). They are interchangeable - mix whichever reads best.
Running a parser against input produces a result object:
| Field | Meaning |
|---|---|
result["ok"] | true if the parser matched |
result["value"] | the parsed value (when ok) |
result["state"] | where parsing stopped - index, line, column |
result["expected"] | list of what was expected (when it failed) |
result["message"] | a human-readable error (when it failed) |
Run a parser, labelling the input source_name for error messages. (run is the friendly alias.)
r = parse(string("hi"), "greeting.txt", "hi there")
r["ok"]
-- Output: true
r["value"]
-- Output: "hi"
Run a parser and return the parsed value on success, or a pretty error string on failure - perfect for the REPL. (Alias: quickParse.)
parseTest(char('a'), "abc")
-- Output: 'a'
parseTest(char('a'), "xyz")
-- Output: ":1:1: unexpected input; expected: 'a'"
Format any result as a one-line source:line:column: message; expected: β¦ string. (Alias: prettyError.)
r = parse(char('a'), "in", "zzz")
errorBundlePretty(r)
-- Output: "in:1:1: unexpected input; expected: 'a'"
Match exactly the character c.
parseTest(char('a'), "abc")
-- Output: 'a'
Match the exact text token.
parseTest(string("let"), "let x = 1")
-- Output: "let"
Match any single character.
parseTest(anySingle(), "Z!") -- Output: 'Z'
Match any one character that appears in chars.
parseTest(oneOf("+-*/"), "*3")
-- Output: '*'
Match any one character that does not appear in chars.
parseTest(noneOf(" \t\n"), "hi")
-- Output: 'h'
Match a character for which pred(c) is true. The optional label names it in error messages.
vowel = satisfy(\c -> oneOf("aeiou") != -1, "vowel")
parseTest(satisfy(\c -> c == 'x', "an x"), "xyz")
-- Output: 'x'
Succeed only at the end of the input (consumes nothing).
parseTest(before(string("ok"), eof()), "ok")
-- Output: "ok"
parseTest(before(string("ok"), eof()), "okay")
-- Output: (error: expected end of input)
Always succeed, yielding value without consuming input. (Alias: succeed.)
parseTest(pure(42), "anything") -- Output: 42
Always fail with message. (Alias: failWith.)
parseTest(fail_parser("nope"), "abc")
-- Output: ":1:1: nope; expected: "
Ready-made satisfy parsers for common classes, each with a sensible error label.
One digit 0β9.
parseTest(digitChar(), "7x") -- Output: '7'
One letter aβz / AβZ.
parseTest(letterChar(), "Bern") -- Output: 'B'
One letter or digit.
parseTest(some(alphaNumChar()), "id42 ") -- Output: ['i', 'd', '4', '2']
One whitespace character, a newline, or a tab respectively.
parseTest(spaceChar(), " x") -- Output: ' '
space() matches zero or more whitespace characters; space1() requires at least one.
parseTest(space(), " hi") -- Output: [' ', ' ', ' ']
Run parser, then transform its result with mapper. (Alias: mapResult.)
parseTest(mapP(digitChar(), char_to_digit), "9") -- Output: 9 (the integer, not the character)
Run both in order, keep the right value. (Alias: keepRight.)
parseTest(thenP(char('$'), decimal()), "$50")
-- Output: 50 (the '$' is consumed but discarded)
Run both in order, keep the left value. (Alias: keepLeft.)
parseTest(before(decimal(), char('%')), "75%")
-- Output: 75
Result-dependent sequencing: run parser, then feed its value to to_parser, which returns the next parser to run. (Alias: andThen.)
-- read a digit n, then read exactly that many letters repeated = bind(mapP(digitChar(), char_to_digit), \n -> count(n, letterChar())) parseTest(repeated, "3abc") -- Output: ['a', 'b', 'c']
Try left; if it fails, try right. (Alias: orTry.)
yesNo = orElse(string("yes"), string("no"))
parseTest(yesNo, "no")
-- Output: "no"
Try each parser in the list, returning the first that matches. (Alias: firstOf.)
keyword = choice([string("if"), string("else"), string("end")])
parseTest(keyword, "else ...")
-- Output: "else"
Give a parser a friendly name for error messages. (Alias: describe.)
p = label(digitChar(), "a single digit") errorBundlePretty(parse(p, "in", "x")) -- Output: "in:1:1: unexpected input; expected: a single digit"
Provided for megaparsec familiarity; in BernParsec it returns the parser unchanged.
parseTest(try(string("ab")), "abc")
-- Output: "ab"
Match zero or more times, collecting a list. (Alias: zeroOrMore.)
parseTest(many(digitChar()), "123abc") -- Output: ['1', '2', '3'] parseTest(many(digitChar()), "abc") -- Output: []
Match one or more times. (Alias: oneOrMore.)
parseTest(some(letterChar()), "Bern2") -- Output: ['B', 'e', 'r', 'n']
Match exactly n times. (Alias: repeatExactly.)
parseTest(count(4, digitChar()), "2026!") -- Output: ['2', '0', '2', '6']
Match at most once, returning a Maybe (BPJust value or BPNothing). (Alias: optionally.)
parseTest(optional(char('-')), "-5")
-- Output: BPJust('-')
parseTest(optional(char('-')), "5")
-- Output: BPNothing()
Zero or more items separated by separator. (Alias: separatedBy.)
csv = sepBy(decimal(), char(','))
parseTest(csv, "1,2,3")
-- Output: [1, 2, 3]
parseTest(csv, "")
-- Output: []
One or more items separated by separator. (Alias: separatedBy1.)
parseTest(sepBy1(letterChar(), char('-')), "a-b-c")
-- Output: ['a', 'b', 'c']
Repeat parser until end_parser matches; the terminator is consumed. (Alias: repeatUntil.)
comment = thenP(string("--"), manyTill(anySingle(), newline()))
parseTest(comment, "-- a note\nrest")
-- Output: [' ', 'a', ' ', 'n', 'o', 't', 'e']
Match parser wrapped by open and close, keeping only the inner value. (Alias: surroundedBy.)
quoted = between(char('"'), char('"'), many(noneOf("\"")))
parseTest(quoted, "\"hi\"")
-- Output: ['h', 'i']
Match parser but consume no input - peek at what's next. (Alias: peek.)
parseTest(lookAhead(string("ab")), "abc")
-- Output: "ab" (the cursor stays at position 0)
Succeed only if parser does not match here; consumes nothing. (Alias: notAhead.)
-- "let" not immediately followed by a letter (so not "letter")
kw = before(string("let"), notFollowedBy(letterChar()))
parseTest(kw, "let x")
-- Output: "let"
Defer building a parser until it is needed - the key to recursive grammars, where a parser refers to itself. (Alias: deferred.)
def value() -> orElse(decimal(), between(char('('), char(')'), lazy(value)))
parseTest(value(), "(((7)))")
-- Output: 7
Parse one or more terms separated by op, folding them left-associatively with combine(left, op_value, right). Ideal for arithmetic. (Alias: chainLeft.)
plus = mapP(char('+'), \_ -> "+")
sum = chainl1(decimal(), plus, \a, _, b -> a + b)
parseTest(sum, "1+2+3")
-- Output: 6
Run parser, then skip any trailing whitespace - so tokens don't have to worry about the spaces after them. (Alias: token.)
parseTest(lexeme(decimal()), "42 ") -- Output: 42 (trailing spaces consumed)
Match the exact text s as a token, skipping trailing whitespace.
parseTest(sepBy1(symbol("ok"), symbol(",")), "ok , ok , ok")
-- Output: ["ok", "ok", "ok"]
An unsigned integer (one or more digits). (Aliases: digits, wholeNumber.)
parseTest(decimal(), "123") -- Output: 123
A number with a fractional part (digits, a dot, digits). (Alias: decimalNumber.)
parseTest(float(), "12.50") -- Output: 12.5
Allow an optional leading + or - in front of a number parser, negating on -. (Alias: withSign.)
parseTest(signed(float()), "-3.5") -- Output: -3.5
A signed integer - signed(decimal()). (Alias: integerNumber.)
parseTest(integer(), "-42") -- Output: -42
Wrappers that parse something inside brackets and skip inner whitespace (built on symbol).
Parse parser inside ( ), [ ], or { }. (Aliases: inParens, inBrackets, inBraces.)
list = brackets(sepBy(lexeme(decimal()), symbol(",")))
parseTest(list, "[ 1, 2, 3 ]")
-- Output: [1, 2, 3]
BernParsec ships a set of infix operators, inspired by Haskell's megaparsec and Applicative, so a grammar reads almost like the thing it describes. They are just the combinators above with symbols, so everything you already know still applies.
| Operator | Same as | Meaning |
|---|---|---|
f <$> p | mapP(p, f) | apply f to whatever p parses |
pf <*> px | applicative | sequence two parsers; apply pf's result to px's |
p <* q | before(p, q) | run both in order, keep the left result |
p *> q | thenP(p, q) | run both in order, keep the right result |
p <|> q | orElse(p, q) | try p; if it fails, try q |
p >>= f | bind(p, f) | run p, then build the next parser from its value |
p <?> "lbl" | label(p, "lbl") | rename p for nicer error messages |
Precedence mirrors Haskell: <$> <*> <* *> bind tightest, then <|>, then >>=, then <?> (the loosest, so it labels a whole parser). All are left-associative.
import core
import vendor/bernparsec
-- a token: a signed number, trailing spaces eaten
num = lexeme(signed(decimal()))
-- "a price tag like $12, keep the number"
price = char('$') *> num
parseTest(price, "$12")
-- Output: 12
-- "two numbers separated by a comma, in brackets"
pair = (\x, y -> [x, y]) <$> (num <* symbol(",")) <*> num
parseTest(inBrackets(pair), "[ 3, 4 ]")
-- Output: [3, 4]
-- choice + label
keyword = string("if") <|> string("else") <|> string("end") > "a keyword"
parseTest(keyword, "else")
-- Output: "else"
vendor/bernparsec with Bern's custom-operator feature, and they reach your file through import like any other definition.
Every core combinator has a plain-English alias that forwards to it. Use whichever reads better - or mix both in the same grammar.
| Friendly alias | Core name |
|---|---|
literalChar | char |
anyChar | anySingle |
charIn | oneOf |
charNotIn | noneOf |
charWhere | satisfy |
text | string |
endOfInput | eof |
succeed | pure |
failWith | fail_parser |
digit / letter | digitChar / letterChar |
letterOrDigit | alphaNumChar |
whitespaceChar | spaceChar |
mapResult | mapP |
andThen | bind |
keepRight / keepLeft | thenP / before |
orTry | orElse |
firstOf | choice |
zeroOrMore / oneOrMore | many / some |
repeatExactly | count |
optionally | optional |
separatedBy / separatedBy1 | sepBy / sepBy1 |
repeatUntil | manyTill |
peek / notAhead | lookAhead / notFollowedBy |
describe | label |
deferred | lazy |
surroundedBy | between |
chainLeft | chainl1 |
token | lexeme |
digits / wholeNumber | decimal |
integerNumber | integer |
decimalNumber | float |
withSign | signed |
inParens / inBrackets / inBraces | parens / brackets / braces |
run / quickParse / prettyError | parse / parseTest / errorBundlePretty |
Every parser can also be expressed as the BPParser ADT (the BP* constructors), which is handy when you want a parser you can pattern-match on or pass around as plain data. parser_from_adt / normalize_parser translate the ADT form into the dictionary form the interpreter runs.
-- a parser described as data, then run
p = parser_from_adt(BPString("hi"))
parseTest(p, "hi there")
-- Output: "hi"
The result, state, and Maybe values have ADT forms too - BPOk value state / BPErr expected message state, BPState input source index line column, and BPJust value / BPNothing. The parseADT, runParserADT, and parseTestADT functions run a parser and hand back the ADT result instead of the dictionary.
parseADT(string("hi"), "src", "hi")
-- Output: BPOk("hi", BPState("hi", "src", 2, 1, 3))
The combinators shine when composed into a real grammar. Here is a small arithmetic evaluator that combines lexeme, parens, lazy, chainl1, and choice to parse and evaluate expressions with correct precedence:
import vendor/bernparsec
-- a number, or a parenthesised expression (recursion via lazy)
def factor() -> choice([
lexeme(signed(float())),
lexeme(signed(decimal())),
parens(lazy(expr))
])
-- '*' and '/' bind tighter than '+' and '-'
def mulOp() -> choice([
mapP(symbol("*"), \_ -> '*'),
mapP(symbol("/"), \_ -> '/')
])
def term() -> chainl1(factor(), mulOp(), \a, op, b ->
if op == '*' then a * b else a / b end)
def addOp() -> choice([
mapP(symbol("+"), \_ -> '+'),
mapP(symbol("-"), \_ -> '-')
])
def expr() -> chainl1(term(), addOp(), \a, op, b ->
if op == '+' then a + b else a - b end)
parseTest(expr(), "2 + 3 * 4")
-- Output: 14
parseTest(expr(), "(2 + 3) * 4")
-- Output: 20
And a keyβvalue config line parser that brings together some, between, sepBy, and before:
import vendor/bernparsec
ident = mapP(some(letterChar()), \cs -> "" <> cs)
val = mapP(some(noneOf(";")), \cs -> "" <> cs)
pair = bind(before(ident, symbol("=")), \k ->
mapP(val, \v -> [k, v]))
config = sepBy(lexeme(pair), symbol(";"))
parseTest(config, "host=localhost; port=8080")
-- Output: [["host", "localhost"], ["port", "8080"]]