Spent some time on Parsec recently, and would like to take some notes for personal reflection, which hopefully could be of any use to others.

One simple calculator is a good example to illustrate how Parsec could be used, so we would ignore all kind of details, and only focus on the most essential part.

The intuitive way of writing expr using BNF would be something like below, and most parser generators would accept it and generate the intended parser given some extra rules for precedence and association.

expr = expr + expr
     | expr - expr
     | expr * expr
     | expr / expr
     | term

However, Parsec doesn’t support left recursion; the above rules would cause infinite loop in Parsec. Fortunately, this problem has gained enough attention so that dedicate solutions are provided in Parsec.

Before jumping to the final answer, it’s good to have a high level overview about how Parsec works from users’ perspective. Firstly, we need to provide the definition of the language you want to parse, which includes but not limited to, reserved operations, keywords, comment style, etc. The specific module for this task is Text.Parsec.Language. Secondly, we build lexers from the language definition from first step, using module Text.Parsec.Token. The lexers constructed this way could take care of trailing spaces, which makes parsing much easier to work with. Last, we create the expression parser using buildExpressionParser from module Text.Parsec.Expr. Here we could define all the operations supported, prefixed, postfixed, and infixed, along with the precedence and association.

The complete code is shown below, less than 50 lines.

import Text.Parsec
import Text.Parsec.Language (emptyDef)
import qualified Text.Parsec.Token as P
import Text.Parsec.Expr
import Control.Applicative ((<$>))

lexer =
  P.makeTokenParser emptyDef{
    P.reservedOpNames = ["+", "-", "*", "/"]

reservedOp = P.reservedOp lexer
parens     = P.parens lexer
natural    = P.natural lexer
float      = P.float lexer

expr = buildExpressionParser table term

term = parens expr
    <|> fromIntegral <$> natural
    <|> float

parse_expr = expr

table = [ [prefix "-" negate]
  , [postfix "++" (+1), postfix "--" ((-)1)]
  , [binary "*" (*) AssocLeft, binary "/" (/) AssocLeft]
  , [binary "+" (+) AssocLeft, binary "-" (-) AssocLeft]

prefix name f       = Prefix $ do {reservedOp name; return f}
postfix name f      = Postfix $ do {reservedOp name; return f}
binary name f assoc = Infix (do {reservedOp name; return f}) assoc

p = print . (parse parse_expr "input")

inputs = [
  "1"       , "1.0"     ,
  "-1"      , "-1.0"    ,
  "1++"     , "1.0--"   ,
  "1*2"     , "1/2"     ,
  "1+2"     , "1-2"     ,
  "(1+2)*2" , "(1-2)/2"

main = do
  mapM_ p inputs