Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
519 views
in Technique[技术] by (71.8m points)

parsing - Generate parser that runs a received parser on the output of another parser and monadically joins the results

given the following type and function, meant to parse a field of a CSV field into a string:

type Parser resultType = ParsecT String () Identity resultType
cell :: Parser String 

I have implemented the following function:

customCell :: String -> Parser res  -> Parser res
customCell typeName subparser = 
  cell
    >>= either (const $ unexpected typeName) 
               return . parse (subparser <* eof) ""

Though I cannot stop thinking that I am not using the Monad concept as much as desired and that eventually there is a better way to merge the result of the inner with the outer parser, specially on what regards its failure.

Does anybody know how could I do so, or is this code what is meant to be done?

PS - I now realised that my type simplification is probably not appropriate and that maybe what I want is to replace the underlying Identity Monad by the Either Monad.... Unfortunately, I do not feel enough acquainted with monad transformers yet.

PS2 - What the hell is the underlying monad good for anyway?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Elaborating on @Daniel Wagner's answer... The way parsers are normally built with Parsec, you start with low-level parsers that parse specific characters (e.g., a plus sign or a digit), and you build parsers on top of them using combinators (like a many1 combinator that turns a parser that reads a single digit into a parser that reads one or more digits, or a monadic parse that parsers "one or more digits" followed by a "plus sign" followed by "one or more digits").

However, each parser, whether it's a low-level digit parser or a higher-level "addition expression" parser, is intended to be applied directly to the same input stream.

What you don't typically do is write a parser that gobbles a chunk of the input stream to produce, say, a String and another parser that parses that String (instead of the original input stream) and try to combine them. This is the kind of "vertical composition" that isn't directly supported by Parsec and looks unnatural and non-monadic.

As pointed out in the comments, there are some situations where vertical composition is the cleanest overall approach (like when you have one language embedded within the components or expressions of another language), but it's not the usual approach taken by a Parsec parser.

The bottom line in your application is that a cell parser that produces only a String is too specialized to be useful. A more useful Parsec framework for CSV files would be:

import Text.Parsec
import Text.Parsec.String

-- | `csv cell` parses a CSV file each of whose elements is parsed by `cell`
csv :: Parser a -> Parser [[a]]
csv cell = many (row cell)

-- | `row cell` parses a newline-terminated row of comma separated
--   `cell`-expressions
row :: Parser a -> Parser [a]
row cell = sepBy cell (char ',') <* char '
'

Now, you can write a custom cell parser that parses positive integers:

customCell :: Parser Int
customCell = read <$> many1 digit

and parse CSV files:

> parse (csv customCell) "" "1,2,3
4,5,6
"
Right [[1,2,3],[4,5,6]]
>

Here, instead of having a cell subparser that explicitly parses a comma-delimited cell into a string to be fed to a different parser, the "cell" is an implicit context in which a supplied cell parser is called to parse the underlying input stream at the appropriate point where one would expect a comma-delimited cell in the middle of a row in the middle of the input stream.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...