Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.6k views
in Technique[技术] by (71.8m points)

regex - Regular Expression to split by comma + ignores comma within double quotes. VB.NET

I'm trying to parse csv file with VB.NET.

csv files contains value like 0,"1,2,3",4 which splits in 5 instead of 3. There are many examples with other languages in Stockoverflow but I can't implement it in VB.NET. Here is my code so far but it doesn't work...

 Dim t As String() = Regex.Split(str(i), ",(?=([^""]*""[^""]*"")*[^""]*$)")
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Assuming your csv is well-formed (ie no " besides those used to delimit string fields, or besides ones escaped like "), you can split on a comma that's followed by an even number of non-escaped "-marks. (If you're inside a set of "" there's only an odd number left in the line).

Your regex you've tried looks like you're almost there.

The following looks for a comma followed by an even number of any sort of quote marks:

,(?=([^"]*"[^"]*")*[^"]*$)

To modify it to look for an even number of non-escaped quote marks (assuming quote marks are escaped with backslash like "), I replace each [^"] with ([^"\]|\.). This means "match a character that isn't a " and isn't a blackslash, OR match a backslash and the character immediately following it".

,(?=(([^"\]|\.)*"([^"\]|\.)*")*([^"\]|\.)*$)

See it in action here. (The reason the backslash is doubled is I want to match a literal backslash).

Now to get it into vb.net you just need to double all your quote marks:

splitRegex = ",(?=(([^""\]|\.)*""([^""\]|\.)*"")*([^""\]|\.)*$)"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...