Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
216 views
in Technique[技术] by (71.8m points)

regex - Handling different escaping sequences?

I'm using ANTLR with Presto grammar in order to parse SQL queries. This is the original string definition I've used to parse queries:

STRING
    : ''' ( '\' .
           | ~[\']       // match anything other than  and '
           | ''''       // match ''
           )*
      '''
    ;

This worked ok for most queries until I saw queries with different escaping rules. For example:

select 
table1(replace(replace(some_col,'\'',''),'"' ,'')) as features 
from table1

So I've modified my String definition and now it looks like:

STRING
    : ''' ( '\' .
           | '\\'  .  {HelperUtils.isNeedSpecialEscaping(this)}?       // match  followed by any char
           | ~[\']       // match anything other than  and '
           | ''''       // match ''
           )*
      '''
    ;

However, this won't work for the query mentioned above as I'm getting

'\'',''),'

as a single string. The predicate returns True for the following query. Any idea how can I handle this query as well?

Thanks, Nir.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In the end I was able to solve it. This is the expression I was using:

STRING
    : ''' ( '\\'  .  {HelperUtils.isNeedSpecialEscaping(this)}?
           | '\' (~[\] | . {!HelperUtils.isNeedSpecialEscaping(this)}?)
           | ~[\']       // match anything other than  and '
           | ''''       // match ''
           )*
      '''
    ;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...