Monday, February 7, 2011

What is the proper regular expression for an unescaped backslash before a character?

Let's say I want to represent \q (or any other particular "backslash-escaped character"). That is, I want to match \q but not \\q, since the latter is a backslash-escaped backslash followed by a q. Yet \\\q would match, since it's a backslash-escaped backslash followed by a backslash-escaped q. (Well, it would match the \q at the end, not the \\ at the beginning.)

I know I need a negative lookbehind, but they always tie my head up in knots, especially since the backslashes themselves have to be escaped in the regexp.

  • Now You Have Two Problems.

    Just write a simple parser. If the regex ties your head up in knots now, just wait a month.

    Frank Krueger : What's with all the down-modding going on?
  • To clarify for all the others that seem to have trouble understanding the question:

    He's looking for a single character (non-space?) preceded by an ODD number of '\' characters.

  • The best solution to this is to do your own string parsing as Regular Expressions don't really support what you are trying to do. (rep @Frank Krueger if you go this way, I'm just repeating his advice)

    I did however take a shot at a exclusionary regex. This will match all strings that do not fit your criteria of a "\" followed by a character.

    (?:[\\][\\])(?!(([\\](?![\\])[a-zA-Z])))
    
    From Jared
  • Updated: My new and improved regex, supporting more than 3 backslashes:

    /(?<!\\)    # Not preceded by a single backslash
      (?>\\\\)* # an even number of backslashes
      \\q       # Followed by a \q
      /x;

    or if your regex library doesn't support extended syntax.

    /(?<!\\)(?>\\\\)*\\q/

    Output of my test program:

    q does not match
    \q does match
    \\q does not match
    \\\q does match
    \\\\q does not match
    \\\\\q does match

    Older version

    /(?:(?<!\\)|(?<=\\\\))\\q/
    Jared : leon, what language / program was that run in?
    Leon Timmermans : Perl, what else for you use for regular expression ;-)
    Jared : LOL, should have guessed :D
    Leon Timmermans : Just see how broken English that sentence it. It should have been 'what else do you use for regular expressions'
    Jared : Oh Man! I never even picked up on that. ugh, it's going to be a rough day.
  • Leon Timmermans got exactly what I was looking for. I would add one small improvement for those who come here later:

    /(?<!\\)(?:\\\\)*\\q/
    

    The additional ?: at the beginning of the (\\\\) group makes it not saved into any match-data. I can't imagine a scenario where I'd want the text of that saved.

    Leon Timmermans : True. If you want it even better, you could do /(?\\\\)*\\q/ It has a slightly better performance in case of a non-match.
    l0b0 : Thanks James, works a charm. Example: To split a string at the first unescaped colon character in Python: `re.match(r'(.*?(?

0 comments:

Post a Comment