Thursday, March 31, 2011

How to get a regular expression to match items with spaces

The following regular expression works if there is no space character, but if the input has a space character, it doesn't work (it seems like a link but JavaScript doesn't work).

How can I change it?

The regular expression should work for both World and The World.

    Dim makebkz As String
    Dim pattern As String = "\(bkz: ([a-z0-9$&.öışçğü\s]+)\)"
    Dim regex As New Regex(pattern, RegexOptions.IgnoreCase)
    Dim myAnchor As New System.Web.UI.HtmlControls.HtmlAnchor()
    Dim postbackRef As String = Page.GetPostBackEventReference(myAnchor, "$1")
    myAnchor.HRef = postbackRef

    str = regex.Replace(str, "(bkz: <a href=javascript:" & myAnchor.HRef & "><font color=""#CC0000"">$1</font></a> )")
    makebkz = str
    Return makebkz
From stackoverflow
  • Actually, there is a space 'character'. It is a '\s' (minus the quotes).

  • \s will match any whitespace character. Be sure to escape this properly.

  • try making the space character conditional \s* means there can be 0 or many spaces between the bkz: block. You can also use \s? if there are 0 or 1 spaces.

    Dim pattern As String = "\(bkz:\s*([a-z0-9$&.öışçğü\s]+)\)"
    

    Although since your second pattern contains a match for a space you may have to expand your expression to state that the first character in the parenthisized match isn't a space here is a sample for that.

    Dim pattern As String = "\(bkz:\s*([a-z0-9$&.öışçğü][a-z0-9$&.öışçğü\s]+)\)"
    
  • Just to be clear, are you saying that something like (bkz: world) works, but (bkz: the world) does not?

    The regex you currently have will match both (verfied in RegexBuddy), and your capture group should be fine (it should capture world in the first case and the world in the second).

    What is being stored in str after your call to Replace in the case where things aren't working? My guess is that the string you're generating is where the problem is, not the regex itself.

  • I think this line should be changed:

    str = regex.Replace(str, "(bkz: <a href=javascript:" & myAnchor.HRef & "><font color=""#CC0000"">$1</font></a> )")
    

    If I read your code correctly you are including the $1 in the HRef (that's what the Page.GetPostBackEventReference(myAnchor, "$1") is doing), and then replacing it in both the text between the font tags and the href, so your output would be something like:

    (bkz: <a href=javascript:__doPostBack(The World)><font color=""#CC0000"">The World</font></a> )
    

    If you update your replace function to this, it should work:

    str = regex.Replace(str, "(bkz: <a href=""javascript:" & myAnchor.HRef & """><font color=""#CC0000"">$1</font></a> )")
    

    I.e. wrap the javascript call in quotes, and your world will be good.

0 comments:

Post a Comment