Regular Expressions in C#- Advanced Language Elements

Regular Expression Advanced Language Elements

Grouping Constructs

  • ( ) – captures the matched substring (or noncapturing group). Captures using () are numbered automatically based on the order of the opening parenthesis, starting from one. The first capture, capture element number zero, is the text matched by the whole regular expression pattern
  • (?<name> ) – captures the matched substring into a group name or number name. The string used for name must not contain any punctuation and it cannot begin with a number. You can use single quotes instead of angle brackets
  • (?<name1-name2> ) – balancing group definition. Deletes the definition of the previously defined group name2 and stores in group name1 the interval between the previously defined name2 group and the current group. If no group name2 is defined, the match backtracks. Because deleting the last definition of name2 reveals the previous definition of name2, this construct allows the stack of captures for group name2 to be used as a counter for keeping track of nested constructs such as parentheses. In this construct, name1 is optional. You can use single quotes instead of angle brackets
  • (?: ) – noncapturing group
  • (?imnsx-imnsx: ) – applies or disables the specified options within the subexpression. For example, (?i-s: ) turns on case insensitivity and disables single-line mode
  • (?= ) – zero-width positive lookahead assertion. Continues match only if the subexpression matches at this position on the right. For example, \w+(?=\d) matches a word followed by a digit, without matching the digit. This construct does not backtrack
  • (?! ) – zero-width negative lookahead assertion. Continues match only if the subexpression does not match at this position on the right. For example, \b(?!un)\w+\b matches words that do not begin with un
  • (?<= ) -zero-width positive lookbehind assertion. Continues match only if the subexpression matches at this position on the left. For example, (?<=19)99 matches instances of 99 that follow 19. This construct does not backtrack
  • (?<!– ) – zero-width negative lookbehind assertion. Continues match only if the subexpression does not match at the position on the left
  • (?> ) – nonbacktracking subexpression (also known as a “greedy” subexpression). The subexpression is fully matched once, and then does not participate piecemeal in backtracking (That is, the subexpression matches only strings that would be matched by the subexpression alone.)
  • Named captures are numbered sequentially, based on the left-to-right order of the opening parenthesis (like unnamed captures), but numbering of named captures starts after all unnamed captures have been counted.

Backreference Constructs

  • \number – backreference. For example, (\w)\1 finds doubled word characters
  • \k <name> – named backreference. For example, (?\w)\k finds doubled word characters. The expression (?<43>\w)\43 does the same. You can use single quotes instead of angle brackets; for example, \k’char’

Alternation Constructs

  • | – matches any one of the terms separated by the | (vertical bar) character. The leftmost successful match wins
  • (?(expression)yes|no)  – matches the “yes” part if the expression matches at this point; otherwise, matches the “no” part. The “no” part can be omitted. The expression can be any valid subexpression, but it is turned into a zero-width assertion, so this syntax is equivalent to (?(?=expression)yes|no). Note that if the expression is the name of a named group or a capturing group number, the alternation construct is interpreted as a capture test (described in the next row of this table). To avoid confusion in these cases, you can spell out the inside (?=expression) explicitly
  • (?(name)yes|no) – matches the “yes” part if the named capture string has a match; otherwise, matches the “no” part. The “no” part can be omitted. If the given name does not correspond to the name or number of a capturing group used in this expression, the alternation construct is interpreted as an expression test (described in the preceding row of this table)

Miscellaneous Constructs

  • (?imnsx-imnsx) – sets or disables options such as case insensitivity to be turned on or off in the middle of a pattern. Option changes are effective until the end of the enclosing group. See also the information on the grouping construct (?imnsx-imnsx: ), which is a cleaner form
  • (?# ) – inline comment inserted within a regular expression. The comment terminates at the first closing parenthesis character
  • # [to end of line] – X-mode comment. The comment begins at an unescaped # and continues to the end of the line. (Note that the x option or the RegexOptions.IgnorePatternWhitespace enumerated option must be activated for this kind of comment to be recognized.)

Bibliography

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s