Regular Expressions in C#- Language Elements

Regular Expression Language Elements

Meta Characters

  • . – matches any single character
  • $ – matches the end of a line
  • ^ – matches the beginning of a line
  • * – matches zero or more occurrences of the character immediately preceding
  • \ – this is escape or quoting character. The character after this is treated as an ordinary character
  • [] – matches any one of the characters between the brackets
  • [a1-a9] – ranges of characters can specified by using a hyphen
  • [^a1-a9] – to match any character except those in the range, t
  • () – treat the expressions between ( and ) as a group. Also, saves the characters matched by the expression into temporary holding areas. Up to nine pattern matches can be saved in a single regular expression. They can be referenced as 1 through 9
  • | – or two conditions together
  • + – matches one or more occurrences of the character or regular expression immediately preceding
  • ? – matches 0 or 1 occurrence of the character or regular expression immediately preceding
  • {n} – specifies exactly n matches
  • {n,} – specifies at least n matches
  • {n,m} – specifies at least n, but no more than m, matches
  • *? – specifies the first match that consumes as few repeats as possible (equivalent to lazy *)
  • +? – specifies as few repeats as possible, but at least one (equivalent to lazy +)
  • ?? – specifies zero repeats if possible, or one (lazy ?)
  • {n}? – equivalent to {n} (lazy {n})
  • {n,}? – specifies as few repeats as possible, but at least n (lazy {n,})
  • {n,m}? – specifies as few repeats as possible between n and m (lazy {n,m})

Character Escapes

  • \a –  matches a bell (alarm) \u0007
  • \b – matches a backspace \u0008 if in a [] character class; otherwise,  \b denotes a word boundary (between \w and \W characters) . In a replacement pattern, \b always denotes a backspace
  • \t – matches a tab \u0009
  • \r – matches a carriage return \u000D
  • \v – matches a vertical tab \u000B
  • \f – matches a form feed \u000C
  • \n – matches a new line \u000A
  • \e – matches an escape \u001B
  • 40 – matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number
  • \x20 – matches an ASCII character using hexadecimal representation (exactly two digits)
  • \cC – matches an ASCII control character
  • \u0020 – matches a Unicode character using hexadecimal representation (exactly four digits)

Substitutions

  • $number – substitutes the last substring matched by group number number (decimal)
  • ${name} – substitutes the last substring matched by a (? ) group
  • $$ – substitutes a single “$” literal
  • $& – substitutes a copy of the entire match itself
  • $` – substitutes all the text of the input string before the match
  • $’ – substitutes all the text of the input string after the match
  • $+ – substitutes the last group captured
  • $_ – substitutes the entire input string

Character Classes

  • \p{name} – matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges
  • \P{name} – matches text not included in groups and block ranges specified in {name}
  • \w – matches any word character
  • \W – matches any nonword character
  • \s – matches any white-space character
  • \S – matches any non-white-space character
  • \d – matches any decimal digit
  • \D – matches any nondigit

Atomic Zero-Width Assertions

  • \A – specifies that the match must occur at the beginning of the string (ignores the Multiline option)
  • \Z – specifies that the match must occur at the end of the string or before \n at the end of the string (ignores the Multiline option)
  • \z – specifies that the match must occur at the end of the string (ignores the Multiline option)
  • \G – specifies that the match must occur at the point where the previous match ended. When used with Match.NextMatch(), this ensures that matches are all contiguous
  • \b – specifies that the match must occur on a boundary between \w (alphanumeric) and \W (nonalphanumeric) characters. The match must occur on word boundaries – that is, at the first or last characters in words separated by any nonalphanumeric characters
  • \B – specifies that the match must not occur on a \b boundary

Bibliography

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s