Regular Expressions is a sequence of characters that forms a pattern, which is mainly used for search and replace. The purpose of creating a pattern is to match specific strings, so that the developer can extract characters based on conditions and replace certain characters.
RegExp object helps the developers to match the pattern of strings and the properties and methods help us to work with Regular Expressions easily. It is similar to RegExp in JavaScript
Pattern − The Pattern method represents a string that is used to define the regular expression and it should be set before using the regular expression object.
IgnoreCase − A Boolean property that represents if the regular expression should be tested against all possible matches in a string if true or false. If not specified explicitly, IgnoreCase value is set to False.
Global − A Boolean property that represents if the regular expression should be tested against all possible matches in a string. If not specified explicitly, Global value is set to False.
Test(search-string) − The Test method takes a string as its argument and returns True if the regular expression can successfully be matched against the string, otherwise False is returned.
Replace(search-string, replace-string) − The Replace method takes 2 parameters. If the search is successful then it replaces that match with the replace-string, and the new string is returned. If there are no matches then the original search-string is returned.
Execute(search-string) − The Execute method works like Replace, except that it returns a Matches collection object, containing a Match object for each successful match. It doesn't modify the original string.
The Matches collection object is returned as a result of the Execute method. This collection object can contain zero or more Match objects and the properties of this object are read-only.
Count − The Count method represents the number of match objects in the collection.
Item − The Item method enables the match objects to be accessed from matches collections object.
The Match object is contained within the matches collection object. These objects represent the successful match after the search for a string.
FirstIndex − It represents the position within the original string where the match occurred. This index are zero-based which means that the first position in a string is 0.
Length − A value that represents the total length of the matched string.
Value − A value that represents the matched value or text. It is also the default value when accessing the Match object.
The pattern building is similar to PERL. Pattern building is the most important thing while working with Regular Expressions. In this section, we will deal with how to create a pattern based on various factors.
The significance of position matching is to ensure that we place the regular expressions at the correct places.
Symbol | Description |
---|---|
^ | Matches only the beginning of a string. |
$ | Match only the end of a string. |
\b | Matches any word boundary |
\B | Matches any non-word boundary |
Any form of characters such as alphabet, number or special character or even decimal, hexadecimal can be treated as a Literal. Since few of the characters have already got a special meaning within the context of Regular Expression, we need to escape them using escape sequences.
Symbol | Description |
---|---|
Alphanumeric | Matches alphabetical and numerical characters only. |
\n | Matches a new line. |
\[ | Matches [ literal only |
\] | Matches ] literal only |
\( | Matches ( literal only |
\) | Matches ) literal only |
\t | Matches horizontal tab |
\v | Matches vertical tab |
\| | Matches | literal only |
\{ | Matches { literal only |
\} | Matches } literal only |
\\ | Matches \ literal only |
\? | Matches ? literal only |
\* | Matches * literal only |
\+ | Matches + literal only |
\. | Matches . literal only |
\b | Matches any word boundary |
\B | Matches any non-word boundary |
\f | Matches a form feed |
\r | Matches carriage return |
\xxx | Matches the ASCII character of an octal number xxx. |
\xdd | Matches the ASCII character of an hexadecimal number dd. |
\uxxxx | Matches the ASCII character of an UNICODE literal xxxx. |
The character classes are the Pattern formed by customized grouping and enclosed within [ ] braces. If we are expecting a character class that should not be in the list, then we should ignore that particular character class using the negative symbol, which is a cap ^.
Symbol | Description |
---|---|
[xyz] | Match any of the character class enclosed within the character set. |
[^xyz] | Matches any of the character class that are NOT enclosed within the character set. |
. | Matches any character class except \n |
\w | Match any word character class. Equivalent to [a-zA-Z_0-9] |
\W | Match any non-word character class. Equivalent to [^a-zA-Z_0-9] |
\d | Match any digit class. Equivalent to [0-9]. |
\D | Match any non-digit character class. Equivalent to [^0-9]. |
\s | Match any space character class. Equivalent to [ \t\r\n\v\f] |
\S | Match any space character class. Equivalent to [^\t\r\n\v\f] |
Repetition matching allows multiple searches within the regular expression. It also specifies the number of times an element is repeated in a Regular Expression.
Symbol | Description |
---|---|
* | Matches zero or more occurrences of the given regular Expression. Equivalent to {0,}. |
+ | Matches one or more occurrences of the given regular Expression. Equivalent to {1,}. |
? | Matches zero or one occurrences of the given regular Expression. Equivalent to {0,1}. |
{x} | Matches exactly x number of occurrences of the given regular expression. |
{x,} | Match atleast x or more occurrences of the given regular expression. |
{x,y} | Matches x to y number of occurences of the given regular expression. |
Alternation and grouping helps developers to create more complex Regular Expressions in particularly handling intricate clauses within a Regular Expression which gives a great flexibility and control.
Symbol | Description |
---|---|
0 | Grouping a clause to create a clause. "(xy)?(z)" matches "xyz" or "z". |
| | Alternation combines one regular expression clause and then matches any of the individual clauses. "(ij)|(23)|(pq)" matches "ij" or "23" or "pq". |
Given below are a few examples that clearly explain how to build a Regular Expression.
Regular Expression | Description |
---|---|
"^\s*.." and "..\s*$" | Represents that there can be any number of leading and trailing space characters in a single line. |
"((\$\s?)|(#\s?))?" | Represents an optional $ or # sign followed by an optional space. |
"((\d+(\.(\d\d)?)?))" | Represents that at least one digit is present followed by an optional decimals and two digits after decimals. |
The below example checks whether or not the user entered an email id whose format should match such that there is an email id followed by '@' and then followed by domain name.
<!DOCTYPE html> <html> <body> <script language = "vbscript" type = "text/vbscript"> strid = "welcome.user@howcodex.co.us" Set re = New RegExp With re .Pattern = "^[\w-\.]{1,}\@([\da-zA-Z-]{1,}\.){1,}[\da-zA-Z-]{2,3}$" .IgnoreCase = False .Global = False End With ' Test method returns TRUE if a match is found If re.Test( strid ) Then Document.write(strid & " is a valid e-mail address") Else Document.write(strid & " is NOT a valid e-mail address") End If Set re = Nothing </script> </body> </html>