Regular expressions, commonly referred to as “regex” or “regexp”, are a special kind of text string used for describing a search pattern for string data. Put simply, regular expressions find patterns in strings. Neat huh? This post will cover how to use regular expressions, considerations when using special characters, and character classes.
Regular expressions are both confusing and powerful, and understanding them will definitely aid us in our programming endeavors. The concept of regular expressions originated in 1956 after Stephen Cole Kleene, an American mathematician, created a formal description of regular language using his own mathematical notation called regular sets. Regular expressions are used to power search engines, text editor “find and replace” features, and even lexical analysis. However, we’re going to focus on regular expressions in the context of programming with JavaScript, and as such we’ll be using regular expressions with the string methods search
, replace
, match
, and split
. Additionally, we’re able to use regular expressions with the test
and exec
methods built into the RegExp
object.
Using Regular Expressions
There are two ways to write regular expressions: as a regular expression literal or by using the RegExp
object constructor. The simplest regular expression would consist of a single character and find the first occurrence of that character in a string. Both of the regular expressions in the example below contain the same search pattern.
1 2 3 4 5 6 |
|
Regular expression literals are compiled when the script that invokes them loads, whereas regex objects are compiled at run time. So how do we choose which regex syntax to use? If our regex pattern is going to remain constant then using a regex literal can help improve performance. Conversely, if our regex pattern is going to change or be pulled in from an external source(like user input) then we want to use the RegExp
object consructor.
Special Characters
It’s worth noting that there are some special considerations when it comes to using certain characters with the regular expression literal syntax. Because the regex pattern ends with a forward slash, any forward slash that we want to be part of the search pattern needs to be escaped with a backslash. Additionally, there are 12 characters that have special meanings in regular expressions and as such they also need to be escaped with a backslash if we want to use them as literals in a regex pattern:
?
question mark+
plus sign$
dollar sign*
asterisk/
backslash.
period|
pipe^
caret(
opening parenthesis)
closing parenthesis[
opening square bracket{
opening curly brace
Character Classes/Sets
A character class(also referred to as a set) is a way to search for a specific character or set of characters out of various combinations of characters within a string. Say we wanted to find both American and British spellings of the word ‘grey’ using a regex pattern, we could construct our pattern using character classes as follows:
1 2 3 4 5 6 7 8 |
|
It’s worth noting that the order of characters in a character class doesn’t matter. Additionally, you can invert/negate a character class by inserting a caret after the opening square bracket, resulting in a regex pattern that finds any character that is not in the character class. In the example below, the function noZAfterA
returns an array with the matched regex pattern as its first element. The regex pattern a[^z]
matches the ap
in apple
but it does not match mocha
; this is because there is no character after the ‘a’ for our pattern to match.
1 2 3 4 5 6 7 |
|
Conclusion
Regular expressions ought to be part of every programmer’s toolkit and having a solid understanding of regex patterns and how to use them can empower us to search, filter, and validate(among other things) all manner of string data. In Intermediate Regular Expressions we will dig into more advanced regex patterns that include repeating patterns, shorthand for character classes, and using anchors to mark positions within a string.