Here is the video version for this topic. You can read the written version which comes after the video section.
Asides meta characters which we looked at previously, there are also special characters you can use for regex patterns.
As we saw in meta characters, they are preceded by a backward slash. The whitespace, digit, and so on. But special characters are literal symbol characters, which just have a special meaning.
Some special characters can exist alone, while others cannot--they have to be used with other characters.
Let's look at the special characters we have in regular expressions.
Wildcard Character .
The dot character, is called the Wildcard character. As a wildcard character, it matches ANY character. It matches a letter, a number, a space and even a symbol. The only character it does not match is the newline character.
This special character can exist alone. Let's see an example pattern:
/b.{2}d/
This pattern matches any substring which starts from "b", followed by any character, repeated twice, then followed by d.
A string example:
This pattern matches “b9ad” as 9 and "a" are represented by the wildcard character repeated twice.
“ba d” and “ba$d” are also matched.
If I replace the space in this second match with a newline:
You see that the pattern no longer matches the substring. That's because, like I mentioned earlier, the wildcard character does not match a newline. You can change this default behaviour with the newline flag s as we saw in the flags lesson :
Beginning Character ^
The caret character, also called an anchor, is called the Beginning character. It matches the beginning of a full string if the pattern that follows it can be found at the beginning of the string. Let's look at an example:
/^".{3,}"/
This pattern means, "match the beginning part of a string, if that part starts with a double quote, followed by any character repeated three or more times, followed by a double quote".
String example:
As you can see here, the substring at the beginning is a match, because it has quotes, followed by some characters that matches the wildcard special character, then another quotes.
But if I add a number to the beginning of this string:
You see that it is no longer a match. That's because we used the beginning special character, and the substring that matches our pattern is not at the beginning.
One question you may probably have right now is:
"What is the difference between the Negated Character Class, and the Beginning Special Character? Since they both use the caret symbol?"
I'll explain that difference in the next lesson .
Beginning Substrings on Different Lines
Now, remember when we talked about the multiline flag m in the flags lesson ? This is where we apply it.
Let's say we have two sentences in our string on different lines:
Our pattern has the beginning special character, followed by a character class with "o" and "i" followed by "pple".
Even though we applied a global flag g, you can see that the only match we have is “opple”. “ipple” is not matched even though it begins the next line and matches our specified pattern.
The reason for this is that by default, when using the beginning special character, the regular expression matches the whole string. It doesn't check if there's a new line in between strings. It matches every line together as one string.
But watch what happens when I apply the multiline flag m:
By applying this flag, you see that “ipple” is now matched which begins the string in the second line.
The multiline flag changes the default behaviour by making the beginning special character match the beginning of each line, not just everything together as one string.
Ending Character $
The dollar character, also called an anchor, is called the Ending Character. When you use this special character in a pattern, it does the opposite of the beginning special character; it matches the end part of a string, if that end part matches our pattern.
It cannot be used alone, you need to use it after a pattern. Let's see an example:
/\s\w{4}$/g
This pattern matches the end part of a string if the end part has a space, followed by a word character repeated four times.
String example:
As you see here, the ending part " ball" is matched because the pattern is at the end of the string.
Now if I add a full stop to the end of the string:
You see that the end part is no longer a match. The string ends with full-stop which our pattern does not specify that the string ends with.
What if we had a string that broke into two lines?
As you can see in this string, only the end part " fish" is a match. Just like the beginning special character, the ending special character, by default, treats the whole string as one string, even though there are line breaks. If we want to have multiple matches for different lines, then we include the multiline flag:
Now we have " ball" and " fish" as matches since they end each line.
Combining the beginning and ending special character allows you to validate a string from the beginning to the end. Useful for things like validating passwords, emails, e.t.c
You can learn more about that in this lesson
Escape Character \
The backward slash character, represents the Escape character. What does "escaping" mean here?
Let's see an example:
Our pattern has a character class with a range of "a" to "z", followed by ".", followed by "com". Which means this should match "a.com", "b.com", "c.com" and so on...right? 🤔
But this regular expression will not work like you expect it to. This pattern will match “a.com”, but it will also match “a8com” or “a$com”:
The reason is that, as we have seen earlier, the period sign is a wildcard special character which matches any character at all. That's why it matches "8" in "a8com" and "$" in "a$com".
So how do we specify that we do not want the special period character, but we want the literal period sign? This is where the escape character comes in.
As the name implies, this character is used to escape a character. It cannot be used by itself. It is used before the character you want to escape.
Here's how we use it:
[a-z]\.com
In this pattern, we have defined that we want the normal "." character, and not the special "." character. When you now apply this on our string:
You see it now matches only "a.com". That's because "8" is not "." and neither is "$".
Escaping Special Characters
The escape special character is useful for escaping characters that by default have a different meaning in patterns. Examples are escaping the:
- beginning special character \^hello
- ending special character hello\$
- and many other special characters.
Escaping the Escape Special Character
You can also use the escape character to escape itself 😂
By default, the backward slash has a special function, which is for escaping characters. But what if you had a string like: “He wrote it as hello\” and you wanted to match the end part of the string ending with hello and backward slash, then your pattern will be:
hello\\$
Here, you have specified that you do not want the special backward slash character, but the normal backward slash character after hello:
In this case, you escaped the backward slash character, which means the dollar sign holds its special function which is for matching the end of strings. If I put something else after the end of the string:
You see it is no longer a match.
Escaping a Meta Character
Also, you can escape a meta character . As we have seen previously, meta characters involve the use of backward slash. We saw the digit meta character written as backward slash d \d, the whitespace character written as backward slash s \s and many others.
What if we wanted to write a pattern that matches the backward slash d in this string: “The author wrote \d”? If we use a pattern like this:
This is a digit meta character which matches a digit so it won't work.
But here, we don't want to match a digit, but we want to match a backward slash d. In this case, we escape the meta character like this:
So this is how you use the escape character for escaping characters that have special meanings when you want the literal character.
There are still more special characters to look at. The remaining special characters we would be looking at are known as quantifier special characters. They work similarly to quantifiers which we have looked at previously.