Here is the video version for this topic. You can read the written version which comes after the video section.
Previously, we looked at character classes which allows you to specify a "this or that". Another similar concept in regular expressions is alternation, also called alternating characters.
What is Alternation?
Alternation allows you to specify a "this" or "that" character or group of characters. You create them with the pipe | symbol. For example:
/this|that/g
This pattern will match the characters "this" or "that":
Because we have the g flag , this pattern matches "this" and "that".
How does this differ from character classes?
Alternation vs Character Classes
The difference is:
Character classes only allows you to provide characters: [abc]. It does not allow you to provide other expressions like groups and quantifiers.
For example, let's say you have a character class of the \w meta character, and the + plus quantifier which means 1 OR MORE:
/[\w+]/
Passing plus + in a character class will treat it as a “normal character” and not a quantifier. Every special character in a character class is treated as a normal character.
As you can see above, every character except "$" and "&" are not matched. These are the only characters that are neither \w (word characters) nor + in the character class.
Also if you attempt to have groups in a character class, like this:
/[(ha)]ing/
It also won't work, as the parentheses would be treated as normal bracket characters. So, this character class would mean “open parentheses (” or “h” or “a” or “close parentheses )”.
From the matches, you see "(ing", "aing", and ")ing", because they begin with a character from the character class followed by "ing".
This is how alternation is different.
Unlike character classes, alternation allows you to combine other special characters in your "this or that" pattern.
With alternation, you can provide expressions like characters, groups, quantifiers or any other valid regular expression. For example:
/th(is|at)/
Here we have a group and in it, we use alternation to indicate "is" or "at". So the pattern here means:
"t" followed by "h" followed by "is" or "at":
The orange colors signifies the groups that are captured. Remember you can turn off capturing with a question mark followed by a colon in the group: (?:..)
Let's see another example. Let's say we had a string like:
"I grabbed a refreshing can of Coca-Cola, or as some people call it, Coke, while others prefer to refer to it as Coca Cola"
And you want to match "Coca-Cola", "Coke" and "Coca Cola". Here are few ways you can use alternation for this:
Here we basically said "Coca-Cola" or "Coke" or "Coca Cola". Very easy to read, but we can make this shorter.
Very slightly shorter, but what we're doing here is "Co" followed by:
- "co-Cola" or
- "ke" or
- "ca Cola"
And of course, the groups are captured.
Now let's make this shorter.
Here we say, "Co" followed by:
- "ca" followed by
- "- or |"
- followed by "Cola"
- "ke"
Though shorter, this becomes complex, and readibility becomes difficult. Like I mentioned at the beginning of this course, there are many patterns you can write to achieve the same thing, but always keep reability in mind.
In this pattern, we have a nested group ca(-| ) which includes an alternation between "ca-" and "ca ". This way, we can match Coca-Cola and Coca Cola. While this does the job, I'd rather use our earlier solution:
/Coca-Cola|Coke|Coca Cola/ig
This is easier to read and pretty clear what we're trying to achieve.
Solve this
Try to solve these exercises on your own, and you can share on Twitter and tag me @iamdillion.
Match all filenames with their extensions here.
Match all domains in this string.
Now let's move onto a more interesting concept known as Lookaheads.