We've looked at backreferences in the previous lesson. And we saw how they can be useful when matching different quotes in a string:
Let's look at another example where backreferences can be very helpful.
Let's say you had this string:
my name is john john and i would like some tea tea please
Here, we have two repeated words: "john" and "tea". If you were to use a programming language to find these repeated words, you would probably have to loop through every word to check.
"Check 'my'. Is it followed by 'my'? No. Check 'name'...and so on"
We can easily do this with backreferences in regular expressions.
First, we have a pattern to match a word:
Our pattern here is "a word boundary followed by word characters repeated one or more times followed by a space".
From the matches, you see all words matched, except "please" as it does not have a space in front of it.
Now, let's look for the repetitions. We have to capture the word as a group as that's the only way we can use backreferences:
Time to use a backreference:
In our pattern, we added \1 after the space.
Now the regex engine captures a word like "my", then it checks if it is followed by a space and followed by "my". For "my", that is not the case.
In the case of "john", the regex engine captures the word, then it checks if it is followed by a space, and followed by "john". That's the case here, and also for "tea".
Instead of matching "john john", we can match only "john " by using a lookahead pattern :
By using the backreference in the lookahead pattern, (?=\1), we now match the word and the space only if the word is repeated coming after it. If I remove the repetition for "john", you see it no longer becomes a match:
Now, using any programming language, you can simply replace the repeated words or do anything you want them 😅
I hope this example was helpful and shows you how useful backreferences can be.