Here is the video version for this topic. You can read the written version which comes after the video section.

Before we move further to start creating advanced regular expressions, I want to quickly explain how a regular expression works.

What you should know is that a regular expression is just a string that defines a pattern. For example:

/name/
/code/
/code/ig

These are just pattern definitions using the literal notation. What actually "applies" these pattern definitions on strings to get matches is a REGEX ENGINE.

A regex engine is responsible for finding AND returning the parts of a string that matches a defined pattern.

So your programming languages like JavaScript, Python, PHP—they all have these regex engines for executing regular expressions.

There are two major types of regex engines—Backtracking regex engines and Finite-state regex engines.

In this course, we're going to focus on Backtracking regex engines as that is what is most common in programming languages.

Here is what happens in when executing regular expressions with backtracking regex engines.

When you apply a regular expression to a string, the engine does a search through the string from left to right. What happens is that, it starts searching from the beginning, until it encounters the first token, or you can say first character in the string that matches the first character in your pattern.

So let's say you have a pattern like /code/ and you have a string like:

She is coming to school to code but I could not come to code

The engine starts searching the string to find a “c”, since that is what begins the pattern. It finds a “c” at position 7 (counting from 0). It considers this a potential match. The engine keeps record of position 7 as its start:

Next up in the pattern is “o”, so the engine checks if “o” comes after the “c” it found in the string. It sees that there's “co”, so things are going well so far.

Next up in the pattern is “d”, so the engine checks if “d” comes after the “co” it has found so far in the string. Ooops…there is no “d”, instead it sees an “m”. This is the engine's cue or signal that this part of the string is not a match for the defined pattern:

Now this is where the backtracking occurs. The engine would backtrack to the position 7 where it found a potential match:

Since it has verified that it wasn't an actual match, the engine would try to look for the next potential match.

So it keeps searching until it finds another “c” which again comes from the pattern. It sees another “c” at position 18. Again, the engine keeps record of this position as the start. It checks if "o" (from the pattern) comes after the "c" it found. Unfortunately, "h" comes after "c", so this it not a valid match:

The engine backtracks to position 18 where it stopped.

The engine continues searching again and finds a "c" at position 27 (after "to "). The engine keeps record of this position again.

Next in the pattern is “o”, the engine checks if “o” comes after the “c” it found. There's an “o”, so we have “co” matching so far. Awesome. From the pattern, we have “d” next, so the engine checks if “d” comes after the “co” it has found so far. Wooo…there's a “d”, so we have “cod” so far.

The engine is like "we're almost there…almost there". Next up in the pattern is “e”, so the engine checks if “e” comes after the “cod” it has found so far. Amazing! there's an “e”, so we have “code” so far, which is an exact match for the pattern:

The engine has found the exact match, and so it stops searching, and returns that match:

Regex/code/
Input
Match

But, there's another match for the code pattern (at the end of the string) and the engine only gets the first match.

As we saw in the flags lesson, the default behaviour of regular expressions is to return ONLY the first match. But now that we want to get all the matches, we use the g flag: /code/g.

Here's what the regex engine does after this flag has been applied.

After it has found the first part of the string that matches the defined pattern, it takes note of the substring and continues searching from there. Without the “g” flag, it would stop searching after the first match.

Now that we have changed the default behaviour, it continues searching for another potential match. It finds the “c” at position 38 (in "could"). It checks if it is followed by “o”. It is. So it checks if it is followed “d” and unfortunately it's not. Backtracks again.

It finds a "c" at position 48 (in "come"). Does "o" come after it, yes. Does "d" come after the "o", no! Backtracks again.

If finds a "c" at position 56 (in "code"). Does "o" come after it, yes. Does "d" come after the "o", yes. Almost there…then it checks if it is followed by “e”, and it actually is. So we have “code” which is an exact match for the pattern:

The engine also takes note of this substring. We have two matches so far:

Regex/code/g
Input
Match

And now the engine comes to the end of the string. If the string had not stopped here, the engine would keep searching for more matches until the end of the string.

So this is how backtracking regex engines work and programming languages like JavaScript, Python, JAVA, PHP and much more use this regex engine implementation in the background for applying regular expressions on strings.

Things can look more advanced than this under the hood if you're working with advanced regular expressions. But what I've shown in this lesson is the basic idea.


It's important to understand this concept as we progress in this course, so I hope you now understand how regular expressions work under the hood.

Let's move forward to learning how to create advanced regular expressions.