Today I learned: How to match a portion of text that spans over multiple lines with a JavaScript regular expression
Posted on
Problem
Let's say you have the following 5 lines long quote by Mark Twain:
Keep away from people who try to belittle your ambitions.
Small people always do that,
but the really great make you feel that you,
too,
can become great.
Now you wish to match, with a regular expression, from people on line 1,
and until really on line 3.
Hint: Spotlight on a very useful set of tokens
The token [^] lets you match any character, including new line.
If you append the * quantifier token to it, it matches what matches [^]
but between zero and an unlimited amount of times.
Beware though, because the token [^] seems to only work in JavaScript.
An alternative to it seems to be [\s\S].
- \smatches any whitespace character
- \Smatches any non-whitespace character
Solution
Using [^]*
/people[^]*.*really/m
This, according to regex101.com goes as follow:
- peoplematches the characters- peopleliterally (case sensitive)
- [^]matches any character, including- newline
- *matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- .matches any character (except for line terminators)
- *matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- reallymatches the characters- reallyliterally (case sensitive)
Global pattern flag:
m modifier: multi line. Causes ^ and $ to match the begin/end of each line
(not only begin/end of string)
Using [\s\S]*
/people[\s\S]*.*really/m
This, according to regex101.com goes as follow:
- peoplematches the characters- peopleliterally (case sensitive)
- \smatches any whitespace character (equivalent to- [\r\n\t\f\v \u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff])
- \Smatches any non-whitespace character (equivalent to- [^\r\n\t\f\v \u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff])
- *matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- .matches any character (except for line terminators)
- *matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- reallymatches the characters- reallyliterally (case sensitive)
Global pattern flag:
m modifier: multi line. Causes ^ and $ to match the begin/end of each line
(not only begin/end of string)