Regular Expressions

40 min read Intermediate

Master pattern matching with regular expressions for searching, validating, and transforming text in JavaScript.

RegExp Basics

Regular expressions are patterns used to match character combinations in strings.

JavaScript
// Creating regular expressions
const regex1 = /pattern/;              // Literal notation
const regex2 = new RegExp("pattern");  // Constructor

// Flags
const caseInsensitive = /pattern/i;    // Case insensitive
const globalSearch = /pattern/g;       // Find all matches
const multiLine = /pattern/m;          // Multi-line mode
const combined = /pattern/gim;         // Multiple flags

// Basic testing
const pattern = /hello/i;
console.log(pattern.test("Hello World")); // true
console.log(pattern.test("Hi there"));    // false

// Finding matches
const text = "Hello World, hello universe";
console.log(text.match(/hello/));   // ["hello", index: 13...]
console.log(text.match(/hello/g));  // ["hello"]
console.log(text.match(/hello/gi)); // ["Hello", "hello"]

// exec() - more detail
const regex = /hello/gi;
let match;
while ((match = regex.exec(text)) !== null) {
    console.log(`Found "${match[0]}" at index ${match.index}`);
}
// Found "Hello" at index 0
// Found "hello" at index 13

// Search position
console.log(text.search(/world/i)); // 6 (index of first match)
console.log(text.search(/xyz/));    // -1 (not found)

// Replace
console.log(text.replace(/hello/gi, "Hi"));
// "Hi World, Hi universe"

Pattern Characters

Special characters and sequences for building patterns.

JavaScript
// ===== Character Classes =====
/\d/    // Any digit (0-9)
/\D/    // Any non-digit
/\w/    // Word character (a-z, A-Z, 0-9, _)
/\W/    // Non-word character
/\s/    // Whitespace (space, tab, newline)
/\S/    // Non-whitespace
/./     // Any character except newline

// Examples
console.log("abc123".match(/\d+/));  // ["123"]
console.log("hello world".match(/\s/)); // [" "]

// ===== Character Sets =====
/[abc]/     // Match a, b, or c
/[a-z]/     // Match any lowercase letter
/[A-Z]/     // Match any uppercase letter
/[0-9]/     // Match any digit
/[a-zA-Z]/  // Match any letter
/[^abc]/    // Match anything EXCEPT a, b, c (negation)

console.log("apple".match(/[aeiou]/g)); // ["a", "e"]
console.log("abc123".match(/[^a-z]/g)); // ["1", "2", "3"]

// ===== Anchors =====
/^hello/    // Start of string
/world$/    // End of string
/\bhello\b/ // Word boundary

console.log(/^hello/.test("hello world")); // true
console.log(/^hello/.test("say hello"));   // false
console.log(/\bcat\b/.test("category"));   // false
console.log(/\bcat\b/.test("my cat"));     // true

// ===== Quantifiers =====
/a*/      // 0 or more a's
/a+/      // 1 or more a's
/a?/      // 0 or 1 a
/a{3}/    // Exactly 3 a's
/a{2,4}/  // 2 to 4 a's
/a{2,}/   // 2 or more a's

console.log("aaa".match(/a{2,}/)); // ["aaa"]
console.log("ab".match(/a+b/));    // ["ab"]

// Greedy vs Lazy
const html = "
content
"; console.log(html.match(/<.+>/)); // ["
content
"] (greedy) console.log(html.match(/<.+?>/)); // ["
"] (lazy, add ?) // ===== Groups ===== /(abc)/ // Capturing group /(?:abc)/ // Non-capturing group /(a|b)/ // Alternation (a or b) const datePattern = /(\d{4})-(\d{2})-(\d{2})/; const dateMatch = "2024-01-15".match(datePattern); console.log(dateMatch[0]); // "2024-01-15" (full match) console.log(dateMatch[1]); // "2024" (group 1) console.log(dateMatch[2]); // "01" (group 2) console.log(dateMatch[3]); // "15" (group 3) // Named groups const namedPattern = /(?\d{4})-(?\d{2})-(?\d{2})/; const namedMatch = "2024-01-15".match(namedPattern); console.log(namedMatch.groups.year); // "2024" console.log(namedMatch.groups.month); // "01"
Escape Special Characters

These characters have special meaning and need escaping with \: . * + ? ^ $ { } [ ] ( ) | \ /

String & RegExp Methods

Use regex with string methods for powerful text manipulation.

JavaScript
// ===== String Methods =====

// match() - find matches
"hello world".match(/o/g); // ["o", "o"]

// matchAll() - iterator of all matches with details
const text = "test1 test2 test3";
const matches = text.matchAll(/test(\d)/g);
for (const match of matches) {
    console.log(match[0], match[1], match.index);
}
// "test1", "1", 0
// "test2", "2", 6
// "test3", "3", 12

// replace() - replace matches
"hello world".replace(/o/g, "0"); // "hell0 w0rld"

// Replace with callback
"hello world".replace(/\w+/g, (word) => {
    return word.toUpperCase();
}); // "HELLO WORLD"

// Replace with capture groups
"John Smith".replace(/(\w+) (\w+)/, "$2, $1"); // "Smith, John"

// Named group replacement
"2024-01-15".replace(
    /(?\d{4})-(?\d{2})-(?\d{2})/,
    "$/$/$"
); // "15/01/2024"

// replaceAll() - replace all (ES2021)
"hello hello".replaceAll(/hello/g, "hi"); // "hi hi"

// split() - split by pattern
"apple, banana,  orange".split(/,\s*/); 
// ["apple", "banana", "orange"]

"one1two2three3".split(/\d/);
// ["one", "two", "three", ""]

// search() - find index
"hello world".search(/world/); // 6


// ===== RegExp Methods =====

// test() - check if matches
/\d+/.test("abc123"); // true
/\d+/.test("abcdef"); // false

// exec() - detailed match info
const regex = /(\w+)@(\w+)/g;
const str = "user@domain other@site";
let result;

while ((result = regex.exec(str)) !== null) {
    console.log({
        full: result[0],
        user: result[1],
        domain: result[2],
        index: result.index
    });
}

Common Patterns

Useful regex patterns for validation and parsing.

JavaScript
// Email (basic)
const emailPattern = /^[\w.-]+@[\w.-]+\.\w{2,}$/;
emailPattern.test("user@example.com"); // true

// URL
const urlPattern = /^https?:\/\/[\w.-]+(?:\/[\w.-]*)*\/?$/;
urlPattern.test("https://example.com/path"); // true

// Phone number (various formats)
const phonePattern = /^[\d\s\-+()]{10,}$/;
phonePattern.test("+1 (555) 123-4567"); // true

// Password (8+ chars, letter, number, special)
const passwordPattern = /^(?=.*[A-Za-z])(?=.*\d)(?=.*[@$!%*#?&])[A-Za-z\d@$!%*#?&]{8,}$/;
passwordPattern.test("Pass@123"); // true

// Date (YYYY-MM-DD)
const datePattern = /^\d{4}-\d{2}-\d{2}$/;

// Time (HH:MM or HH:MM:SS)
const timePattern = /^\d{2}:\d{2}(:\d{2})?$/;

// Hex color
const hexColorPattern = /^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$/;
hexColorPattern.test("#FFF"); // true
hexColorPattern.test("#1a2b3c"); // true

// IP address (IPv4)
const ipPattern = /^(\d{1,3}\.){3}\d{1,3}$/;

// Credit card (basic, just 16 digits)
const cardPattern = /^\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}$/;

// Slug (URL-friendly string)
const slugPattern = /^[a-z0-9]+(?:-[a-z0-9]+)*$/;

// Username (alphanumeric, underscore, 3-16 chars)
const usernamePattern = /^[a-zA-Z0-9_]{3,16}$/;

// Extract hashtags
const text = "Hello #world and #javascript";
const hashtags = text.match(/#\w+/g); // ["#world", "#javascript"]

// Extract mentions
const mentions = "@user1 and @user2".match(/@\w+/g);
// ["@user1", "@user2"]

// Extract URLs from text
const urlExtractor = /https?:\/\/[^\s]+/g;
"Visit https://example.com for more".match(urlExtractor);
// ["https://example.com"]

// Remove HTML tags
const html = "

Hello World

"; html.replace(/<[^>]+>/g, ""); // "Hello World" // Trim whitespace " hello ".replace(/^\s+|\s+$/g, ""); // "hello" // Collapse multiple spaces "hello world".replace(/\s+/g, " "); // "hello world"

Advanced Features

Lookahead, lookbehind, and Unicode support.

JavaScript
// ===== Lookahead =====
// Positive lookahead (?=...) - matches if followed by
/\d+(?=px)/.test("100px"); // true - digit followed by "px"
"100px 50em".match(/\d+(?=px)/g); // ["100"]

// Negative lookahead (?!...) - matches if NOT followed by
/\d+(?!px)/.test("100em"); // true
"100px 50em".match(/\d+(?!px)/g); // ["10", "50"]

// ===== Lookbehind =====
// Positive lookbehind (?<=...) - matches if preceded by
/(?<=\$)\d+/.test("$100"); // true
"$100 €200".match(/(?<=\$)\d+/g); // ["100"]

// Negative lookbehind (?.*?<\/\1>/;
htmlTag.test("
content
"); // true htmlTag.test("
content

"); // false
Testing Regex

Use online tools like regex101.com to build and test regular expressions with explanations of each part.

Named Capture Groups

Named groups make regex more readable and maintainable by letting you reference captured content by name instead of number.

JavaScript - Named Groups
// Named capture groups with (?<name>pattern)
const dateRegex = /(?\d{4})-(?\d{2})-(?\d{2})/;
const match = dateRegex.exec('2024-03-15');

// Access by name
console.log(match.groups.year);  // '2024'
console.log(match.groups.month); // '03'
console.log(match.groups.day);   // '15'

// Destructure groups
const { year, month, day } = match.groups;
console.log(`${month}/${day}/${year}`); // '03/15/2024'

// Named groups in replace
const reformatted = '2024-03-15'.replace(
    /(?\d{4})-(?\d{2})-(?\d{2})/,
    '$/$/$'
);
console.log(reformatted); // '03/15/2024'

// Named backreferences
const duplicateWords = /\b(?\w+)\s+\k\b/gi;
'the the quick brown fox fox'.replace(duplicateWords, '$');
// 'the quick brown fox'

// Complex parsing with named groups
const urlRegex = /^(?https?):\/\/(?[^\/]+)(?\/[^?]*)?(?\?.*)?$/;
const urlMatch = urlRegex.exec('https://example.com/api/users?page=1');

console.log(urlMatch.groups);
// { protocol: 'https', host: 'example.com', path: '/api/users', query: '?page=1' }

// Using with matchAll
const logRegex = /\[(?\w+)\] (?.+)/g;
const logData = '[INFO] Server started\n[ERROR] Connection failed';

for (const match of logData.matchAll(logRegex)) {
    console.log(`${match.groups.level}: ${match.groups.message}`);
}
// INFO: Server started
// ERROR: Connection failed

Lookahead and Lookbehind

Lookaround assertions match positions based on what comes before or after, without including it in the match. They're zero-width assertions.

JavaScript - Lookaround Assertions
// Positive Lookahead (?=pattern) - must be followed by
const beforeDollar = /\d+(?=\$)/g;
'100$ 200€ 300$'.match(beforeDollar); // ['100', '300']

// Negative Lookahead (?!pattern) - must NOT be followed by
const notFollowedByDollar = /\d+(?!\$)/g;
'100$ 200€ 300$'.match(notFollowedByDollar); // ['200']

// Positive Lookbehind (?<=pattern) - must be preceded by
const afterDollar = /(?<=\$)\d+/g;
'$100 €200 $300'.match(afterDollar); // ['100', '300']

// Negative Lookbehind (?

Unicode in Regular Expressions

The u flag enables full Unicode support, allowing you to match emojis, international characters, and use Unicode property escapes.

JavaScript - Unicode Regex
// The 'u' flag enables Unicode mode
const emoji = '👨‍👩‍👧‍👦';  // Family emoji (complex grapheme cluster)

// Without 'u' flag - sees surrogate pairs
/^.$/.test(emoji);   // false (sees multiple "characters")
emoji.length;         // 11 (!) - surrogate pairs + ZWJ

// With 'u' flag - still complex due to ZWJ sequence
/^.$/u.test(emoji);  // false (still multiple code points)

// Simple emojis work better
/^.$/u.test('😀');   // true

// Unicode property escapes (\p{...})
// Match any letter from any language
const anyLetter = /^\p{Letter}+$/u;
anyLetter.test('Hello');    // true
anyLetter.test('Привет');   // true (Russian)
anyLetter.test('你好');      // true (Chinese)
anyLetter.test('123');      // false

// Common Unicode properties
/\p{Emoji}/u.test('🎉');              // true
/\p{Script=Greek}/u.test('Ω');        // true
/\p{Script=Han}/u.test('漢');         // true
/\p{Number}/u.test('⅓');              // true (fraction)
/\p{Punctuation}/u.test('「');         // true (CJK bracket)
/\p{Currency_Symbol}/u.test('€');     // true

// Negated properties (uppercase P)
/\P{Number}/u.test('A');              // true (not a number)

// Match words in any language
const anyWord = /[\p{Letter}\p{Mark}]+/gu;
'Hello мир 世界'.match(anyWord); // ['Hello', 'мир', '世界']

// Email-like pattern supporting international characters
const internationalEmail = /[\p{Letter}\p{Number}.]+@[\p{Letter}\p{Number}.-]+\.[\p{Letter}]{2,}/u;
internationalEmail.test('用户@例子.中国'); // true

// The 'v' flag (ES2024) - extended Unicode sets
// Allows set operations and multi-char matching
const emojiPattern = /[\p{Emoji}--\p{ASCII}]/v;  // Emoji but not ASCII
emojiPattern.test('😀'); // true
emojiPattern.test('1');  // false (ASCII digit matches Emoji but excluded)

// String properties with 'v' flag
const flagEmoji = /^\p{RGI_Emoji_Flag_Sequence}$/v;
flagEmoji.test('🇺🇸'); // true
flagEmoji.test('🇯🇵'); // true
Regex Performance Tips
  • Be specific: Use \d instead of . when matching digits
  • Avoid catastrophic backtracking: Don't nest quantifiers like (a+)+
  • Use non-capturing groups: (?:...) when you don't need the capture
  • Anchor when possible: ^ and $ prevent unnecessary scanning
  • Compile once: Store regex in variables, don't recreate in loops

Summary

Character Classes

\d, \w, \s for digits, words, whitespace

Quantifiers

*, +, ?, {n,m} for repetition

Anchors

^, $, \b for positions

Groups

() for capturing, (?:) for non-capturing

Lookahead/behind

(?=), (?!), (?<=), (?<!)

Flags

g, i, m, s, u for behavior