What is Python regex and where is it used?

Python regex refers to regular expressions implemented through Python’s built-in re module. It is used to search, match, extract, replace, and validate text based on defined patterns. Common use cases include email validation, log file analysis, data cleaning, text preprocessing for machine learning, and parsing structured or semi-structured text.

What does re.compile() do in Python regex?

re.compile() converts a regex pattern into a compiled regular expression object. This object can be reused multiple times without recompiling the pattern. It improves performance when the same pattern is applied repeatedly and makes code more readable and maintainable in large projects.

What is the difference between re.search() and re.match() in Python?

re.search() scans the entire string and returns the first match it finds anywhere in the text.re.match() attempts to match the pattern only at the beginning of the string.This difference is important because many beginners mistakenly use re.match() when they actually need re.search().

What are greedy and lazy quantifiers in Python regex?

Greedy quantifiers match as much text as possible while still allowing the pattern to succeed. By default, all quantifiers in Python regex are greedy. Lazy quantifiers match as little text as possible and are created by adding a ? after the quantifier. Lazy matching is especially useful when working with HTML or nested text structures.

What is the difference between re.findall() and re.finditer()?

re.findall() returns all matches as a list of strings or tuples, depending on capturing groups.re.finditer() returns an iterator of Match objects, providing access to match positions and detailed metadata. re.finditer() is preferred when working with large texts or when index information is required.

Mastering Python Regex (Regular Expressions): A Step-by-Step Guide

Introduction to Python Regex

What is a Python Regular Expression (Regex)?

A regular expression is a special sequence of characters that defines a search pattern.

Table of Contents

This is like a super-powered “find” tool. Instead of just searching for an exact word like “cat”, a regex lets you describe a pattern of text you’re looking for, such as:

Any word that starts with “c” and ends with “t” (cat, cot, cut, etc.)
All email addresses in a big block of text
All phone numbers in a certain format
Only strings that look like valid dates

In short: Regex = a mini-language for describing patterns in text.

Why are Regular Expressions useful?

Regex is incredibly powerful because it helps you do three main things with text:

Search / Matching
- “Does this string contain a pattern I’m looking for?” Example: Check if a user’s input contains the word “error”.
Data Validation
- “Is this piece of text in the correct format?” Example: Is this a valid email? Is this a strong password? Is this a properly formatted phone number?
Extraction & Manipulation
- “Find and pull out specific pieces of information.” Example: Extract all phone numbers from a long document.
- “Replace parts of text.” Example: Redact all email addresses by replacing them with “[EMAIL REDACTED]”.

Almost every programming language (including Python) supports regex because working with text is so common.

The re Module in Python

Python has a built-in module called re that gives us all the regex superpowers.

To use it, you first need to import it:

import re

import re

That’s it! Once you import re, you can start using regex functions.

Important Tip: Raw Strings (r”pattern”)

In regex, the backslash \ is a very special character. It’s used for things like:

\d → means “any digit”
\s → means “any whitespace”
\w → means “any word character”

But here’s the problem: Python’s normal strings also use \ for escape sequences. For example, \n means “newline”, \t means “tab”.

So if you write a normal string like “\d”, Python thinks you’re trying to make a special character and gets confused.

Solution: Use raw strings by putting an r in front of the quotes:

pattern = r"\d"        # Correct way for regex (raw string)
# NOT this:
pattern = "\d"         # Wrong — Python treats this strangely

pattern = r"\d"        # Correct way for regex (raw string)
# NOT this:
pattern = "\d"         # Wrong — Python treats this strangely

Rule of thumb: Always use raw strings for regex patterns in Python → r”your pattern here”

It prevents Python from misinterpreting your backslashes.

Your First Match: Basic Matching (Literal Characters)

Let’s start with the simplest kind of regex: matching literal characters.

“Literal” means exact text — what you see is what you search for.

We’ll use the most common function: re.search()

re.search(pattern, string) → Looks for the first occurrence of the pattern in the string. → Returns a Match object if found, or None if not found.

Example 1: Searching for the exact word “hello”

import re

text = "Hello world, hello again!"

# Pattern: exact word "hello" (using raw string)
pattern = r"hello"

# Search for the pattern
match = re.search(pattern, text)

if match:
    print("Found it!")
    print("Matched text:", match.group())   # Shows what was matched
    print("Starts at index:", match.start())  # Position where it starts
    print("Ends at index:", match.end())      # Position where it ends
else:
    print("Not found.")

import re

text = "Hello world, hello again!"

# Pattern: exact word "hello" (using raw string)
pattern = r"hello"

# Search for the pattern
match = re.search(pattern, text)

if match:
    print("Found it!")
    print("Matched text:", match.group())   # Shows what was matched
    print("Starts at index:", match.start())  # Position where it starts
    print("Ends at index:", match.end())      # Position where it ends
else:
    print("Not found.")

Output:

Found it!
Matched text: hello
Starts at index: 13
Ends at index: 18

Notice:

It found “hello” (lowercase) in “hello again!”
It ignored “Hello” at the beginning because regex is case-sensitive by default.

Example 2: Case-sensitive vs case-insensitive

text = "Hello world, hello again!"

pattern = r"hello"
match = re.search(pattern, text)
print(match)  # Finds the second "hello"

pattern = r"Hello"
match = re.search(pattern, text)
print(match.group())  # Outputs: Hello (finds the first one)

text = "Hello world, hello again!"

pattern = r"hello"
match = re.search(pattern, text)
print(match)  # Finds the second "hello"

pattern = r"Hello"
match = re.search(pattern, text)
print(match.group())  # Outputs: Hello (finds the first one)

To make it case-insensitive, add a third argument:

pattern = r"hello"
match = re.search(pattern, text, re.IGNORECASE)  # or re.I
if match:
    print(match.group())  # Will find "Hello" or "hello"

pattern = r"hello"
match = re.search(pattern, text, re.IGNORECASE)  # or re.I
if match:
    print(match.group())  # Will find "Hello" or "hello"

Example 3: What if it’s not found?

text = "Hi there!"

pattern = r"python"
match = re.search(pattern, text, re.IGNORECASE)

if match:
    print("Found!")
else:
    print("Not found.")  # This will print

text = "Hi there!"

pattern = r"python"
match = re.search(pattern, text, re.IGNORECASE)

if match:
    print("Found!")
else:
    print("Not found.")  # This will print

Essential Python Regex Functions for Beginners

We’ll go through each function one by one, with clear explanations, code examples you can run yourself, and comparisons so you really understand the differences.

Using `re.search()` to Find Patterns Anywhere in a String

re.search() is your go-to tool when you want to find the first occurrence of a pattern anywhere in a string.

Basic Syntax

import re

match = re.search(pattern, string)

import re

match = re.search(pattern, string)

pattern: Your regex (as a raw string: r”…”)
string: The text you’re searching in
Returns: A Match object if found, or None if not found

Simple Example

import re

text = "The rain in Spain stays mainly in the plain"

# Search for the word "Spain"
match = re.search(r"Spain", text)

if match:
    print("Found!")
    print("Matched text:", match.group())    # Spain
    print("Start position:", match.start())  # 12
    print("End position:", match.end())      # 17
    print("Span:", match.span())             # (12, 17)
else:
    print("Not found")

import re

text = "The rain in Spain stays mainly in the plain"

# Search for the word "Spain"
match = re.search(r"Spain", text)

if match:
    print("Found!")
    print("Matched text:", match.group())    # Spain
    print("Start position:", match.start())  # 12
    print("End position:", match.end())      # 17
    print("Span:", match.span())             # (12, 17)
else:
    print("Not found")

Output:

Found!
Matched text: Spain
Start position: 12
End position: 17
Span: (12, 17)

Using Character Classes with re.search()

Let’s make it more powerful!

text = "Contact: john.doe123@example.com or 555-123-4567"

# Find the first sequence of digits (phone or part of email)
match = re.search(r"\d+", text)

if match:
    print("First number sequence:", match.group())  # 123 (from email)

text = "Contact: john.doe123@example.com or 555-123-4567"

# Find the first sequence of digits (phone or part of email)
match = re.search(r"\d+", text)

if match:
    print("First number sequence:", match.group())  # 123 (from email)

# Find the first word (sequence of word characters)
match = re.search(r"\w+", text)
print(match.group())  # Contact

# Find the first word (sequence of word characters)
match = re.search(r"\w+", text)
print(match.group())  # Contact

# Find the first email-like pattern
match = re.search(r"\w+@\w+\.\w+", text)
if match:
    print("First email:", match.group())  # john.doe123@example.com

# Find the first email-like pattern
match = re.search(r"\w+@\w+\.\w+", text)
if match:
    print("First email:", match.group())  # john.doe123@example.com

Case-Insensitive Search

By default, regex is case-sensitive.

text = "Python is great, python is fun"

match = re.search(r"python", text)
print(match.group())  # python (second one)

# To ignore case:
match = re.search(r"python", text, re.IGNORECASE)  # or re.I
print(match.group())  # Python (finds first occurrence)

text = "Python is great, python is fun"

match = re.search(r"python", text)
print(match.group())  # python (second one)

# To ignore case:
match = re.search(r"python", text, re.IGNORECASE)  # or re.I
print(match.group())  # Python (finds first occurrence)

Using Square Brackets in re.search()

text = "Room 42, Building B-15, Floor 3"

# Find the first sequence of only digits
match = re.search(r"[0-9]+", text)
print(match.group())  # 42

# Find the first sequence with letters and hyphens (like B-15)
match = re.search(r"[A-Za-z-]+", text)
print(match.group())  # Room (first word)

# Find something with mixed letters and numbers (like B-15)
match = re.search(r"[A-Za-z0-9-]+", text)
print(match.group())  # Room (still first)

# Better: find the building code specifically
match = re.search(r"[A-Z]-\d+", text)
print(match.group())  # B-15

text = "Room 42, Building B-15, Floor 3"

# Find the first sequence of only digits
match = re.search(r"[0-9]+", text)
print(match.group())  # 42

# Find the first sequence with letters and hyphens (like B-15)
match = re.search(r"[A-Za-z-]+", text)
print(match.group())  # Room (first word)

# Find something with mixed letters and numbers (like B-15)
match = re.search(r"[A-Za-z0-9-]+", text)
print(match.group())  # Room (still first)

# Better: find the building code specifically
match = re.search(r"[A-Z]-\d+", text)
print(match.group())  # B-15

What If Nothing Is Found?

text = "No numbers here!"

match = re.search(r"\d+", text)

if match:
    print("Found:", match.group())
else:
    print("No digits found")  # This will print

text = "No numbers here!"

match = re.search(r"\d+", text)

if match:
    print("Found:", match.group())
else:
    print("No digits found")  # This will print

Always check if match: before using .group()!

Practical Real-World Examples

# 1. Check if a string contains a valid year (19xx or 20xx)
text = "Copyright 2025 Company Inc."
if re.search(r"\b(19|20)\d{2}\b", text):
    print("Contains a valid year")

# 1. Check if a string contains a valid year (19xx or 20xx)
text = "Copyright 2025 Company Inc."
if re.search(r"\b(19|20)\d{2}\b", text):
    print("Contains a valid year")

# 2. Find the first hashtag
tweet = "Loving #Python and #Regex today!"
match = re.search(r"#\w+", tweet)
if match:
    print("First hashtag:", match.group())  # #Python

# 2. Find the first hashtag
tweet = "Loving #Python and #Regex today!"
match = re.search(r"#\w+", tweet)
if match:
    print("First hashtag:", match.group())  # #Python

# 3. Detect if a password contains at least one special character
password = "MyPass123!"
if re.search(r"[^a-zA-Z0-9]", password):  # or r"\W"
    print("Has special character")

# 3. Detect if a password contains at least one special character
password = "MyPass123!"
if re.search(r"[^a-zA-Z0-9]", password):  # or r"\W"
    print("Has special character")

A high-resolution technical diagram comparing Python's re.match() and re.search() functions. It features a string "Learn Python Today" divided into indexed boxes. The match() section shows a red warning indicator at index 0, labeled "RESULT: None," because the pattern is not at the start. The search() section features a green magnifying glass scanning the string and successfully finding the pattern at index 6, labeled "RESULT: Match Found." — Visualizing the fundamental difference between Python’s re.match() and re.search() methods. While match() is restricted to the start of a string, search() performs a full-string scan to locate patterns anywhere.

Using re.match() for Beginning-of-String Validation

Great job mastering re.search()! Now let’s look at its close cousin: re.match().

re.match() is perfect when you want to check if a pattern matches at the very beginning of the string — and only at the beginning.

This makes it ideal for strict validation where the entire string (or at least the start) must follow a specific format.

Basic Syntax

import re

match = re.match(pattern, string)

import re

match = re.match(pattern, string)

Just like re.search(), it returns a Match object if there’s a match, or None if not.
But it only tries to match at the start of the string.

Key Difference: re.match() vs re.search()

Function	Where it looks for a match	Returns match if pattern is…	Common Use Case
re.search()	Anywhere in the string	Found anywhere	Finding patterns in text
re.match()	Only at the beginning of the string	At the very start	Validating format (e.g., input, codes)

Visual Example

text = "123-456-7890 is my phone number"

# Using re.search() — finds digits anywhere
print(re.search(r"\d{3}", text).group())   # 123 (first three digits)

# Using re.match() — only checks the start
print(re.match(r"\d{3}", text).group())    # 123 (it starts with 123)

# Now change the text
text2 = "Phone: 123-456-7890"

print(re.search(r"\d{3}", text2).group())  # 123 (finds it after "Phone: ")
print(re.match(r"\d{3}", text2))           # None — doesn't start with digits!

text = "123-456-7890 is my phone number"

# Using re.search() — finds digits anywhere
print(re.search(r"\d{3}", text).group())   # 123 (first three digits)

# Using re.match() — only checks the start
print(re.match(r"\d{3}", text).group())    # 123 (it starts with 123)

# Now change the text
text2 = "Phone: 123-456-7890"

print(re.search(r"\d{3}", text2).group())  # 123 (finds it after "Phone: ")
print(re.match(r"\d{3}", text2))           # None — doesn't start with digits!

Strict Validation with re.match()

This is where re.match() really shines.

Example 1: Validate a string that must start with a protocol

urls = [
    "https://example.com",
    "http://site.org",
    "ftp://old.com",
    "www.example.com"   # No protocol!
]

for url in urls:
    if re.match(r"https?://", url):
        print(url, "→ Valid (starts with http or https)")
    elif re.match(r"ftp://", url):
        print(url, "→ Valid FTP")
    else:
        print(url, "→ Missing protocol")

urls = [
    "https://example.com",
    "http://site.org",
    "ftp://old.com",
    "www.example.com"   # No protocol!
]

for url in urls:
    if re.match(r"https?://", url):
        print(url, "→ Valid (starts with http or https)")
    elif re.match(r"ftp://", url):
        print(url, "→ Valid FTP")
    else:
        print(url, "→ Missing protocol")

Example 2: Validate a US phone number format from the start

phones = [
    "555-123-4567",
    "(555) 123-4567",
    "555.123.4567",
    "123-456-7890 extra text"   # Has extra stuff
]

pattern = r"(\(\d{3}\)|\d{3})[-.\s]\d{3}[-.\s]\d{4}"

for phone in phones:
    if re.match(pattern, phone):
        print(phone, "→ Valid format at start")
    else:
        print(phone, "→ Invalid or extra text before")

phones = [
    "555-123-4567",
    "(555) 123-4567",
    "555.123.4567",
    "123-456-7890 extra text"   # Has extra stuff
]

pattern = r"(\(\d{3}\)|\d{3})[-.\s]\d{3}[-.\s]\d{4}"

for phone in phones:
    if re.match(pattern, phone):
        print(phone, "→ Valid format at start")
    else:
        print(phone, "→ Invalid or extra text before")

Wait — the last one fails! That’s good — re.match() ensures nothing comes before the pattern.

But what if the phone number is clean but has trailing text?

"555-123-4567".match(pattern) → Yes  
"555-123-4567 extra" → No (because match only checks start, but pattern doesn't reach end)

"555-123-4567".match(pattern) → Yes  
"555-123-4567 extra" → No (because match only checks start, but pattern doesn't reach end)

Note: When using ^ and $ with no multiline flag, re.match() + ^ is redundant because match() already anchors at start. But adding $ ensures the whole string matches.

Best Practice for Full Validation: Use re.fullmatch() (Python 3.4+) for exact whole-string match:

if re.fullmatch(pattern, phone):
    print("Exact match — no extra text!")

if re.fullmatch(pattern, phone):
    print("Exact match — no extra text!")

Or with anchors:

re.match(r"^\d{3}-\d{3}-\d{4}$", phone)

re.match(r"^\d{3}-\d{3}-\d{4}$", phone)

More Real-World Validation Examples

# 1. Check if a string is a valid Python identifier (starts with letter/underscore)
def is_valid_identifier(s):
    return bool(re.match(r"^[a-zA-Z_]\w*$", s))

print(is_valid_identifier("my_var"))     # True
print(is_valid_identifier("123var"))     # False — starts with digit
print(is_valid_identifier("my-var"))     # False — hyphen not in \w

# 2. Validate hexadecimal color code (must start with # and be exact)
colors = ["#FF0000", "#abc", "#GGG", "FF0000", "#12345G"]

for color in colors:
    if re.match(r"^#[0-9A-Fa-f]{6}$", color) or re.match(r"^#[0-9A-Fa-f]{3}$", color):
        print(color, "→ Valid hex color")
    else:
        print(color, "→ Invalid")

# 1. Check if a string is a valid Python identifier (starts with letter/underscore)
def is_valid_identifier(s):
    return bool(re.match(r"^[a-zA-Z_]\w*$", s))

print(is_valid_identifier("my_var"))     # True
print(is_valid_identifier("123var"))     # False — starts with digit
print(is_valid_identifier("my-var"))     # False — hyphen not in \w

# 2. Validate hexadecimal color code (must start with # and be exact)
colors = ["#FF0000", "#abc", "#GGG", "FF0000", "#12345G"]

for color in colors:
    if re.match(r"^#[0-9A-Fa-f]{6}$", color) or re.match(r"^#[0-9A-Fa-f]{3}$", color):
        print(color, "→ Valid hex color")
    else:
        print(color, "→ Invalid")

re.findall()

This function finds ALL non-overlapping matches and returns them as a list of strings.

Perfect when you want every occurrence.

Example:

text = "My numbers are 123, 456, and 789."

# Find all sequences of digits
pattern = r"\d+"    # \d means digit, + means one or more → we'll learn this soon!

numbers = re.findall(pattern, text)
print(numbers)      # Output: ['123', '456', '789']

text = "My numbers are 123, 456, and 789."

# Find all sequences of digits
pattern = r"\d+"    # \d means digit, + means one or more → we'll learn this soon!

numbers = re.findall(pattern, text)
print(numbers)      # Output: ['123', '456', '789']

Another example:

email_text = "Contact us at support@example.com or sales@company.org"

emails = re.findall(r"\w+@\w+\.\w+", email_text)
print(emails)       # Output: ['support@example.com', 'sales@company.org']

email_text = "Contact us at support@example.com or sales@company.org"

emails = re.findall(r"\w+@\w+\.\w+", email_text)
print(emails)       # Output: ['support@example.com', 'sales@company.org']

If no matches → returns empty list [].

Key Difference:

re.search() → returns first match as a Match object (or None)
re.findall() → returns all matches as a list of strings

re.split()

This splits a string into a list, using the regex pattern as the separator.

Like string.split(), but way more powerful because you can split on complex patterns.

Example:

text = "apple, banana; orange  grape"

# Split on comma, semicolon, or multiple spaces
pattern = r"[,\s;]+"

fruits = re.split(pattern, text)
print(fruits)       
# Output: ['apple', 'banana', 'orange', 'grape']

text = "apple, banana; orange  grape"

# Split on comma, semicolon, or multiple spaces
pattern = r"[,\s;]+"

fruits = re.split(pattern, text)
print(fruits)       
# Output: ['apple', 'banana', 'orange', 'grape']

Note: It removes the separators completely.

Another useful one:

sentence = "Hello!!! How are you???"

words = re.split(r"[!?.]+", sentence)
print(words)        # ['Hello', ' How are you', '']

sentence = "Hello!!! How are you???"

words = re.split(r"[!?.]+", sentence)
print(words)        # ['Hello', ' How are you', '']

You can also limit the number of splits:

re.split(r"\s+", "one two three four", maxsplit=2)
# Output: ['one', 'two', 'three four']

re.split(r"\s+", "one two three four", maxsplit=2)
# Output: ['one', 'two', 'three four']

re.sub()

Short for “substitute” → search and replace using regex.

Replaces all matches with a replacement string.

Syntax:

re.sub(pattern, replacement, string)

re.sub(pattern, replacement, string)

Example:

text = "I have 5 apples and 10 oranges."

# Replace all numbers with the word "FRUIT"
new_text = re.sub(r"\d+", "FRUIT", text)
print(new_text)     
# Output: I have FRUIT apples and FRUIT oranges.

text = "I have 5 apples and 10 oranges."

# Replace all numbers with the word "FRUIT"
new_text = re.sub(r"\d+", "FRUIT", text)
print(new_text)     
# Output: I have FRUIT apples and FRUIT oranges.

Another example – redacting emails:

text = "Email me at john@example.com or jane@site.org"

redacted = re.sub(r"\w+@\w+\.\w+", "[EMAIL REDACTED]", text)
print(redacted)
# Output: Email me at [EMAIL REDACTED] or [EMAIL REDACTED]

text = "Email me at john@example.com or jane@site.org"

redacted = re.sub(r"\w+@\w+\.\w+", "[EMAIL REDACTED]", text)
print(redacted)
# Output: Email me at [EMAIL REDACTED] or [EMAIL REDACTED]

You can also limit the number of replacements:

re.sub(r"\d+", "X", "123 456 789", count=2)
# Output: X X 789

re.sub(r"\d+", "X", "123 456 789", count=2)
# Output: X X 789

re.compile() – Optional but Highly Recommended

If you’re going to use the same pattern multiple times, it’s better to compile it once.

Compiling turns the pattern into a regex object that can be reused → faster and cleaner.

Example:

# Without compile
text1 = "abc123"
text2 = "xyz456"

print(bool(re.search(r"\d+", text1)))
print(bool(re.search(r"\d+", text2)))

# With compile → better if using many times
pattern = re.compile(r"\d+")    # Compile once

print(bool(pattern.search(text1)))
print(bool(pattern.search(text2)))

# Without compile
text1 = "abc123"
text2 = "xyz456"

print(bool(re.search(r"\d+", text1)))
print(bool(re.search(r"\d+", text2)))

# With compile → better if using many times
pattern = re.compile(r"\d+")    # Compile once

print(bool(pattern.search(text1)))
print(bool(pattern.search(text2)))

You can use the compiled pattern with all functions:

pattern = re.compile(r"\d+")

print(pattern.findall("Numbers: 12, 34, 56"))    # ['12', '34', '56']
print(pattern.sub("NUM", "Age: 25 years"))       # Age: NUM years

pattern = re.compile(r"\d+")

print(pattern.findall("Numbers: 12, 34, 56"))    # ['12', '34', '56']
print(pattern.sub("NUM", "Age: 25 years"))       # Age: NUM years

Why use compile()?

Faster when reusing the pattern many times
Cleaner code
You can give the pattern a name (good for complex ones)

Python Regex Syntax: Metacharacters and Special Sequences

A high-resolution, grid-based infographic (1200x1200px) explaining Python Regex Metacharacters. The design features a dark header and six clean, color-coded cards arranged in a 2x3 grid. Each card displays a large, bold metacharacter—Caret (^), Dollar ($), Dot (.), Question Mark (?), Asterisk (*), and Plus (+)—alongside its function name, a brief description, and a code example. The bottom features a "Combined Power" example box showing the pattern ^Py.+n$ and a professional watermark footer for "Emmimal Alexander | emitechlogic.com". — A visual cheat sheet for Python Regex Metacharacters. This guide categorizes fundamental symbols into anchors, wildcards, and quantifiers to illustrate how complex string patterns are constructed.

Metacharacters and Special Sequences — this is where regex goes from “basic search” to “super powerful pattern machine.”

Think of metacharacters as the magic symbols that let you describe flexible patterns instead of exact text.

Let’s go through each one slowly, with lots of examples and code you can try right away.

Common Metacharacters

1. The Dot ( . )

The period . is the wildcard: it matches any single character except a newline (\n).

Examples:

import re

text = "cat hat bat rat sat"

# Pattern: any 3-letter word ending with "at"
pattern = r".at"

matches = re.findall(pattern, text)
print(matches)  
# Output: ['cat', 'hat', 'bat', 'rat', 'sat']

import re

text = "cat hat bat rat sat"

# Pattern: any 3-letter word ending with "at"
pattern = r".at"

matches = re.findall(pattern, text)
print(matches)  
# Output: ['cat', 'hat', 'bat', 'rat', 'sat']

It matches:

cat → “c” is one character
hat → “h”
even if it were “@at” or “3at” — the dot accepts almost anything.

What it does NOT match:

“at” (only 2 characters)
“flat” (that’s 4 characters — dot is only one)

Another example:

text2 = "abc def\nghi"

pattern = r"a.c"    # a, then any char, then c
print(re.findall(pattern, text2))  # ['abc'] — doesn't cross newline

text2 = "abc def\nghi"

pattern = r"a.c"    # a, then any char, then c
print(re.findall(pattern, text2))  # ['abc'] — doesn't cross newline

The dot stops at newline by default.

Key: One dot = exactly one character (anything except \n).

2. Anchors: ^ and $

These don’t match characters — they match positions.

^ → start of the string (or start of each line with special flag)
$ → end of the string (or end of each line)

Example with ^ (start):

lines = ["python", "Python", "java", "PYTHON"]

for line in lines:
    if re.match(r"^python", line, re.IGNORECASE):
        print(line, "→ starts with python")
    if re.match(r"^Python", line):
        print(line, "→ starts exactly with capital P")

lines = ["python", "Python", "java", "PYTHON"]

for line in lines:
    if re.match(r"^python", line, re.IGNORECASE):
        print(line, "→ starts with python")
    if re.match(r"^Python", line):
        print(line, "→ starts exactly with capital P")

Output:

python → starts with python
Python → starts with python
Python → starts exactly with capital P
PYTHON → starts with python

python → starts with python
Python → starts with python
Python → starts exactly with capital P
PYTHON → starts with python

re.match() already checks the start, but ^ is useful with re.search() too.

Example with $ (end):

text = "email: user@example.com\nphone: 123-456-7890\nend"

pattern = r".com$"   # ends with .com
match = re.search(pattern, text)
print(match)  # None — because text ends with "0\nend"

# To match multiline, use re.MULTILINE flag
matches = re.findall(r".com$", text, re.MULTILINE)
print(matches)  # ['example.com']

text = "email: user@example.com\nphone: 123-456-7890\nend"

pattern = r".com$"   # ends with .com
match = re.search(pattern, text)
print(match)  # None — because text ends with "0\nend"

# To match multiline, use re.MULTILINE flag
matches = re.findall(r".com$", text, re.MULTILINE)
print(matches)  # ['example.com']

Super useful for validation:

r”^\d{3}-\d{3}-\d{4}$” → exact phone number format like 123-456-7890

Character Classes in Python: \d, \w, \s, and \b

These are the most useful shorthand character classes in Python’s re module. They act like pre-built sets of characters.

Shorthand	Meaning	Equivalent to	Example Pattern	Matches in “Hello 123_world!”
\d	Any digit (0–9)	[0-9]	r”\d+”	“123”
\D	Any non-digit	[^0-9]	r”\D+”	“Hello “, “_world!”
\w	Any word character (letter, digit, underscore)	[a-zA-Z0-9_]	r”\w+”	“Hello”, “123”, “world”
\W	Any non-word character	[^a-zA-Z0-9_]	r”\W+”	” “, “!”
\s	Any whitespace (space, tab, newline, etc.)	[ \t\n\r\f\v]	r”\s+”	” ” (the space)
\S	Any non-whitespace	[^ \t\n\r\f\v]	r”\S+”	“Hello”, “123_world!”
\b	Word boundary (position between \w and \W)	—	r”\bcat\b”	whole word “cat” only
\B	Non-word boundary	—	r”\Bcat\B”	“cat” inside “scatter”

Practical Examples

import re

text = "Hello 123_world!   email@site.com"

print(re.findall(r"\d+", text))     # ['123']
print(re.findall(r"\w+", text))     # ['Hello', '123_world', 'email', 'site', 'com']
print(re.findall(r"\s+", text))     # [' ', '   ']
print(re.findall(r"\S+", text))     # ['Hello', '123_world!', 'email@site.com']

# Word boundaries in action
text2 = "the cat scattered the scattercat"

print(re.findall(r"cat", text2))          # ['cat', 'cat', 'cat'] (all occurrences)
print(re.findall(r"\bcat\b", text2))      # ['cat'] (only whole word "cat")
print(re.findall(r"\Bcat\B", text2))      # ['cat'] (only inside "scattercat")

import re

text = "Hello 123_world!   email@site.com"

print(re.findall(r"\d+", text))     # ['123']
print(re.findall(r"\w+", text))     # ['Hello', '123_world', 'email', 'site', 'com']
print(re.findall(r"\s+", text))     # [' ', '   ']
print(re.findall(r"\S+", text))     # ['Hello', '123_world!', 'email@site.com']

# Word boundaries in action
text2 = "the cat scattered the scattercat"

print(re.findall(r"cat", text2))          # ['cat', 'cat', 'cat'] (all occurrences)
print(re.findall(r"\bcat\b", text2))      # ['cat'] (only whole word "cat")
print(re.findall(r"\Bcat\B", text2))      # ['cat'] (only inside "scattercat")

Tip: Always use raw strings (r””) with these — backslashes are common!

Using Square Brackets [] for Custom Character Sets

Square brackets let you define your own set of characters to match exactly one character from that set.

Basic Syntax

[abc] → matches a or b or c
[^abc] → matches anything except a, b, or c (negation)

Ranges

You can use hyphens for ranges:

Pattern	Meaning	Example Matches
[a-z]	Any lowercase letter	a, b, …, z
[A-Z]	Any uppercase letter	A, B, …, Z
[0-9]	Any digit	0 through 9
[a-zA-Z]	Any letter (case-insensitive)	a–z or A–Z
[a-z0-9]	Lowercase letter or digit
[a-zA-Z0-9_]	Same as \w

Examples in Code

text = "abc123XYZ!@#"

print(re.findall(r"[aeiou]", text))        # [] (no lowercase vowels)
print(re.findall(r"[a-z]", text))          # ['a', 'b', 'c']
print(re.findall(r"[0-9]", text))          # ['1', '2', '3']
print(re.findall(r"[A-Z]", text))          # ['X', 'Y', 'Z']
print(re.findall(r"[!@#]", text))          # ['!', '@', '#']

# Negation
print(re.findall(r"[^0-9]", text))         # All non-digits: ['a','b','c','X','Y','Z','!','@','#']
print(re.findall(r"[^a-zA-Z0-9_]", text))   # Same as \W: ['!', '@', '#']

text = "abc123XYZ!@#"

print(re.findall(r"[aeiou]", text))        # [] (no lowercase vowels)
print(re.findall(r"[a-z]", text))          # ['a', 'b', 'c']
print(re.findall(r"[0-9]", text))          # ['1', '2', '3']
print(re.findall(r"[A-Z]", text))          # ['X', 'Y', 'Z']
print(re.findall(r"[!@#]", text))          # ['!', '@', '#']

# Negation
print(re.findall(r"[^0-9]", text))         # All non-digits: ['a','b','c','X','Y','Z','!','@','#']
print(re.findall(r"[^a-zA-Z0-9_]", text))   # Same as \W: ['!', '@', '#']

Real-World Uses

# Extract all vowels (case-insensitive)
re.findall(r"[aeiouAEIOU]", "Education")

# Match hexadecimal digits
re.findall(r"[0-9A-Fa-f]+", "Color: #FF8800 and A1B2C3")

# Match only letters (no numbers or symbols)
re.findall(r"[a-zA-Z]+", "Room 42 has Wi-Fi!")
# → ['Room', 'has', 'Wi', 'Fi']

# Extract all vowels (case-insensitive)
re.findall(r"[aeiouAEIOU]", "Education")

# Match hexadecimal digits
re.findall(r"[0-9A-Fa-f]+", "Color: #FF8800 and A1B2C3")

# Match only letters (no numbers or symbols)
re.findall(r"[a-zA-Z]+", "Room 42 has Wi-Fi!")
# → ['Room', 'has', 'Wi', 'Fi']

Quick Mix-and-Match

You can combine literals, ranges, and special chars:

# Match valid characters in a username (letters, digits, hyphen, underscore)
pattern = r"[a-zA-Z0-9_-]+"

# Match valid characters in a username (letters, digits, hyphen, underscore)
pattern = r"[a-zA-Z0-9_-]+"

Mini Practice (Try These!)

Extract all lowercase letters from “Hello World 123!”
Find all sequences of exactly 3 digits: “12 123 1234 56789” → only “123”
Match punctuation marks only: “, . ! ? ; :”
Find whole words consisting only of uppercase letters in “NASA and USA are acronyms”

You’ve got the tools to match almost any character pattern now!

Advanced Python Regex Techniques for 2026

Quantifiers (Repetitions)

You’ve mastered the building blocks — now we’re adding quantifiers, the tools that let you say “how many times” something should appear.

Quantifiers come right after a character, class, or group, and control repetition.

A high-quality, grid-based infographic (1200x1450px) explaining Python Regex Quantifiers. The design features a dark navy header and six color-coded cards arranged in a 2x3 grid. Each card displays a bold quantifier symbol—Asterisk (*), Plus (+), Question Mark (?), and curly brace variations ({n}, {n,m}, {n,})—using oversized, high-contrast typography. Each card includes the quantifier's logic name, a brief functional description, and a clear string-matching example. The bottom includes a "Real-World Pattern" box showcasing a phone number regex and a professional watermark footer for "Emmimal Alexander | emitechlogic.com". — Visual guide to Regex Quantifiers in Python. This chart illustrates how different symbols control the repetition of characters in a pattern, ranging from simple wildcards to specific numeric ranges.

Let’s learn them one by one with clear examples.

1. Zero or One: ? (Question Mark)

The ? makes the preceding item optional — it can appear 0 or 1 time.

Classic example: British vs American spelling

import re

text = "color colour coloring coloured"

pattern = r"colou?r"    # 'u' is optional

matches = re.findall(pattern, text)
print(matches)
# Output: ['color', 'colour', 'color', 'colou']  (wait — "colou" from "coloured"?)

import re

text = "color colour coloring coloured"

pattern = r"colou?r"    # 'u' is optional

matches = re.findall(pattern, text)
print(matches)
# Output: ['color', 'colour', 'color', 'colou']  (wait — "colou" from "coloured"?)

Actually, better example:

print(re.findall(r"\bcolou?r\b", text))
# Output: ['color', 'colour']

print(re.findall(r"\bcolou?r\b", text))
# Output: ['color', 'colour']

The \b ensures whole words.

Another great use: optional file extensions

filenames = ["report.pdf", "image.jpg", "document", "script.py"]

for name in filenames:
    if re.search(r"\.pdf?$", name):   # .pdf or .pd (no, wait — better example below)
        print(name)

# Better: optional "s" for plural
re.findall(r"cat?s", "cat cats catss")  # ['cat', 'cats']

filenames = ["report.pdf", "image.jpg", "document", "script.py"]

for name in filenames:
    if re.search(r"\.pdf?$", name):   # .pdf or .pd (no, wait — better example below)
        print(name)

# Better: optional "s" for plural
re.findall(r"cat?s", "cat cats catss")  # ['cat', 'cats']

Key: ? = exactly 0 or 1 of the thing before it.

2. Zero or More: * (Asterisk)

Matches zero or more repetitions of the preceding item.

It will match as many as possible (we’ll talk about “greedy” soon).

Examples:

text = "ab acb aacb aaacb"

pattern = r"a*b"    # zero or more 'a's followed by 'b'

matches = re.findall(pattern, text)
print(matches)  
# Output: ['b', 'ab', 'aab', 'aaab']

text = "ab acb aacb aaacb"

pattern = r"a*b"    # zero or more 'a's followed by 'b'

matches = re.findall(pattern, text)
print(matches)  
# Output: ['b', 'ab', 'aab', 'aaab']

From “ab” → “ab”, From “acb” → “ab” (the ‘c’ stops it) Wait — actually re.findall grabs each match:

Better text:

text = "b ab aab aaab aaaab"

print(re.findall(r"a*b", text))
# ['b', 'ab', 'aab', 'aaab', 'aaaab']

text = "b ab aab aaab aaaab"

print(re.findall(r"a*b", text))
# ['b', 'ab', 'aab', 'aaab', 'aaaab']

Yes! It matches any number of ‘a’s (including zero) before ‘b’.

Another common use: matching spaces

text = "hello     world  python"

print(re.split(r"\s*", text))   # Splits on any number of spaces (including zero)
# But be careful — this can create empty strings!

text = "hello     world  python"

print(re.split(r"\s*", text))   # Splits on any number of spaces (including zero)
# But be careful — this can create empty strings!

More practical:

# Match HTML tags (simple version)
re.findall(r"<.*>", "<p>Hello</p> <b>bold</b>")
# ['<p>Hello</p> <b>bold</b>'] — wait, actually one big match (greedy!)

# Match HTML tags (simple version)
re.findall(r"<.*>", "<p>Hello</p> <b>bold</b>")
# ['<p>Hello</p> <b>bold</b>'] — wait, actually one big match (greedy!)

We’ll fix that with non-greedy later.

3. One or More: + (Plus)

Matches one or more repetitions — must appear at least once.

Example:

text = "b ab aab aaab"

pattern = r"a+b"

print(re.findall(pattern, text))
# ['ab', 'aab', 'aaab'] — no plain "b" because at least one 'a' required

text = "b ab aab aaab"

pattern = r"a+b"

print(re.findall(pattern, text))
# ['ab', 'aab', 'aaab'] — no plain "b" because at least one 'a' required

Super useful for grabbing numbers, words, etc.

text = "Prices: $5, $100, $2500"

prices = re.findall(r"\$\d+", text)
print(prices)  # ['$5', '$100', '$2500']

text = "Prices: $5, $100, $2500"

prices = re.findall(r"\$\d+", text)
print(prices)  # ['$5', '$100', '$2500']

Or email local part:

re.findall(r"\w+@\w+\.\w+", "john.doe@example.com")

re.findall(r"\w+@\w+\.\w+", "john.doe@example.com")

4. Specific Counts: {m,n}

This is the most precise quantifier.

{m} → exactly m times
{m,} → at least m times (no upper limit)
{m,n} → between m and n times (inclusive)

Examples:

# Exactly 4 digits (like a year 0000-9999)
text = "Years: 1999 2025 12345 850"

print(re.findall(r"\b\d{4}\b", text))
# ['1999', '2025']

# At least 3 digits
print(re.findall(r"\d{3,}", text))
# ['1999', '2025', '12345']

# Between 2 and 4 digits
print(re.findall(r"\b\d{2,4}\b", text))
# ['1999', '2025', '85']  (12345 has 5 → not included, 850 has 3 → '850')

# Exactly 4 digits (like a year 0000-9999)
text = "Years: 1999 2025 12345 850"

print(re.findall(r"\b\d{4}\b", text))
# ['1999', '2025']

# At least 3 digits
print(re.findall(r"\d{3,}", text))
# ['1999', '2025', '12345']

# Between 2 and 4 digits
print(re.findall(r"\b\d{2,4}\b", text))
# ['1999', '2025', '85']  (12345 has 5 → not included, 850 has 3 → '850')

Real-world uses:

Phone numbers: \d{3}-\d{3}-\d{4}
ZIP codes: \d{5}(-\d{4})?
Hex colors: #[0-9A-Fa-f]{6} or #[0-9A-Fa-f]{3}

Putting It All Together

# Validate a simple password: 8-16 characters, at least one digit and one letter
pattern = r"^(?=.*[a-zA-Z])(?=.*\d).{8,16}$"

# We'll learn lookaheads later — but this is powerful!

# Validate a simple password: 8-16 characters, at least one digit and one letter
pattern = r"^(?=.*[a-zA-Z])(?=.*\d).{8,16}$"

# We'll learn lookaheads later — but this is powerful!

Or extract repeated patterns:

text = "haha hahaha haaaaha"

print(re.findall(r"ha+", text))     # ['ha', 'haha', 'haaaa']
print(re.findall(r"(ha){2,}", text))  # ['haha', 'haha'] — groups of "ha" repeated

text = "haha hahaha haaaaha"

print(re.findall(r"ha+", text))     # ['ha', 'haha', 'haaaa']
print(re.findall(r"(ha){2,}", text))  # ['haha', 'haha'] — groups of "ha" repeated

5. Greedy vs Non-Greedy (Lazy) Matching

By default, *, +, and {m,n} are greedy — they match as much as possible.

Greedy example (problem!):

text = '<p>Hello <b>world</b></p>'

print(re.findall(r"<.*>", text))
# Output: ['<p>Hello <b>world</b></p>'] — one huge match!
# It grabbed everything from first < to last >

text = '<p>Hello <b>world</b></p>'

print(re.findall(r"<.*>", text))
# Output: ['<p>Hello <b>world</b></p>'] — one huge match!
# It grabbed everything from first < to last >

We wanted separate tags.

Fix: Make it non-greedy with ?

Add ? after the quantifier → matches as little as possible.

print(re.findall(r"<.*?>", text))
# Output: ['<p>', '<b>', '</b>', '</p>'] — perfect!

print(re.findall(r"<.*?>", text))
# Output: ['<p>', '<b>', '</b>', '</p>'] — perfect!

Works with all:

*? → zero or more, lazily
+? → one or more, lazily
{m,n}? → lazily

Another example:

text = "12345"

print(re.search(r"\d{2,4}", text).group())   # '1234' (greedy — takes 4)
print(re.search(r"\d{2,4}?", text).group())  # '12' (lazy — takes minimum 2)

text = "12345"

print(re.search(r"\d{2,4}", text).group())   # '1234' (greedy — takes 4)
print(re.search(r"\d{2,4}?", text).group())  # '12' (lazy — takes minimum 2)

Rule:

Greedy (default): match as much as possible while still allowing overall match
Non-greedy (? after): match as little as possible

Grouping and Alternation

You’ve already learned how to find patterns and control repetitions. Now we’re stepping into one of the most powerful parts of regex: Grouping and Alternation.

This is where regex turns from “find stuff” into “extract and organize specific pieces of information.”

Let’s go step by step, as always!

A high-resolution infographic (1200x1450px) explaining "Grouping" and "Alternation" in Python Regex. The document features a dark header and two large, distinct sections. The top section (blue) explains Grouping using parentheses ( ), showing how (abc)+ repeats the entire sequence. The middle section (pink) explains Alternation using the pipe symbol |, showing how cat|dog functions as an OR operator. A bottom section features a "Combined Logic" example (Learn|Code) Python, and the entire file is branded with a footer watermark for "Emmimal Alexander | emitechlogic.com". — An overview of Regex Grouping and Alternation. This guide demonstrates how to treat multiple characters as a single unit using parentheses and how to create conditional matching patterns using the pipe operator.

1. Grouping with Parentheses (())

Parentheses () do two important things:

Group sub-patterns together so you can apply quantifiers to the whole group.
Capture the matched text inside the group so you can retrieve it later.

A. Grouping to apply quantifiers

Without parentheses, quantifiers only apply to the single character before them.

With parentheses, they apply to the entire group.

Example:

import re

text = "ab abc abab ababab"

# Without grouping
print(re.findall(r"ab+", text))
# ['ab', 'ab', 'abab', 'ababab']  → + only applies to 'b'

# With grouping
print(re.findall(r"(ab)+", text))
# ['ab', 'ab', 'abab', 'abab']  → + applies to the whole "ab"

import re

text = "ab abc abab ababab"

# Without grouping
print(re.findall(r"ab+", text))
# ['ab', 'ab', 'abab', 'ababab']  → + only applies to 'b'

# With grouping
print(re.findall(r"(ab)+", text))
# ['ab', 'ab', 'abab', 'abab']  → + applies to the whole "ab"

See the difference? (ab)+ means “one or more repetitions of the sequence ‘ab'”.

Other examples:

re.findall(r"(ha)+", "ha haha hahaha")     # ['ha', 'ha', 'haha']
re.findall(r"(wo)+", "wo wowowo")          # ['wo', 'wowo']
re.findall(r"(\d{3})-?", "123-456 789")     # ['123', '456']  (optional dash)

re.findall(r"(ha)+", "ha haha hahaha")     # ['ha', 'ha', 'haha']
re.findall(r"(wo)+", "wo wowowo")          # ['wo', 'wowo']
re.findall(r"(\d{3})-?", "123-456 789")     # ['123', '456']  (optional dash)

B. Capturing groups to extract parts

Every time you use (), the matched text inside that group is saved (captured).

You can access them using .group(n) where:

.group(0) or .group() → the entire match
.group(1) → first group
.group(2) → second group, etc.

Real-world example: Extract username and domain from email

text = "Contact: john.doe@example.com"

pattern = r"(\w+)@(\w+\.\w+)"

match = re.search(pattern, text)

if match:
    print("Full match:", match.group(0))   # john.doe@example.com
    print("Username:", match.group(1))     # john.doe
    print("Domain:", match.group(2))       # example.com

text = "Contact: john.doe@example.com"

pattern = r"(\w+)@(\w+\.\w+)"

match = re.search(pattern, text)

if match:
    print("Full match:", match.group(0))   # john.doe@example.com
    print("Username:", match.group(1))     # john.doe
    print("Domain:", match.group(2))       # example.com

You can also use .groups() to get all captured groups as a tuple:

print(match.groups())          # ('john.doe', 'example.com')

print(match.groups())          # ('john.doe', 'example.com')

Or .groupdict() (we’ll see this with named groups soon).

Multiple groups example: Parsing dates

date_text = "Today's date: 2025-12-29"

pattern = r"(\d{4})-(\d{2})-(\d{2})"

match = re.search(pattern, date_text)

if match:
    year = match.group(1)
    month = match.group(2)
    day = match.group(3)
    print(f"Year: {year}, Month: {month}, Day: {day}")
    # Output: Year: 2025, Month: 12, Day: 29

date_text = "Today's date: 2025-12-29"

pattern = r"(\d{4})-(\d{2})-(\d{2})"

match = re.search(pattern, date_text)

if match:
    year = match.group(1)
    month = match.group(2)
    day = match.group(3)
    print(f"Year: {year}, Month: {month}, Day: {day}")
    # Output: Year: 2025, Month: 12, Day: 29

2. Alternation: | (The OR operator)

The pipe means OR — match this or that.

It has low precedence, so it applies to the parts on either side unless grouped.

Simple example:

text = "I like cats and dogs"

pattern = r"cat|dog"

print(re.findall(pattern, text))
# ['cat', 'dog']

text = "I like cats and dogs"

pattern = r"cat|dog"

print(re.findall(pattern, text))
# ['cat', 'dog']

With grouping for more control:

text = "The gray cat and the grey dog"

# Match American OR British spelling
pattern = r"gr(e|a)y"

print(re.findall(pattern, text))
# ['grey', 'gray']  Wait — actually finds 'ey' and 'ay' because of capture

# Better: group the whole word
pattern = r"gr(e|a)y\b"   # \b for whole word

matches = re.findall(pattern, text)
print(matches)  # ['grey', 'gray']

text = "The gray cat and the grey dog"

# Match American OR British spelling
pattern = r"gr(e|a)y"

print(re.findall(pattern, text))
# ['grey', 'gray']  Wait — actually finds 'ey' and 'ay' because of capture

# Better: group the whole word
pattern = r"gr(e|a)y\b"   # \b for whole word

matches = re.findall(pattern, text)
print(matches)  # ['grey', 'gray']

More complex alternation:

# Match different phone number formats
text = "Call me at 123-456-7890 or (123) 456-7890 or 123.456.7890"

pattern = r"\(?(\d{3})\)?[.\s-]?(\d{3})[.\s-]?(\d{4})"

# This captures the three parts regardless of separators
matches = re.findall(pattern, text)
print(matches)
# [('123', '456', '7890'), ('123', '456', '7890'), ('123', '456', '7890')]

# Match different phone number formats
text = "Call me at 123-456-7890 or (123) 456-7890 or 123.456.7890"

pattern = r"\(?(\d{3})\)?[.\s-]?(\d{3})[.\s-]?(\d{4})"

# This captures the three parts regardless of separators
matches = re.findall(pattern, text)
print(matches)
# [('123', '456', '7890'), ('123', '456', '7890'), ('123', '456', '7890')]

Alternation with different lengths:

# Match "http", "https", or "ftp"
pattern = r"https?|ftp"

# https? means http or https (s is optional)
# | ftp adds ftp as another option

re.findall(pattern, "https://site.com ftp://old.com http://example.org")
# ['https', 'ftp', 'http']

# Match "http", "https", or "ftp"
pattern = r"https?|ftp"

# https? means http or https (s is optional)
# | ftp adds ftp as another option

re.findall(pattern, "https://site.com ftp://old.com http://example.org")
# ['https', 'ftp', 'http']

Important: Alternation tries options from left to right and stops at the first one that works.

3. Named Groups (Highly Recommended!)

When you have many groups, remembering group(1), group(2) gets confusing.

Named groups let you give each group a meaningful name.

Syntax: (?P<name>pattern)

Access with .group(‘name’) or .groupdict()

Example: Improved email extraction

pattern = r"(?P<username>\w+)@(?P<domain>\w+\.\w+)"

match = re.search(pattern, "alice@wonderland.org")

if match:
    print(match.group('username'))   # alice
    print(match.group('domain'))     # wonderland.org
    print(match.groupdict())         
    # {'username': 'alice', 'domain': 'wonderland.org'}

pattern = r"(?P<username>\w+)@(?P<domain>\w+\.\w+)"

match = re.search(pattern, "alice@wonderland.org")

if match:
    print(match.group('username'))   # alice
    print(match.group('domain'))     # wonderland.org
    print(match.groupdict())         
    # {'username': 'alice', 'domain': 'wonderland.org'}

Date example with names:

pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"

match = re.search(pattern, "Event: 2025-12-29")

if match:
    data = match.groupdict()
    print(data)
    # {'year': '2025', 'month': '12', 'day': '29'}
    print(f"{data['month']}/{data['day']}/{data['year']}")
    # 12/29/2025

pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"

match = re.search(pattern, "Event: 2025-12-29")

if match:
    data = match.groupdict()
    print(data)
    # {'year': '2025', 'month': '12', 'day': '29'}
    print(f"{data['month']}/{data['day']}/{data['year']}")
    # 12/29/2025

Pro tip: Always use named groups in real projects — your future self will thank you!

Practical Python Regex Examples and Real-World Use Cases

You’ve learned all the core concepts. Now it’s time for the fun part: Practical Real-World Examples and Exercises.

We’ll bring everything together with real-world tasks you’ll encounter in programming, data processing, web scraping, form validation, and more. I’ll show complete code examples, explain the patterns step-by-step, and then give you challenges to try yourself.

Let’s dive in!

How to Validate an Email Address in Python

Regex shines for checking if input is in the correct format.

A. Email Address Validation

A solid (but not overly complex) email pattern:

import re

email_pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"

def is_valid_email(email):
    return bool(re.match(email_pattern, email.strip()))

# Test it
test_emails = [
    "test@example.com",          # Valid
    "john.doe@sub.domain.co.uk", # Valid (multiple parts)
    "bad@email",                 # Invalid (no TLD)
    "noat.com",                  # Invalid
    "@missinglocal.com",         # Invalid
    "spaces @ example.com",      # Invalid
    "good+filter@my-site.org"    # Valid (plus addressing)
]

for email in test_emails:
    print(f"{email:30} → {is_valid_email(email)}")

import re

email_pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"

def is_valid_email(email):
    return bool(re.match(email_pattern, email.strip()))

# Test it
test_emails = [
    "test@example.com",          # Valid
    "john.doe@sub.domain.co.uk", # Valid (multiple parts)
    "bad@email",                 # Invalid (no TLD)
    "noat.com",                  # Invalid
    "@missinglocal.com",         # Invalid
    "spaces @ example.com",      # Invalid
    "good+filter@my-site.org"    # Valid (plus addressing)
]

for email in test_emails:
    print(f"{email:30} → {is_valid_email(email)}")

Explanation of the pattern:

^ → start of string
[a-zA-Z0-9._%+-]+ → one or more allowed local-part characters
@ → literal @
[a-zA-Z0-9.-]+ → domain name
\. → literal dot
[a-zA-Z]{2,}$ → TLD at least 2 letters, end of string

This catches most real emails without being too strict.

B. Phone Number Validation (US format)

Common formats: 123-456-7890, (123) 456-7890, 123.456.7890

phone_pattern = r"^(\(\d{3}\)|\d{3})[-.\s]?\d{3}[-.\s]?\d{4}$"

def is_valid_phone(phone):
    return bool(re.match(phone_pattern, phone))

tests = ["123-456-7890", "(123) 456-7890", "123.456.7890", "1234567890", "bad"]
for p in tests:
    print(p, "→", is_valid_phone(p))

phone_pattern = r"^(\(\d{3}\)|\d{3})[-.\s]?\d{3}[-.\s]?\d{4}$"

def is_valid_phone(phone):
    return bool(re.match(phone_pattern, phone))

tests = ["123-456-7890", "(123) 456-7890", "123.456.7890", "1234567890", "bad"]
for p in tests:
    print(p, "→", is_valid_phone(p))

C. Password Strength Checking

Example: At least 8 characters, with at least one uppercase, one lowercase, one digit.

password_pattern = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$"

# This uses positive lookahead (advanced, but useful!)
# (?=.*[a-z]) → must contain lowercase somewhere
# etc.

def check_password_strength(pw):
    if re.match(password_pattern, pw):
        return "Strong enough"
    else:
        return "Too weak"

print(check_password_strength("Weak123"))     # Too short → weak
print(check_password_strength("StrongPass1")) # Strong enough

password_pattern = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$"

# This uses positive lookahead (advanced, but useful!)
# (?=.*[a-z]) → must contain lowercase somewhere
# etc.

def check_password_strength(pw):
    if re.match(password_pattern, pw):
        return "Strong enough"
    else:
        return "Too weak"

print(check_password_strength("Weak123"))     # Too short → weak
print(check_password_strength("StrongPass1")) # Strong enough

Extracting Phone Numbers and Dates (YYYY-MM-DD)

A. Extracting Dates from Log Files

Common formats: 2025-12-29, 12/29/2025, Dec 29, 2025

log_line = "ERROR [2025-12-29 14:30:45] Database connection failed on 12/29/2025"

# Extract ISO format dates (YYYY-MM-DD)
iso_dates = re.findall(r"\b\d{4}-\d{2}-\d{2}\b", log_line)
print("ISO dates:", iso_dates)  # ['2025-12-29']

# Extract MM/DD/YYYY
slash_dates = re.findall(r"\b\d{2}/\d{2}/\d{4}\b", log_line)
print("Slash dates:", slash_dates)  # ['12/29/2025']

log_line = "ERROR [2025-12-29 14:30:45] Database connection failed on 12/29/2025"

# Extract ISO format dates (YYYY-MM-DD)
iso_dates = re.findall(r"\b\d{4}-\d{2}-\d{2}\b", log_line)
print("ISO dates:", iso_dates)  # ['2025-12-29']

# Extract MM/DD/YYYY
slash_dates = re.findall(r"\b\d{2}/\d{2}/\d{4}\b", log_line)
print("Slash dates:", slash_dates)  # ['12/29/2025']

With named groups:

date_pattern = r"(?P<iso>\d{4}-\d{2}-\d{2})|(?P<slash>\d{2}/\d{2}/\d{4})"

matches = re.finditer(date_pattern, log_line)
for m in matches:
    if m.group('iso'):
        print("Found ISO:", m.group('iso'))
    if m.group('slash'):
        print("Found slash:", m.group('slash'))

date_pattern = r"(?P<iso>\d{4}-\d{2}-\d{2})|(?P<slash>\d{2}/\d{2}/\d{4})"

matches = re.finditer(date_pattern, log_line)
for m in matches:
    if m.group('iso'):
        print("Found ISO:", m.group('iso'))
    if m.group('slash'):
        print("Found slash:", m.group('slash'))

B. Parsing Information from a Simple CSV String

csv_line = 'John Doe,john@example.com,25,New York,NY'

# Split and extract with groups
pattern = r'^([^,]+),([^,]+),(\d+),([^,]+),([A-Z]{2})$'

match = re.match(pattern, csv_line)
if match:
    name, email, age, city, state = match.groups()
    print(f"Name: {name}, Email: {email}, Age: {age}")

csv_line = 'John Doe,john@example.com,25,New York,NY'

# Split and extract with groups
pattern = r'^([^,]+),([^,]+),(\d+),([^,]+),([A-Z]{2})$'

match = re.match(pattern, csv_line)
if match:
    name, email, age, city, state = match.groups()
    print(f"Name: {name}, Email: {email}, Age: {age}")

Cleaning Scraped Data and Removing Special Characters

A. Removing Extra Whitespace

messy_text = "Too   many     spaces    here\n\n\nAnd empty lines."

clean = re.sub(r"\s+", " ", messy_text)   # All whitespace → single space
clean = clean.strip()                    # Remove leading/trailing
print(clean)
# "Too many spaces here And empty lines."

messy_text = "Too   many     spaces    here\n\n\nAnd empty lines."

clean = re.sub(r"\s+", " ", messy_text)   # All whitespace → single space
clean = clean.strip()                    # Remove leading/trailing
print(clean)
# "Too many spaces here And empty lines."

B. Reformatting Dates (MM-DD-YYYY → YYYY/MM/DD)

text = "Event on 12-29-2025 and another on 01-15-2026."

def reformat_date(match):
    month, day, year = match.groups()
    return f"{year}/{month}/{day}"

new_text = re.sub(r"(\d{2})-(\d{2})-(\d{4})", reformat_date, text)
print(new_text)
# "Event on 2025/12/29 and another on 2026/01/15."

text = "Event on 12-29-2025 and another on 01-15-2026."

def reformat_date(match):
    month, day, year = match.groups()
    return f"{year}/{month}/{day}"

new_text = re.sub(r"(\d{2})-(\d{2})-(\d{4})", reformat_date, text)
print(new_text)
# "Event on 2025/12/29 and another on 2026/01/15."

4. Practice Exercises (Your Turn!)

Now test your skills! Try writing the regex yourself, then run the code to check.

Exercise 1: Find All Hashtags in a Tweet

tweet = "Loving #Python regex today! #Regex #programming #100DaysOfCode is fun."

# Write pattern to find all hashtags (including the #)
pattern = r"#\w+"   # Your pattern here

hashtags = re.findall(pattern, tweet)
print(hashtags)
# Expected: ['#Python', '#Regex', '#programming', '#100DaysOfCode']

tweet = "Loving #Python regex today! #Regex #programming #100DaysOfCode is fun."

# Write pattern to find all hashtags (including the #)
pattern = r"#\w+"   # Your pattern here

hashtags = re.findall(pattern, tweet)
print(hashtags)
# Expected: ['#Python', '#Regex', '#programming', '#100DaysOfCode']

Exercise 2: Extract All URLs from HTML/Text

text = '<a href="https://example.com">Visit</a> or http://old.site.org/path?query=1#fragment'

# Basic HTTP/HTTPS URLs (no quotes issue)
pattern = r"https?://[^\s<>\"]+"

urls = re.findall(pattern, text)
print(urls)
# Expected: ['https://example.com', 'http://old.site.org/path?query=1#fragment']

text = '<a href="https://example.com">Visit</a> or http://old.site.org/path?query=1#fragment'

# Basic HTTP/HTTPS URLs (no quotes issue)
pattern = r"https?://[^\s<>\"]+"

urls = re.findall(pattern, text)
print(urls)
# Expected: ['https://example.com', 'http://old.site.org/path?query=1#fragment']

Exercise 3: Validate Strong Password (8-20 chars, upper, lower, digit, special char)

Try this pattern:

strong_pw_pattern = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&]).{8,20}$"

# Test with: "Passw0rd!", "weak", "SuperStrong123!"

strong_pw_pattern = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&]).{8,20}$"

# Test with: "Passw0rd!", "weak", "SuperStrong123!"

Exercise 4: Extract Quoted Text

text = 'She said "Hello world" and then "Goodbye" quietly.'

# Extract text inside double quotes
pattern = r'"(.*?)"'   # Non-greedy!

quotes = re.findall(pattern, text)
print(quotes)
# Expected: ['Hello world', 'Goodbye']

text = 'She said "Hello world" and then "Goodbye" quietly.'

# Extract text inside double quotes
pattern = r'"(.*?)"'   # Non-greedy!

quotes = re.findall(pattern, text)
print(quotes)
# Expected: ['Hello world', 'Goodbye']

Bonus Challenge: Redact All Email Addresses

text = "Contact john@example.com or support@site.org for help."

redacted = re.sub(r"\b\w+@\w+\.\w+\b", "[EMAIL REDACTED]", text)
print(redacted)

text = "Contact john@example.com or support@site.org for help."

redacted = re.sub(r"\b\w+@\w+\.\w+\b", "[EMAIL REDACTED]", text)
print(redacted)

Python Regex Best Practices

You’ve built some serious skills, and now we’re wrapping up with Best Practices and Debugging — the secrets that turn “it works… sometimes” into reliable, maintainable regex magic.

Let’s go through this like always: clear, practical, and teacher-to-student. 😊

1. Tips for Writing Readable and Efficient Regex

Regex can quickly become a tangled mess (the infamous “write-only” code). Here’s how to keep yours clean and fast.

Why You Should Use Raw Strings (`r""`) for Regex Patterns

Always use r”” for patterns.

For complex patterns, use the re.VERBOSE (or re.X) flag to add whitespace and comments:

import re

email_pattern = re.compile(r"""
    ^[a-zA-Z0-9._%+-]+   # Local part
    @                    # At symbol
    [a-zA-Z0-9.-]+       # Domain name
    \.                   # Dot
    [a-zA-Z]{2,}$        # Top-level domain
""", re.VERBOSE)

text = "test@example.com"
print(bool(email_pattern.match(text)))  # True

import re

email_pattern = re.compile(r"""
    ^[a-zA-Z0-9._%+-]+   # Local part
    @                    # At symbol
    [a-zA-Z0-9.-]+       # Domain name
    \.                   # Dot
    [a-zA-Z]{2,}$        # Top-level domain
""", re.VERBOSE)

text = "test@example.com"
print(bool(email_pattern.match(text)))  # True

This makes long patterns readable!

B. Use Named Groups Instead of Numbered Ones

We touched on this — always prefer named groups:

pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
# Much better than remembering group(1) is year!

pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
# Much better than remembering group(1) is year!

C. Be Specific, Not Overly Greedy

Avoid overly broad patterns like .* when you can be more precise.

Bad (greedy and slow):

r"<.*>"  # Matches from first < to last > on the whole page!

r"<.*>"  # Matches from first < to last > on the whole page!

Good:

r"<[^>]*>"  # Matches only within one tag

r"<[^>]*>"  # Matches only within one tag

D. Start Simple, Then Build Up

Test small pieces first:

Does \d{4} match years? Yes.
Add dashes: \d{4}-\d{2}-\d{2}
Add anchors: ^\d{4}-\d{2}-\d{2}$

E. Reuse with re.compile()

If using the same pattern multiple times:

url_pattern = re.compile(r"https?://\S+")
# Then reuse: url_pattern.findall(text)

url_pattern = re.compile(r"https?://\S+")
# Then reuse: url_pattern.findall(text)

Faster and cleaner.

2. Using Online Regex Testers/Debuggers

Regex101.com is your best friend! (Highly recommend bookmarking it)

Why it’s amazing:

Real-time testing as you type
Explains each part of your pattern
Supports Python flavor (choose “Python” in the flavor dropdown)
Shows match information, groups, and errors
Has a debugger and quick reference

Other good ones:

https://regexr.com (great for visualization)
https://pythex.org (Python-specific, simple)

Pro tip: Always test your regex on Regex101 with the Python flavor before using it in code.

3. Using Flags

Flags modify how regex behaves. Pass them as the third argument or with re.compile().

Common and useful flags:

Flag	Meaning	Example Use Case
re.IGNORECASE (re.I)	Case-insensitive matching	Matching “python” or “Python”
re.MULTILINE (re.M)	^ and $ match start/end of each line	Processing multi-line logs
re.DOTALL (re.S)	. matches newline too	Matching across multiple lines
re.VERBOSE (re.X)	Allow whitespace and comments in pattern	Long, readable patterns

Example: Combining flags

text = """
Error in line 1
WARNING: low memory
Error in line 3
"""

# Find lines starting with "Error"
pattern = re.compile(r"^Error.*", re.MULTILINE | re.IGNORECASE)

matches = pattern.findall(text)
print(matches)
# ['Error in line 1', 'Error in line 3']

text = """
Error in line 1
WARNING: low memory
Error in line 3
"""

# Find lines starting with "Error"
pattern = re.compile(r"^Error.*", re.MULTILINE | re.IGNORECASE)

matches = pattern.findall(text)
print(matches)
# ['Error in line 1', 'Error in line 3']

With re.compile:

pattern = re.compile(r"""
    ^               # Start of line
    \w+             # Word
    \s+             # Spaces
    \d+             # Number
""", re.MULTILINE | re.VERBOSE)

pattern = re.compile(r"""
    ^               # Start of line
    \w+             # Word
    \s+             # Spaces
    \d+             # Number
""", re.MULTILINE | re.VERBOSE)

4. When NOT to Use Regex

Regex is powerful, but not always the best tool. Sometimes simple string methods are faster, clearer, and less error-prone.

Use string methods when:

Checking if a substring exists:

# Better than regex
if "error" in log_line.lower():

# Better than regex
if "error" in log_line.lower():

Splitting on fixed delimiters:

# Use str.split() instead of re.split(r",")
parts = line.split(",")

# Use str.split() instead of re.split(r",")
parts = line.split(",")

Simple replacements:Python

# Often better
cleaned = text.replace("  ", " ").strip()

# Often better
cleaned = text.replace("  ", " ").strip()

Parsing structured data: Use json.loads(), csv module, datetime.strptime(), etc.

Rule of thumb:

Use regex when you need flexible pattern matching. Use built-in string methods or parsers when the format is simple and predictable.

Example: Validating a date?

Simple cases → datetime.strptime() with try/except
Complex/varying formats → regex

Summary: Best Practices Checklist

Do This	Avoid This
Use raw strings r””	Forgetting r and escaping backslashes
Use named groups	Relying only on numbered groups
Use re.VERBOSE for complex patterns	Writing long one-line regex
Test on Regex101.com	Debugging only in your code
Use re.compile() for reuse	Re-writing the pattern every time
Be specific (avoid greedy .*)	Overly broad patterns
Consider simple alternatives first	Using regex for everything

Final Thoughts

You’ve now completed a full regex journey — from “what is this wizardry?” to building real-world, maintainable patterns!

You can now:

Validate inputs
Extract data from messy text
Clean and transform strings
Write readable and efficient patterns

Next steps:

Practice on real data (logs, CSVs, user inputs)
Try challenges on sites like Regex Crossword or Exercism
Use regex in your projects confidently

Python Regex Master Cheat Sheet

A comprehensive, color-coded visual cheat sheet for Python Regular Expressions (Regex). The graphic is organized into a tabulated layout with columns for Symbol, Meaning/Description, and Example/Matches, covering anchors, character classes, shorthands, quantifiers, lookarounds, grouping, and Python-specific flags. — The Ultimate Python Regex Reference: A complete guide to syntax, special sequences, and advanced lookaround patterns for Python’s re module.

Download the High-Res Version

While the image above is great for a quick glance, you can download the png file for infinite zooming and perfect printing.

Click here to download the Master Cheat Sheet (Link this to the .png file you saved from the canvas)

External Resources

Official Python Documentation

1. Python `re` Module Documentation

The authoritative reference for all regex functions, syntax, flags, and detailed behavior in Python.
https://docs.python.org/3/library/re.html

2. Python Regular Expression HOWTO

A beginner-friendly tutorial from Python’s own documentation explaining regex basics, usage, and examples.
https://docs.python.org/3/howto/regex.html

3. Google Python Education: Regular Expressions

An official Google educational resource that breaks down how regex works in Python with simple examples.
https://developers.google.com/edu/python/regular-expressions

Interactive Learning & Practice

4. RegexOne – Interactive Regex Lessons

Step-by-step interactive exercises for learning regex from scratch. Works for all regex flavors and helps build pattern intuition.
https://regexone.com/

5. RegexLearn – Step by Step Regex Tutorials

A dedicated online tutorial platform where users learn regex basics, syntax, and examples with practice challenges.
https://regexlearn.com/

6. Regex101 – Python Flavor Regex Tester

Not a tutorial site, but a real-time regex tester that supports Python regex syntax. It shows matched groups, explains patterns, and visualizes matches — extremely useful for building and debugging regex.
https://regex101.com/

Python Regex FAQs (Frequently Asked Questions)

What is Python regex and where is it used?
Python regex refers to regular expressions implemented through Python’s built-in re module. It is used to search, match, extract, replace, and validate text based on defined patterns. Common use cases include email validation, log file analysis, data cleaning, text preprocessing for machine learning, and parsing structured or semi-structured text.
What does re.compile() do in Python regex?
re.compile() converts a regex pattern into a compiled regular expression object. This object can be reused multiple times without recompiling the pattern. It improves performance when the same pattern is applied repeatedly and makes code more readable and maintainable in large projects.
What is the difference between re.search() and re.match() in Python?
re.search() scans the entire string and returns the first match it finds anywhere in the text.
re.match() attempts to match the pattern only at the beginning of the string.
This difference is important because many beginners mistakenly use re.match() when they actually need re.search().
What are greedy and lazy quantifiers in Python regex?
Greedy quantifiers match as much text as possible while still allowing the pattern to succeed. By default, all quantifiers in Python regex are greedy. Lazy quantifiers match as little text as possible and are created by adding a ? after the quantifier. Lazy matching is especially useful when working with HTML or nested text structures.
What is the difference between re.findall() and re.finditer()?
re.findall() returns all matches as a list of strings or tuples, depending on capturing groups.
re.finditer() returns an iterator of Match objects, providing access to match positions and detailed metadata. re.finditer() is preferred when working with large texts or when index information is required.

Essential Foundations (Highly Recommended)

These articles cover the mechanics of how Python handles text, which is critical for understanding why we use r"" (raw strings) in Regex.

Escape Sequences and Raw Strings in Python: Regex patterns are full of backslashes. This article explains how to handle them without errors. Read Article
The Complete Guide to Python String Methods Why link it: Sometimes Regex is overkill. This guide helps users decide when to use simple methods like .find() or .replace() instead. Read Article

Advanced Data Manipulation

Regex article focuses on data extraction (scraping or cleaning), these are the perfect “next steps.”

A Guide to Web Scraping in Python using Beautiful Soup: Regex is often used inside web scrapers to find specific patterns in HTML. Read Article
Mastering Input and Output Operations in Python: For users who need to read a file, run a Regex search, and save the results. Read Article

Practice & Projects

For readers who want to see Regex logic applied to real-world code.

How to Build a Python Port Scanner from Scratch: Shows practical string parsing and network logic. Read Article
Python Terminology Cheat Sheet for Interviews: Regex is a common “live coding” topic; this helps them prep for the surrounding theory. Read Article

Interactive Regex Playground

Regex Live Playground

Test your Python patterns instantly

Interactive Tool

Regular Expression

/ /g

Test String

0 matches found

Quick Reference

\d+

Match digits (numbers)

[a-zA-Z]+

Match any words

^\w+

Start of string

$.*?$

Content inside brackets

Pro Tip

Use re.findall() in Python to get these matches as a list!

Emmimal Alexander | Emitechlogic.com

Introduction to Python Regex

What is a Python Regular Expression (Regex)?

Why are Regular Expressions useful?

The re Module in Python

Important Tip: Raw Strings (r”pattern”)

Your First Match: Basic Matching (Literal Characters)

Example 1: Searching for the exact word “hello”

Example 2: Case-sensitive vs case-insensitive

Example 3: What if it’s not found?

Essential Python Regex Functions for Beginners

Using re.search() to Find Patterns Anywhere in a String

Basic Syntax

Simple Example

Using Character Classes with re.search()

Case-Insensitive Search

Using Square Brackets in re.search()

Practical Real-World Examples

Using re.match() for Beginning-of-String Validation

Key Difference: re.match() vs re.search()

Visual Example

Strict Validation with re.match()

More Real-World Validation Examples

re.findall()

re.split()

re.sub()

re.compile() – Optional but Highly Recommended

Python Regex Syntax: Metacharacters and Special Sequences

Common Metacharacters

1. The Dot ( . )

Another example:

2. Anchors: ^ and $

Example with ^ (start):

Example with $ (end):

Character Classes in Python: \d, \w, \s, and \b

Practical Examples

Using Square Brackets [] for Custom Character Sets

Basic Syntax

Ranges

Examples in Code

Real-World Uses

Quick Mix-and-Match

Mini Practice (Try These!)

Advanced Python Regex Techniques for 2026

Quantifiers (Repetitions)

1. Zero or One: ? (Question Mark)

Classic example: British vs American spelling

2. Zero or More: * (Asterisk)

Examples:

3. One or More: + (Plus)

Example:

4. Specific Counts: {m,n}

Examples:

Putting It All Together

5. Greedy vs Non-Greedy (Lazy) Matching

Greedy example (problem!):

Fix: Make it non-greedy with ?

Another example:

Grouping and Alternation

1. Grouping with Parentheses (())

A. Grouping to apply quantifiers

Example:

B. Capturing groups to extract parts

Real-world example: Extract username and domain from email

Multiple groups example: Parsing dates

2. Alternation: | (The OR operator)

Simple example:

3. Named Groups (Highly Recommended!)

Example: Improved email extraction

Practical Python Regex Examples and Real-World Use Cases

How to Validate an Email Address in Python

A. Email Address Validation

B. Phone Number Validation (US format)

C. Password Strength Checking

Extracting Phone Numbers and Dates (YYYY-MM-DD)

A. Extracting Dates from Log Files

B. Parsing Information from a Simple CSV String

Cleaning Scraped Data and Removing Special Characters

A. Removing Extra Whitespace

B. Reformatting Dates (MM-DD-YYYY → YYYY/MM/DD)

4. Practice Exercises (Your Turn!)

Using `re.search()` to Find Patterns Anywhere in a String

Why You Should Use Raw Strings (`r""`) for Regex Patterns

1. Python `re` Module Documentation

What does `re.compile()` do in Python regex?

What is the difference between `re.search()` and `re.match()` in Python?

What is the difference between `re.findall()` and `re.finditer()`?