Writing Pythonic CodeAdvanced10 min61 / 63

Regular Expressions

Learn to use Python's re module to search, match, and transform text using powerful pattern expressions.

Imagine you have thousands of email addresses in a text file and you need to find all of them. Or you want to check if a phone number looks valid before storing it. You could write dozens of if-statements to handle every case — or you could use a regular expression (regex) to describe the pattern in one concise line.

Regular expressions are a mini-language for describing text patterns. They feel strange at first, but once they click, they become one of the most powerful tools in your toolkit.

See it in action

Visual walkthrough1 / 6
1

Find Patterns in Any Text

A regular expression (regex) is a tiny pattern language that describes what text to look for. Instead of dozens of if-statements, one pattern can find emails, phone numbers, or prices in a single line.

Python's built-in `re` module is all you need — no install required.

#Importing the re Module

Python's built-in re module gives you everything you need. No installation required — just import it.

re.search returns a Match object if the pattern is found, or None if not.
import re

text = "My email is hello@example.com"
result = re.search(r"hello", text)
print(result)

#Always Use Raw Strings

Tip

Use r"..." for Regex Patterns

Always write your regex patterns as raw strings: r"\d+" instead of "\\d+".

In a normal Python string, \n means newline and \t means tab. In a raw string, backslashes are treated literally — which is exactly what regex needs. Making this a habit saves you from mysterious bugs.

Always prefix regex patterns with r to avoid backslash headaches.
import re

# Without raw string — backslash causes confusion
pattern_bad  = "\d+"   # Python interprets \d as just 'd'

# With raw string — backslash is passed through to regex
pattern_good = r"\d+"  # regex sees \d, which means "a digit"

print(re.findall(pattern_good, "I have 3 cats and 12 dogs"))

#The Four Core Functions

The re module has many functions, but four cover almost every situation you will encounter:

  • re.search(pattern, text) — finds the first match anywhere in the text
  • re.match(pattern, text) — checks if the text starts with the pattern
  • re.findall(pattern, text) — returns a list of all matches
  • re.sub(pattern, replacement, text)replaces all matches with something new
The three most-used regex functions in one example.
import re

text = "Prices: $10, $250, $3"

# findall — grab every number
all_numbers = re.findall(r"\d+", text)
print(all_numbers)

# search — find the first number
first = re.search(r"\d+", text)
print(first.group())

# sub — replace all prices with ???
censored = re.sub(r"\$\d+", "???", text)
print(censored)
Note

re.search vs re.match

re.match only checks at the very beginning of the string. If the pattern could appear anywhere in the text, use re.search instead. Most of the time, re.search is what you want.

#Common Metacharacters and Character Classes

Regular expressions get their power from metacharacters — special symbols that represent patterns rather than literal characters. Here are the most important ones:

  • . — any single character (except newline)
  • \d — any digit (0–9)
  • \w — any word character (letters, digits, underscore)
  • \s — any whitespace (space, tab, newline)
  • \D, \W, \S — the opposite of the lowercase versions
  • [abc] — any one of: a, b, or c
  • [a-z] — any lowercase letter
  • ^ — start of string; $ — end of string
  • + — one or more of the previous; * — zero or more; ? — zero or one
  • {3} — exactly 3 of the previous; {2,5} — between 2 and 5
Mixing metacharacters lets you describe complex patterns compactly.
import re

examples = [
    (r"\d{3}-\d{4}",  "Call us at 555-1234 anytime"),
    (r"[aeiou]+",     "beautiful"),
    (r"\w+@\w+\.\w+", "contact: info@pyquest.dev"),
    (r"^Hello",       "Hello, world!"),
]

for pattern, text in examples:
    match = re.search(pattern, text)
    if match:
        print(f"Pattern {pattern!r} found: {match.group()!r}")

#Capturing Groups with ()

Wrapping part of a pattern in parentheses () creates a group. Groups let you extract specific pieces of a match — not just the whole thing. Think of groups like named buckets inside your pattern.

group(0) is the whole match; group(1), group(2), ... are the captured pieces.
import re

email = "Send receipts to billing@company.org"

# Groups capture the username and domain separately
match = re.search(r"(\w+)@([\w.]+)", email)

if match:
    print("Full match:", match.group(0))   # entire match
    print("Username: ", match.group(1))   # first group
    print("Domain:   ", match.group(2))   # second group

#Practical Example: Validating and Extracting Emails

Let's put it all together with a real task: find every valid-looking email address in a block of text.

findall collects every email-shaped string. 'not-an-email' is correctly skipped.
import re

text = """
Team contacts:
  alice@example.com
  bob.smith@work.co.uk
  not-an-email
  charlie@dev.io
"""

# A reasonable (not perfect) email pattern
email_pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z]{2,}"

emails = re.findall(email_pattern, text)
print("Found emails:")
for email in emails:
    print(" ", email)
Common mistake

Always Check for None Before Calling .group()

When re.search finds nothing, it returns None. Calling .group() on None raises an AttributeError and crashes your program.

Always guard with an if check: ``python match = re.search(r"\d+", text) if match: print(match.group()) `` This is one of the most common regex mistakes beginners make.

Think of it like

Regex is Like a Metal Detector

A regex pattern is like a metal detector set to a specific frequency. You sweep it over text (the beach), and it beeps only when it finds something matching your signal. The tighter you tune the frequency (your pattern), the more precisely it picks out what you want — and ignores everything else.

#When Regex is Overkill

Regex is powerful, but it is not always the right tool. For simple tasks, plain string methods are cleaner, faster, and easier to read:

  • Checking if a string starts with something? Use str.startswith().
  • Finding a word? Use in or str.find().
  • Splitting on a comma? Use str.split(",").
  • Replacing a fixed word? Use str.replace().

Reach for regex when patterns are variable or complex — like phone numbers in many formats, or extracting data from messy text.

Use plain string methods for simple lookups. Save regex for genuinely complex patterns.
# Overkill with regex:
import re
if re.search(r"hello", "hello world"):
    print("found")

# Better — simpler and clearer:
if "hello" in "hello world":
    print("found")
Watch out

Regex Can't Parse HTML or JSON

A classic mistake is using regex to parse structured formats like HTML or JSON. These formats have nesting and special rules that regex cannot handle reliably. Use html.parser or BeautifulSoup for HTML, and json.loads() for JSON. Regex is for flat text patterns, not document structures.

Quick check

What does re.findall(r"\d+", "abc 42 def 7") return?

Key takeaways

  • Always write regex patterns as raw strings (r"...") to avoid backslash confusion.
  • Use re.search to find a pattern anywhere, re.findall to collect all matches, and re.sub to replace matches.
  • Parentheses () create capturing groups so you can extract specific parts of a match.
  • Always check that re.search returned a match (not None) before calling .group() on it.
  • Regex is powerful but not always the right tool — prefer simple string methods for simple tasks.
Practice challenges
Test yourself · earn XP
0/4
Predict the output#1

What does this code print?

predict-output
import re

text = "I have 3 cats and 12 dogs"
result = re.findall(r"\d+", text)
print(result)
Fix the bug#2

This code crashes at runtime. What is the bug?

fix-bug
import re

text = "No digits here!"
match = re.search(r"\d+", text)
print(match.group())
Fill in the blank#3

Complete the code so it replaces every price (like $10 or $250) with the word FREE and prints the result.

import re

text = "Apples $2, Oranges $5"
result = re.(r"\$\d+", "FREE", text)
print(result)
Reorder the lines#4

Put these lines in the right order to extract the username and domain from an email address and print them separately.

1
    print(match.group(1), match.group(2))
2
match = re.search(r"(\w+)@([\w.]+)", email)
3
if match:
4
email = "alice@example.com"
5
import re
Your turn
Practice exercise

Write a function called extract_phone_numbers(text) that takes a string and returns a list of all phone numbers in the format NNN-NNN-NNNN (three digits, hyphen, three digits, hyphen, four digits). Test it on a sample paragraph.

Try it live — edit the code and hit Run to execute real Python:

solution.py · editable