Regular Expressions
Learn to use Python's re module to search, match, and transform text using powerful pattern expressions.
Imagine you have thousands of email addresses in a text file and you need to find all of them. Or you want to check if a phone number looks valid before storing it. You could write dozens of if-statements to handle every case — or you could use a regular expression (regex) to describe the pattern in one concise line.
Regular expressions are a mini-language for describing text patterns. They feel strange at first, but once they click, they become one of the most powerful tools in your toolkit.
See it in action
— step through the idea, then dive into the details below.Find Patterns in Any Text
A regular expression (regex) is a tiny pattern language that describes what text to look for. Instead of dozens of if-statements, one pattern can find emails, phone numbers, or prices in a single line.
#Importing the re Module
Python's built-in re module gives you everything you need. No installation required — just import it.
import re
text = "My email is hello@example.com"
result = re.search(r"hello", text)
print(result)#Always Use Raw Strings
Use r"..." for Regex Patterns
Always write your regex patterns as raw strings: r"\d+" instead of "\\d+".
In a normal Python string, \n means newline and \t means tab. In a raw string, backslashes are treated literally — which is exactly what regex needs. Making this a habit saves you from mysterious bugs.
import re
# Without raw string — backslash causes confusion
pattern_bad = "\d+" # Python interprets \d as just 'd'
# With raw string — backslash is passed through to regex
pattern_good = r"\d+" # regex sees \d, which means "a digit"
print(re.findall(pattern_good, "I have 3 cats and 12 dogs"))#The Four Core Functions
The re module has many functions, but four cover almost every situation you will encounter:
re.search(pattern, text)— finds the first match anywhere in the textre.match(pattern, text)— checks if the text starts with the patternre.findall(pattern, text)— returns a list of all matchesre.sub(pattern, replacement, text)— replaces all matches with something new
import re
text = "Prices: $10, $250, $3"
# findall — grab every number
all_numbers = re.findall(r"\d+", text)
print(all_numbers)
# search — find the first number
first = re.search(r"\d+", text)
print(first.group())
# sub — replace all prices with ???
censored = re.sub(r"\$\d+", "???", text)
print(censored)re.search vs re.match
re.match only checks at the very beginning of the string. If the pattern could appear anywhere in the text, use re.search instead. Most of the time, re.search is what you want.
#Common Metacharacters and Character Classes
Regular expressions get their power from metacharacters — special symbols that represent patterns rather than literal characters. Here are the most important ones:
.— any single character (except newline)\d— any digit (0–9)\w— any word character (letters, digits, underscore)\s— any whitespace (space, tab, newline)\D,\W,\S— the opposite of the lowercase versions[abc]— any one of: a, b, or c[a-z]— any lowercase letter^— start of string;$— end of string+— one or more of the previous;*— zero or more;?— zero or one{3}— exactly 3 of the previous;{2,5}— between 2 and 5
import re
examples = [
(r"\d{3}-\d{4}", "Call us at 555-1234 anytime"),
(r"[aeiou]+", "beautiful"),
(r"\w+@\w+\.\w+", "contact: info@pyquest.dev"),
(r"^Hello", "Hello, world!"),
]
for pattern, text in examples:
match = re.search(pattern, text)
if match:
print(f"Pattern {pattern!r} found: {match.group()!r}")#Capturing Groups with ()
Wrapping part of a pattern in parentheses () creates a group. Groups let you extract specific pieces of a match — not just the whole thing. Think of groups like named buckets inside your pattern.
import re
email = "Send receipts to billing@company.org"
# Groups capture the username and domain separately
match = re.search(r"(\w+)@([\w.]+)", email)
if match:
print("Full match:", match.group(0)) # entire match
print("Username: ", match.group(1)) # first group
print("Domain: ", match.group(2)) # second group#Practical Example: Validating and Extracting Emails
Let's put it all together with a real task: find every valid-looking email address in a block of text.
import re
text = """
Team contacts:
alice@example.com
bob.smith@work.co.uk
not-an-email
charlie@dev.io
"""
# A reasonable (not perfect) email pattern
email_pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z]{2,}"
emails = re.findall(email_pattern, text)
print("Found emails:")
for email in emails:
print(" ", email)Always Check for None Before Calling .group()
When re.search finds nothing, it returns None. Calling .group() on None raises an AttributeError and crashes your program.
Always guard with an if check: ``python match = re.search(r"\d+", text) if match: print(match.group()) `` This is one of the most common regex mistakes beginners make.
Regex is Like a Metal Detector
A regex pattern is like a metal detector set to a specific frequency. You sweep it over text (the beach), and it beeps only when it finds something matching your signal. The tighter you tune the frequency (your pattern), the more precisely it picks out what you want — and ignores everything else.
#When Regex is Overkill
Regex is powerful, but it is not always the right tool. For simple tasks, plain string methods are cleaner, faster, and easier to read:
- Checking if a string starts with something? Use
str.startswith(). - Finding a word? Use
inorstr.find(). - Splitting on a comma? Use
str.split(","). - Replacing a fixed word? Use
str.replace().
Reach for regex when patterns are variable or complex — like phone numbers in many formats, or extracting data from messy text.
# Overkill with regex:
import re
if re.search(r"hello", "hello world"):
print("found")
# Better — simpler and clearer:
if "hello" in "hello world":
print("found")Regex Can't Parse HTML or JSON
A classic mistake is using regex to parse structured formats like HTML or JSON. These formats have nesting and special rules that regex cannot handle reliably. Use html.parser or BeautifulSoup for HTML, and json.loads() for JSON. Regex is for flat text patterns, not document structures.
What does re.findall(r"\d+", "abc 42 def 7") return?
Key takeaways
- Always write regex patterns as raw strings (r"...") to avoid backslash confusion.
- Use re.search to find a pattern anywhere, re.findall to collect all matches, and re.sub to replace matches.
- Parentheses () create capturing groups so you can extract specific parts of a match.
- Always check that re.search returned a match (not None) before calling .group() on it.
- Regex is powerful but not always the right tool — prefer simple string methods for simple tasks.
What does this code print?
import re
text = "I have 3 cats and 12 dogs"
result = re.findall(r"\d+", text)
print(result)This code crashes at runtime. What is the bug?
import re
text = "No digits here!"
match = re.search(r"\d+", text)
print(match.group())Complete the code so it replaces every price (like $10 or $250) with the word FREE and prints the result.
import re text = "Apples $2, Oranges $5" result = re.(r"\$\d+", "FREE", text) print(result)
Put these lines in the right order to extract the username and domain from an email address and print them separately.
print(match.group(1), match.group(2))
match = re.search(r"(\w+)@([\w.]+)", email)
if match:
email = "alice@example.com"
import re
Write a function called extract_phone_numbers(text) that takes a string and returns a list of all phone numbers in the format NNN-NNN-NNNN (three digits, hyphen, three digits, hyphen, four digits). Test it on a sample paragraph.
Try it live — edit the code and hit Run to execute real Python: