exit
Author’s Note: I’m learning about regular expressions alongside you as I write this series. While I’ve done my research and tested the examples, there might be mistakes or oversights. If you spot any errors or have suggestions for improvement, please let me know! We’re all learning together. 🌱
Introduction
Ever wished you could find all phone numbers in a document with just one line of code? Or validate email addresses without writing dozens of if statements? That’s where regular expressions (regex) come in handy!
Think of regex as a super-powered search tool. Instead of looking for exact text like “cat”, you can search for patterns like “any three letter word ending in ‘at’”. In Python, the re
module gives you access to this powerful pattern-matching capability .
In this guide, we’ll explore how to use Python regex to solve real world text processing problems. You’ll learn the basics, see practical examples, and even try your hand at writing your own patterns.
What Are Regular Expressions?
Regular expressions are special text patterns that describe how to search for text. They’re like wildcards on steroids. While a simple search finds exact matches, regex can find patterns like:
- All words starting with “Python”
- Phone numbers in any format
- Email addresses
- Dates in MM/DD/YYYY format
Here’s a simple example:
import re
= "My phone number is 415-555-1234"
text = r'\d{3}-\d{3}-\d{4}'
pattern = re.search(pattern, text)
match if match:
print(f"Found: {match.group()}") # Output: Found: 415-555-1234
Found: 415-555-1234
The pattern \d{3}-\d{3}-\d{4}
means “three digits, dash, three digits, dash, four digits” .
Setting Up: The re Module
Before using regex in Python, you need to import the re
module:
import re
Python’s re
module provides several functions for pattern matching :
Function | What It Does |
---|---|
re.search() |
Finds the first match anywhere in the string |
re.match() |
Checks if the pattern matches at the start of the string |
re.findall() |
Returns all matches as a list |
re.sub() |
Replaces matches with new text |
Basic Pattern Elements
Let’s start with the building blocks of regex patterns:
Character Classes
These are shortcuts for common character types:
\d
- Any digit (0-9)\w
- Any word character (letters, digits, underscore)\s
- Any whitespace (space, tab, newline).
- Any character except newline
# Finding all digits in a string
= "I have 2 cats and 3 dogs"
text = re.findall(r'\d', text)
digits print(digits) # Output: ['2', '3']
['2', '3']
Quantifiers
These specify how many times a pattern should repeat:
*
- Zero or more times+
- One or more times?
- Zero or one time{n}
- Exactly n times{n,m}
- Between n and m times
# Finding words with 3 or more letters
= "I am learning Python"
text = re.findall(r'\w{3,}', text)
long_words print(long_words) # Output: ['learning', 'Python']
['learning', 'Python']
Common Regex Patterns for Beginners
1. Email Validation
def is_valid_email(email):
= r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
pattern return re.match(pattern, email) is not None
# Test it
print(is_valid_email("[email protected]")) # True
True
print(is_valid_email("invalid.email")) # False
False
2. Phone Number Extraction
= "Call me at 415-555-1234 or (555) 987-6543"
text = r'(\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4})'
pattern = re.findall(pattern, text)
phones print(phones) # ['415-555-1234', '(555) 987-6543']
['415-555-1234', '(555) 987-6543']
3. Password Strength Check
def check_password(password):
# At least 8 chars, one uppercase, one lowercase, one digit
if len(password) < 8:
return False
if not re.search(r'[A-Z]', password):
return False
if not re.search(r'[a-z]', password):
return False
if not re.search(r'\d', password):
return False
return True
print(check_password("Pass123!")) # True
True
print(check_password("weak")) # False
False
Groups: Extracting Parts of Matches
Groups let you extract specific parts of a match using parentheses:
# Extract area code and number separately
= "415-555-1234"
phone = r'(\d{3})-(\d{3}-\d{4})'
pattern = re.search(pattern, phone)
match if match:
print(f"Area code: {match.group(1)}") # 415
print(f"Number: {match.group(2)}") # 555-1234
print(f"Full match: {match.group(0)}") # 415-555-1234
Area code: 415
Number: 555-1234
Full match: 415-555-1234
Remember:
group(0)
is the entire match,group(1)
is the first set of parentheses, and so on .
Special Characters and Escaping
Some characters have special meanings in regex. To match them literally, you need to escape them with a backslash:
Character | Special Meaning | To Match Literally |
---|---|---|
. |
Any character | \. |
* |
Zero or more | \* |
+ |
One or more | \+ |
? |
Zero or one | \? |
^ |
Start of string | \^ |
$ |
End of string | \$ |
# Matching a literal period
= "The price is $19.99"
text = r'\$\d+\.\d{2}'
pattern = re.search(pattern, text)
match print(match.group()) # $19.99
$19.99
Using Raw Strings (Important!)
Always use raw strings (prefix with r
) for regex patterns :
# Good - raw string
= r'\d+'
pattern
# Bad - regular string (backslash might be interpreted)
= '\d+' pattern
Raw strings prevent Python from interpreting backslashes as escape characters.
Common Mistakes to Avoid
1. Greedy vs. Non-Greedy Matching
By default, quantifiers are “greedy” - they match as much as possible:
= '<b>Bold</b> and <i>Italic</i>'
text # Greedy - matches too much!
= re.findall(r'<.*>', text)
greedy print(greedy) # ['<b>Bold</b> and <i>Italic</i>']
['<b>Bold</b> and <i>Italic</i>']
# Non-greedy - add ? after quantifier
= re.findall(r'<.*?>', text)
non_greedy print(non_greedy) # ['<b>', '</b>', '<i>', '</i>']
['<b>', '</b>', '<i>', '</i>']
2. Forgetting Anchors
Use ^
and $
to match the entire string:
# Without anchors - matches partial string
= r'\d{3}'
pattern print(re.search(pattern, "abc123def")) # Matches!
<re.Match object; span=(3, 6), match='123'>
# With anchors - must be entire string
= r'^\d{3}$'
pattern print(re.search(pattern, "abc123def")) # No match
None
print(re.search(pattern, "123")) # Matches!
<re.Match object; span=(0, 3), match='123'>
3. Case Sensitivity
Regex is case-sensitive by default. Use the re.IGNORECASE
flag for case-insensitive matching :
= "Python PYTHON python"
text # Case-sensitive
print(re.findall(r'python', text)) # ['python']
['python']
# Case-insensitive
print(re.findall(r'python', text, re.IGNORECASE)) # ['Python', 'PYTHON', 'python']
['Python', 'PYTHON', 'python']
Your Turn!
Here’s a practical exercise to test your new regex skills:
Challenge: Write a regex pattern to find all dates in the format MM/DD/YYYY in the following text:
= """
text Important dates:
- Project starts on 01/15/2025
- First deadline: 02/28/2025
- Final submission: 12/31/2025
- Invalid date: 13/45/2025
"""
# Write your pattern here
= r'___' # Fill in the blank!
pattern
= re.findall(pattern, text)
dates print(dates)
Click here for Solution!
= """
text Important dates:
- Project starts on 01/15/2025
- First deadline: 02/28/2025
- Final submission: 12/31/2025
- Invalid date: 13/45/2025
"""
# Solution
= r'\b(0[1-9]|1[0-2])/(0[1-9]|[12][0-9]|3[01])/\d{4}\b'
pattern
= re.findall(pattern, text)
dates print(dates) # [('01', '15'), ('02', '28'), ('12', '31')]
[('01', '15'), ('02', '28'), ('12', '31')]
# To get full dates as strings:
= r'\b(?:0[1-9]|1[0-2])/(?:0[1-9]|[12][0-9]|3[01])/\d{4}\b'
pattern = re.findall(pattern, text)
dates print(dates) # ['01/15/2025', '02/28/2025', '12/31/2025']
['01/15/2025', '02/28/2025', '12/31/2025']
The pattern breaks down as: - \b
- Word boundary - (?:0[1-9]|1[0-2])
- Month: 01-09 or 10-12 - /
- Literal forward slash - (?:0[1-9]|[12][0-9]|3[01])
- Day: 01-09, 10-29, or 30-31 - /
- Another forward slash - \d{4}
- Four-digit year - \b
- Word boundary
Note: This pattern doesn’t validate if dates are real (like February 30th).
Quick Reference Guide
Common Patterns
# Email
= r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
email_pattern
# Phone (US format)
= r'(\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4})'
phone_pattern
# URL
= r'https?://(?:www\.)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&\'\(\)\*\+,;=.]+'
url_pattern
# Date (MM/DD/YYYY)
= r'\b(?:0[1-9]|1[0-2])/(?:0[1-9]|[12][0-9]|3[01])/\d{4}\b' date_pattern
Most Used Functions
# Search for first match
= re.search(pattern, text)
match if match:
= match.group()
result
# Find all matches
= re.findall(pattern, text)
matches
# Replace matches
= re.sub(pattern, replacement, text)
new_text
# Split by pattern
= re.split(pattern, text) parts
Key Takeaways
- Always use raw strings (r’pattern’) for regex patterns
- Start simple - build complex patterns step by step
- Test your patterns with online tools like regex101.com
- Remember the difference between search(), match(), and findall()
- Escape special characters when you want to match them literally
- Use groups to extract parts of your matches
- Be careful with greedy matching - add ? to make quantifiers non-greedy
Conclusion
Regular expressions might seem intimidating at first, but they’re just patterns made up of simple building blocks. Start with basic patterns like \d+
for numbers or \w+
for words, then gradually combine them to solve more complex problems.
The key is practice! Try modifying the examples in this guide, experiment with your own patterns, and don’t be afraid to make mistakes. Every Python programmer started exactly where myself and possibly you are now.
Ready to level up your text processing skills? Pick a real problem you’re facing, maybe cleaning up messy data or validating user input, and try solving it with regex. You’ll be surprised how much time it can save!
Frequently Asked Questions
Q: When should I use regex instead of regular string methods? A: Use regex when you need pattern matching, not exact matching. For simple tasks like checking if a string starts with something, use str.startswith()
. For complex patterns like “find all email addresses,” use regex.
Q: Why do my patterns sometimes not work? A: Common issues include forgetting to use raw strings, not escaping special characters, or using greedy matching when you need non-greedy. Test your patterns piece by piece to find the problem.
Q: Are Python regex patterns the same as in other languages? A: The basics are similar, but there are differences. Python uses Perl-compatible syntax with some variations. Always check Python-specific documentation .
Q: How can I make my regex patterns more readable? A: Use the re.VERBOSE
flag to write patterns across multiple lines with comments :
= re.compile(r'''
pattern \d{3} # Area code
- # Separator
\d{4} # Number
''', re.VERBOSE)
Q: Is there a performance impact with complex regex? A: Yes, poorly written patterns can be slow. The re
module caches the last 512 compiled patterns for efficiency. For frequently used patterns, compile them once and reuse.
References
Python Software Foundation. “re — Regular expression operations.” Python Documentation.
Python Software Foundation. “Regular Expression HOWTO.” Python Documentation.
Happy Coding! 🚀
You can connect with me at any one of the below:
Telegram Channel here: https://t.me/steveondata
LinkedIn Network here: https://www.linkedin.com/in/spsanderson/
Mastadon Social here: https://mstdn.social/@stevensanderson
RStats Network here: https://rstats.me/@spsanderson
GitHub Network here: https://github.com/spsanderson
Bluesky Network here: https://bsky.app/profile/spsanderson.com
My Book: Extending Excel with Python and R here: https://packt.link/oTyZJ
You.com Referral Link: https://you.com/join/EHSLDTL6