Splitting strings at uppercase letters is essential when working with camelCase or PascalCase identifiers, parsing API responses, or converting variable names to readable formats. Whether you're processing code identifiers, cleaning data, or building text parsers, Python provides multiple efficient ways to split strings at capital letters.
This tutorial shows you practical methods to split strings at uppercase letters using regex, list comprehension, and built-in functions.
1. Using Regex with re.findall() (Most Reliable)
Regular expressions provide the most flexible and powerful approach for splitting strings at uppercase letters.
import re
def split_at_uppercase(text):
"""Split string at uppercase letters using regex"""
return re.findall('[A-Z][^A-Z]*', text)
# Examples
print(split_at_uppercase('ThisIsATest'))
# Output: ['This', 'Is', 'A', 'Test']
print(split_at_uppercase('parseHTMLString'))
# Output: ['parse', 'H', 'T', 'M', 'L', 'String']
print(split_at_uppercase('camelCaseExample'))
# Output: ['camel', 'Case', 'Example']
output:
['This', 'Is', 'A', 'Test']
['H', 'T', 'M', 'L', 'String']
['Case', 'Example']
How it works: The pattern [A-Z][^A-Z]* matches an uppercase letter followed by zero or more non-uppercase characters. This cleanly splits at each capital letter while keeping the uppercase letter with its following lowercase characters.
2. Using re.sub() for Space Insertion
A cleaner approach for converting camelCase to readable text inserts spaces before uppercase letters, then splits normally.
import re
def camel_case_split(text):
"""Convert camelCase to separate words"""
# Insert space before uppercase letters
spaced = re.sub(r'([A-Z])', r' \1', text)
# Split and clean up
return spaced.split()
# Examples
print(camel_case_split('thisIsMyVariableName'))
# Output: ['this', 'Is', 'My', 'Variable', 'Name']
print(camel_case_split('parseJSONData'))
# Output: ['parse', 'J', 'S', 'O', 'N', 'Data']
# Convert to readable format
text = 'getUserProfileData'
words = camel_case_split(text)
readable = ' '.join(words).lower()
print(readable)
# Output: 'get user profile data'
output:
['this', 'Is', 'My', 'Variable', 'Name']
['parse', 'J', 'S', 'O', 'N', 'Data']
get user profile data
When to use this: Perfect for converting function names or variable names into human-readable text for documentation, logs, or UI display.
3. Handling Consecutive Uppercase Letters (Acronyms)
For strings with acronyms like "parseHTMLString" or "XMLHttpRequest", you need a smarter pattern that keeps acronyms together.
import re
def split_preserve_acronyms(text):
"""Split at uppercase while preserving acronyms"""
# Matches: lowercase followed by uppercase, or uppercase followed by uppercase+lowercase
return re.sub('([a-z])([A-Z])', r'\1 \2',
re.sub('([A-Z]+)([A-Z][a-z])', r'\1 \2', text)).split()
# Examples
print(split_preserve_acronyms('parseHTMLString'))
# Output: ['parse', 'HTML', 'String']
print(split_preserve_acronyms('XMLHttpRequest'))
# Output: ['XML', 'Http', 'Request']
print(split_preserve_acronyms('getAPIResponseData'))
# Output: ['get', 'API', 'Response', 'Data']
print(split_preserve_acronyms('HTTPSConnection'))
# Output: ['HTTPS', 'Connection']
result:
['parse', 'HTML', 'String']
['XML', 'Http', 'Request']
['get', 'API', 'Response', 'Data']
['HTTPS', 'Connection']
Why this matters: Acronyms like HTTP, XML, API, or JSON should remain together rather than being split into individual letters.
4. Converting to snake_case or kebab-case
A practical real-world use case: converting camelCase variable names to snake_case for Python conventions or kebab-case for URLs.
import re
def camel_to_snake(text):
"""Convert camelCase to snake_case"""
# Insert underscore before uppercase letters
snake = re.sub('([a-z])([A-Z])', r'\1_\2', text)
# Handle acronyms
snake = re.sub('([A-Z]+)([A-Z][a-z])', r'\1_\2', snake)
return snake.lower()
def camel_to_kebab(text):
"""Convert camelCase to kebab-case"""
kebab = re.sub('([a-z])([A-Z])', r'\1-\2', text)
kebab = re.sub('([A-Z]+)([A-Z][a-z])', r'\1-\2', kebab)
return kebab.lower()
# Examples
print(camel_to_snake('getUserData'))
# Output: 'get_user_data'
print(camel_to_snake('parseHTMLContent'))
# Output: 'parse_html_content'
print(camel_to_kebab('myVariableName'))
# Output: 'my-variable-name'
print(camel_to_kebab('XMLHttpRequest'))
# Output: 'xml-http-request'
# Batch conversion
variables = ['firstName', 'lastName', 'userID', 'getAPIKey']
snake_vars = [camel_to_snake(v) for v in variables]
print(snake_vars)
# Output: ['first_name', 'last_name', 'user_id', 'get_api_key']
result:
get_user_data
parse_html_content
my-variable-name
xml-http-request
['first_name', 'last_name', 'user_id', 'get_api_key']
Common Use Cases
API Response Parsing: Converting JavaScript camelCase keys to Python snake_case
# JavaScript: { "firstName": "John", "lastName": "Doe" }
# Python: { "first_name": "John", "last_name": "Doe" }
Code Analysis: Extracting words from function or class names for documentation
URL Generation: Converting class names to kebab-case for RESTful URLs
# UserProfile → user-profile
# BlogPost → blog-post
Database Column Mapping: Converting model field names between naming conventions
Log Messages: Making camelCase identifiers readable in error messages
# "getUserProfileData failed" → "Get User Profile Data failed"
Edge Cases to Handle
Starting with lowercase:
print(split_at_uppercase('myVariable'))
# Output: ['my', 'Variable']
All uppercase:
print(split_at_uppercase('HTML'))
# Output: ['H', 'T', 'M', 'L']
# Use preserve_acronyms version for: ['HTML']
Numbers in string:
print(camel_to_snake('user123Data'))
# Output: 'user123_data'
Empty or single character:
print(split_at_uppercase('')) # Output: []
print(split_at_uppercase('a')) # Output: []
print(split_at_uppercase('A')) # Output: ['A']
Quick Reference: Regex Patterns
| Pattern | Matches | Use Case |
|---|---|---|
[A-Z][^A-Z]* |
Uppercase + non-uppercase chars | Basic split |
([a-z])([A-Z]) |
Lowercase before uppercase | Boundary detection |
([A-Z]+)([A-Z][a-z]) |
Acronym before word | Preserve acronyms |
\1_\2 |
Replacement with underscore | snake_case conversion |
\1-\2 |
Replacement with hyphen | kebab-case conversion |
Choosing the Right Method
- Simple splitting? → Use
re.findall('[A-Z][^A-Z]*', text)(Method 1) - Human-readable text? → Use
re.sub()with space insertion (Method 2) - Handling acronyms? → Use double regex substitution (Method 3)
- Converting naming conventions? → Use snake_case/kebab-case converters (Method 4)
For most camelCase to snake_case conversions (the most common use case), Method 4 with acronym handling is your best choice. It's production-ready and handles edge cases properly.