Splitting strings at uppercase letters is essential when working with camelCase or PascalCase identifiers, parsing API responses, or converting variable names to readable formats. Whether you're processing code identifiers, cleaning data, or building text parsers, Python provides multiple efficient ways to split strings at capital letters.

This tutorial shows you practical methods to split strings at uppercase letters using regex, list comprehension, and built-in functions.

1. Using Regex with re.findall() (Most Reliable)

Regular expressions provide the most flexible and powerful approach for splitting strings at uppercase letters.

import re

def split_at_uppercase(text):
    """Split string at uppercase letters using regex"""
    return re.findall('[A-Z][^A-Z]*', text)

# Examples
print(split_at_uppercase('ThisIsATest'))
# Output: ['This', 'Is', 'A', 'Test']

print(split_at_uppercase('parseHTMLString'))
# Output: ['parse', 'H', 'T', 'M', 'L', 'String']

print(split_at_uppercase('camelCaseExample'))
# Output: ['camel', 'Case', 'Example']

output:

['This', 'Is', 'A', 'Test']
['H', 'T', 'M', 'L', 'String']
['Case', 'Example']

How it works: The pattern [A-Z][^A-Z]* matches an uppercase letter followed by zero or more non-uppercase characters. This cleanly splits at each capital letter while keeping the uppercase letter with its following lowercase characters.

2. Using re.sub() for Space Insertion

A cleaner approach for converting camelCase to readable text inserts spaces before uppercase letters, then splits normally.

import re

def camel_case_split(text):
    """Convert camelCase to separate words"""
    # Insert space before uppercase letters
    spaced = re.sub(r'([A-Z])', r' \1', text)
    # Split and clean up
    return spaced.split()

# Examples
print(camel_case_split('thisIsMyVariableName'))
# Output: ['this', 'Is', 'My', 'Variable', 'Name']

print(camel_case_split('parseJSONData'))
# Output: ['parse', 'J', 'S', 'O', 'N', 'Data']

# Convert to readable format
text = 'getUserProfileData'
words = camel_case_split(text)
readable = ' '.join(words).lower()
print(readable)
# Output: 'get user profile data'

output:

['this', 'Is', 'My', 'Variable', 'Name']
['parse', 'J', 'S', 'O', 'N', 'Data']
get user profile data

When to use this: Perfect for converting function names or variable names into human-readable text for documentation, logs, or UI display.

3. Handling Consecutive Uppercase Letters (Acronyms)

For strings with acronyms like "parseHTMLString" or "XMLHttpRequest", you need a smarter pattern that keeps acronyms together.

import re

def split_preserve_acronyms(text):
    """Split at uppercase while preserving acronyms"""
    # Matches: lowercase followed by uppercase, or uppercase followed by uppercase+lowercase
    return re.sub('([a-z])([A-Z])', r'\1 \2', 
                  re.sub('([A-Z]+)([A-Z][a-z])', r'\1 \2', text)).split()

# Examples
print(split_preserve_acronyms('parseHTMLString'))
# Output: ['parse', 'HTML', 'String']

print(split_preserve_acronyms('XMLHttpRequest'))
# Output: ['XML', 'Http', 'Request']

print(split_preserve_acronyms('getAPIResponseData'))
# Output: ['get', 'API', 'Response', 'Data']

print(split_preserve_acronyms('HTTPSConnection'))
# Output: ['HTTPS', 'Connection']

result:

['parse', 'HTML', 'String']
['XML', 'Http', 'Request']
['get', 'API', 'Response', 'Data']
['HTTPS', 'Connection']

Why this matters: Acronyms like HTTP, XML, API, or JSON should remain together rather than being split into individual letters.

4. Converting to snake_case or kebab-case

A practical real-world use case: converting camelCase variable names to snake_case for Python conventions or kebab-case for URLs.

import re

def camel_to_snake(text):
    """Convert camelCase to snake_case"""
    # Insert underscore before uppercase letters
    snake = re.sub('([a-z])([A-Z])', r'\1_\2', text)
    # Handle acronyms
    snake = re.sub('([A-Z]+)([A-Z][a-z])', r'\1_\2', snake)
    return snake.lower()

def camel_to_kebab(text):
    """Convert camelCase to kebab-case"""
    kebab = re.sub('([a-z])([A-Z])', r'\1-\2', text)
    kebab = re.sub('([A-Z]+)([A-Z][a-z])', r'\1-\2', kebab)
    return kebab.lower()

# Examples
print(camel_to_snake('getUserData'))
# Output: 'get_user_data'

print(camel_to_snake('parseHTMLContent'))
# Output: 'parse_html_content'

print(camel_to_kebab('myVariableName'))
# Output: 'my-variable-name'

print(camel_to_kebab('XMLHttpRequest'))
# Output: 'xml-http-request'

# Batch conversion
variables = ['firstName', 'lastName', 'userID', 'getAPIKey']
snake_vars = [camel_to_snake(v) for v in variables]
print(snake_vars)
# Output: ['first_name', 'last_name', 'user_id', 'get_api_key']

result:

get_user_data
parse_html_content
my-variable-name
xml-http-request
['first_name', 'last_name', 'user_id', 'get_api_key']

Common Use Cases

API Response Parsing: Converting JavaScript camelCase keys to Python snake_case

# JavaScript: { "firstName": "John", "lastName": "Doe" }
# Python: { "first_name": "John", "last_name": "Doe" }

Code Analysis: Extracting words from function or class names for documentation

URL Generation: Converting class names to kebab-case for RESTful URLs

# UserProfile → user-profile
# BlogPost → blog-post

Database Column Mapping: Converting model field names between naming conventions

Log Messages: Making camelCase identifiers readable in error messages

# "getUserProfileData failed" → "Get User Profile Data failed"

Edge Cases to Handle

Starting with lowercase:

print(split_at_uppercase('myVariable'))
# Output: ['my', 'Variable']

All uppercase:

print(split_at_uppercase('HTML'))
# Output: ['H', 'T', 'M', 'L']
# Use preserve_acronyms version for: ['HTML']

Numbers in string:

print(camel_to_snake('user123Data'))
# Output: 'user123_data'

Empty or single character:

print(split_at_uppercase(''))  # Output: []
print(split_at_uppercase('a'))  # Output: []
print(split_at_uppercase('A'))  # Output: ['A']

Quick Reference: Regex Patterns

Pattern Matches Use Case
[A-Z][^A-Z]* Uppercase + non-uppercase chars Basic split
([a-z])([A-Z]) Lowercase before uppercase Boundary detection
([A-Z]+)([A-Z][a-z]) Acronym before word Preserve acronyms
\1_\2 Replacement with underscore snake_case conversion
\1-\2 Replacement with hyphen kebab-case conversion

Choosing the Right Method

  • Simple splitting? → Use re.findall('[A-Z][^A-Z]*', text) (Method 1)
  • Human-readable text? → Use re.sub() with space insertion (Method 2)
  • Handling acronyms? → Use double regex substitution (Method 3)
  • Converting naming conventions? → Use snake_case/kebab-case converters (Method 4)

For most camelCase to snake_case conversions (the most common use case), Method 4 with acronym handling is your best choice. It's production-ready and handles edge cases properly.