In this post you can find useful information for beginers and advanced how to split strings into lists. You can see the using of a separator, dictionaries, split only on first separator or how to treat consecutive separators. There is an example for using regular expression for spliting strings:
- Simple split of string into list
- Python split string by separator
- Split multi-line string into a list (per line)
- Split string dictionary into lists (map)
- Python split string by first occurrence
- Split string by consecutive separators(regex)
You could be interested in these articles about python:
- Python useful tips and reference project
- Python extract text from image or PDF
- IntelliJ/PyCharm performance and beauty tips
Simple split of string into list
If you want to split any string into a list (of substrings) you can use simply the method split(). It can be used:
- without parameter - then space is used as separator
- with parameter - comma, dot etc - see next section
print "Python2 Python3 Python Numpy".split()
print "Python2, Python3, Python, Numpy".split()
the result is:
['Python2', 'Python3', 'Python', 'Numpy']
['Python2,', 'Python3,', 'Python,', 'Numpy']
Python split string by separator
Python split string by comma or any other character use the same method split() with parameter - comma, dot etc. In the example below the string is split by comma and semi colon (which can be used for CSV files.
print "Python2, Python3, Python, Numpy".split(',')
print "Python2; Python3; Python; Numpy".split(';')
the result is:
['Python2', ' Python3', ' Python', ' Numpy']
['Python2', ' Python3', ' Python', ' Numpy']
You can note that separator is missed in the ouput list. So if you want to keep the separator in the output you can use non capturing groups which means:
sep = re.split(',', 'Python2, Python3, Python, Numpy')
print(sep)
sep = re.split('(,)', 'Python2, Python3, Python, Numpy')
print(sep)
and the result is:
['Python2', ' Python3', ' Python', ' Numpy']
['Python2', ',', ' Python3', ',', ' Python', ',', ' Numpy']
But if you want the separator to be part of the separated words then you can use list comprehensions(no regular expressions):
text = 'Python2, Python3, Python, Numpy'
sep = ','
result = [x+sep for x in text.split(sep)]
print(result)
result
['Python2,', ' Python3,', ' Python,', ' Numpy,']
Split multi-line string into a list (per line)
We can use the same string method split and the special character for new line '\n'. If the text contains some extra spaces we can remove them by strip() or lstrip():
str = """
Python is cool
Python is easy
Python is mighty
"""
list = []
for line in str.split("\n"):
if not line.strip():
continue
list.append(line.lstrip())
print list
the result is:
['Python is cool', 'Python is easy', 'Python is mighty']
Split string dictionary into lists (map)
Let say that we have string which is formatted as a dictionary with values: key => value. We want to have this couples into lists or a map. Here you can find simple example:
dictionary = """\
key1 => value1
key2 => value2
key3 => value3
"""
mydict = {}
listKey = []
listValue = []
for line in dictionary.split("\n"):
if not line.strip():
continue
k, v = [word.strip() for word in line.split("=>")]
mydict[k] = v
listKey.append(k)
listValue.append(v)
print mydict
print listKey
print listValue
the result are 1 map and 2 lists:
{'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}
['key1', 'key2', 'key3']
['value1', 'value2', 'value3']
Python split string by first occurrence
If you need to do a split but only for several items and not all of them then you can use "maxsplit". In this example we are splitting the first 3 comma separated items:
str = "Python2, Python3, Python, Numpy, Python2, Python3, Python, Numpy"
data = str.split(", ",3)
for temp in data:
print temp
the result is:
Python2
Python3
Python
Numpy Python2 Python3 Python Numpy
Split string by consecutive separators(regex)
If you want to split several consecutive separators as one(not like the default string split method) you need to use regex module in order to achieve it:
default split method vs module re:
import re
print('Hello1111World'.split('1'))
print(re.split('1+', 'Hello1111World' ))
the result is:
['Hello', '', '', '', 'World']
['Hello', 'World']
This is very useful when you want to skip several spaces or other characters.