Columns can be split with Python and Pandas by:
- creating new dataframe from the results - you don't need to provide column names and types
- adding the results as columns to the old dataframe - you will need to provide headers for your columns
Both methods use pandas.Series.str.split:
Series.str.split(pat=None, n=-1, expand=False)
Split strings around given separator/delimiter.
Split each string in the caller’s values by given pattern, propagating NaN values. Equivalent to str.split().
import pandas as pd
df = pd.DataFrame({'Company': ['Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung',
'LG', 'LG', 'LG', 'LG', 'LG', 'Sony', 'Sony', 'Sony',
'Sony', 'Sony'],
'Date': ['10/9/2018', '10/9/2018', '10/9/2018', '10/10/2018',
'10/10/2018', '10/10/2018', '10/9/2018', '10/9/2018',
'10/9/2018', '10/10/2018', '10/10/2018', '10/10/2018',
'10/10/2018', '10/10/2018', '10/10/2018'],
'Country': ['India', 'India', 'USA', 'France', 'India', 'India',
'Germany', 'USA', 'Brazil', 'Brazil', 'India', 'Germany',
'India', 'India', 'Brazil'],
'Sells': [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53]})
# creating new dataframe single column
new_df_no_expand = df['Date'].str.split('/')
# expand to new columns
new_df = df['Date'].str.split('/',expand=True)
# limit the split to first one
new_df_limit = df['Date'].str.split('/', n=1,expand=True)
# adding new columns to the old dataframe
df[['day', 'mm', 'year']] = df['Date'].str.split('/',expand=True)
a = 1
Result for new dataframe:
0 | 1 | 2 | |
---|---|---|---|
0 | 10 | 9 | 2018 |
1 | 10 | 9 | 2018 |
2 | 10 | 9 | 2018 |
3 | 10 | 10 | 2018 |
4 | 10 | 10 | 2018 |
5 | 10 | 10 | 2018 |
6 | 10 | 9 | 2018 |
7 | 10 | 9 | 2018 |
8 | 10 | 9 | 2018 |
9 | 10 | 10 | 2018 |
10 | 10 | 10 | 2018 |
11 | 10 | 10 | 2018 |
12 | 10 | 10 | 2018 |
13 | 10 | 10 | 2018 |
14 | 10 | 10 | 2018 |
if you want to add the result values as columns to the old dataframe:
Company | Date | Country | Sells day | mm | year | ||
---|---|---|---|---|---|---|---|
0 | Samsung | 10/9/2018 | India | 15 | 10 | 9 | 2018 |
1 | Samsung | 10/9/2018 | India | 81 | 10 | 9 | 2018 |
2 | Samsung | 10/9/2018 | USA | 29 | 10 | 9 | 2018 |
3 | Samsung | 10/10/2018 | France | 33 | 10 | 10 | 2018 |
4 | Samsung | 10/10/2018 | India | 21 | 10 | 10 | 2018 |
5 | LG | 10/10/2018 | India | 42 | 10 | 10 | 2018 |
6 | LG | 10/9/2018 | Germany | 67 | 10 | 9 | 2018 |
7 | LG | 10/9/2018 | USA | 35 | 10 | 9 | 2018 |
8 | LG | 10/9/2018 | Brazil | 2 | 10 | 9 | 2018 |
9 | LG | 10/10/2018 | Brazil | 34 | 10 | 10 | 2018 |
10 | Sony | 10/10/2018 | India | 21 | 10 | 10 | 2018 |
11 | Sony | 10/10/2018 | Germany | 50 | 10 | 10 | 2018 |
12 | Sony | 10/10/2018 | India | 10 | 10 | 10 | 2018 |
13 | Sony | 10/10/2018 | India | 26 | 10 | 10 | 2018 |
14 | Sony | 10/10/2018 | Brazil | 53 | 10 | 10 | 2018 |
In case of a missing data you will get None - example:
10/10/2018
10/10
will result into:
0 1 2
0 10 10 2018
1 10 10 None