Concatenating two columns of pandas dataframe is simple as concatenating strings in python.
In python you can do concatenation of two strings as follow:
conc_str = 'pyt' + 'hon'
print(conc_str)
result:
python
if you want to apply similar operation to pandas data frame by combining two and more columns you can use the following way:
import pandas as pd
df = pd.DataFrame({'Company': ['Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung',
'LG', 'LG', 'LG', 'LG', 'LG', 'Sony', 'Sony', 'Sony',
'Sony', 'Sony'],
'Date': ['10/9/2018', '10/9/2018', '10/9/2018', '10/10/2018',
'10/10/2018', '10/10/2018', '10/9/2018', '10/9/2018',
'10/9/2018', '10/10/2018', '10/10/2018', '10/10/2018',
'10/10/2018', '10/10/2018', '10/10/2018'],
'Country': ['India', 'India', 'USA', 'France', 'India', 'India',
'Germany', 'USA', 'Brazil', 'Brazil', 'India', 'Germany',
'India', 'India', 'Brazil'],
'Sells': [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53]})
df['CountryCompany'] = df.Country + df.Company
print(df.CountryCompany[0:3])
This will create a new series/column in the dataframe and you can see the result below:
0 IndiaSamsung
1 IndiaSamsung
2 USASamsung
As you can see we are using the dot notation to get information from the new column. This is working only for columns without spaces. So the dot notation is not working with :
print(df.Country Company)
instead of this you need to use:
print(df['Country Company'])
You can do better formatting with concatenation for example adding separators like:
df['CountryDate'] = df.Company + ', ' + df.Date
print(df.CountryDate[0:3])
result:
Name: CountryCompany, dtype: object
0 Samsung, 10/9/2018
1 Samsung, 10/9/2018
2 Samsung, 10/9/2018
You should be careful for the data types of the concatenated columns because you can get error like:
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U21') dtype('<U21') dtype('<U21')
TypeError: Can't convert 'int' object to str implicitly
if you try to use column/series with integer type like:
df['CountrySells'] = df.Company + ', ' + df.Sells
in order to solve it you will need to convert the column to string by:
df['CountrySells'] = df.Company + ', ' + df.Sells.map(str)
print(df.CountrySells[0:3])
and the result will be:
0 Samsung, 15
1 Samsung, 81
2 Samsung, 29
Name: CountrySells, dtype: object
Note: If you do simple conversion for the concatenation like:
df['CountrySells'] = df.Company + ', ' + str(df.Sells)
print(df.CountrySells[0:3])
then you will get an unexpected result like:
0 Samsung, 0 15\n1 81\n2 29\n3 3...
1 Samsung, 0 15\n1 81\n2 29\n3 3...
2 Samsung, 0 15\n1 81\n2 29\n3 3...
Name: CountrySells, dtype: object
Concatenating more than two columns is simple too. For example creating a series by concatenating 3 columns is done my:
df['CountryDate'] = df.Company + ', ' + df.Date + ', ' + df.Country
print(df.CountryDate[0:3])
the result is:
0 Samsung, 10/9/2018, India
1 Samsung, 10/9/2018, India
2 Samsung, 10/9/2018, USA
Name: CountryDate, dtype: object
You have one more way of custom concatenation for pandas dataframes. And this solutuion is by applying lambda expressions to the columns for each row. You can see the example below:
concat = df.apply(lambda x:'%s, %s' % (x['Company'],x['Sells']),axis=1)
print(concat)
result:
0 Samsung, 15
1 Samsung, 81
2 Samsung, 29
Testing performance of the both methods
-
df.Company + df.Country - this method is a bit faster( check the bottom of the article)
-
df.Company.str.cat(df.Country) - 5.073 seconds
-
df.Company + df.Country - this method is a bit faster - 4.004 seconds