In this article you can find two examples how to use pandas and python with functions: group by and sum. You can see the example data below. This article describes how to group by and sum by two and more columns with pandas.
- table 1
Country | Company | Date | Sells | |
---|---|---|---|---|
0 | India | Samsung | 10/09/18 | 15 |
1 | India | Samsung | 10/09/18 | 81 |
2 | USA | Samsung | 10/09/18 | 29 |
3 | France | Samsung | 10/10/18 | 33 |
4 | India | Samsung | 10/10/18 | 21 |
5 | India | LG | 10/10/18 | 42 |
6 | Germany | LG | 10/09/18 | 67 |
7 | USA | LG | 10/09/18 | 35 |
8 | Brazil | LG | 10/09/18 | 2 |
9 | Tony | LG | 10/10/18 | 34 |
10 | India | Sony | 10/10/18 | 21 |
11 | Germany | Sony | 10/10/18 | 50 |
12 | India | Sony | 10/10/18 | 10 |
13 | India | Sony | 10/10/18 | 26 |
14 | Brazil | Sony | 10/10/18 | 53 |
table 2
Company | Date | Sells | Promo | |
---|---|---|---|---|
0 | Samsung | 08/09/2018 | 11 | 54 |
1 | Samsung | 10/09/2018 | 8 | 56 |
2 | Samsung | 09/09/2018 | 43 | 54 |
3 | Samsung | 04/09/2018 | 4 | 4 |
4 | Sony | 08/09/2018 | 45 | 3 |
5 | Sony | 10/09/2018 | 534 | 34 |
6 | Sony | 09/09/2018 | 42 | 32 |
7 | Sony | 04/09/2018 | 32 | 15 |
Now lets say that we want to get information about the sells per country in the first table using Python and pandas. The next example shows how this can be done:
import pandas as pd
df = pd.DataFrame({'Company': ['Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung',
'LG', 'LG', 'LG', 'LG', 'LG', 'Sony', 'Sony', 'Sony',
'Sony', 'Sony'],
'Date': ['10/9/2018', '10/9/2018', '10/9/2018', '10/10/2018',
'10/10/2018', '10/10/2018', '10/9/2018', '10/9/2018',
'10/9/2018', '10/10/2018', '10/10/2018', '10/10/2018',
'10/10/2018', '10/10/2018', '10/10/2018'],
'Country': ['India', 'India', 'USA', 'France', 'India', 'India',
'Germany', 'USA', 'Brazil', 'Brazil', 'India', 'Germany',
'India', 'India', 'Brazil'],
'Sells': [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53]})
print(df.groupby(['Company', 'Country']).sum())
result:
Company | Country | Sells |
---|---|---|
LG | Brazil | 36 |
Germany | 67 | |
India | 42 | |
USA | 35 | |
Samsung | France | 33 |
India | 117 | |
USA | 29 | |
Sony | Brazil | 53 |
Germany | 50 | |
India | 57 |
now lets see how to get sells in a country per company. This can be done by:
print(df.groupby(['Country', 'Company']).sum())
the result is:
Sells | ||
---|---|---|
Country | Company | |
Brazil | LG | 36 |
Sony | 53 | |
France | Samsung | 33 |
Germany | LG | 67 |
Sony | 50 | |
India | LG | 42 |
Samsung | 117 | |
Sony | 57 | |
USA | LG | 35 |
Samsung | 29 |
Now lets have a look on the second table and get sells per date:
df = pd.DataFrame({'Date': ['08/09/2018', '10/09/2018', '09/09/2018', '04/09/2018', '08/09/2018', '10/09/2018',
'09/09/2018', '04/09/2018'],
'Company': ['Samsung', 'Samsung', 'Samsung', 'Samsung', 'Sony', 'Sony', 'Sony', 'Sony'],
'Promotions': [54, 56, 54, 4, 3, 34, 32, 15]})
print(df)
group = df['Promotions'].groupby(df['Date']).sum()
print(group)
result:
Date | |
---|---|
04/09/2018 | 19 |
08/09/2018 | 57 |
09/09/2018 | 86 |
10/09/2018 | 90 |
Name: Promotions, dtype: int64 |
You can have also new column as a total from your data:
import pandas as pd
df = pd.DataFrame({'Date': ['08/09/2018', '10/09/2018', '09/09/2018', '04/09/2018', '08/09/2018', '10/09/2018',
'09/09/2018', '04/09/2018'],
'Company': ['Samsung', 'Samsung', 'Samsung', 'Samsung', 'Sony', 'Sony', 'Sony', 'Sony'],
'Promotions': [54, 56, 54, 4, 3, 34, 32, 15],
'Sells': [23, 34, 54, 6, 96, 45, 3, 12]})
df['Total'] = df['Promotions'].groupby(df['Date']).transform('sum')
print(df)
result:
Company | Date | Promotions | Sells | Total | |
---|---|---|---|---|---|
0 | Samsung | 8/9/2018 | 54 | 23 | 57 |
1 | Samsung | 10/9/2018 | 56 | 34 | 90 |
2 | Samsung | 9/9/2018 | 54 | 54 | 86 |
3 | Samsung | 4/9/2018 | 4 | 6 | 19 |
4 | Sony | 8/9/2018 | 3 | 96 | 57 |
5 | Sony | 10/9/2018 | 34 | 45 | 90 |
6 | Sony | 9/9/2018 | 32 | 3 | 86 |
7 | Sony | 4/9/2018 | 15 | 12 | 19 |