Pandas Tutorial : How to split dataframe by string or date

In this article you will find 3 different examples about how to split a dataframe into new dataframes based on a column. The examples are:

How to split dataframe on a month basis
How to split dataframe per year
Split dataframe on a string column
References

Video tutorial

Pandas: How to split dataframe on a month basis

You can see the dataframe on the picture below. Initially the columns: "day", "mm", "year" don't exists. We are going to split the dataframe into several groups depending on the month. For that purpose we are splitting column date into day, month and year. After that we will group on the month column. Finally we are printing the output dataframes:

import pandas as pd

df = pd.DataFrame(
    {
        "Company": [
            "Samsung", "Samsung", "Samsung", "Samsung", "Samsung", "LG", "LG", "LG", "LG", "LG", "Sony", "Sony", "Sony",
            "Sony", "Sony",
        ],
        "Date": [
            "10/9/2015", "10/9/2015", "10/9/2017", "10/10/2017", "10/10/2017", "10/10/2018", "10/9/2018", "10/9/2018",
            "10/9/2018", "10/10/2016", "10/10/2016", "10/10/2016", "10/10/2019", "10/10/2019", "10/10/2019",
        ],
        "Country": [
            "India", "India", "USA", "France", "India", "India", "Germany", "USA", "Brazil", "Brazil", "India",
            "Germany", "India", "India", "Brazil",
        ],
        "Sells": [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53],
    }
)

df[["day", "mm", "year"]] = df["Date"].str.split("/", expand=True)
agg = df.groupby(['mm'])
for group in agg:
    print(group)

result:

Company	Date	Country	Sells day	mm	year
3	Samsung	10/10/2017	France	33	10	10	2017
4	Samsung	10/10/2017	India	21	10	10	2017
5	LG	10/10/2018	India	42	10	10	2018
9	LG	10/10/2016	Brazil	34	10	10	2016
10	Sony	10/10/2016	India	21	10	10	2016
11	Sony	10/10/2016	Germany	50	10	10	2016
12	Sony	10/10/2019	India	10	10	10	2019
13	Sony	10/10/2019	India	26	10	10	2019
14	Sony	10/10/2019	Brazil	53	10	10	2019
Company	Date	Country	Sells day mm	year
0	Samsung	10/9/2015	India	15	10	9	2015
1	Samsung	10/9/2015	India	81	10	9	2015
2	Samsung	10/9/2017	USA	29	10	9	2017
6	LG	10/9/2018	Germany	67	10	9	2018
7	LG	10/9/2018	USA	35	10	9	2018
8	LG	10/9/2018	Brazil	2	10	9	2018

Pandas: How to split dataframe per year

This time we will use different approach in order to achieve similar behavior. First we will use lambda in order to convert the string into date. Then we are extracting the periods. The final part is to group by the extracted years:

import dateutil
import pandas as pd

df = pd.DataFrame(
    {
        "Company": [
            "Samsung", "Samsung", "Samsung", "Samsung", "Samsung", "LG", "LG", "LG", "LG", "LG", "Sony", "Sony", "Sony",
            "Sony", "Sony",
        ],
        "Date": [
            "10/9/2015", "10/9/2015", "10/9/2017", "10/10/2017", "10/10/2017", "10/10/2018", "10/9/2018", "10/9/2018",
            "10/9/2018", "10/10/2016", "10/10/2016", "10/10/2016", "10/10/2019", "10/10/2019", "10/10/2019",
        ],
        "Country": [
            "India", "India", "USA", "France", "India", "India", "Germany", "USA", "Brazil", "Brazil", "India",
            "Germany", "India", "India", "Brazil",
        ],
        "Sells": [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53],
    }
)

df["Date"] = df["Date"].apply(lambda x: dateutil.parser.parse(x))
year = df["Date"].dt.to_period("Y")
agg = df.groupby([year])
for group in agg:
    print(group)

result:

Company	Date Country	Sells	day		mm	year
0	Samsung 2015-10-09	India	15	10	9	2015
1	Samsung 2015-10-09	India	81	10	9	2015

Company	Date	Country	Sells day	mm	year
9	LG 2016-10-10	Brazil	34	10	10	2016
10	Sony 2016-10-10	India	21	10	10	2016
11	Sony 2016-10-10	Germany	50	10	10	2016

Company	Date Country	Sells day	mm	year
2	Samsung 2017-10-09	USA	29	10	9	2017
3	Samsung 2017-10-10	France	33	10	10	2017
4	Samsung 2017-10-10	India	21	10	10	2017

Company	Date	Country	Sells day	mm	year
5	LG 2018-10-10	India	42	10	10	2018
6	LG 2018-10-09	Germany	67	10	9	2018
7	LG 2018-10-09	USA	35	10	9	2018
8	LG 2018-10-09	Brazil	2	10	9	2018

Company	Date Country	Sells day	mm	year
12	Sony 2019-10-10	India	10	10	10	2019
13	Sony 2019-10-10	India	26	10	10	2019
14	Sony 2019-10-10	Brazil	53	10	10	2019

Pandas: Split dataframe on a strign column

This time the dataframe is a different one. And we have records for two companies inside. If our goal is to split this data frame into new ones based on the companies then we can do:

import pandas as pd

df = pd.DataFrame(
    {
        "Date": [
            "08/09/2018", "10/09/2017", "09/09/2017", "04/09/2017", "08/09/2018", "10/09/2018", "09/09/2018",
            "04/09/2018",
        ],
        "Company": [
            "Samsung", "Samsung", "Samsung", "Samsung", "Sony", "Sony", "Sony", "Sony",
        ],
    }
)

group = df["Company"]
# group by that period
agg = df.groupby([group])
for year, group in agg:
    print(group)

result:

Date	Company
0	08/09/2018	Samsung
1	10/09/2017	Samsung
2	09/09/2017	Samsung
3	04/09/2017	Samsung
Date	Company
4	08/09/2018	Sony
5	10/09/2018	Sony
6	09/09/2018	Sony
7	04/09/2018	Sony

Pandas: How to split dataframe on a month basis

Pandas: How to split dataframe per year

Pandas: Split dataframe on a strign column

Reference