How to Merge Multiple DataFrames in Pandas: Complete Guide

In this short guide, we will learn how to merge multiple DataFrames in Pandas using different techniques like merge(), concat(), and join(). Whether you're combining customer data from different sources, merging sales records, or joining employee information, Pandas provides powerful methods to combine DataFrames efficiently.

Understanding when to use merge vs concat vs join is crucial for data analysis, ETL pipelines, and data science projects. Each method serves different use cases depending on how your data is structured.

1. Merging Multiple DataFrames Using merge()

The merge() function works like SQL joins and is perfect when you need to combine DataFrames based on common columns. It supports inner, outer, left, and right joins for flexible data combination.

import pandas as pd

df1 = pd.DataFrame({
    'id': [1, 2, 3], 
    'name': ['Emma', 'James', 'Oliver']
})

df2 = pd.DataFrame({
    'id': [3, 4, 5], 
    'age': [25, 30, 35]
})

df3 = pd.DataFrame({
    'id': [5, 6, 7], 
    'city': ['New York', 'Los Angeles', 'Chicago']
})

result = pd.merge(pd.merge(df1, df2, on='id', how='outer'), df3, on='id', how='outer')
print(result)

Output Result:

	id	name	age	city
0	1	Emma	NaN	NaN
1	2	James	NaN	NaN
2	3	Oliver	25.0	NaN
3	4	NaN	30.0	NaN
4	5	NaN	35.0	New York
5	6	NaN	NaN	Los Angeles
6	7	NaN	NaN	Chicago

How it works: The outer join keeps all rows from both DataFrames, filling missing values with NaN. The on='id' parameter specifies the key column for matching. This is essential when working with relational data or database-style joins.

2. Concatenating Multiple DataFrames with concat()

The concat() function is ideal for stacking DataFrames vertically (rows) or horizontally (columns). Unlike merge, it doesn't require matching columns and simply combines DataFrames based on their structure.

import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Sarah', 'Michael'], 
    'Sales': [50000, 60000]
})

df2 = pd.DataFrame({
    'Name': ['David', 'Lisa'], 
    'Revenue': [75000, 85000]
})

df3 = pd.DataFrame({
    'Name': ['John', 'Anna'], 
    'Profit': [12000, 15000]
})

result = pd.concat([df1, df2, df3], axis=0, join='outer', ignore_index=True)
print(result)

Output Result:

	Name	Sales	Revenue	Profit
0	Sarah	50000.0	NaN	NaN
1	Michael	60000.0	NaN	NaN
2	David	NaN	75000.0	NaN
3	Lisa	NaN	85000.0	NaN
4	John	NaN	NaN	12000.0
5	Anna	NaN	NaN	15000.0

Key parameters explained:

axis=0: Concatenates vertically (stacks rows). Use axis=1 for horizontal concatenation
join='outer': Includes all columns from all DataFrames, filling missing with NaN
ignore_index=True: Creates a new sequential index instead of keeping original indices

When to use: Perfect for combining monthly reports, quarterly data, or appending new records to existing datasets.

3. Joining DataFrames Based on Index with join()

The join() method combines DataFrames based on their index values rather than column names. It's faster than merge when working with indexed data and is best for side-by-side combinations.

import pandas as pd

sales_df = pd.DataFrame({
    'Q1_Sales': [100000, 150000, 200000], 
    'Q2_Sales': [120000, 160000, 210000]
}, index=['Product_A', 'Product_B', 'Product_C'])

costs_df = pd.DataFrame({
    'Q1_Costs': [70000, 90000, 130000], 
    'Q2_Costs': [75000, 95000, 135000]
}, index=['Product_A', 'Product_B', 'Product_C'])

profit_df = pd.DataFrame({
    'Q1_Profit': [30000, 60000, 70000], 
    'Q2_Profit': [45000, 65000, 75000]
}, index=['Product_A', 'Product_B', 'Product_C'])

result = sales_df.join([costs_df, profit_df])
print(result)

Output Result:

	Q1_Sales	Q2_Sales	Q1_Costs	Q2_Costs	Q1_Profit	Q2_Profit
Product_A	100000	120000	70000	75000	30000	45000
Product_B	150000	160000	90000	95000	60000	65000
Product_C	200000	210000	130000	135000	70000	75000

Why use join(): When your DataFrames are already indexed properly, join is the most efficient method. It's commonly used in time series analysis, financial data, and panel data where indices represent dates or identifiers.

4. Real-World Example: Combining Customer Data

Here's a practical example combining customer information from multiple sources using all three methods to compare their behavior.

import pandas as pd

customers = pd.DataFrame({
    'customer_id': [101, 102, 103], 
    'name': ['Alice Johnson', 'Bob Smith', 'Charlie Brown']
})

orders = pd.DataFrame({
    'customer_id': [101, 102, 104], 
    'total_spent': [5000, 7500, 3000]
})

loyalty = pd.DataFrame({
    'customer_id': [101, 103, 105], 
    'points': [2500, 1800, 900]
})

merged_data = pd.merge(customers, orders, on='customer_id', how='inner')
merged_data = pd.merge(merged_data, loyalty, on='customer_id', how='inner')

print("Merged Result (Inner Join - Only Matching Customers):")
print(merged_data)
print()

all_customers = pd.merge(customers, orders, on='customer_id', how='outer')
all_customers = pd.merge(all_customers, loyalty, on='customer_id', how='outer')

print("Merged Result (Outer Join - All Customers):")
print(all_customers)

Output Result:

Merged Result (Inner Join - Only Matching Customers):
   customer_id           name  total_spent  points
0          101  Alice Johnson         5000  2500.0

Merged Result (Outer Join - All Customers):
   customer_id            name  total_spent  points
0          101   Alice Johnson       5000.0  2500.0
1          102       Bob Smith       7500.0     NaN
2          103   Charlie Brown          NaN  1800.0
3          104             NaN       3000.0     NaN
4          105             NaN          NaN   900.0

	customer_id	name	total_spent	points
0	101	Alice Johnson	5000.0	2500.0
1	102	Bob Smith	7500.0	NaN
2	103	Charlie Brown	NaN	1800.0
3	104	NaN	3000.0	NaN
4	105	NaN	NaN	900.0

Real-world insight: The inner join shows only customers present in all datasets (complete records), while the outer join reveals data gaps - customers without orders or loyalty points, which is valuable for data quality analysis.

Common Use Cases

CRM Data Integration: Merging customer profiles, purchase history, and support tickets

customers.merge(purchases, on='customer_id').merge(tickets, on='customer_id')

Financial Analysis: Combining quarterly revenue, expenses, and profit reports

pd.concat([q1_data, q2_data, q3_data, q4_data], ignore_index=True)

E-commerce Analytics: Joining product catalogs with inventory and sales data

products.join([inventory, sales_stats])

ETL Pipelines: Merging data from multiple database tables or API sources

Time Series Analysis: Concatenating daily, weekly, or monthly datasets

Join Types Quick Reference

Join Type	Description	Use When
inner	Only matching rows	Need complete records
outer	All rows from both	Want to see all data
left	All from left, matching from right	Preserve main dataset
right	All from right, matching from left	Preserve secondary dataset

Method Comparison

Method	Best For	Key Parameter	Speed
merge()	Column-based joins	`on='column'`	Medium
concat()	Stacking DataFrames	`axis=0 or 1`	Fast
join()	Index-based joins	Index matching	Fastest

Choosing the Right Method

Have common columns? → Use merge() with appropriate join type
Stacking similar DataFrames? → Use concat() with axis parameter
Already indexed data? → Use join() for best performance
Multiple DataFrames to merge? → Use reduce() with merge() or multiple concat()

For most database-style operations where you need to match records by ID or key, merge() with outer join is your best choice. It's the most flexible and handles missing data gracefully.

> Python Basics

> Advanced Python Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet

1. Merging Multiple DataFrames Using merge()

2. Concatenating Multiple DataFrames with concat()

3. Joining DataFrames Based on Index with join()

4. Real-World Example: Combining Customer Data

Common Use Cases

Join Types Quick Reference

Method Comparison

Choosing the Right Method