In this short guide, we will learn how to read CSV files with MultiIndex columns using pandas read_csv() function. Whether you're working with financial reports, time series data, or hierarchical datasets, properly parsing multi-level column headers is essential for data analysis with nested categories and structured data organization.
MultiIndex columns allow hierarchical organization of data, common in quarterly reports, regional data, and multi-dimensional datasets.
Problem: Parsing Multi-Index Headers from CSV
Standard CSV reading treats all header rows as a single level, losing hierarchical structure in files with multi-level column names.
Example
import pandas as pd
from io import StringIO
csv_data = """Company,,,Revenue,,
,Q1,Q2,Q3,Q1,Q2
Apple,100,150,200,50,75
Google,120,180,220,60,90"""
df = pd.read_csv(StringIO(csv_data), header=[0, 1])
| Company | Unnamed: 1_level_0 | Unnamed: 2_level_0 | Revenue | Unnamed: 4_level_0 | Unnamed: 5_level_0 | |
|---|---|---|---|---|---|---|
| Unnamed: 0_level_1 | Q1 | Q2 | Q3 | Q1 | Q2 | |
| 0 | Apple | 100 | 150 | 200 | 50 | 75 |
| 1 | 120 | 180 | 220 | 60 | 90 |
now saving the dataframe as csv and reading it again:
df.to_csv("data.csv", index=False)
df1 = pd.read_csv("data.csv")
df1
| Company | Unnamed: 1_level_0 | Unnamed: 2_level_0 | Revenue | Unnamed: 4_level_0 | Unnamed: 5_level_0 | |
|---|---|---|---|---|---|---|
| 0 | Unnamed: 0_level_1 | Q1 | Q2 | Q3 | Q1 | Q2 |
| 1 | Apple | 100 | 150 | 200 | 50 | 75 |
| 2 | 120 | 180 | 220 | 60 | 90 |
ValueError: Can only compare identically-labeled (both index and columns) DataFrame objects
To solve it we need to use: df1 = pd.read_csv("data.csv", header=[0, 1])
Challenge: First row contains category headers (Company, Revenue), second row contains sub-headers (Name, Q1, Q2).
Solution: Use header Parameter for Multi-Level Columns
The solution is to use basic MultiIndex reading with header=[0, 1].
The header parameter with a list of row indices tells pandas to create MultiIndex columns from multiple header rows.
import pandas as pd
from io import StringIO
csv_data = """Company,Company,Revenue,Revenue
,Name,Q1,Q2
AAPL,Apple Inc,95000,105000
MSFT,Microsoft Corp,78000,82000
TSLA,Tesla Inc,65000,71000"""
df = pd.read_csv(StringIO(csv_data), header=[0, 1])
print("Column levels:", df.columns.nlevels)
print("\nDataFrame:")
print(df)
Output Result:
Column levels: 2
DataFrame:
Company Revenue
Name AAPL Q1 Q2
0 Apple Inc 95000 105000
1 Microsoft Corp 78000 82000
2 Tesla Inc 65000 71000
| Company | Revenue | |||
|---|---|---|---|---|
| Unnamed: 0_level_1 | Name | Q1 | Q2 | |
| 0 | AAPL | Apple Inc | 95000 | 105000 |
| 1 | MSFT | Microsoft Corp | 78000 | 82000 |
| 2 | TSLA | Tesla Inc | 65000 | 71000 |
How it works: The header=[0, 1] parameter instructs pandas to use rows 0 and 1 as column headers, creating a two-level MultiIndex automatically.
Known Bug: Empty Values in MultiIndex Headers
Pandas Issue #59560: The read_csv() function has inconsistent behavior when reading MultiIndex with empty header values.
Bug Description
When CSV files have empty cells in the first header row (representing column groups), pandas may:
- Create "Unnamed: X_level_Y" column names
- Lose hierarchical structure
- Produce inconsistent MultiIndex
Workaround for Empty Header Values
import pandas as pd
from io import StringIO
csv_data = """,,Revenue,Revenue
Company,Ticker,Q1,Q2
Apple,AAPL,95000,105000
Google,GOOGL,85000,92000"""
df = pd.read_csv(StringIO(csv_data), header=[0, 1])
df.columns = pd.MultiIndex.from_tuples([
('' if 'Unnamed' in str(c[0]) else c[0], c[1])
for c in df.columns
])
print(df)
Before
| Unnamed: 0_level_0 | Unnamed: 1_level_0 | Revenue | ||
|---|---|---|---|---|
| Company | Ticker | Q1 | Q2 | |
| 0 | Apple | AAPL | 95000 | 105000 |
| 1 | GOOGL | 85000 | 92000 | |
After Result:
| Revenue | ||||
|---|---|---|---|---|
| Company | Ticker | Q1 | Q2 | |
| 0 | Apple | AAPL | 95000 | 105000 |
| 1 | GOOGL | 85000 | 92000 | |
Fix: Manually replace "Unnamed" strings with empty strings to restore intended structure.
TLDR - How to Save and Load MultiIndex DataFrames as CSV
Write MultiIndex to CSV
my_dataframe.to_csv('output.csv')
The multiindex columns appear as regular columns (e.g., 'resolution', 'batch_size') in the CSV file.
Read CSV Back to MultiIndex
import pandas as pd
my_dataframe = pd.read_csv('output.csv', index_col=[0, 1])
Key parameter: index_col=[0, 1] - list of column numbers containing the multiindex levels (adjust based on your structure).