How to Read Multi-Index CSV Files in Pandas

In this short guide, we will learn how to read CSV files with MultiIndex columns using pandas read_csv() function. Whether you're working with financial reports, time series data, or hierarchical datasets, properly parsing multi-level column headers is essential for data analysis with nested categories and structured data organization.

MultiIndex columns allow hierarchical organization of data, common in quarterly reports, regional data, and multi-dimensional datasets.

Problem: Parsing Multi-Index Headers from CSV

Standard CSV reading treats all header rows as a single level, losing hierarchical structure in files with multi-level column names.

Example

import pandas as pd
from io import StringIO

csv_data = """Company,,,Revenue,,
,Q1,Q2,Q3,Q1,Q2
Apple,100,150,200,50,75
Google,120,180,220,60,90"""

df = pd.read_csv(StringIO(csv_data), header=[0, 1])

	Company	Unnamed: 1_level_0	Unnamed: 2_level_0	Revenue	Unnamed: 4_level_0	Unnamed: 5_level_0
	Unnamed: 0_level_1	Q1	Q2	Q3	Q1	Q2
0	Apple	100	150	200	50	75
1	Google	120	180	220	60	90

now saving the dataframe as csv and reading it again:

df.to_csv("data.csv", index=False)


df1 = pd.read_csv("data.csv")

df1

	Company	Unnamed: 1_level_0	Unnamed: 2_level_0	Revenue	Unnamed: 4_level_0	Unnamed: 5_level_0
0	Unnamed: 0_level_1	Q1	Q2	Q3	Q1	Q2
1	Apple	100	150	200	50	75
2	Google	120	180	220	60	90

ValueError: Can only compare identically-labeled (both index and columns) DataFrame objects

To solve it we need to use: df1 = pd.read_csv("data.csv", header=[0, 1])

Challenge: First row contains category headers (Company, Revenue), second row contains sub-headers (Name, Q1, Q2).

Solution: Use header Parameter for Multi-Level Columns

The solution is to use basic MultiIndex reading with header=[0, 1].

The header parameter with a list of row indices tells pandas to create MultiIndex columns from multiple header rows.

import pandas as pd
from io import StringIO

csv_data = """Company,Company,Revenue,Revenue
,Name,Q1,Q2
AAPL,Apple Inc,95000,105000
MSFT,Microsoft Corp,78000,82000
TSLA,Tesla Inc,65000,71000"""

df = pd.read_csv(StringIO(csv_data), header=[0, 1])

print("Column levels:", df.columns.nlevels)
print("\nDataFrame:")
print(df)

Output Result:

Column levels: 2

DataFrame:
  Company                Revenue       
     Name         AAPL    Q1     Q2
0   Apple Inc    95000  105000
1   Microsoft Corp 78000   82000
2   Tesla Inc     65000   71000

	Company		Revenue
	Unnamed: 0_level_1	Name	Q1	Q2
0	AAPL	Apple Inc	95000	105000
1	MSFT	Microsoft Corp	78000	82000
2	TSLA	Tesla Inc	65000	71000

How it works: The header=[0, 1] parameter instructs pandas to use rows 0 and 1 as column headers, creating a two-level MultiIndex automatically.

Known Bug: Empty Values in MultiIndex Headers

Pandas Issue #59560: The read_csv() function has inconsistent behavior when reading MultiIndex with empty header values.

Bug Description

When CSV files have empty cells in the first header row (representing column groups), pandas may:

Create "Unnamed: X_level_Y" column names
Lose hierarchical structure
Produce inconsistent MultiIndex

Workaround for Empty Header Values

import pandas as pd
from io import StringIO

csv_data = """,,Revenue,Revenue
Company,Ticker,Q1,Q2
Apple,AAPL,95000,105000
Google,GOOGL,85000,92000"""

df = pd.read_csv(StringIO(csv_data), header=[0, 1])

df.columns = pd.MultiIndex.from_tuples([
    ('' if 'Unnamed' in str(c[0]) else c[0], c[1]) 
    for c in df.columns
])

print(df)

Before

	Unnamed: 0_level_0	Unnamed: 1_level_0	Revenue
	Company	Ticker	Q1	Q2
0	Apple	AAPL	95000	105000
1	Google	GOOGL	85000	92000

After Result:

			Revenue
	Company	Ticker	Q1	Q2
0	Apple	AAPL	95000	105000
1	Google	GOOGL	85000	92000

Fix: Manually replace "Unnamed" strings with empty strings to restore intended structure.

TLDR - How to Save and Load MultiIndex DataFrames as CSV

Write MultiIndex to CSV

my_dataframe.to_csv('output.csv')

The multiindex columns appear as regular columns (e.g., 'resolution', 'batch_size') in the CSV file.

Read CSV Back to MultiIndex

import pandas as pd

my_dataframe = pd.read_csv('output.csv', index_col=[0, 1])

Key parameter: index_col=[0, 1] - list of column numbers containing the multiindex levels (adjust based on your structure).

> Python Basics

> Advanced Python Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet