In this short guide, we will learn how to read CSV files with MultiIndex columns using pandas read_csv() function. Whether you're working with financial reports, time series data, or hierarchical datasets, properly parsing multi-level column headers is essential for data analysis with nested categories and structured data organization.

MultiIndex columns allow hierarchical organization of data, common in quarterly reports, regional data, and multi-dimensional datasets.

Problem: Parsing Multi-Index Headers from CSV

Standard CSV reading treats all header rows as a single level, losing hierarchical structure in files with multi-level column names.

Example

import pandas as pd
from io import StringIO

csv_data = """Company,,,Revenue,,
,Q1,Q2,Q3,Q1,Q2
Apple,100,150,200,50,75
Google,120,180,220,60,90"""

df = pd.read_csv(StringIO(csv_data), header=[0, 1])

Company Unnamed: 1_level_0 Unnamed: 2_level_0 Revenue Unnamed: 4_level_0 Unnamed: 5_level_0
Unnamed: 0_level_1 Q1 Q2 Q3 Q1 Q2
0 Apple 100 150 200 50 75
1 Google 120 180 220 60 90

now saving the dataframe as csv and reading it again:

df.to_csv("data.csv", index=False)


df1 = pd.read_csv("data.csv")

df1
Company Unnamed: 1_level_0 Unnamed: 2_level_0 Revenue Unnamed: 4_level_0 Unnamed: 5_level_0
0 Unnamed: 0_level_1 Q1 Q2 Q3 Q1 Q2
1 Apple 100 150 200 50 75
2 Google 120 180 220 60 90

ValueError: Can only compare identically-labeled (both index and columns) DataFrame objects

To solve it we need to use: df1 = pd.read_csv("data.csv", header=[0, 1])

Challenge: First row contains category headers (Company, Revenue), second row contains sub-headers (Name, Q1, Q2).

Solution: Use header Parameter for Multi-Level Columns

The solution is to use basic MultiIndex reading with header=[0, 1].

The header parameter with a list of row indices tells pandas to create MultiIndex columns from multiple header rows.

import pandas as pd
from io import StringIO

csv_data = """Company,Company,Revenue,Revenue
,Name,Q1,Q2
AAPL,Apple Inc,95000,105000
MSFT,Microsoft Corp,78000,82000
TSLA,Tesla Inc,65000,71000"""

df = pd.read_csv(StringIO(csv_data), header=[0, 1])

print("Column levels:", df.columns.nlevels)
print("\nDataFrame:")
print(df)

Output Result:

Column levels: 2

DataFrame:
  Company                Revenue       
     Name         AAPL    Q1     Q2
0   Apple Inc    95000  105000
1   Microsoft Corp 78000   82000
2   Tesla Inc     65000   71000
Company Revenue
Unnamed: 0_level_1 Name Q1 Q2
0 AAPL Apple Inc 95000 105000
1 MSFT Microsoft Corp 78000 82000
2 TSLA Tesla Inc 65000 71000

How it works: The header=[0, 1] parameter instructs pandas to use rows 0 and 1 as column headers, creating a two-level MultiIndex automatically.

Known Bug: Empty Values in MultiIndex Headers

Pandas Issue #59560: The read_csv() function has inconsistent behavior when reading MultiIndex with empty header values.

Bug Description

When CSV files have empty cells in the first header row (representing column groups), pandas may:

  • Create "Unnamed: X_level_Y" column names
  • Lose hierarchical structure
  • Produce inconsistent MultiIndex

Workaround for Empty Header Values

import pandas as pd
from io import StringIO

csv_data = """,,Revenue,Revenue
Company,Ticker,Q1,Q2
Apple,AAPL,95000,105000
Google,GOOGL,85000,92000"""

df = pd.read_csv(StringIO(csv_data), header=[0, 1])

df.columns = pd.MultiIndex.from_tuples([
    ('' if 'Unnamed' in str(c[0]) else c[0], c[1]) 
    for c in df.columns
])

print(df)

Before

Unnamed: 0_level_0 Unnamed: 1_level_0 Revenue
Company Ticker Q1 Q2
0 Apple AAPL 95000 105000
1 Google GOOGL 85000 92000

After Result:

Revenue
Company Ticker Q1 Q2
0 Apple AAPL 95000 105000
1 Google GOOGL 85000 92000

Fix: Manually replace "Unnamed" strings with empty strings to restore intended structure.

TLDR - How to Save and Load MultiIndex DataFrames as CSV

Write MultiIndex to CSV

my_dataframe.to_csv('output.csv')

The multiindex columns appear as regular columns (e.g., 'resolution', 'batch_size') in the CSV file.

Read CSV Back to MultiIndex

import pandas as pd

my_dataframe = pd.read_csv('output.csv', index_col=[0, 1])

Key parameter: index_col=[0, 1] - list of column numbers containing the multiindex levels (adjust based on your structure).

Resources