How to Merge Multiple JSON Files with Python

In this quick article, we'll focus on a few examples of how to merge multiple JSON files into a one with Python. We will answer also on those questions:

  • How to merge 2 JSON files in Python?
  • How to merge all JSON files in directory?
  • Merge JSON files by pattern
  • Keep trace on merging multiple JSON files

If you are interested in combining CSV files into DataFrame then you can check this detailed article: How to merge multiple CSV files with Python

This image illustrates the process of merging two JSON files into single on by Python:

Merge multiple JSON files into one

To merge multiple JSON files with trace into one we can use :

import pandas as pd
import glob, os, json


json_dir = 'data/json_files_dir'

json_pattern = os.path.join(json_dir, '*.json')
file_list = glob.glob(json_pattern)

dfs = []
for file in file_list:
	with open(file) as f:
    	json_data = pd.json_normalize(json.loads(f.read()))
    	json_data['site'] = file.rsplit("/", 1)[-1]
	dfs.append(json_data)
df = pd.concat(dfs)

If you like to learn more about this code and how to customize it. Then you can check the next steps:

Step 1: List multiple JSON files in a folder

Merging multiple files requires several Python libraries like: pandas, glob, os and json.

Next we can see how to list JSON files in a folder with Python:

import pandas as pd
import glob, os, json


json_dir = 'data/json_files_dir'

json_pattern = os.path.join(json_dir, '*.json')
file_list = glob.glob(json_pattern)

This will result in a list with the absolute JSON files like:

['/data/json_files_dir/file1.json',
'/data/json_files_dir/file2.json'
]

Note: for JSON lines you may need to change the matching pattern to '*.jl'

Step 2: Read and merge multiple JSON file into DataFrame

Finally we are going to process all JSON files found in the previous step one by one.

We are reading the files with f.read() and loading them as JSON records by method json.loads.

Finally we are going to create a Pandas DataFrame with pd.json_normalize. All DataFrames are appended to a list.

The last step is concatenating list of DataFrames into a single one by: pd.concat(dfs)

dfs = []
for file in file_list:
	with open(file) as f:
    	json_data = pd.json_normalize(json.loads(f.read()))
    	json_data['site'] = file.rsplit("/", 1)[-1]
	dfs.append(json_data)
df = pd.concat(dfs)

If you like to have a trace of each record from which file is coming - then you can use a line like:

json_data['site'] = file.rsplit("/", 1)[-1]

We are converting the absolute file path in the file name so:

'/data/json_files_dir/file1.json'

will be kept as:

'file1.json'

Python merge json files - lines

Below you can find how to merge multiple JSON line files. What we need is to add parameter lines=True:

import pandas as pd
import glob, os, json


json_dir = '/home/user/data'

json_pattern = os.path.join(json_dir, '*.json')
file_list = glob.glob(json_pattern)


dfs = []
for file in file_list:
	df_temp = pd.read_json(file,  lines=True)
	df_temp['source'] = file.rsplit("/", 1)[-1]
	dfs.append(df_temp)
df = pd.concat(dfs)

Summary

In this article we covered how to merge multiple JSON files into one in Python.

We also covered merging of all JSON files in directory by pattern in Python. As a bonus we saw how to save information into Pandas DataFrame and keep track of each source JSON file.