In this article, we'll see how to read/unzip file(s) from zip or tar.gz with Python. We will describe the extraction of single or multiple files from the archive.

If you are interested in parallel extraction from archive than you can check: Python Parallel Processing Multiple Zipped JSON Files Into Pandas DataFrame

Step 1: Get info from Zip Or Tar.gz Archive with Python

First we can check what is the content of the zip file by this code snippet:

from zipfile import ZipFile

zipfile = 'file.zip'

z = ZipFile(zipfile)
z.infolist()

the result:

 <ZipInfo filename='text1.txt' filemode='-rw-rw-r--' external_attr=0x8020 file_size=0>]

From which we can find two filenames and size:

  • pandas-dataframe-background-color-based-condition-value-python.png
  • text1.txt

Step 2: List and Read all files from Archive with Python

Next we can list all files from the archive in a list by:

from zipfile import ZipFile

archive = 'file.zip'

zip_file = ZipFile(archive)
[text_file.filename for text_file in zip_file.infolist() ]

Result:

['pandas-dataframe-background-color-based-condition-value-python.png',
'text1.txt']

If you like to filter them - for example only .json ones - or read the files as Pandas DataFrames you can do:

from zipfile import ZipFile

archive = 'file.zip'

zip_file = ZipFile(archive)
dfs = {text_file.filename: pd.read_csv(zip_file.open(text_file.filename))        for text_file in zip_file.infolist()       if text_file.filename.endswith('.json')}
dfs

Step 3: Extract files from zip archive With Python

Package zipfile can be used in order to extract files from zip archive for Python. Basic usage is shown below:

import zipfile

archive = 'file.zip'

with zipfile.ZipFile(archive, 'r') as zip_file:
    zip_file.extractall(directory_to_extract_to)

Step 4: Extract files from Tar/Tar.gz With Python

For Tar/Tar.gz files we can use the code below in order to extract the files. It uses module - tarfile and differs the two types in order to use proper extraction mode:

import tarfile

zipfile = 'file.zip'

if zipfile.endswith("tar.gz"):
    tar = tarfile.open(zipfile, "r:gz")
elif zipfile.endswith("tar"):
    tar = tarfile.open(zipfile, "r:")

tar.extractall()
tar.close()

Note: All files from the archive will be unzipped in the current working directory for the script.

Step 5: Extract single file from Archive

If you like to get just a single file from Archive then you can use the method: zipObject.extract(fileName, 'temp_py'). Basic usage is shown below:

import zipfile

archive = 'file.zip'

with zipfile.ZipFile(archive, 'r') as zip_file:
    zip_file.extract('text1.txt', '.')

In this example we are going to extract the file - 'text1.txt' in the current working directory. If you like to change the output directory than you can change the second parameter - '.'

Conclusion

In this tutorial, we covered how to extract single or multiple files from Archive with Python. It covered two different python packages - zipfile and tarfile.

You've also learned how to list and get info from archived files.