Searching in multiple tabs of excel file and extracting the found results in a text file is easy task with Python and Pandas. Python offers an easy way to read information from excel files like:
These are the method header:
pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, **kwds)[source]
Find Results in all excel sheets and copy them
The code below reads excel file from the old format - .xls . Read all sheet names, Then read each sheet one by one and search for a string: default. If the string is found extract information from two columns from the sheet. Finally will pretty print the results. You may need to install two python modules in order to run it:
- tabulate 0.8.2 -
pip install tabulate
- pandas 0.23.4 -
pip install pandas
and the code is:
import pandas as pd
from tabulate import tabulate
pd.set_option('display.max_columns', None) # or 1000
pd.set_option('display.max_rows', None) # or 1000
pd.set_option('display.max_colwidth', -1) # or 199
xls = pd.ExcelFile('C:\\Users\\myuser\\funky.xls')
# how to convert a single sheet into dataframe
# df2 = pd.read_excel(xls, 'first sheet')
# read all excel sheets names
sheet_names = xls.sheet_names
my_sheet = ["first sheet", "second sheet", "third sheet", "forth sheet"]
# read all excel sheets and search in a column
for tab in mylist:
print('################################## ' + sheet + ' ##################################')
df = pd.read_excel(xls, tab)
# extract a single column
print(df['First Column'].to_string())
# extract a single column and search in it
print (tabulate(df[df['First Column'].str.contains('salary', na=False)]))
# search in a column and extract results from two columns
agg = df[df['Second Column'].str.contains('salary', na=False)]
print(tabulate(agg[['Second Column', 'Third Column']]))
agg = df[df['Second Column'].str.contains('bonus', na=False)]
print(tabulate(agg[['Second Column', 'Third Column']]))
How the code is working:
- first we import the required modules
- then we set pretty print options for pandas
- the excel files is read and converted to pandas dataframe
- we get all excel sheet names
- loop over all sheets (or the predefined ones)
- search in a excel column and output two columns
- finally we pretty print the found results from all excel sheets
You can modify this code to match your needs. For my needs I needed to work with excel file with 60+ tabs, search for several matches and then extract the information. With Python and Pandas everything was done in minutes and ready to be reused.
Store dataframe as excel file
Other examples for pandas and excel:
df_excel = pd.DataFrame([('John', 1),
('Doe', 2),
('Jovi', 3)],
columns=['Name', 'Id'])
Specify the index and the header options
pd.read_excel('tmp.xlsx', index_col=None, header=None)
Read Google Sheet data using the Python API
If you wonder is it possible to read excel files from Google docs - the answer is - yes - it's possible. You can find more information and example on this link here:
How to access Google Sheet data using the Python API and convert to Pandas dataframe