Do you need to search and replace a list of values in a big Excel file with many sheets?
If so, I'll show you the steps to search in Excel file - list of words and replace them. In this article you can find the exact cell(with packages like xlrd, xlwt and openpyxl) and partial cell match. At the end a new Excel file is generated with the replaced values.
Let's check the example data:
A | B | C |
---|---|---|
4321 | 3210 | 2100 |
1 | 0 | 0 |
2 | 1 | 0 |
3 | 2 | 1 |
4 | 3 | 2 |
we are going to search for 0 and 1 - and replace them with False and True.
A | B | C |
---|---|---|
4321 | 3210 | 2100 |
True | False | False |
2 | True | False |
3 | 2 | True |
4 | 3 | 2 |
You can check also this video:
Easily extract information from Excel with Python and Pandas
Python and Excel - search and replace with xlrd and xlwt
In this example we are going to use two Python packages: xlrd
and xlwt
.
Step 1: Read Excel file
The first package reads the Excel file and a given sheet by name or index:
import xlwt
import xlrd
# read Excel file and sheet by name
workbook = xlrd.open_workbook('/home/vanx/Documents/example1.xlsx')
sheet = workbook.sheet_by_name('Test')
sheet2 = workbook.sheet_by_index(2)
Step 2: Create new Excel file
The second package - xlwt
- will be used to write the data into new Excel file:
new_workbook = xlwt.Workbook()
new_sheet = new_workbook.add_sheet('Test')
Step 3: Search and replace a cell in xlsx file
The next step is to define replacement pairs like: {1:True, 0:False}
:
replacement = {1:True, 0:False}
Step 4: Search and replace a cell in xlsx file
Finally the code iterates over the rows and columns is controlled by:
ncols
- number of columns in the selected sheetnrows
- number of rows in the selected sheet
This is the final part of the code:
# iterate over the rows
for i in range(sheet.nrows):
print(i)
data = [sheet.cell_value(i, col) for col in range(sheet.ncols)]
for index, value in enumerate(data):
if value in replacement.keys():
new_sheet.write(i, index, str(replacement.get(value)))
else:
new_sheet.write(i, index, value)
new_workbook.save('example.xls')
where new_workbook.save('example.xlsx')
saves the data into file - example.xlsx
Python and Excel - search and replace with openpyxl
Another Python package exists and can be used - openpyxl
. It can do the same as the previous example. Difference is that only one module is used for reading and writing new Excel file:
import openpyxl
from openpyxl.utils.cell import get_column_letter
wb = openpyxl.load_workbook('/home/vanx/Documents/example1.xlsx')
wb.sheetnames
sheet = wb["Test"]
number_rows = sheet.max_row
number_columns = sheet.max_column
replacement = {'1':True, '0':False}
for i in range(number_columns):
for k in range(number_rows):
cell = str(sheet[get_column_letter(i+1)+str(k+1)].value)
for key in replacement.keys():
if str(cell) == key:
newCell = replacement.get(key)
sheet[get_column_letter(i+1)+str(k+1)] = str(newCell)
wb.save('example1.xlsx')
The code above reads file: example1.xlsx
from folder ~/Documents
and then produces a new Excel file in the current working directory. In the file there are several sheets - we are interested in the one named - Test
.
Getting the number of rows and columns is done by:
sheet.max_row
sheet.max_column
Finally we iterate and search for the values. In case of a match then we will replace the cell with the new values.
Python and Excel - partial match
The previous examples work with exact text matches for the cell value. If you need to perform partial match than you can try with Python code like:
if str(cell[0]) == key:
This will search only the first character of the cell if it's exactly the searched value.
Or if the cell contains the key:
if key in str(cell[0]):
Regex can be used as well. The problem for such partial matches is performance - it might take resources and time for large Excel files.