Do you need to search and replace a list of values in a big Excel file with many sheets?
If so, I'll show you the steps to search in Excel file - list of words and replace them. In this article you can find the exact cell(with packages like xlrd, xlwt and openpyxl) and partial cell match. At the end a new Excel file is generated with the replaced values.
Let's check the example data:
| A | B | C |
|---|---|---|
| 4321 | 3210 | 2100 |
| 1 | 0 | 0 |
| 2 | 1 | 0 |
| 3 | 2 | 1 |
| 4 | 3 | 2 |
we are going to search for 0 and 1 - and replace them with False and True.
| A | B | C |
|---|---|---|
| 4321 | 3210 | 2100 |
| True | False | False |
| 2 | True | False |
| 3 | 2 | True |
| 4 | 3 | 2 |
You can check also this video:
Easily extract information from Excel with Python and Pandas
Python and Excel - search and replace with xlrd and xlwt
In this example we are going to use two Python packages: xlrd and xlwt.
Step 1: Read Excel file
The first package reads the Excel file and a given sheet by name or index:
import xlwt
import xlrd
# read Excel file and sheet by name
workbook = xlrd.open_workbook('/home/vanx/Documents/example1.xlsx')
sheet = workbook.sheet_by_name('Test')
sheet2 = workbook.sheet_by_index(2)
Step 2: Create new Excel file
The second package - xlwt - will be used to write the data into new Excel file:
new_workbook = xlwt.Workbook()
new_sheet = new_workbook.add_sheet('Test')
Step 3: Search and replace a cell in xlsx file
The next step is to define replacement pairs like: {1:True, 0:False}:
replacement = {1:True, 0:False}
Step 4: Search and replace a cell in xlsx file
Finally the code iterates over the rows and columns is controlled by:
ncols- number of columns in the selected sheetnrows- number of rows in the selected sheet
This is the final part of the code:
# iterate over the rows
for i in range(sheet.nrows):
print(i)
data = [sheet.cell_value(i, col) for col in range(sheet.ncols)]
for index, value in enumerate(data):
if value in replacement.keys():
new_sheet.write(i, index, str(replacement.get(value)))
else:
new_sheet.write(i, index, value)
new_workbook.save('example.xls')
where new_workbook.save('example.xlsx') saves the data into file - example.xlsx
Python and Excel - search and replace with openpyxl
Another Python package exists and can be used - openpyxl. It can do the same as the previous example. Difference is that only one module is used for reading and writing new Excel file:
import openpyxl
from openpyxl.utils.cell import get_column_letter
wb = openpyxl.load_workbook('/home/vanx/Documents/example1.xlsx')
wb.sheetnames
sheet = wb["Test"]
number_rows = sheet.max_row
number_columns = sheet.max_column
replacement = {'1':True, '0':False}
for i in range(number_columns):
for k in range(number_rows):
cell = str(sheet[get_column_letter(i+1)+str(k+1)].value)
for key in replacement.keys():
if str(cell) == key:
newCell = replacement.get(key)
sheet[get_column_letter(i+1)+str(k+1)] = str(newCell)
wb.save('example1.xlsx')
The code above reads file: example1.xlsx from folder ~/Documents and then produces a new Excel file in the current working directory. In the file there are several sheets - we are interested in the one named - Test.
Getting the number of rows and columns is done by:
sheet.max_rowsheet.max_column
Finally we iterate and search for the values. In case of a match then we will replace the cell with the new values.
Python and Excel - partial match
The previous examples work with exact text matches for the cell value. If you need to perform partial match than you can try with Python code like:
if str(cell[0]) == key:
This will search only the first character of the cell if it's exactly the searched value.
Or if the cell contains the key:
if key in str(cell[0]):
Regex can be used as well. The problem for such partial matches is performance - it might take resources and time for large Excel files.