How to read and write Excel using Python- xlrd and xlwt
1. Introduction
In our daily work, we frequently use office software such as Word, Excel, PowerPoint, and PDF. However, we often encounter repetitive and tedious tasks, making manual operations highly inefficient. Therefore, automating these tasks with Python has become essential.
In this series of articles, I will provide a comprehensive overview of Python-based office automation, packed with practical insights!
2. Preparation
Common libraries for handling Excel files in Python include:
- xlrd / xlwt
- openpyxl
- Pandas
- xlsxwriter
- xlwings
- pywin32
Among these, xlrd and xlwt are two of the most widely used libraries.
- xlrd is used for reading Excel files.
- xlwt is used for writing data to Excel files.
To install these libraries, use the following commands:
pip3 install xlrd pip3 install xlwt
3. Reading Excel with xlrd
Use open_workbook(filepath)
from xlrd to open a local Excel file:
import xlrd # Open the file and return a workbook object wb = xlrd.open_workbook(file_path)
The workbook object’s nsheets
attribute retrieves the number of sheets, while sheet_names()
returns a list of all sheet names:
# Count the number of sheets sheets_num, sheets_names = wb.nsheets, wb.sheet_names() print('Total number of sheets:', sheets_num) print('Sheet names:', sheets_names)
There are two ways to select a specific sheet:
- By sheet name
- By index (starting from 0)
# Get a specific sheet by name or index sheet = wb.sheet_by_index(0) # sheet = wb.sheet_by_name('First Sheet') print(sheet)
Each sheet object has the following attributes:
name
: Sheet namenrows
: Number of rowsncols
: Number of columns
The methods row_values(index)
and col_values(index)
retrieve data from a specific row or column as a list:
# Get the sheet name, row count, and column count sheet_name, sheet_row_count, sheet_column_count = sheet.name, sheet.nrows, sheet.ncols print('Current sheet name:', sheet_name, ", Rows:", sheet_row_count, ", Columns:", sheet_column_count) # Get a specific row (index starts from 0) # Example: Get the second row row_datas = sheet.row_values(1) print('Second row data:', row_datas) # Get a specific column (index starts from 0) # Example: Get the second column column_datas = sheet.col_values(1) print('Second column data:', column_datas)
To retrieve a specific cell, use cell(row_index, column_index)
. Note that row and column indices start from 0.
In xlrd, cell data types are represented by ctype
with the following mappings:
- 0 — empty
- 1 — string
- 2 — number
- 3 — date
- 4 — boolean
- 5 — error
# Get data from a specific cell # Example: Get the cell at row 2, column 1 one_cell = sheet.cell(1, 0) # Cell value cell_value = one_cell.value print("Cell value:", cell_value) # Cell data type cell_type = one_cell.ctype print("Cell data type:", cell_type)
To retrieve all cell data in the sheet, iterate through all rows and columns:
# Get all cell values print('All data in the sheet:') for r in range(sheet.nrows): for i in range(sheet.ncols): print(sheet.cell(r, i).value)
4. Writing Excel with xlwt
To write data to Excel, xlwt is a convenient choice.
First, create a workbook object using Workbook()
. Then, add a sheet using add_sheet(sheetname)
:
import xlwt sheetname = 'First Sheet' # Create a workbook object wb = xlwt.Workbook() # Add a sheet by name sheet = wb.add_sheet(sheetname)
Next, use the write()
method of the sheet object to insert data into cells by specifying row and column indices:
# Write data to the sheet # Parameters: row index (from 0), column index (from 0), cell value # Write headers for index, title in enumerate(self.titles): sheet.write(0, index, title) # Write values for index_row, row_values in enumerate(self.values): for index_column, column_value in enumerate(row_values): sheet.write(index_row + 1, index_column, column_value)
Finally, call save(filepath)
to generate the Excel file locally:
# Save the file wb.save(filepath)
5. Advanced Techniques
Here are some commonly used advanced techniques:
- Get All Visible Sheets
When reading sheet data, it’s often necessary to filter out hidden sheets. A sheet is visible if itsvisibility
attribute is 0:
def get_all_visiable_sheets(self, wb): """ Get all visible sheets :param wb: :return: """ return list(filter(lambda item: item.visibility == 0, wb.sheets())) # Get all visible sheets sheet_visiable = self.get_all_visiable_sheets(wb) print('All visible sheets:', sheet_visiable)
- Get Visible Rows or Columns
Some rows or columns in a sheet may be hidden:
def get_all_visiable_rows(self, sheet): """ Get all visible rows in a sheet :param sheet: :return: """ result = [index for index in range(sheet.nrows) if sheet.rowinfo_map[index].hidden == 0] return result def get_all_visiable_columns(self, sheet): """ Get all visible columns in a sheet :param sheet: :return: """ result = [index for index in range(sheet.ncols) if sheet.colinfo_map[index].hidden == 0] return result
- Get Cell Styles
To retrieve font color and background color of a cell:
def get_cell_bg_color(self, wb, sheet, row_index, col_index): """ Get the background color of a cell :param wb: :param sheet: :param row_index: :param col_index: :return: """ xfx = sheet.cell_xf_index(row_index, col_index) xf = wb.xf_list[xfx] # Font color font_color = wb.font_list[xf.font_index].colour_index # Background color bg_color = xf.background.pattern_colour_index return font_color, bg_color
Note: When reading cell styles with xlrd, you must open the workbook with formatting_info=True
:
# Must set formatting_info=True to read cell properties wb = xlrd.open_workbook(file_path, formatting_info=True) sheet = wb.sheet_by_index(0)
6. Conclusion
Using xlrd and xlwt together can handle most tasks. For advanced features like copying, splitting, or filtering, consider using the xlutils library.
However, note that this combination has limited compatibility with .xlsx files. If you need to work with .xlsx files, convert them to .xls first.
Related articles