Python operation on Excel — How to use openpyxl
1. Introduction
In the previous article, we discussed using the combination of xlrd, xlwt, and xlutils to manipulate Excel files.
This article continues with another approach: openpyxl.
It must be said that openpyxl is more powerful!
It supports the .xlsx format spreadsheet files and also integrates with packages like NumPy and Pandas, enabling features such as chart creation.
2. Preparation
First, we need to install the dependency package.
bash
# Install the dependency package pip3 install openpyxl
3. Reading Data
Use load_workbook(filepath)
from openpyxl to load a local Excel file. It returns a Workbook object.
python
import openpyxl # Load a local Excel file wb = openpyxl.load_workbook(file_path)
Using the Workbook object, you can get all Sheet names and the list of Sheets.
python
def get_all_sheet_names(wb): """ Get the names of all sheets :param wb: Workbook object :return: List of sheet names """ # List of sheet names sheet_names = wb.sheetnames return sheet_names def get_all_sheets(wb): """ Get all sheet objects :param wb: Workbook object :return: List of sheet objects """ # List of sheet names sheet_names = get_all_sheet_names(wb) # All sheets sheets = [] for sheet_name in sheet_names: sheet = wb[sheet_name] sheets.append(sheet) return sheets
The Workbook object provides an active
attribute to quickly get the currently active Sheet.
python
def get_current_sheet(wb): """ Get the currently active sheet (defaults to the last sheet) :param wb: Workbook object :return: The active Sheet object """ # The currently active sheet current_sheet = wb.active return current_sheet
Alternatively, you can get a specific Sheet object by its name.
python
def get_sheet_by_name(wb, sheet_name): """ Find a specific sheet by its name :param wb: Workbook object :param sheet_name: Name of the sheet to find :return: Sheet object if found, else None """ sheet_names = get_all_sheet_names(wb) if sheet_name in sheet_names: result = wb[sheet_name] else: result = None return result
Use sheet.max_row
and sheet.max_column
to get the number of rows and columns in the current Sheet.
python
def get_row_and_column_num(sheet): """ Get the number of rows and columns in a sheet :param sheet: Sheet object :return: Tuple (row_count, column_count) """ # Number of rows row_count = sheet.max_row # Number of columns column_count = sheet.max_column return row_count, column_count # Example usage: Get row and column count row_count, column_count = get_row_and_column_num(sheet) print('Number of rows and columns:', row_count, column_count)
openpyxl provides two ways to locate a cell:
- Numeric Indexing (1-based)
- Uses row number and column number (e.g.,
row_index=1
,column_index=1
).
- Uses row number and column number (e.g.,
- String Indexing
- Uses column letter + row number (e.g.,
'A1'
for the cell at row 1, column 1).
- Uses column letter + row number (e.g.,
Additionally, openpyxl.utils
provides helper functions to convert between these column index formats.
python
from openpyxl.utils import get_column_letter, column_index_from_string def column_num_to_str(num): """ Convert Excel column number to letter(s) :param num: Column number (integer) :return: Column letter(s) (string) """ return get_column_letter(num) def column_str_to_num(str): """ Convert Excel column letter(s) to number :param str: Column letter(s) (string) :return: Column number (integer) """ return column_index_from_string(str)
Cells can be accessed using either of the two indexing methods mentioned above.
python
def get_cell(sheet, row_index, column_index): """ Get a cell object :param sheet: Sheet object :param row_index: Row number (integer, 1-based) :param column_index: Column number (integer, 1-based) :return: Cell object """ # Note: openpyxl uses 1-based indexing, unlike xlrd which is 0-based. # Get a specific cell (either method works) # Example: Get cell A1 (first row, first column) # cell_one = sheet['A1'] cell_one = sheet.cell(row=row_index, column=column_index) return cell_one
In daily Excel processing, you might need to check a cell’s data type. openpyxl doesn’t provide a direct method for this.
We can get the value using the cell’s value
attribute and then use isinstance
to check the type.
python
import datetime def get_cell_value_and_type(cell): """ Get the value and data type of a cell :param cell: Cell object :return: Tuple (cell_value, cell_type) """ # Cell value cell_value = cell.value # Cell data type cell_type = get_cell_value_type(cell_value) return cell_value, cell_type def get_cell_value_type(cell_value): """ Determine the data type of a cell value :param cell_value: The value from a cell :return: Integer representing data type """ # Type mapping: # 0: empty # 1: number # 2: string # 3: date # 4: other if cell_value is None: cell_type = 0 elif isinstance(cell_value, (int, float)): cell_type = 1 elif isinstance(cell_value, str): cell_type = 2 elif isinstance(cell_value, datetime.datetime): cell_type = 3 else: cell_type = 4 return cell_type
To get data from a specific row or column:
python
def get_row_cells_by_index(sheet, row_index): """ Get all cells in a specific row by row number :param sheet: Sheet object :param row_index: Row number (integer) :return: Tuple of Cell objects in the row """ # Note: Indexing starts at 1 row_cells = sheet[row_index] return row_cells def get_column_cells_by_index(sheet, column_index): """ Get all cells in a specific column by column number :param sheet: Sheet object :param column_index: Column number (integer) :return: Tuple of Cell objects in the column """ # Convert number to letter column_index_str = column_num_to_str(column_index) # Get cells in the column column_cells = sheet[column_index_str] return column_cells
Note: Getting row data requires a numeric index; getting column data requires a string index (letter).
Similar to Python list slicing, openpyxl supports using the :
symbol to get a range of rows or columns.
python
def get_rows_by_range(sheet, row_index_start, row_index_end): """ Select a range of rows by start and end row numbers Example: Select rows 2 to 4 (inclusive) :param sheet: Sheet object :param row_index_start: Start row number :param row_index_end: End row number :return: Tuple containing tuples of Cell objects for each row in the range """ rows_range = sheet[row_index_start:row_index_end] return rows_range def get_columns_by_range(sheet, column_index_start, column_index_end): """ Select a range of columns by start and end column numbers Example: Select columns 2 to 4 (inclusive) :param sheet: Sheet object :param column_index_start: Start column number :param column_index_end: End column number :return: Tuple containing tuples of Cell objects for each column in the range """ columns_range = sheet[column_num_to_str(column_index_start):column_num_to_str(column_index_end)] return columns_range
4. Writing Data
To write data to an Excel file:
First, create an Excel Workbook object using openpyxl.Workbook()
.
Then, create a new Sheet in the workbook using create_sheet()
.
python
# Create an Excel Workbook # Note: A new Excel file automatically gets a default sheet named 'Sheet' wb = openpyxl.Workbook() # Create a new sheet, inserted at the end by default # new_sheet = wb.create_sheet('New Sheet') # You can specify the insertion position using the index parameter # Example: Insert at the beginning (index 0) new_sheet = wb.create_sheet('New Sheet', 0)
By default, the new Sheet is appended at the end. The second parameter (index
) specifies the insertion position.
Sheet tab background color can also be modified using the sheet_properties.tabColor
property, assigning an RGB color value.
For example, to set a Sheet’s tab color to red, find the target Sheet and set the color value to ‘FF0000’.
python
def set_sheet_tab_color(sheet, rgb_value): """ Set the Sheet tab color :param sheet: Sheet object :param rgb_value: RGB color string (e.g., 'FF0000' for red) :return: None """ # Set the tab color (RRGGBB) sheet.sheet_properties.tabColor = rgb_value # Set the sheet tab color to red set_sheet_tab_color(new_sheet, 'FF0000')
openpyxl supports writing data to cells using either numeric indexing or string indexing.
python
def write_value_to_cell_with_num(sheet, row_index, column_index, value): """ Write data to a cell using row and column numbers :param sheet: Sheet object :param row_index: Row number (integer) :param column_index: Column number (integer) :param value: Value to write :return: None """ # Either method works: sheet.cell(row=row_index, column=column_index, value=value) # sheet.cell(row=row_index, column=column_index).value = value def write_value_to_cell_with_str(sheet, cell_address, value): """ Write data to a cell using cell address string (e.g., 'A1') :param sheet: Sheet object :param cell_address: Cell address string (e.g., 'A1') :param value: Value to write :return: None """ sheet[cell_address] = value
Inserting an image into a cell is straightforward using the add_image()
method.
It takes two parameters: the image object and the target cell’s string address.
For convenience, we can create two wrapper functions for inserting images.
python
from openpyxl.drawing.image import Image def insert_img_to_cell_with_num(sheet, image_path, row_index, column_index): """ Insert an image into a cell specified by row and column numbers :param sheet: Sheet object :param image_path: Path to the image file :param row_index: Row number :param column_index: Column number :return: None """ # Convert row/column numbers to cell address string cell_address = column_num_to_str(column_index) + str(row_index) insert_img_to_cell_with_str(sheet, image_path, cell_address) def insert_img_to_cell_with_str(sheet, image_path, cell_address): """ Insert an image into a cell specified by address string :param sheet: Sheet object :param image_path: Path to the image file :param cell_address: Cell address string (e.g., 'A1') :return: None """ img = Image(image_path) sheet.add_image(img, cell_address)
Finally, call the Workbook’s save()
method to physically write the data to the Excel file.
python
# Important: Must call save to persist changes to the file wb.template = False wb.save('new.xlsx')
5. Modifying Data
Modifying data includes changing cell values and modifying cell styles.
To modify cell data, read the Workbook, get the target Sheet, use the methods above to change cell values, and then call save()
to overwrite the file.
python
def modify_excel(file_path): """ Modify data in an existing local Excel file :param file_path: Path to the Excel file :return: None """ # Load the local Excel file wb = openpyxl.load_workbook(file_path) # Get a specific sheet sheet = wb['First Sheet'] # Use the actual sheet name print(sheet) # Directly modify a cell's value write_value_to_cell_with_num(sheet, 1, 1, 'Name1') # Save and overwrite the original file wb.save(file_path)
Cell styles include: font style, cell background, borders, alignment, etc.
Let’s take common styles like font and alignment as examples.
First, create a Font
object from openpyxl.styles
, specifying font name, size, bold, italic, color, underline, etc.
python
from openpyxl.styles import Font # Font formatting # Specify font type, size, bold, color, etc. font0 = Font(name='Calibri', size=20, bold=False, italic=False, vertAlign=None, underline='none', strike=False, color='FF00FF00') # Green color
Next, create an Alignment
object to specify the cell’s alignment.
python
from openpyxl.styles import Alignment # Cell alignment alignment0 = Alignment(horizontal='center', # Horizontal alignment: 'center', 'left', 'right' vertical='bottom', # Vertical alignment: 'bottom', 'top', 'center' text_rotation=0, wrap_text=False, shrink_to_fit=False, indent=0)
Finally, assign the Font and Alignment objects to the cell’s font
and alignment
attributes.
python
# Apply the styles (font, alignment) to cell A1 sheet['A1'].font = font0 sheet['A1'].alignment = alignment0
6. Advanced Usage
Let’s discuss some common advanced features.
1. Get Visible and Hidden Sheets
Check the Sheet object’s sheet_state
property to determine if it’s visible or hidden.
- Value
'visible'
means the Sheet is shown. - Value
'hidden'
means the Sheet is hidden.
python
def get_all_visible_sheets(wb): """ Get all visible sheets in the workbook :param wb: Workbook object :return: List of visible Sheet objects """ all_sheets = get_all_sheets(wb) # Assuming this function gets all sheets return [sheet for sheet in all_sheets if sheet.sheet_state == 'visible'] def get_all_hidden_sheets(wb): """ Get all hidden sheets in the workbook :param wb: Workbook object :return: List of hidden Sheet objects """ all_sheets = get_all_sheets(wb) return [sheet for sheet in all_sheets if sheet.sheet_state == 'hidden']
2. Get Lists of Hidden/Visible Row and Column Indices
For brevity, let’s take getting all visible/hidden row indices as an example.
Iterate through the Sheet’s row_dimensions
attribute. Check the hidden
property of each row dimension to determine if the row is hidden or visible.
python
def get_all_rows_index(sheet, get_hidden): """ Get all hidden or visible row indices :param sheet: Sheet object :param get_hidden: Boolean. True -> get hidden rows, False -> get visible rows :return: List of row indices (integers) """ # Iterate through rows hidden_indices = [] # Find indices of all hidden rows for row_index, row_dimension in sheet.row_dimensions.items(): if row_dimension.hidden: hidden_indices.append(row_index) # Get total number of rows (assuming get_row_and_column_num returns (rows, cols)) total_rows = get_row_and_column_num(sheet)[0] # Indices of all visible rows (1 to total_rows, excluding hidden ones) visible_indices = [index for index in range(1, total_rows + 1) if index not in hidden_indices] # Return hidden or visible indices based on parameter return hidden_indices if get_hidden else visible_indices
3. Get Cell Font Color and Cell Background Color
The cell’s font.color.rgb
and fill.fgColor.rgb
properties represent the font color and cell background color, respectively.
python
def get_cell_font_color(sheet, row_index, column_index): """ Get the font color of a cell :param sheet: Sheet object :param row_index: Row number :param column_index: Column number :return: RGB color string or None if not set """ cell = sheet.cell(row=row_index, column=column_index) if cell.font.color: # Check if font color is set return cell.font.color.rgb else: # Color might not exist if cell is empty or uses default return None def get_cell_bg_color(sheet, row_index, column_index): """ Get the background/fill color of a cell :param sheet: Sheet object :param row_index: Row number :param column_index: Column number :return: RGB color string or None if not set """ cell = sheet.cell(row=row_index, column=column_index) # Check if fill style and foreground color are set if cell.fill and cell.fill.fgColor: return cell.fill.fgColor.rgb else: return None
7. Conclusion
As we can see, compared to xlrd/xlwt, openpyxl provides a wealth of practical APIs, is more powerful, and fully supports the .xlsx format!
Related articles