Python operation on Excel — How to use openpyxl

1. Introduction

In the previous article, we discussed using the combination of xlrd, xlwt, and xlutils to manipulate Excel files.

This article continues with another approach: openpyxl.

It must be said that openpyxl is more powerful!

It supports the .xlsx format spreadsheet files and also integrates with packages like NumPy and Pandas, enabling features such as chart creation.

2. Preparation

First, we need to install the dependency package.

bash

# Install the dependency package
pip3 install openpyxl

3. Reading Data

Use load_workbook(filepath) from openpyxl to load a local Excel file. It returns a Workbook object.

python

import openpyxl

# Load a local Excel file
wb = openpyxl.load_workbook(file_path)

Using the Workbook object, you can get all Sheet names and the list of Sheets.

python

def get_all_sheet_names(wb):
    """
    Get the names of all sheets
    :param wb: Workbook object
    :return: List of sheet names
    """
    # List of sheet names
    sheet_names = wb.sheetnames
    return sheet_names

def get_all_sheets(wb):
    """
    Get all sheet objects
    :param wb: Workbook object
    :return: List of sheet objects
    """
    # List of sheet names
    sheet_names = get_all_sheet_names(wb)

    # All sheets
    sheets = []
    for sheet_name in sheet_names:
        sheet = wb[sheet_name]
        sheets.append(sheet)

    return sheets

The Workbook object provides an active attribute to quickly get the currently active Sheet.

python

def get_current_sheet(wb):
    """
    Get the currently active sheet (defaults to the last sheet)
    :param wb: Workbook object
    :return: The active Sheet object
    """
    # The currently active sheet
    current_sheet = wb.active
    return current_sheet

Alternatively, you can get a specific Sheet object by its name.

python

def get_sheet_by_name(wb, sheet_name):
    """
    Find a specific sheet by its name
    :param wb: Workbook object
    :param sheet_name: Name of the sheet to find
    :return: Sheet object if found, else None
    """
    sheet_names = get_all_sheet_names(wb)
    if sheet_name in sheet_names:
        result = wb[sheet_name]
    else:
        result = None
    return result

Use sheet.max_row and sheet.max_column to get the number of rows and columns in the current Sheet.

python

def get_row_and_column_num(sheet):
    """
    Get the number of rows and columns in a sheet
    :param sheet: Sheet object
    :return: Tuple (row_count, column_count)
    """
    # Number of rows
    row_count = sheet.max_row
    # Number of columns
    column_count = sheet.max_column

    return row_count, column_count

# Example usage: Get row and column count
row_count, column_count = get_row_and_column_num(sheet)
print('Number of rows and columns:', row_count, column_count)

openpyxl provides two ways to locate a cell:

  1. Numeric Indexing (1-based)
    • Uses row number and column number (e.g., row_index=1column_index=1).
  2. String Indexing
    • Uses column letter + row number (e.g., 'A1' for the cell at row 1, column 1).

Additionally, openpyxl.utils provides helper functions to convert between these column index formats.

python

from openpyxl.utils import get_column_letter, column_index_from_string

def column_num_to_str(num):
    """
    Convert Excel column number to letter(s)
    :param num: Column number (integer)
    :return: Column letter(s) (string)
    """
    return get_column_letter(num)

def column_str_to_num(str):
    """
    Convert Excel column letter(s) to number
    :param str: Column letter(s) (string)
    :return: Column number (integer)
    """
    return column_index_from_string(str)

Cells can be accessed using either of the two indexing methods mentioned above.

python

def get_cell(sheet, row_index, column_index):
    """
    Get a cell object
    :param sheet: Sheet object
    :param row_index: Row number (integer, 1-based)
    :param column_index: Column number (integer, 1-based)
    :return: Cell object
    """
    # Note: openpyxl uses 1-based indexing, unlike xlrd which is 0-based.
    # Get a specific cell (either method works)
    # Example: Get cell A1 (first row, first column)
    # cell_one = sheet['A1']
    cell_one = sheet.cell(row=row_index, column=column_index)
    return cell_one

In daily Excel processing, you might need to check a cell’s data type. openpyxl doesn’t provide a direct method for this.
We can get the value using the cell’s value attribute and then use isinstance to check the type.

python

import datetime

def get_cell_value_and_type(cell):
    """
    Get the value and data type of a cell
    :param cell: Cell object
    :return: Tuple (cell_value, cell_type)
    """
    # Cell value
    cell_value = cell.value
    # Cell data type
    cell_type = get_cell_value_type(cell_value)
    return cell_value, cell_type

def get_cell_value_type(cell_value):
    """
    Determine the data type of a cell value
    :param cell_value: The value from a cell
    :return: Integer representing data type
    """
    # Type mapping:
    # 0: empty
    # 1: number
    # 2: string
    # 3: date
    # 4: other
    if cell_value is None:
        cell_type = 0
    elif isinstance(cell_value, (int, float)):
        cell_type = 1
    elif isinstance(cell_value, str):
        cell_type = 2
    elif isinstance(cell_value, datetime.datetime):
        cell_type = 3
    else:
        cell_type = 4
    return cell_type

To get data from a specific row or column:

python

def get_row_cells_by_index(sheet, row_index):
    """
    Get all cells in a specific row by row number
    :param sheet: Sheet object
    :param row_index: Row number (integer)
    :return: Tuple of Cell objects in the row
    """
    # Note: Indexing starts at 1
    row_cells = sheet[row_index]
    return row_cells

def get_column_cells_by_index(sheet, column_index):
    """
    Get all cells in a specific column by column number
    :param sheet: Sheet object
    :param column_index: Column number (integer)
    :return: Tuple of Cell objects in the column
    """
    # Convert number to letter
    column_index_str = column_num_to_str(column_index)
    # Get cells in the column
    column_cells = sheet[column_index_str]
    return column_cells

Note: Getting row data requires a numeric index; getting column data requires a string index (letter).

Similar to Python list slicing, openpyxl supports using the : symbol to get a range of rows or columns.

python

def get_rows_by_range(sheet, row_index_start, row_index_end):
    """
    Select a range of rows by start and end row numbers
    Example: Select rows 2 to 4 (inclusive)
    :param sheet: Sheet object
    :param row_index_start: Start row number
    :param row_index_end: End row number
    :return: Tuple containing tuples of Cell objects for each row in the range
    """
    rows_range = sheet[row_index_start:row_index_end]
    return rows_range

def get_columns_by_range(sheet, column_index_start, column_index_end):
    """
    Select a range of columns by start and end column numbers
    Example: Select columns 2 to 4 (inclusive)
    :param sheet: Sheet object
    :param column_index_start: Start column number
    :param column_index_end: End column number
    :return: Tuple containing tuples of Cell objects for each column in the range
    """
    columns_range = sheet[column_num_to_str(column_index_start):column_num_to_str(column_index_end)]
    return columns_range

4. Writing Data

To write data to an Excel file:

First, create an Excel Workbook object using openpyxl.Workbook().

Then, create a new Sheet in the workbook using create_sheet().

python

# Create an Excel Workbook
# Note: A new Excel file automatically gets a default sheet named 'Sheet'
wb = openpyxl.Workbook()

# Create a new sheet, inserted at the end by default
# new_sheet = wb.create_sheet('New Sheet')
# You can specify the insertion position using the index parameter
# Example: Insert at the beginning (index 0)
new_sheet = wb.create_sheet('New Sheet', 0)

By default, the new Sheet is appended at the end. The second parameter (index) specifies the insertion position.

Sheet tab background color can also be modified using the sheet_properties.tabColor property, assigning an RGB color value.

For example, to set a Sheet’s tab color to red, find the target Sheet and set the color value to ‘FF0000’.

python

def set_sheet_tab_color(sheet, rgb_value):
    """
    Set the Sheet tab color
    :param sheet: Sheet object
    :param rgb_value: RGB color string (e.g., 'FF0000' for red)
    :return: None
    """
    # Set the tab color (RRGGBB)
    sheet.sheet_properties.tabColor = rgb_value

# Set the sheet tab color to red
set_sheet_tab_color(new_sheet, 'FF0000')

openpyxl supports writing data to cells using either numeric indexing or string indexing.

python

def write_value_to_cell_with_num(sheet, row_index, column_index, value):
    """
    Write data to a cell using row and column numbers
    :param sheet: Sheet object
    :param row_index: Row number (integer)
    :param column_index: Column number (integer)
    :param value: Value to write
    :return: None
    """
    # Either method works:
    sheet.cell(row=row_index, column=column_index, value=value)
    # sheet.cell(row=row_index, column=column_index).value = value

def write_value_to_cell_with_str(sheet, cell_address, value):
    """
    Write data to a cell using cell address string (e.g., 'A1')
    :param sheet: Sheet object
    :param cell_address: Cell address string (e.g., 'A1')
    :param value: Value to write
    :return: None
    """
    sheet[cell_address] = value

Inserting an image into a cell is straightforward using the add_image() method.

It takes two parameters: the image object and the target cell’s string address.

For convenience, we can create two wrapper functions for inserting images.

python

from openpyxl.drawing.image import Image

def insert_img_to_cell_with_num(sheet, image_path, row_index, column_index):
    """
    Insert an image into a cell specified by row and column numbers
    :param sheet: Sheet object
    :param image_path: Path to the image file
    :param row_index: Row number
    :param column_index: Column number
    :return: None
    """
    # Convert row/column numbers to cell address string
    cell_address = column_num_to_str(column_index) + str(row_index)
    insert_img_to_cell_with_str(sheet, image_path, cell_address)

def insert_img_to_cell_with_str(sheet, image_path, cell_address):
    """
    Insert an image into a cell specified by address string
    :param sheet: Sheet object
    :param image_path: Path to the image file
    :param cell_address: Cell address string (e.g., 'A1')
    :return: None
    """
    img = Image(image_path)
    sheet.add_image(img, cell_address)

Finally, call the Workbook’s save() method to physically write the data to the Excel file.

python

# Important: Must call save to persist changes to the file
wb.template = False
wb.save('new.xlsx')

5. Modifying Data

Modifying data includes changing cell values and modifying cell styles.

To modify cell data, read the Workbook, get the target Sheet, use the methods above to change cell values, and then call save() to overwrite the file.

python

def modify_excel(file_path):
    """
    Modify data in an existing local Excel file
    :param file_path: Path to the Excel file
    :return: None
    """
    # Load the local Excel file
    wb = openpyxl.load_workbook(file_path)

    # Get a specific sheet
    sheet = wb['First Sheet'] # Use the actual sheet name
    print(sheet)

    # Directly modify a cell's value
    write_value_to_cell_with_num(sheet, 1, 1, 'Name1')

    # Save and overwrite the original file
    wb.save(file_path)

Cell styles include: font style, cell background, borders, alignment, etc.

Let’s take common styles like font and alignment as examples.

First, create a Font object from openpyxl.styles, specifying font name, size, bold, italic, color, underline, etc.

python

from openpyxl.styles import Font

# Font formatting
# Specify font type, size, bold, color, etc.
font0 = Font(name='Calibri',
             size=20,
             bold=False,
             italic=False,
             vertAlign=None,
             underline='none',
             strike=False,
             color='FF00FF00') # Green color

Next, create an Alignment object to specify the cell’s alignment.

python

from openpyxl.styles import Alignment

# Cell alignment
alignment0 = Alignment(horizontal='center', # Horizontal alignment: 'center', 'left', 'right'
                       vertical='bottom',   # Vertical alignment: 'bottom', 'top', 'center'
                       text_rotation=0,
                       wrap_text=False,
                       shrink_to_fit=False,
                       indent=0)

Finally, assign the Font and Alignment objects to the cell’s font and alignment attributes.

python

# Apply the styles (font, alignment) to cell A1
sheet['A1'].font = font0
sheet['A1'].alignment = alignment0

6. Advanced Usage

Let’s discuss some common advanced features.

1. Get Visible and Hidden Sheets

Check the Sheet object’s sheet_state property to determine if it’s visible or hidden.

  • Value 'visible' means the Sheet is shown.
  • Value 'hidden' means the Sheet is hidden.

python

def get_all_visible_sheets(wb):
    """
    Get all visible sheets in the workbook
    :param wb: Workbook object
    :return: List of visible Sheet objects
    """
    all_sheets = get_all_sheets(wb) # Assuming this function gets all sheets
    return [sheet for sheet in all_sheets if sheet.sheet_state == 'visible']

def get_all_hidden_sheets(wb):
    """
    Get all hidden sheets in the workbook
    :param wb: Workbook object
    :return: List of hidden Sheet objects
    """
    all_sheets = get_all_sheets(wb)
    return [sheet for sheet in all_sheets if sheet.sheet_state == 'hidden']

2. Get Lists of Hidden/Visible Row and Column Indices

For brevity, let’s take getting all visible/hidden row indices as an example.

Iterate through the Sheet’s row_dimensions attribute. Check the hidden property of each row dimension to determine if the row is hidden or visible.

python

def get_all_rows_index(sheet, get_hidden):
    """
    Get all hidden or visible row indices
    :param sheet: Sheet object
    :param get_hidden: Boolean. True -> get hidden rows, False -> get visible rows
    :return: List of row indices (integers)
    """
    # Iterate through rows
    hidden_indices = []

    # Find indices of all hidden rows
    for row_index, row_dimension in sheet.row_dimensions.items():
        if row_dimension.hidden:
            hidden_indices.append(row_index)

    # Get total number of rows (assuming get_row_and_column_num returns (rows, cols))
    total_rows = get_row_and_column_num(sheet)[0]
    # Indices of all visible rows (1 to total_rows, excluding hidden ones)
    visible_indices = [index for index in range(1, total_rows + 1) if index not in hidden_indices]

    # Return hidden or visible indices based on parameter
    return hidden_indices if get_hidden else visible_indices

3. Get Cell Font Color and Cell Background Color

The cell’s font.color.rgb and fill.fgColor.rgb properties represent the font color and cell background color, respectively.

python

def get_cell_font_color(sheet, row_index, column_index):
    """
    Get the font color of a cell
    :param sheet: Sheet object
    :param row_index: Row number
    :param column_index: Column number
    :return: RGB color string or None if not set
    """
    cell = sheet.cell(row=row_index, column=column_index)
    if cell.font.color: # Check if font color is set
        return cell.font.color.rgb
    else:
        # Color might not exist if cell is empty or uses default
        return None

def get_cell_bg_color(sheet, row_index, column_index):
    """
    Get the background/fill color of a cell
    :param sheet: Sheet object
    :param row_index: Row number
    :param column_index: Column number
    :return: RGB color string or None if not set
    """
    cell = sheet.cell(row=row_index, column=column_index)
    # Check if fill style and foreground color are set
    if cell.fill and cell.fill.fgColor:
        return cell.fill.fgColor.rgb
    else:
        return None

7. Conclusion

As we can see, compared to xlrd/xlwt, openpyxl provides a wealth of practical APIs, is more powerful, and fully supports the .xlsx format!