Python Tutorial: How to Read Multiple Excel Formats and Perform Cross-Sheet Matching & Merging (Supports XLS/XLSX)

We have two Excel files: one is the main data sheet, and the other is a complete information sheet for students/employees/customers.
The goal is to match records by name and supplement the complete information into the main sheet.
It sounds simple, but several common pitfalls can occur in practice, such as:

The main sheet is in XLS format, while the information sheet is in XLSX format.
The headers in either sheet are not in the first row.
The first column of the information sheet is an index/sequence number, with the real data starting from the second column.
After merging, new fields need to be inserted at a specific position, not simply appended to the end.

This article shares a solution from a real project:
Using Python + Pandas + xlrd + openpyxl to implement:

Reading multiple Excel formats
Flexible header row handling
Matching and merging by name
Outputting results with a custom column order.

1. Requirement Breakdown

The final objectives are:

Read both XLS and XLSX files.
Support skipping header rows.
Support skipping the first column (commonly used for sequence numbers).
Perform matching based on the “Name” field.
Insert the matched results after a specified column (e.g., Column I) in the main sheet.
Output a new XLSX file.

To achieve this, we need to write a “Universal Excel Reading Function”.

2. Universal Excel Reading Function (Supports XLS / XLSX)

Core challenges:

Pandas cannot directly read old .xls format files with merged cells or special formats.
xlrd can only read .xls files (newer versions do not support .xlsx).

Therefore, the strategy is:

For .xls files → use xlrd.
For .xlsx files → use Pandas (with openpyxl engine).
Manually implement features like skipping rows and skipping the first column.

3. Overall Process for Cross-Sheet Matching

The overall data processing logic is as follows:

Read the main sheet (supports both XLS/XLSX).
Read the information sheet (e.g., where the real header starts from the second row).
Set the “Name” column as the index for the information sheet.
Iterate through each row in the main sheet, retrieving corresponding information based on the name.
Combine the matched results into a new DataFrame.
Insert this new DataFrame after the specified column in the main sheet.
Output the final merged sheet.

This process is robust and adaptable to various Excel formats.

4. Complete Example Code (Desensitized)

Here is the universal example code, which you can reuse directly:

python

import pandas as pd
import xlrd
import openpyxl

def read_excel_any(path, sheet_name=None, skip_header_rows=0, skip_first_column=False):
    """
    Universal function to read both XLS and XLSX files.
    - skip_header_rows: Skip the first N rows.
    - skip_first_column: Whether to skip the first column (e.g., index/sequence number).
    """
    if path.lower().endswith(".xls"):
        book = xlrd.open_workbook(path)
        sheet = book.sheet_by_name(sheet_name) if sheet_name else book.sheet_by_index(0)

        data = []
        for r in range(sheet.nrows):
            if r < skip_header_rows:
                continue
            row = sheet.row_values(r)
            if skip_first_column:
                row = row[1:]
            data.append(row)

        df = pd.DataFrame(data[1:], columns=data[0])
        return df

    else:  # XLSX
        df = pd.read_excel(
            path,
            sheet_name=sheet_name,
            engine="openpyxl",
            skiprows=skip_header_rows
        )
        if skip_first_column:
            df = df.iloc[:, 1:]
        return df


# ===== Configuration Area (Example) ===== #
main_file = "main_sheet.xls"
main_sheet = "DataSheet"

info_file = "info_sheet.xlsx"
info_sheet = "CompleteInfo"

info_skip_header = 1  # Skip 1 header row in info sheet
info_skip_first_col = True  # Skip the first column in info sheet
# ======================================== #

# Read the main sheet
df_main = read_excel_any(main_file, sheet_name=main_sheet)

# Read the information sheet
df_info = read_excel_any(
    info_file,
    sheet_name=info_sheet,
    skip_header_rows=info_skip_header,
    skip_first_column=info_skip_first_col
)

# Ensure both sheets have a "Name" column
if "Name" not in df_main.columns or "Name" not in df_info.columns:
    raise ValueError('Both sheets must contain a "Name" column!')

# Set the index of the information sheet to "Name"
df_info_index = df_info.set_index("Name")

# Perform matching by name
matched_data = []
for name in df_main["Name"]:
    if name in df_info_index.index:
        matched_data.append(df_info_index.loc[name].to_dict())
    else:
        # If the name doesn't exist, fill with None/blank
        matched_data.append({col: None for col in df_info.columns if col != "Name"})

df_match = pd.DataFrame(matched_data)

# Insert matched content at the specified position (Example: after column 8, i.e., index 8)
insert_pos = 8
cols_main = df_main.columns.tolist()
new_cols = cols_main[:insert_pos + 1] + df_match.columns.tolist() + cols_main[insert_pos + 1:]

df_out = pd.concat([df_main, df_match], axis=1)[new_cols]

df_out.to_excel("output_result.xlsx", index=False)

5. Explanation of Key Technical Points

① Significance of Encapsulating the Universal Reading Function
In many projects, the main sheet and information sheet have different formats, structures, and column layouts.
After encapsulating the universal function:

The script becomes highly reusable.
Switching Excel files only requires modifying the filenames.
The structure is clearer and easier to maintain.

② Using set_index for Matching
df_info.set_index("Name") creates an index on the “Name” column.
This makes lookups extremely efficient, similar to dictionary queries.

③ Maintaining the Original Field Order
Many business sheets are “fixed templates” where column order cannot be arbitrarily changed.
By customizing new_cols, we ensure the final column order is fully controllable.

6. Example of the Final Output

The final output Excel file will:

Retain the original column order of the main sheet.
Insert the fields from the information sheet after the specified column (e.g., Column I).
Perform row-by-row matching based on the “Name” field.
Fill with blanks if a name is not found in the information sheet.

The entire process is automated, eliminating the need for manual filtering, copying, and pasting.

Easy Python

Python Tutorial: How to Read Multiple Excel Formats and Perform Cross-Sheet Matching & Merging (Supports XLS/XLSX)

New Article

Related articles