Python Programming in Action: Building a Batch Excel Processing Tool (Desktop Utility Script)

With the growing demand for office automation, creating your own “batch processing tool” using Python can easily save you from a lot of repetitive tasks. Examples include:

Batch merging Excel files
Batch replacing content
Batch splitting data
Batch adding watermarks
Batch generating reports
Batch converting formats (CSV ↔ Excel)

In this section, we will build a versatile batch processing script to give you a practical understanding of how real-world projects are implemented.

1. Project Goal

Build a Python desktop script with the following functionalities:

Read all Excel files (.xls/.xlsx) in a specified folder.
Process each file (customizable based on your business needs).
Output the processed results to a new folder.
Have a well-structured design: extensible, maintainable, and reusable.

2. Main Libraries Used

We will use these tools:

python

import os
import pandas as pd
from openpyxl import load_workbook

Pandas is the core library, and openpyxl assists with advanced operations.

3. Project Structure Design

A simple directory structure is recommended:

text

excel_batch_tool/
│── main.py               # Main script
│── processors.py         # Contains various processing methods
│── input/                # Input folder
│── output/               # Output folder for results

Module separation is crucial in real projects, making future feature expansion significantly easier.

4. Core Function: Batch Traversing Excel Files

main.py

python

import os
import pandas as pd
from processors import process_excel

INPUT_DIR = "input"
OUTPUT_DIR = "output"

def ensure_output_folder():
    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)

def main():
    ensure_output_folder()

    for filename in os.listdir(INPUT_DIR):
        if filename.endswith((".xlsx", ".xls")):
            path = os.path.join(INPUT_DIR, filename)
            print(f"Processing: {filename}")

            df = pd.read_excel(path)

            # Call custom processing logic
            df = process_excel(df)

            # Output results
            output_path = os.path.join(OUTPUT_DIR, filename)
            df.to_excel(output_path, index=False)
            print(f"Saved to: {output_path}")

if __name__ == "__main__":
    main()

You can reuse this template repeatedly; just modify processors.py to add new functionalities.

5. Custom Processing Logic: processors.py

Example: Removing empty rows, standardizing column names, adding a new field.

python

def process_excel(df):
    # 1. Remove completely empty rows
    df = df.dropna(how="all")

    # 2. Standardize column name format
    df.columns = [str(c).strip() for c in df.columns]

    # 3. Add a timestamp column
    import datetime
    df["Processing_Time"] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

    return df

In the future, you can replace this with:

Batch value replacement
Batch adding formulas
Batch merging multiple sheets
Batch splitting into multiple files
Exporting after conditional filtering
Statistical analysis and report generation

The possibilities for extension are entirely up to you.

6. Practical Example: Batch Replacing Content in a Column

For example, if you want to replace all occurrences of “Old_Name” with “New_Name” in the “Product_Name” column across all Excel files:

python

def process_excel(df):
    if "Product_Name" in df.columns:
        df["Product_Name"] = df["Product_Name"].replace("Old_Name", "New_Name")
    return df

7. Practical Example: Batch Merging All Excel Files

You can even merge the contents of all files into one:

In processors.py:

python

def process_excel(df):
    return df  # No processing, return as-is

Then, modify main.py:

python

all_data = []

for filename in os.listdir(INPUT_DIR):
    if filename.endswith((".xls", ".xlsx")):
        df = pd.read_excel(os.path.join(INPUT_DIR, filename))
        df["Source_File"] = filename # Add source file name
        all_data.append(df)

final_df = pd.concat(all_data, ignore_index=True)
final_df.to_excel("output/Merged_Result.xlsx", index=False)

This creates a commercially viable “Multi-File Merger Tool.”

8. Practical Example: Batch Splitting

Split a file into multiple Excel files based on a specific field:

python

def process_excel(df):
    for name, group in df.groupby("Category"):
        group.to_excel(f"output/{name}.xlsx", index=False)
    return None  # Indicates not to write the original combined DF

In main.py, decide whether to write the file based on whether process_excel returns a DataFrame.

9. Summary

Through this project, you will master:

Flexible use of pandas + openpyxl
How to structure a batch processing script
Practical application of functions and modularization
Methods for implementing office automation in real scenarios
How to develop Python tools into marketable small products (which you are already doing!)

By continuously expanding this script, it can evolve into:

A Merger Tool
A Splitter Tool
A Replacement Tool
A Field Extraction Tool
A Watermarking Tool
A Batch Format Conversion Tool

It holds significant potential for commercial value.

Easy Python

Python Programming in Action: Building a Batch Excel Processing Tool (Desktop Utility Script)

New Article

Related articles