Python Programming in Action: Building a Batch Excel Processing Tool (Desktop Utility Script)

david 28/11/2025

With the growing demand for office automation, creating your own “batch processing tool” using Python can easily save you from a lot of repetitive tasks. Examples include:

  • Batch merging Excel files
  • Batch replacing content
  • Batch splitting data
  • Batch adding watermarks
  • Batch generating reports
  • Batch converting formats (CSV ↔ Excel)

In this section, we will build a versatile batch processing script to give you a practical understanding of how real-world projects are implemented.


1. Project Goal

Build a Python desktop script with the following functionalities:

  1. Read all Excel files (.xls/.xlsx) in a specified folder.
  2. Process each file (customizable based on your business needs).
  3. Output the processed results to a new folder.
  4. Have a well-structured design: extensible, maintainable, and reusable.

2. Main Libraries Used

We will use these tools:

python

import os
import pandas as pd
from openpyxl import load_workbook

Pandas is the core library, and openpyxl assists with advanced operations.


3. Project Structure Design

A simple directory structure is recommended:

text

excel_batch_tool/
│── main.py               # Main script
│── processors.py         # Contains various processing methods
│── input/                # Input folder
│── output/               # Output folder for results

Module separation is crucial in real projects, making future feature expansion significantly easier.


4. Core Function: Batch Traversing Excel Files

main.py

python

import os
import pandas as pd
from processors import process_excel

INPUT_DIR = "input"
OUTPUT_DIR = "output"

def ensure_output_folder():
    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)

def main():
    ensure_output_folder()

    for filename in os.listdir(INPUT_DIR):
        if filename.endswith((".xlsx", ".xls")):
            path = os.path.join(INPUT_DIR, filename)
            print(f"Processing: {filename}")

            df = pd.read_excel(path)

            # Call custom processing logic
            df = process_excel(df)

            # Output results
            output_path = os.path.join(OUTPUT_DIR, filename)
            df.to_excel(output_path, index=False)
            print(f"Saved to: {output_path}")

if __name__ == "__main__":
    main()

You can reuse this template repeatedly; just modify processors.py to add new functionalities.


5. Custom Processing Logic: processors.py

Example: Removing empty rows, standardizing column names, adding a new field.

python

def process_excel(df):
    # 1. Remove completely empty rows
    df = df.dropna(how="all")

    # 2. Standardize column name format
    df.columns = [str(c).strip() for c in df.columns]

    # 3. Add a timestamp column
    import datetime
    df["Processing_Time"] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

    return df

In the future, you can replace this with:

  • Batch value replacement
  • Batch adding formulas
  • Batch merging multiple sheets
  • Batch splitting into multiple files
  • Exporting after conditional filtering
  • Statistical analysis and report generation

The possibilities for extension are entirely up to you.


6. Practical Example: Batch Replacing Content in a Column

For example, if you want to replace all occurrences of “Old_Name” with “New_Name” in the “Product_Name” column across all Excel files:

python

def process_excel(df):
    if "Product_Name" in df.columns:
        df["Product_Name"] = df["Product_Name"].replace("Old_Name", "New_Name")
    return df

7. Practical Example: Batch Merging All Excel Files

You can even merge the contents of all files into one:

In processors.py:

python

def process_excel(df):
    return df  # No processing, return as-is

Then, modify main.py:

python

all_data = []

for filename in os.listdir(INPUT_DIR):
    if filename.endswith((".xls", ".xlsx")):
        df = pd.read_excel(os.path.join(INPUT_DIR, filename))
        df["Source_File"] = filename # Add source file name
        all_data.append(df)

final_df = pd.concat(all_data, ignore_index=True)
final_df.to_excel("output/Merged_Result.xlsx", index=False)

This creates a commercially viable “Multi-File Merger Tool.”


8. Practical Example: Batch Splitting

Split a file into multiple Excel files based on a specific field:

python

def process_excel(df):
    for name, group in df.groupby("Category"):
        group.to_excel(f"output/{name}.xlsx", index=False)
    return None  # Indicates not to write the original combined DF

In main.py, decide whether to write the file based on whether process_excel returns a DataFrame.


9. Summary

Through this project, you will master:

  • Flexible use of pandas + openpyxl
  • How to structure a batch processing script
  • Practical application of functions and modularization
  • Methods for implementing office automation in real scenarios
  • How to develop Python tools into marketable small products (which you are already doing!)

By continuously expanding this script, it can evolve into:

  • A Merger Tool
  • A Splitter Tool
  • A Replacement Tool
  • A Field Extraction Tool
  • A Watermarking Tool
  • A Batch Format Conversion Tool

It holds significant potential for commercial value.