With the growing demand for office automation, creating your own “batch processing tool” using Python can easily save you from a lot of repetitive tasks. Examples include:
- Batch merging Excel files
- Batch replacing content
- Batch splitting data
- Batch adding watermarks
- Batch generating reports
- Batch converting formats (CSV ↔ Excel)
In this section, we will build a versatile batch processing script to give you a practical understanding of how real-world projects are implemented.
1. Project Goal
Build a Python desktop script with the following functionalities:
- Read all Excel files (.xls/.xlsx) in a specified folder.
- Process each file (customizable based on your business needs).
- Output the processed results to a new folder.
- Have a well-structured design: extensible, maintainable, and reusable.
2. Main Libraries Used
We will use these tools:
python
import os import pandas as pd from openpyxl import load_workbook
Pandas is the core library, and openpyxl assists with advanced operations.
3. Project Structure Design
A simple directory structure is recommended:
text
excel_batch_tool/ │── main.py # Main script │── processors.py # Contains various processing methods │── input/ # Input folder │── output/ # Output folder for results
Module separation is crucial in real projects, making future feature expansion significantly easier.
4. Core Function: Batch Traversing Excel Files
main.py
python
import os
import pandas as pd
from processors import process_excel
INPUT_DIR = "input"
OUTPUT_DIR = "output"
def ensure_output_folder():
if not os.path.exists(OUTPUT_DIR):
os.makedirs(OUTPUT_DIR)
def main():
ensure_output_folder()
for filename in os.listdir(INPUT_DIR):
if filename.endswith((".xlsx", ".xls")):
path = os.path.join(INPUT_DIR, filename)
print(f"Processing: {filename}")
df = pd.read_excel(path)
# Call custom processing logic
df = process_excel(df)
# Output results
output_path = os.path.join(OUTPUT_DIR, filename)
df.to_excel(output_path, index=False)
print(f"Saved to: {output_path}")
if __name__ == "__main__":
main()
You can reuse this template repeatedly; just modify processors.py to add new functionalities.
5. Custom Processing Logic: processors.py
Example: Removing empty rows, standardizing column names, adding a new field.
python
def process_excel(df):
# 1. Remove completely empty rows
df = df.dropna(how="all")
# 2. Standardize column name format
df.columns = [str(c).strip() for c in df.columns]
# 3. Add a timestamp column
import datetime
df["Processing_Time"] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
return df
In the future, you can replace this with:
- Batch value replacement
- Batch adding formulas
- Batch merging multiple sheets
- Batch splitting into multiple files
- Exporting after conditional filtering
- Statistical analysis and report generation
The possibilities for extension are entirely up to you.
6. Practical Example: Batch Replacing Content in a Column
For example, if you want to replace all occurrences of “Old_Name” with “New_Name” in the “Product_Name” column across all Excel files:
python
def process_excel(df):
if "Product_Name" in df.columns:
df["Product_Name"] = df["Product_Name"].replace("Old_Name", "New_Name")
return df
7. Practical Example: Batch Merging All Excel Files
You can even merge the contents of all files into one:
In processors.py:
python
def process_excel(df):
return df # No processing, return as-is
Then, modify main.py:
python
all_data = []
for filename in os.listdir(INPUT_DIR):
if filename.endswith((".xls", ".xlsx")):
df = pd.read_excel(os.path.join(INPUT_DIR, filename))
df["Source_File"] = filename # Add source file name
all_data.append(df)
final_df = pd.concat(all_data, ignore_index=True)
final_df.to_excel("output/Merged_Result.xlsx", index=False)
This creates a commercially viable “Multi-File Merger Tool.”
8. Practical Example: Batch Splitting
Split a file into multiple Excel files based on a specific field:
python
def process_excel(df):
for name, group in df.groupby("Category"):
group.to_excel(f"output/{name}.xlsx", index=False)
return None # Indicates not to write the original combined DF
In main.py, decide whether to write the file based on whether process_excel returns a DataFrame.
9. Summary
Through this project, you will master:
- Flexible use of pandas + openpyxl
- How to structure a batch processing script
- Practical application of functions and modularization
- Methods for implementing office automation in real scenarios
- How to develop Python tools into marketable small products (which you are already doing!)
By continuously expanding this script, it can evolve into:
- A Merger Tool
- A Splitter Tool
- A Replacement Tool
- A Field Extraction Tool
- A Watermarking Tool
- A Batch Format Conversion Tool
It holds significant potential for commercial value.