How to read and write Word using Python- How to use python-docx

1. Introduction
In daily office automation tasks, using Python can truly double your efficiency!
In our previous series, we provided a comprehensive summary of Python operations with Excel.
Starting from this article, we’ll continue discussing another common document format: Word.

2. Preparation
The most common library for Python Word operations is python-docx.
Before starting, we need to install this dependency in a virtual environment:

python

# Install dependency
pip3 install python-docx

3. Writing Practice
We need to understand the page structure of a Word document:

  • Document
  • Section
  • Paragraph
  • Run

Commonly manipulated data types include: paragraphs, headings, lists, images, tables, and styles.

First, create a document object using Document(), equivalent to creating a blank document:

python

from docx import Document

# 1. Create a blank document
doc = Document()

Then, we can start writing data into the document.

Headings
Use the document object’s add_heading(text, level) method to write headings.
The first parameter is the heading content, the second represents the heading level:

python

# 2.1 Headings
# Write a Level 1, Level 2, and Level 3 heading respectively
doc.add_heading('Level 1 Heading', 0)
doc.add_heading('Level 2 Heading', 1)
doc.add_heading('Level 3 Heading', 2)

Paragraphs
Paragraphs fall into three categories:

  • Normal paragraphs
  • Custom styled paragraphs
  • Quote paragraphs

By default, use the document object’s add_paragraph(text, style) method to add a paragraph.

Normal Paragraph: If the second parameter style is not provided, a normal paragraph is added.
Quote Paragraph: For quote paragraphs, simply set the paragraph style to Intense Quote.

python

# 2.2.1 Add a normal paragraph
doc.add_paragraph("I am a normal paragraph.")

# 2.2.3 Add a quote paragraph
# Simply specify the style as 'Intense Quote'
doc.add_paragraph('-- I am a quote paragraph --', style='Intense Quote')

Custom Styled Paragraphs: There are two implementation methods:

  1. Create an empty paragraph object and specify font styles when adding Run text blocks.
  2. Create a new style (or use an existing one) with the document object, then set it as the second parameter when adding a paragraph.

Considering style reusability, the second method is more practical.
The corresponding method is: document.styles.add_style(style_name, type)

Taking the second implementation as an example, we create a custom styled paragraph, setting the font name, size, color, bold, alignment, etc.
Note: The second parameter specifies the style type, with three options:

  • 1: Paragraph style
  • 2: Character style
  • 3: Table style

python

def create_style(document, style_name, style_type, font_size=-1, font_color=None, font_name=None, align=None):
    """
    Create a style
    :param align:
    :param document:
    :param style_name: Style name
    :param style_type: Style type, 1: Paragraph, 2: Character, 3: Table
    :param font_name:
    :param font_color:
    :param font_size:
    :return:
    """
    if font_color is None:
        font_color = []

    # Important: Check if the style exists to avoid errors when re-adding
    style_names = [style.name for style in document.styles]
    if style_name in style_names:
        # print('Style already exists, no need to re-add!')
        return

    font_style = document.styles.add_style(style_name, style_type)

    # Font size
    if font_size != -1:
        font_style.font.size = Pt(font_size)

    # Font color
    # e.g., [0xff, 0x00, 0x00]
    if font_color and len(font_color) == 3:
        font_style.font.color.rgb = RGBColor(font_color[0], font_color[1], font_color[2])

    # Alignment
    # Note: Only paragraphs and tables have alignment
    if style_type != 2 and align:
        font_style.paragraph_format.alignment = align
        # font_style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
        # font_style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.LEFT
        # font_style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT

    # Chinese font name
    if font_name:
        font_style.font.name = font_name
        font_style._element.rPr.rFonts.set(qn('w:eastAsia'), font_name)

    return font_style

Finally, when adding a paragraph, pass the created style as the second parameter.
The add_paragraph() method returns a paragraph object, which can also use the add_run(text, style) method to append a Run text block with a specified style.

python

# 1/ Paragraph style
style_paragraph = create_style(document=doc, style_name="style2", style_type=1, font_size=30,
                               font_color=[0xff, 0x00, 0x00])
# 2/ Character style
style_string = create_style(document=doc, style_name="style3", style_type=2, font_size=15,
                            font_color=[0x00, 0xff, 0x00])
# 3/ Table style
# Alignment: Center
style_table = create_style(document=doc, style_name="style4", style_type=3, font_size=25,
                           font_color=[0x00, 0x00, 0xff], align=WD_PARAGRAPH_ALIGNMENT.CENTER)

current_paragraph = doc.add_paragraph("I am a paragraph with a custom style (Method 2)!!!", style_paragraph)
# Character style
current_paragraph.add_run("【Part of the text in Paragraph 2】", style_string)

Lists
Ordered and unordered lists are commonly used in Word documents.
Similar to adding paragraphs, use the document object’s add_paragraph() method with different styles:

  • Ordered list: List Number
  • Unordered list: List Bullet

python

def add_list(document, data, isorder):
    """
    Add list data to unordered/ordered lists
    :param document: Document object
    :param data: List data
    :param isorder: Whether it's an ordered list
    :return:
    """
    # Unordered list
    if not isorder:
        for item in data:
            document.add_paragraph(item, style='List Bullet')
    else:
        # Ordered list
        for item in data:
            document.add_paragraph(item, style='List Number')

# 2.3 Lists
# 2.3.1 Unordered list
add_list(doc, ["Unordered-Item1", "Unordered-Item2", "Unordered-Item3"], False)

# 2.3.2 Ordered list
add_list(doc, ["Ordered-Item1", "Ordered-Item2", "Ordered-Item3"], True)

Images
Use the method: add_picture(image, width, height)

  • The first parameter: image path or image stream (for web images)
  • The second and third parameters: set image width and height

Note: If width/height are not specified, the native image size is used. If only one is set, it scales proportionally.

Local Image:

python

def add_local_image(doc, image_path, width=None, height=None):
    """
    Add a local image to the Word document
    :param doc:
    :param image_path:
    :param width:
    :param height:
    :return:
    """
    doc.add_picture(image_path, width=None if width is None else Inches(width),
                    height=None if height is None else Inches(height))

# 2.4.1 Insert local image
add_local_image(doc, './1.png', width=2)

Web Image: First, obtain the image byte stream from the URL, then pass it as the first parameter.

python

import ssl
from io import BytesIO

def get_image_data_from_network(url):
    """
    Get network image byte stream
    :param url: Image URL
    :return:
    """
    ssl._create_default_https_context = ssl._create_unverified_context
    # Get the byte stream of the network image
    image_data = BytesIO(urlopen(url).read())
    return image_data

def add_network_image(doc, image_url, width=None, height=None):
    """
    Add a network image to the Word document
    :param doc:
    :param image_url:
    :param width:
    :param height:
    :return:
    """
    # Get image stream
    image_data = get_image_data_from_network(image_url)
    doc.add_picture(image_data, width=None if width is None else Inches(width),
                    height=None if height is None else Inches(height))

# 2.4.2 Insert network image
url = 'Image_URL_Address'
add_network_image(doc, url, width=3)

Tables
Use the method: add_table(row_num, column_num, style=None)
Return value: Table object <class 'docx.table.Table'>

  • First parameter: number of rows
  • Second parameter: number of columns
  • Third parameter: table style

Use row/column indices to get a list of all cell objects in a specific row/column.

python

# Add a table
table = doc.add_table(***)

# Get all cell objects in a row/column via index
# List of all cell objects in the first row
head_cells = table.rows[0].cells

Additionally, the table object can use add_row() and add_column() methods to append a row/column.

Example: Insert a table with specified headers and data:

python

def add_table(doc, head_datas, datas, style=None):
    """
    Add a new table
    :param doc:
    :param head_datas: Table headers
    :param datas: Data
    :param style:
    :return:
    """
    # Add a new table
    # Complete list of table styles: https://blog.csdn.net/ibiao/article/details/78595295
    # Default style: Table Grid
    table = doc.add_table(rows=1, cols=len(head_datas), style=("Table Grid" if style is None else style))

    # List of all cell objects in the first row
    head_cells = table.rows[0].cells

    # Write data to headers
    for index, head_item in enumerate(head_datas):
        head_cells[index].text = head_item

    # Traverse data and write to table
    for data in datas:
        # Add a row individually: add_row
        row_cells = table.add_row().cells
        for index, cell in enumerate(row_cells):
            cell.text = str(data[index])

# 2.5 Table
head_datas = ["Name", "Age", "Region"]
datas = (
  ('Zhang San', 18, 'Shenzhen'),
  ('Li Si', 28, 'Beijing'),
  ('Wang Wu', 33, 'Shanghai'),
  ('Sun Liu', 42, 'Guangzhou')
)

# Add a table and specify a style
# add_table(doc, head_datas, datas, style_table)
add_table(doc, head_datas, datas)

Note: The default table style is Table Grid. You can also use the method above to create a custom table style and set it when inserting the table.