How to read and write Word using Python- How to use python-docx
1. Introduction
In daily office automation tasks, using Python can truly double your efficiency!
In our previous series, we provided a comprehensive summary of Python operations with Excel.
Starting from this article, we’ll continue discussing another common document format: Word.
2. Preparation
The most common library for Python Word operations is python-docx.
Before starting, we need to install this dependency in a virtual environment:
python
# Install dependency pip3 install python-docx
3. Writing Practice
We need to understand the page structure of a Word document:
- Document
- Section
- Paragraph
- Run
Commonly manipulated data types include: paragraphs, headings, lists, images, tables, and styles.
First, create a document object using Document()
, equivalent to creating a blank document:
python
from docx import Document # 1. Create a blank document doc = Document()
Then, we can start writing data into the document.
Headings
Use the document object’s add_heading(text, level)
method to write headings.
The first parameter is the heading content, the second represents the heading level:
python
# 2.1 Headings # Write a Level 1, Level 2, and Level 3 heading respectively doc.add_heading('Level 1 Heading', 0) doc.add_heading('Level 2 Heading', 1) doc.add_heading('Level 3 Heading', 2)
Paragraphs
Paragraphs fall into three categories:
- Normal paragraphs
- Custom styled paragraphs
- Quote paragraphs
By default, use the document object’s add_paragraph(text, style)
method to add a paragraph.
Normal Paragraph: If the second parameter style
is not provided, a normal paragraph is added.
Quote Paragraph: For quote paragraphs, simply set the paragraph style to Intense Quote
.
python
# 2.2.1 Add a normal paragraph doc.add_paragraph("I am a normal paragraph.") # 2.2.3 Add a quote paragraph # Simply specify the style as 'Intense Quote' doc.add_paragraph('-- I am a quote paragraph --', style='Intense Quote')
Custom Styled Paragraphs: There are two implementation methods:
- Create an empty paragraph object and specify font styles when adding Run text blocks.
- Create a new style (or use an existing one) with the document object, then set it as the second parameter when adding a paragraph.
Considering style reusability, the second method is more practical.
The corresponding method is: document.styles.add_style(style_name, type)
Taking the second implementation as an example, we create a custom styled paragraph, setting the font name, size, color, bold, alignment, etc.
Note: The second parameter specifies the style type, with three options:
- 1: Paragraph style
- 2: Character style
- 3: Table style
python
def create_style(document, style_name, style_type, font_size=-1, font_color=None, font_name=None, align=None): """ Create a style :param align: :param document: :param style_name: Style name :param style_type: Style type, 1: Paragraph, 2: Character, 3: Table :param font_name: :param font_color: :param font_size: :return: """ if font_color is None: font_color = [] # Important: Check if the style exists to avoid errors when re-adding style_names = [style.name for style in document.styles] if style_name in style_names: # print('Style already exists, no need to re-add!') return font_style = document.styles.add_style(style_name, style_type) # Font size if font_size != -1: font_style.font.size = Pt(font_size) # Font color # e.g., [0xff, 0x00, 0x00] if font_color and len(font_color) == 3: font_style.font.color.rgb = RGBColor(font_color[0], font_color[1], font_color[2]) # Alignment # Note: Only paragraphs and tables have alignment if style_type != 2 and align: font_style.paragraph_format.alignment = align # font_style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # font_style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.LEFT # font_style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT # Chinese font name if font_name: font_style.font.name = font_name font_style._element.rPr.rFonts.set(qn('w:eastAsia'), font_name) return font_style
Finally, when adding a paragraph, pass the created style as the second parameter.
The add_paragraph()
method returns a paragraph object, which can also use the add_run(text, style)
method to append a Run text block with a specified style.
python
# 1/ Paragraph style style_paragraph = create_style(document=doc, style_name="style2", style_type=1, font_size=30, font_color=[0xff, 0x00, 0x00]) # 2/ Character style style_string = create_style(document=doc, style_name="style3", style_type=2, font_size=15, font_color=[0x00, 0xff, 0x00]) # 3/ Table style # Alignment: Center style_table = create_style(document=doc, style_name="style4", style_type=3, font_size=25, font_color=[0x00, 0x00, 0xff], align=WD_PARAGRAPH_ALIGNMENT.CENTER) current_paragraph = doc.add_paragraph("I am a paragraph with a custom style (Method 2)!!!", style_paragraph) # Character style current_paragraph.add_run("【Part of the text in Paragraph 2】", style_string)
Lists
Ordered and unordered lists are commonly used in Word documents.
Similar to adding paragraphs, use the document object’s add_paragraph()
method with different styles:
- Ordered list:
List Number
- Unordered list:
List Bullet
python
def add_list(document, data, isorder): """ Add list data to unordered/ordered lists :param document: Document object :param data: List data :param isorder: Whether it's an ordered list :return: """ # Unordered list if not isorder: for item in data: document.add_paragraph(item, style='List Bullet') else: # Ordered list for item in data: document.add_paragraph(item, style='List Number') # 2.3 Lists # 2.3.1 Unordered list add_list(doc, ["Unordered-Item1", "Unordered-Item2", "Unordered-Item3"], False) # 2.3.2 Ordered list add_list(doc, ["Ordered-Item1", "Ordered-Item2", "Ordered-Item3"], True)
Images
Use the method: add_picture(image, width, height)
- The first parameter: image path or image stream (for web images)
- The second and third parameters: set image width and height
Note: If width/height are not specified, the native image size is used. If only one is set, it scales proportionally.
Local Image:
python
def add_local_image(doc, image_path, width=None, height=None): """ Add a local image to the Word document :param doc: :param image_path: :param width: :param height: :return: """ doc.add_picture(image_path, width=None if width is None else Inches(width), height=None if height is None else Inches(height)) # 2.4.1 Insert local image add_local_image(doc, './1.png', width=2)
Web Image: First, obtain the image byte stream from the URL, then pass it as the first parameter.
python
import ssl from io import BytesIO def get_image_data_from_network(url): """ Get network image byte stream :param url: Image URL :return: """ ssl._create_default_https_context = ssl._create_unverified_context # Get the byte stream of the network image image_data = BytesIO(urlopen(url).read()) return image_data def add_network_image(doc, image_url, width=None, height=None): """ Add a network image to the Word document :param doc: :param image_url: :param width: :param height: :return: """ # Get image stream image_data = get_image_data_from_network(image_url) doc.add_picture(image_data, width=None if width is None else Inches(width), height=None if height is None else Inches(height)) # 2.4.2 Insert network image url = 'Image_URL_Address' add_network_image(doc, url, width=3)
Tables
Use the method: add_table(row_num, column_num, style=None)
Return value: Table object <class 'docx.table.Table'>
- First parameter: number of rows
- Second parameter: number of columns
- Third parameter: table style
Use row/column indices to get a list of all cell objects in a specific row/column.
python
# Add a table table = doc.add_table(***) # Get all cell objects in a row/column via index # List of all cell objects in the first row head_cells = table.rows[0].cells
Additionally, the table object can use add_row()
and add_column()
methods to append a row/column.
Example: Insert a table with specified headers and data:
python
def add_table(doc, head_datas, datas, style=None): """ Add a new table :param doc: :param head_datas: Table headers :param datas: Data :param style: :return: """ # Add a new table # Complete list of table styles: https://blog.csdn.net/ibiao/article/details/78595295 # Default style: Table Grid table = doc.add_table(rows=1, cols=len(head_datas), style=("Table Grid" if style is None else style)) # List of all cell objects in the first row head_cells = table.rows[0].cells # Write data to headers for index, head_item in enumerate(head_datas): head_cells[index].text = head_item # Traverse data and write to table for data in datas: # Add a row individually: add_row row_cells = table.add_row().cells for index, cell in enumerate(row_cells): cell.text = str(data[index]) # 2.5 Table head_datas = ["Name", "Age", "Region"] datas = ( ('Zhang San', 18, 'Shenzhen'), ('Li Si', 28, 'Beijing'), ('Wang Wu', 33, 'Shanghai'), ('Sun Liu', 42, 'Guangzhou') ) # Add a table and specify a style # add_table(doc, head_datas, datas, style_table) add_table(doc, head_datas, datas)
Note: The default table style is Table Grid
. You can also use the method above to create a custom table style and set it when inserting the table.
Related articles