How to read and write Word using Python- How to use python-docx
1. Introduction
In daily office automation tasks, using Python can truly double your efficiency!
In our previous series, we provided a comprehensive summary of Python operations with Excel.
Starting from this article, we’ll continue discussing another common document format: Word.
2. Preparation
The most common library for Python Word operations is python-docx.
Before starting, we need to install this dependency in a virtual environment:
python
# Install dependency pip3 install python-docx
3. Writing Practice
We need to understand the page structure of a Word document:
- Document
- Section
- Paragraph
- Run
Commonly manipulated data types include: paragraphs, headings, lists, images, tables, and styles.
First, create a document object using Document(), equivalent to creating a blank document:
python
from docx import Document # 1. Create a blank document doc = Document()
Then, we can start writing data into the document.
Headings
Use the document object’s add_heading(text, level) method to write headings.
The first parameter is the heading content, the second represents the heading level:
python
# 2.1 Headings
# Write a Level 1, Level 2, and Level 3 heading respectively
doc.add_heading('Level 1 Heading', 0)
doc.add_heading('Level 2 Heading', 1)
doc.add_heading('Level 3 Heading', 2)
Paragraphs
Paragraphs fall into three categories:
- Normal paragraphs
- Custom styled paragraphs
- Quote paragraphs
By default, use the document object’s add_paragraph(text, style) method to add a paragraph.
Normal Paragraph: If the second parameter style is not provided, a normal paragraph is added.
Quote Paragraph: For quote paragraphs, simply set the paragraph style to Intense Quote.
python
# 2.2.1 Add a normal paragraph
doc.add_paragraph("I am a normal paragraph.")
# 2.2.3 Add a quote paragraph
# Simply specify the style as 'Intense Quote'
doc.add_paragraph('-- I am a quote paragraph --', style='Intense Quote')
Custom Styled Paragraphs: There are two implementation methods:
- Create an empty paragraph object and specify font styles when adding Run text blocks.
- Create a new style (or use an existing one) with the document object, then set it as the second parameter when adding a paragraph.
Considering style reusability, the second method is more practical.
The corresponding method is: document.styles.add_style(style_name, type)
Taking the second implementation as an example, we create a custom styled paragraph, setting the font name, size, color, bold, alignment, etc.
Note: The second parameter specifies the style type, with three options:
- 1: Paragraph style
- 2: Character style
- 3: Table style
python
def create_style(document, style_name, style_type, font_size=-1, font_color=None, font_name=None, align=None):
"""
Create a style
:param align:
:param document:
:param style_name: Style name
:param style_type: Style type, 1: Paragraph, 2: Character, 3: Table
:param font_name:
:param font_color:
:param font_size:
:return:
"""
if font_color is None:
font_color = []
# Important: Check if the style exists to avoid errors when re-adding
style_names = [style.name for style in document.styles]
if style_name in style_names:
# print('Style already exists, no need to re-add!')
return
font_style = document.styles.add_style(style_name, style_type)
# Font size
if font_size != -1:
font_style.font.size = Pt(font_size)
# Font color
# e.g., [0xff, 0x00, 0x00]
if font_color and len(font_color) == 3:
font_style.font.color.rgb = RGBColor(font_color[0], font_color[1], font_color[2])
# Alignment
# Note: Only paragraphs and tables have alignment
if style_type != 2 and align:
font_style.paragraph_format.alignment = align
# font_style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
# font_style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.LEFT
# font_style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT
# Chinese font name
if font_name:
font_style.font.name = font_name
font_style._element.rPr.rFonts.set(qn('w:eastAsia'), font_name)
return font_style
Finally, when adding a paragraph, pass the created style as the second parameter.
The add_paragraph() method returns a paragraph object, which can also use the add_run(text, style) method to append a Run text block with a specified style.
python
# 1/ Paragraph style
style_paragraph = create_style(document=doc, style_name="style2", style_type=1, font_size=30,
font_color=[0xff, 0x00, 0x00])
# 2/ Character style
style_string = create_style(document=doc, style_name="style3", style_type=2, font_size=15,
font_color=[0x00, 0xff, 0x00])
# 3/ Table style
# Alignment: Center
style_table = create_style(document=doc, style_name="style4", style_type=3, font_size=25,
font_color=[0x00, 0x00, 0xff], align=WD_PARAGRAPH_ALIGNMENT.CENTER)
current_paragraph = doc.add_paragraph("I am a paragraph with a custom style (Method 2)!!!", style_paragraph)
# Character style
current_paragraph.add_run("【Part of the text in Paragraph 2】", style_string)
Lists
Ordered and unordered lists are commonly used in Word documents.
Similar to adding paragraphs, use the document object’s add_paragraph() method with different styles:
- Ordered list:
List Number - Unordered list:
List Bullet
python
def add_list(document, data, isorder):
"""
Add list data to unordered/ordered lists
:param document: Document object
:param data: List data
:param isorder: Whether it's an ordered list
:return:
"""
# Unordered list
if not isorder:
for item in data:
document.add_paragraph(item, style='List Bullet')
else:
# Ordered list
for item in data:
document.add_paragraph(item, style='List Number')
# 2.3 Lists
# 2.3.1 Unordered list
add_list(doc, ["Unordered-Item1", "Unordered-Item2", "Unordered-Item3"], False)
# 2.3.2 Ordered list
add_list(doc, ["Ordered-Item1", "Ordered-Item2", "Ordered-Item3"], True)
Images
Use the method: add_picture(image, width, height)
- The first parameter: image path or image stream (for web images)
- The second and third parameters: set image width and height
Note: If width/height are not specified, the native image size is used. If only one is set, it scales proportionally.
Local Image:
python
def add_local_image(doc, image_path, width=None, height=None):
"""
Add a local image to the Word document
:param doc:
:param image_path:
:param width:
:param height:
:return:
"""
doc.add_picture(image_path, width=None if width is None else Inches(width),
height=None if height is None else Inches(height))
# 2.4.1 Insert local image
add_local_image(doc, './1.png', width=2)
Web Image: First, obtain the image byte stream from the URL, then pass it as the first parameter.
python
import ssl
from io import BytesIO
def get_image_data_from_network(url):
"""
Get network image byte stream
:param url: Image URL
:return:
"""
ssl._create_default_https_context = ssl._create_unverified_context
# Get the byte stream of the network image
image_data = BytesIO(urlopen(url).read())
return image_data
def add_network_image(doc, image_url, width=None, height=None):
"""
Add a network image to the Word document
:param doc:
:param image_url:
:param width:
:param height:
:return:
"""
# Get image stream
image_data = get_image_data_from_network(image_url)
doc.add_picture(image_data, width=None if width is None else Inches(width),
height=None if height is None else Inches(height))
# 2.4.2 Insert network image
url = 'Image_URL_Address'
add_network_image(doc, url, width=3)
Tables
Use the method: add_table(row_num, column_num, style=None)
Return value: Table object <class 'docx.table.Table'>
- First parameter: number of rows
- Second parameter: number of columns
- Third parameter: table style
Use row/column indices to get a list of all cell objects in a specific row/column.
python
# Add a table table = doc.add_table(***) # Get all cell objects in a row/column via index # List of all cell objects in the first row head_cells = table.rows[0].cells
Additionally, the table object can use add_row() and add_column() methods to append a row/column.
Example: Insert a table with specified headers and data:
python
def add_table(doc, head_datas, datas, style=None):
"""
Add a new table
:param doc:
:param head_datas: Table headers
:param datas: Data
:param style:
:return:
"""
# Add a new table
# Complete list of table styles: https://blog.csdn.net/ibiao/article/details/78595295
# Default style: Table Grid
table = doc.add_table(rows=1, cols=len(head_datas), style=("Table Grid" if style is None else style))
# List of all cell objects in the first row
head_cells = table.rows[0].cells
# Write data to headers
for index, head_item in enumerate(head_datas):
head_cells[index].text = head_item
# Traverse data and write to table
for data in datas:
# Add a row individually: add_row
row_cells = table.add_row().cells
for index, cell in enumerate(row_cells):
cell.text = str(data[index])
# 2.5 Table
head_datas = ["Name", "Age", "Region"]
datas = (
('Zhang San', 18, 'Shenzhen'),
('Li Si', 28, 'Beijing'),
('Wang Wu', 33, 'Shanghai'),
('Sun Liu', 42, 'Guangzhou')
)
# Add a table and specify a style
# add_table(doc, head_datas, datas, style_table)
add_table(doc, head_datas, datas)
Note: The default table style is Table Grid. You can also use the method above to create a custom table style and set it when inserting the table.
Related articles