Python Office Automation with PPT — How to use python-pptx part3
1. Introduction
As the final article in the PPT automation series, we’ll cover advanced features and common usage points in PPT, including:
- Predefined Shapes
- Charts
- Reading text content
- Saving all images
2. Predefined Shapes
In fact, the content area of PPT documents is composed of various Shape types, including: images, text boxes, videos, tables, and predefined shapes. Among these, the predefined regular shapes are quite extensive.
Use the following method to insert a shape into a slide:
python
slide.shapes.add_shape(autoshape_type_id, left, top, width, height)
Parameters:
autoshape_type_id: Shape typeleft: Left margintop: Top marginwidth: Shape widthheight: Shape height
Let’s use inserting a simple rounded rectangle as an example.
2.1 Inserting Shapes
python
from pptx.enum.shapes import MSO_SHAPE, MSO_SHAPE_TYPE
def insert_shape(slide, left, top, width, height, autoshape_type_id=MSO_SHAPE.CHEVRON, unit=Inches):
"""
Add shape to slide
:param unit: Unit, defaults to Inches
:param autoshape_type_id: Shape type
:param slide: Slide
:param left: Left margin
:param top: Top margin
:param width: Width
:param height: Height
:return:
"""
# Add a shape
# add_shape(self, autoshape_type_id, left, top, width, height)
# Parameters: shape type, left margin, top margin, width, height
shape = slide.shapes.add_shape(autoshape_type_id=autoshape_type_id,
left=unit(left),
top=unit(top),
width=unit(width),
height=unit(height))
return shape
# 1. Add a rounded rectangle
rectangle = insert_shape(slide, 2, 2, 16, 8, autoshape_type_id=MSO_SHAPE.ROUNDED_RECTANGLE, unit=Cm)
2.2 Setting Shape Properties
We can further set the background color and border properties for the shape object returned by the above method.
Example: Set background color to white; border color to red, width to 0.5 cm:
python
# 2. Set shape properties # 2.1 Background color set_widget_bg(rectangle, bg_rgb_color=[255, 255, 255]) # 2.2 Border properties set_widget_frame(rectangle, frame_rgb_color=[255, 0, 0], frame_width=0.5)
For more shape types, refer to:
https://python-pptx.readthedocs.io/en/latest/api/enum/MsoAutoShapeType.html
3. Charts
Charts are frequently used content in PPT. Using python-pptx, you can create various types of charts including: bar charts, pie charts, line charts, scatter plots, 3D charts, etc.
Chart creation method:
python
slide.shapes.add_chart(chart_type, x, y, cx, cy, chart_data)
Parameters:
chart_type: Chart stylex: Left marginy: Top margincx: Chart display widthcy: Chart display height
3.1 Creating a Line Chart
First, create a ChartData object:
python
from pptx.chart.data import ChartData slide = add_slide(self.presentation, 6) # Create a chart data object chart_data = ChartData()
Then, prepare chart data:
python
# Data categories (x-axis data)
chart_data.categories = [2000, 2005, 2010, 2015, 2020]
# Data for each dimension per year (3 dimensions)
# Economy
chart_data.add_series("Economy", [60, 65, 75, 90, 95])
# Environment
chart_data.add_series("Environment", [95, 88, 84, 70, 54])
# Culture
chart_data.add_series("Military", [40, 65, 80, 95, 98])
Finally, specify chart type as line chart XL_CHART_TYPE.LINE and draw the chart according to the chart data.
For other chart types, refer to:
https://python-pptx.readthedocs.io/en/latest/api/enum/XlChartType.html
python
def insert_chart(slide, left, top, width, height, data, unit=Inches, chart_type=XL_CHART_TYPE.COLUMN_CLUSTERED):
"""
Insert chart
:param slide: Slide
:param left: Left margin
:param top: Top margin
:param width: Width
:param height: Height
:param data: Chart data
:param unit: Data unit, defaults to: Inches
:param chart_type: Chart type, defaults to: bar chart
:return:
"""
chart_result = slide.shapes.add_chart(chart_type=chart_type,
x=unit(left), y=unit(top),
cx=unit(width), cy=unit(height),
chart_data=data)
# Return chart
return chart_result.chart
# Add chart
chart = insert_chart(slide, 4, 5, 20, 9, chart_data, unit=Cm, chart_type=XL_CHART_TYPE.LINE)
3.2 Setting Chart Display Properties
Examples: setting chart legend, whether to display smooth lines, setting chart text style:
python
# Set chart display properties # Display legend chart.has_legend = True # Whether legend displays outside plot area chart.legend.include_in_layout = False # Set whether chart displays smooth lines chart.series[0].smooth = True chart.series[1].smooth = True chart.series[2].smooth = True # Set text style in chart set_font_style(chart.font, font_size=12, font_color=[255, 0, 0])
4. Reading Content
The content area of PPT documents is composed of various Shapes, and shape.has_text_frame can be used to determine if a shape contains a text box. Therefore, by iterating through all shapes, we can get all text content in the PPT.
python
def read_ppt_content(presentation):
"""
Read all content in PPT
:param presentation:
:return:
"""
# All content
results = []
# Iterate through all slides, get values from text boxes
for slide in presentation.slides:
for shape in slide.shapes:
# Check if shape contains text box
if shape.has_text_frame:
content = get_shape_content(shape)
if content:
results.append(content)
return results
presentation = Presentation("./raw.pptx")
# 1. All text content from regular shapes
contents = read_ppt_content(presentation)
print(contents)
However, text data in Table cells cannot be obtained using this method. We can only filter shapes of type TABLE, iterate through all rows and cells in the table, and get text data:
python
def read_ppt_file_table(self):
"""
Read data from PPT
:return:
"""
# Open PPT to read
presentation = Presentation("./raw.pptx")
for slide in presentation.slides:
# Iterate through all shapes
# Shapes: shapes with content, shapes without content
for shape in slide.shapes:
# print('Current shape name:', shape.shape_type)
# Only get data from tables, read content by row
if shape.shape_type == MSO_SHAPE_TYPE.TABLE:
# Get table rows (shape.table.rows)
for row in shape.table.rows:
# All cells in a row (row.cells)
for cell in row.cells:
# Content in cell text box (cell.text_frame.text)
print(cell.text_frame.text)
5. Saving Images
Sometimes we need to save all images from a PPT document locally. This can be completed in just 3 steps:
- Iterate through all shapes in the slide content area
- Filter out picture shapes of type
MSO_SHAPE_TYPE.PICTURE, get the binary byte stream of the picture shape - Write the picture byte stream to a file
python
def save_ppt_images(presentation, output_path):
"""
Save all images from PPT
[Python批量导出PPT中的图片素材](https://www.pythonf.cn/read/49552)
:param presentation:
:param output_path: Save directory
:return:
"""
print('Number of slides:', len(presentation.slides))
# Iterate through all slides
for index_slide, slide in enumerate(presentation.slides):
# Iterate through all shapes
for index_shape, shape in enumerate(slide.shapes):
# Shapes include: text shapes, images, regular shapes, etc.
# Filter out picture shapes
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
# Get image binary character stream
image_data = shape.image.blob
# image/jpeg, image/png, etc.
image_type_pre = shape.image.content_type
# Image file extension
image_suffix = image_type_pre.split('/')[1]
# Create image folder to save extracted images
if not os.path.exists(output_path):
os.makedirs(output_path)
# Image save path
output_image_path = output_path + random_str(10) + "." + image_suffix
print(output_image_path)
# Write to new file
with open(output_image_path, 'wb') as file:
file.write(image_data)
6. Conclusion
This officially concludes the Python PPT Office Automation series!
Related articles