Python Office Automation with PPT — How to use python-pptx part3

1. Introduction
As the final article in the PPT automation series, we’ll cover advanced features and common usage points in PPT, including:

  • Predefined Shapes
  • Charts
  • Reading text content
  • Saving all images

2. Predefined Shapes
In fact, the content area of PPT documents is composed of various Shape types, including: images, text boxes, videos, tables, and predefined shapes. Among these, the predefined regular shapes are quite extensive.

Use the following method to insert a shape into a slide:

python

slide.shapes.add_shape(autoshape_type_id, left, top, width, height)

Parameters:

  • autoshape_type_id: Shape type
  • left: Left margin
  • top: Top margin
  • width: Shape width
  • height: Shape height

Let’s use inserting a simple rounded rectangle as an example.

2.1 Inserting Shapes

python

from pptx.enum.shapes import MSO_SHAPE, MSO_SHAPE_TYPE

def insert_shape(slide, left, top, width, height, autoshape_type_id=MSO_SHAPE.CHEVRON, unit=Inches):
    """
    Add shape to slide
    :param unit: Unit, defaults to Inches
    :param autoshape_type_id: Shape type
    :param slide: Slide
    :param left: Left margin
    :param top: Top margin
    :param width: Width
    :param height: Height
    :return:
    """
    # Add a shape
    # add_shape(self, autoshape_type_id, left, top, width, height)
    # Parameters: shape type, left margin, top margin, width, height
    shape = slide.shapes.add_shape(autoshape_type_id=autoshape_type_id,
                                   left=unit(left),
                                   top=unit(top),
                                   width=unit(width),
                                   height=unit(height))
    return shape

# 1. Add a rounded rectangle
rectangle = insert_shape(slide, 2, 2, 16, 8, autoshape_type_id=MSO_SHAPE.ROUNDED_RECTANGLE, unit=Cm)

2.2 Setting Shape Properties
We can further set the background color and border properties for the shape object returned by the above method.

Example: Set background color to white; border color to red, width to 0.5 cm:

python

# 2. Set shape properties
# 2.1 Background color
set_widget_bg(rectangle, bg_rgb_color=[255, 255, 255])

# 2.2 Border properties
set_widget_frame(rectangle, frame_rgb_color=[255, 0, 0], frame_width=0.5)

For more shape types, refer to:
https://python-pptx.readthedocs.io/en/latest/api/enum/MsoAutoShapeType.html

3. Charts
Charts are frequently used content in PPT. Using python-pptx, you can create various types of charts including: bar charts, pie charts, line charts, scatter plots, 3D charts, etc.

Chart creation method:

python

slide.shapes.add_chart(chart_type, x, y, cx, cy, chart_data)

Parameters:

  • chart_type: Chart style
  • x: Left margin
  • y: Top margin
  • cx: Chart display width
  • cy: Chart display height

3.1 Creating a Line Chart
First, create a ChartData object:

python

from pptx.chart.data import ChartData

slide = add_slide(self.presentation, 6)

# Create a chart data object
chart_data = ChartData()

Then, prepare chart data:

python

# Data categories (x-axis data)
chart_data.categories = [2000, 2005, 2010, 2015, 2020]

# Data for each dimension per year (3 dimensions)
# Economy
chart_data.add_series("Economy", [60, 65, 75, 90, 95])

# Environment
chart_data.add_series("Environment", [95, 88, 84, 70, 54])

# Culture
chart_data.add_series("Military", [40, 65, 80, 95, 98])

Finally, specify chart type as line chart XL_CHART_TYPE.LINE and draw the chart according to the chart data.

For other chart types, refer to:
https://python-pptx.readthedocs.io/en/latest/api/enum/XlChartType.html

python

def insert_chart(slide, left, top, width, height, data, unit=Inches, chart_type=XL_CHART_TYPE.COLUMN_CLUSTERED):
    """
    Insert chart
    :param slide: Slide
    :param left: Left margin
    :param top: Top margin
    :param width: Width
    :param height: Height
    :param data: Chart data
    :param unit: Data unit, defaults to: Inches
    :param chart_type: Chart type, defaults to: bar chart
    :return:
    """
    chart_result = slide.shapes.add_chart(chart_type=chart_type,
                                          x=unit(left), y=unit(top),
                                          cx=unit(width), cy=unit(height),
                                          chart_data=data)
    # Return chart
    return chart_result.chart

# Add chart
chart = insert_chart(slide, 4, 5, 20, 9, chart_data, unit=Cm, chart_type=XL_CHART_TYPE.LINE)

3.2 Setting Chart Display Properties
Examples: setting chart legend, whether to display smooth lines, setting chart text style:

python

# Set chart display properties
# Display legend
chart.has_legend = True

# Whether legend displays outside plot area
chart.legend.include_in_layout = False

# Set whether chart displays smooth lines
chart.series[0].smooth = True
chart.series[1].smooth = True
chart.series[2].smooth = True

# Set text style in chart
set_font_style(chart.font, font_size=12, font_color=[255, 0, 0])

4. Reading Content
The content area of PPT documents is composed of various Shapes, and shape.has_text_frame can be used to determine if a shape contains a text box. Therefore, by iterating through all shapes, we can get all text content in the PPT.

python

def read_ppt_content(presentation):
    """
    Read all content in PPT
    :param presentation:
    :return:
    """
    # All content
    results = []

    # Iterate through all slides, get values from text boxes
    for slide in presentation.slides:
        for shape in slide.shapes:
            # Check if shape contains text box
            if shape.has_text_frame:
                content = get_shape_content(shape)
                if content:
                    results.append(content)

    return results

presentation = Presentation("./raw.pptx")

# 1. All text content from regular shapes
contents = read_ppt_content(presentation)
print(contents)

However, text data in Table cells cannot be obtained using this method. We can only filter shapes of type TABLE, iterate through all rows and cells in the table, and get text data:

python

def read_ppt_file_table(self):
    """
    Read data from PPT
    :return:
    """
    # Open PPT to read
    presentation = Presentation("./raw.pptx")

    for slide in presentation.slides:
        # Iterate through all shapes
        # Shapes: shapes with content, shapes without content
        for shape in slide.shapes:
            # print('Current shape name:', shape.shape_type)
            # Only get data from tables, read content by row
            if shape.shape_type == MSO_SHAPE_TYPE.TABLE:
                # Get table rows (shape.table.rows)
                for row in shape.table.rows:
                    # All cells in a row (row.cells)
                    for cell in row.cells:
                        # Content in cell text box (cell.text_frame.text)
                        print(cell.text_frame.text)

5. Saving Images
Sometimes we need to save all images from a PPT document locally. This can be completed in just 3 steps:

  1. Iterate through all shapes in the slide content area
  2. Filter out picture shapes of type MSO_SHAPE_TYPE.PICTURE, get the binary byte stream of the picture shape
  3. Write the picture byte stream to a file

python

def save_ppt_images(presentation, output_path):
    """
    Save all images from PPT
    [Python批量导出PPT中的图片素材](https://www.pythonf.cn/read/49552)
    :param presentation:
    :param output_path: Save directory
    :return:
    """

    print('Number of slides:', len(presentation.slides))

    # Iterate through all slides
    for index_slide, slide in enumerate(presentation.slides):
        # Iterate through all shapes
        for index_shape, shape in enumerate(slide.shapes):
            # Shapes include: text shapes, images, regular shapes, etc.

            # Filter out picture shapes
            if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
                # Get image binary character stream
                image_data = shape.image.blob

                # image/jpeg, image/png, etc.
                image_type_pre = shape.image.content_type

                # Image file extension
                image_suffix = image_type_pre.split('/')[1]

                # Create image folder to save extracted images
                if not os.path.exists(output_path):
                    os.makedirs(output_path)

                # Image save path
                output_image_path = output_path + random_str(10) + "." + image_suffix

                print(output_image_path)

                # Write to new file
                with open(output_image_path, 'wb') as file:
                    file.write(image_data)

6. Conclusion
This officially concludes the Python PPT Office Automation series!