Python Office Automation with PPT — How to use python-pptx part1

1. Introduction
For office automation, Python is unbeatable! Starting with this article, we’ll continue discussing another commonly used series in office automation: PowerPoint presentations.

2. Preparation
The most powerful dependency library for Python PPT operations is: python-pptx. Before starting, we need to install this dependency in a virtual environment:

python

# Install dependency
pip3 install python-pptx

3. PPT Structure
First, we need to understand the page structure of a PPT document:

  • A PPT document corresponds to a Presentation object
  • Presentation contains multiple Slide objects, each representing a slide
  • The content of each slide is composed of various Shapes

Secondly, content elements in PPT are composed of various shapes such as: text boxes, images, placeholders, tables, regular shapes, etc. By examining the source code, we find they’re all defined in the MSO_SHAPE_TYPE class.

Finally, we need to understand layout templates in PPT. Using the Presentation object’s property method slide_layouts, we can get the built-in 11 master styles:

python

# Use Presentation to get PPT's built-in 11 layout styles
# Layout index starts from 0
slide_layout = presentation.slide_layouts[slide_style_index]

They are respectively:

  • Title Slide
  • Title and Content
  • Section Header
  • Two Content
  • Comparison
  • Title Only
  • Blank
  • Content with Caption
  • Picture with Caption
  • Title and Vertical Text
  • Vertical Title and Text

Of course, you can also view corresponding master styles in Microsoft PPT / WPS.

Additionally, besides built-in layout styles, you can also use PlaceHolders to customize masters to meet specific scenario requirements.

4. Slide Management
A PPT file consists of one or multiple slides.

4.1 Adding a Slide
Simply follow these 3 steps:

  1. Instantiate a Presentation object
  2. Create a layout style using built-in templates
  3. Add a slide using the layout style

python

def add_slide(presentation, slide_style_index):
    """
    Add slide to PPT document using built-in layout
    :param presentation: Document object
    :param slide_style_index: Layout index
    :return:
    """
    # PPT layout styles
    # 11 built-in layout styles
    # 0: Title Slide
    # 1: Title and Content
    # 2: Section Header
    # 3: Two Content
    # 4: Comparison
    # 5: Title Only
    # 6: Blank
    # 7: Content with Caption
    # 8: Picture with Caption
    # 9: Title and Vertical Text
    # 10: Vertical Title and Text
    slide_layout = presentation.slide_layouts[slide_style_index]

    # Add a slide using layout style
    slide = presentation.slides.add_slide(slide_layout)

    return slide

# 1.1 Add slides
slide1 = add_slide(self.presentation, 0)
slide2 = add_slide(self.presentation, 1)
slide3 = add_slide(self.presentation, 2)
slide4 = add_slide(self.presentation, 3)

4.2 Getting Existing Slides
The Presentation object’s slides property returns a list of all slide objects in the current PPT document.

python

def get_slides(presentation):
    """
    Get all slides
    :param presentation:
    :return:
    """
    # All slides
    slides = presentation.slides

    # Number of slides
    slide_num = len(slides)

    return slides, slide_num

def get_slide(presentation, slide_index):
    """
    Get a specific slide by index
    :param presentation:
    :param slide_index: Page index, starting from 0
    :return:
    """
    slides, slide_num = get_slides(presentation=presentation)

    return slides[slide_index]

# 1.2.1 Get all slides
slides, slide_num = get_slides(self.presentation)
print('Existing slides:', slides)
print('Number of slides:', slide_num)

# 1.2.2 Get a specific slide
slide = get_slide(self.presentation, 1)
print(slide.shapes)

4.3 Deleting a Slide
This is also simple – first get the current slide object, then use the following method to remove it:

python

def del_slide(presentation, slide_index=0):
    """
    Delete a specific slide
    :param presentation:
    :param slide_index: Index
    :return:
    """
    # List of all slides
    slides = list(presentation.slides._sldIdLst)

    # Delete a specific slide by index
    presentation.slides._sldIdLst.remove(slides[slide_index])

# 1.3 Delete a specific slide in PPT document by index
# Example: Delete the 4th slide
del_slide(self.presentation, 3)

5. Text and Paragraphs
First, we need to specify a Slide object, which can be an existing slide or a newly created one.

Then, use the slide object’s slide.shapes property to get the queue of all shapes in the current slide.

Finally, use the following function of the shape queue to add a text box, returning a: text box object

python

add_textbox(left, top, width, height)

Function parameters:

  • left: Left margin
  • top: Top margin
  • width: Text box width
  • height: Text box height

This introduces another concept: Text Shape
PS: Text shapes facilitate adding paragraphs and setting styles in text boxes, obtained through the text box object’s property function text_frame.

python

def insert_textbox(slide, left, top, width, height, unit=Inches):
    """
    Add text box to slide
    :param unit: Unit, default set to Inches
    :param slide: Slide object
    :param left: Left margin
    :param top: Top margin
    :param width: Width
    :param height: Height
    :return:
    """
    # Text box
    textbox = slide.shapes.add_textbox(left=unit(left),
                                       top=unit(top),
                                       width=unit(width),
                                       height=unit(height))
    # Text box shape
    tf = textbox.text_frame

    return textbox, tf

For convenience, I’ve encapsulated the action of inserting text boxes into slides. Length unit defaults to: Inches, but can also be customized to centimeters, etc.

Next, let’s look at common operations for text boxes and paragraphs:

5.1 Insert Text Box and Set Default Paragraph Content
When inserting a text box, the text shape object comes with a default paragraph that can have its content set.

python

# 2. Insert a text box into the slide, returns a text box object and text box shape object
textbox, tf = insert_textbox(slide, 8, 2, 10, 4, unit=Cm)

# 2.1 Default paragraph
paragraph_default = tf.paragraphs[0]
paragraph_default.text = "Set default paragraph content"

5.2 Add New Paragraph in Text Box
Examining the source code reveals that text box shape objects are subclasses of TextFrame, so we can use the add_paragraph() function from the TextFrame class to add a new paragraph.

python

# 2.2 Add a new paragraph
paragraph_new = tf.add_paragraph()

# 2.3 Set paragraph content
paragraph_new.text = "Welcome to follow the official account: AirPython\nWeekly sharing of Python original technical content!"

5.3 Set Paragraph and Text Styles
Like Word, using python-pptx can also set paragraph styles in PPT documents.

Alignment: Alignment is for paragraphs, just specify the paragraph object’s alignment property value.

python

def set_parg_font_style(paragraph, font_name=None, font_color=None, font_size=-1, font_bold=False, font_italic=False,
                        paragraph_alignment=PP_ALIGN.CENTER):
    """
    Set text style in paragraph, including: font name, color, size, bold, italic
    :param paragraph_alignment: Paragraph alignment
    :param paragraph:
    :param font_name:
    :param font_color:
    :param font_size:
    :param font_bold:
    :param font_italic:
    :return:
    """

    # Alignment
    # Note: Alignment is for paragraphs
    paragraph.alignment = paragraph_alignment

    # Get font object in paragraph
    font = paragraph.font

    # Set font style
    set_font_style(font, font_name, font_color, font_size, font_bold, font_italic)

    return font

Paragraph Text Attributes: Use the paragraph object’s font property to get the font object, then set font name, size, color, italic, bold.

python

def set_font_style(font, font_name=None, font_color=None, font_size=-1, font_bold=False, font_italic=False):
    """
    Set font style
    :param font:
    :param font_name:
    :param font_color:
    :param font_size:
    :param font_bold:
    :param font_italic:
    :return:
    """
    # Font name
    if font_name:
        font.name = font_name

    # Font color
    if font_color and len(font_color) == 3:
        font.color.rgb = RGBColor(font_color[0], font_color[1], font_color[2])

    # Font size
    if font_size != -1:
        font.size = Pt(font_size)

    # Bold, default not bold
    font.bold = font_bold

    # Italic, default not italic
    font.italic = font_italic

5.4 Set Text Box Background Color
Setting text box background color only requires 2 steps:

  1. Set shape fill type to solid
  2. Set text box background color

python

def set_widget_bg(widget, bg_rgb_color=None):
    """
    Set background color for [textbox/cell/shape]
    :param widget: Textbox, cell, shape
    :param bg_rgb_color: Background color value
    :return:
    """
    if bg_rgb_color and len(bg_rgb_color) == 3:
        # 1. Set shape fill type to solid
        widget.fill.solid()
        # 2. Set text box background color
        widget.fill.fore_color.rgb = RGBColor(bg_rgb_color[0], bg_rgb_color[1], bg_rgb_color[2])

# 4. Set text box background color
set_widget_bg(textbox, [0, 255, 0])

Note: This method also applies to setting background colors for table cells and regular shapes.

5.5 Text Box Auto-alignment
When a text box contains very long text that can’t display completely in a single line, we just need to set the text shape’s word_wrap value to True to enable automatic line wrapping.

python

# 5. Set text box text auto-alignment
tf.word_wrap = True