I. Markdown Documents
Essentially, a Markdown document is: a tree structure (Block level) + inline structure (Inline level).
Block-level elements (structure):
heading_open→inline→heading_closeparagraph_open→inline→paragraph_closelist_open→list_item_open→inline→list_item_closeblockquote_open→ …fence(code block)
Inline-level elements (appear within a line):
textimagelink_open→inline→link_closestrong_open/strong_closeem_open/em_close
Patterns in .md file structure:
- A block always appears in pairs (e.g.,
heading_open/heading_close). contentis only used for “inline” text, not for structural tokens.
MarkdownToken#
| Structure | Token Flow |
|---|---|
| Title | heading_open → inline → heading_close |
| Paragraph | paragraph_open → inline → paragraph_close |
| List Item | list_item_open → inline → list_item_close |
Token content usage:
type | content |
|---|---|
heading_open | "" (empty) |
inline | The entire line’s text (including Markdown syntax) |
text | Plain text content |
image | alt text (i.e., ![alt]) |
II. Python Package: markdown-it
1. How markdown-it Works
markdown-it parses a Markdown document into a flat list of Tokens. Each Token has the following attributes:
type– “Syntactic element type” (Key). Determines which Markdown structure the Token represents. Common types include:SyntaxtypeExamples# Headingheading_open,heading_closeParagraphparagraph_open,paragraph_closeInline contentinlineImage![]()imageList- itembullet_list_open,list_item_open, etc.tag– Corresponding HTML tag name (e.g.,h1,p,img).typetagheading_open(###)h3paragraph_openpimageimgcontent– Text content (only present forinlineortextchild Tokens).- For
inline→contentis the raw text of the entire line (e.g.,"docker images"). - For
text→contentis plain text (the actual text node). - For
image→contentis thealttext (e.g.,"image-2025...").
- For
attrs– HTML attributes (e.g., an image’ssrc,alt,titleare all here).
2. Markdown → Token Mapping
Assume original Markdown:
markdown
### Syntax docker images 
Use the following code to parse the document:
python
from pathlib import Path
from markdown_it import MarkdownIt
md = MarkdownIt()
md_path = Path(r"./docker-learning.md")
md_text = md_path.read_text(encoding="utf-8")
tokens = md.parse(md_text)
for t in tokens:
print(f"type: {t.type}, tag: {t.tag}, content: {t.content}, attrs: {t.attrs}")
The resulting semantic tree (conceptual):
text
heading_open (tag h3)
inline -> text("Syntax")
heading_close (tag h3)
paragraph_open
inline -> text("docker images")
paragraph_close
paragraph_open
inline -> image (alt="image-2025...", src="docker-learning-use-images/...")
paragraph_close
This structure is well-suited for programmatic document analysis.
3. Common Usage of markdown-it
Basic Examples:
python
from markdown_it import MarkdownIt
from pathlib import Path
# Install & Create parser
md = MarkdownIt()
# Render text to HTML format string
text = """
### Title
This is a paragraph containing **bold** and *italic* text.

"""
html = md.render(text)
print(html)
# Parse text into a token list (requires first reading the .md file into a variable)
md_path = Path(r"./docker-learning.md")
md_text = md_path.read_text(encoding="utf-8")
tokens = md.parse(md_text)
for t in tokens:
print(f"type: {t.type}, tag: {t.tag}, content: {t.content}, attrs: {t.attrs}")