Skill Seekers, A Project Every AI Developer Should Try

amy 15/04/2026

What is Skill Seekers?

Think of it as a universal translator for AI.

The Problem: AI models like Claude or Gemini often get “confused” or give “messy” results because the data they are reading (documentation or code) is unorganized.

The Solution: Skill Seekers scrapes that data, cleans it, organizes it into patterns and guides, and packages it into a format the AI understands perfectly (like a SKILL.md file).

Why Use This?

For AI Skill Builders (Claude, Gemini, OpenAI)

  • 🎯 Production-grade Skills — 500+ line SKILL.md files with code examples, patterns, and guides
  • 🔄 Enhancement Workflows — Apply security-focusarchitecture-comprehensive, or custom YAML presets
  • 🎮 Any Domain — Game engines (Godot, Unity), frameworks (React, Django), internal tools
  • 🔧 Teams — Combine internal docs + code into a single source of truth
  • 📚 Quality — AI-enhanced with examples, quick reference, and navigation guidance

For RAG Builders & AI Engineers

  • 🤖 RAG-ready data — Pre-chunked LangChain Documents, LlamaIndex TextNodes, Haystack Documents
  • 🚀 99% faster — Days of preprocessing → 15–45 minutes
  • 📊 Smart metadata — Categories, sources, types → better retrieval accuracy
  • 🔄 Multi-source — Combine docs + GitHub + PDFs + videos in one pipeline
  • 🌐 Platform-agnostic — Export to any vector DB or framework without re-scraping

For AI Coding Assistant Users

  • 💻 Cursor / Windsurf / Cline — Generate .cursorrules / .windsurfrules / .clinerules automatically
  • 🎯 Persistent context — AI “knows” your frameworks without repeated prompting
  • 📚 Always current — Update context in minutes when docs change

Key Features

🌐 Documentation Scraping

  • ✅ Smart SPA Discovery – Three-layer discovery for JavaScript SPA sites (sitemap.xml → llms.txt → headless browser rendering)
  • ✅ llms.txt Support – Automatically detects and uses LLM-ready documentation files (10x faster)
  • ✅ Universal Scraper – Works with ANY documentation website
  • ✅ Smart Categorization – Automatically organizes content by topic
  • ✅ Code Language Detection – Recognizes Python, JavaScript, C++, GDScript, etc.
  • ✅ 24+ Ready-to-Use Presets – Godot, React, Vue, Django, FastAPI, and more

📄 PDF Support

  • ✅ Basic PDF Extraction – Extract text, code, and images from PDF files
  • ✅ OCR for Scanned PDFs – Extract text from scanned documents
  • ✅ Password-Protected PDFs – Handle encrypted PDFs
  • ✅ Table Extraction – Extract complex tables from PDFs
  • ✅ Parallel Processing – 3x faster for large PDFs
  • ✅ Intelligent Caching – 50% faster on re-runs

🎬 Video Extraction

  • ✅ YouTube & Local Videos – Extract transcripts, on-screen code, and structured knowledge from videos
  • ✅ Visual Frame Analysis – OCR extraction from code editors, terminals, slides, and diagrams
  • ✅ GPU Auto-Detection – Automatically installs correct PyTorch build (CUDA/ROCm/MPS/CPU)
  • ✅ AI Enhancement – Two-pass: clean OCR artifacts + generate polished SKILL.md
  • ✅ Time Clipping – Extract specific sections with --start-time and --end-time
  • ✅ Playlist Support – Batch process all videos in a YouTube playlist
  • ✅ Vision API Fallback – Use Claude Vision for low-confidence OCR frames

🐙 GitHub Repository Analysis

  • ✅ Deep Code Analysis – AST parsing for Python, JavaScript, TypeScript, Java, C++, Go
  • ✅ API Extraction – Functions, classes, methods with parameters and types
  • ✅ Repository Metadata – README, file tree, language breakdown, stars/forks
  • ✅ GitHub Issues & PRs – Fetch open/closed issues with labels and milestones
  • ✅ CHANGELOG & Releases – Automatically extract version history
  • ✅ Conflict Detection – Compare documented APIs vs actual code implementation
  • ✅ MCP Integration – Natural language: “Scrape GitHub repo facebook/react”

🔄 Unified Multi-Source Scraping

  • ✅ Combine Multiple Sources – Mix documentation + GitHub + PDF in one skill
  • ✅ Conflict Detection – Automatically finds discrepancies between docs and code
  • ✅ Intelligent Merging – Rule-based or AI-powered conflict resolution
  • ✅ Transparent Reporting – Side-by-side comparison with ⚠️ warnings
  • ✅ Documentation Gap Analysis – Identifies outdated docs and undocumented features
  • ✅ Single Source of Truth – One skill showing both intent (docs) and reality (code)
  • ✅ Backward Compatible – Legacy single-source configs still work

🤖 Multi-LLM Platform Support

  • ✅ 12 LLM Platforms – Claude AI, Google Gemini, OpenAI ChatGPT, MiniMax AI, Generic Markdown, OpenCode, Kimi (Moonshot AI), DeepSeek AI, Qwen (Alibaba), OpenRouter, Together AI, Fireworks AI
  • ✅ Universal Scraping – Same documentation works for all platforms
  • ✅ Platform-Specific Packaging – Optimized formats for each LLM
  • ✅ One-Command Export – --target flag selects platform
  • ✅ Optional Dependencies – Install only what you need
  • ✅ 100% Backward Compatible – Existing Claude workflows unchanged

How to use it (The Workflow)

To stop the “Antigravity” effect and get exactly what you need, you should follow these three steps:

1. Ingest (The “Create” Command)

You tell the tool where the information is. If your design skills are messy, you likely need to re-run the creation from a clean source of truth.

  • Example: skill-seekers create https://docs.your-design-tool.com/
  • This command crawls the site and builds a structured folder of knowledge.

2. Enhance (The “AI Agent” Phase)

This is likely where your “messy” designs are coming from. Skill Seekers uses an AI (like Claude or Gemini) to “write” the instructions for your design skill.

  • If it’s messy, you can tell it to use a specific YAML preset.
  • Use the --agent flag to choose a different “brain” to organize the data. For example, using --agent codex might give more precise technical results than a general model.

3. Package (The “Target” Command)

Once the data is clean, you “package” it for the specific AI you are using.

  • For Claude: skill-seekers package output/my-design --target claude
  • For Cursor (AI Coding): skill-seekers package output/my-design --target cursor

Why your designs might be “Messy” (Troubleshooting)

Given the documentation, here are three reasons your “Antigravity” designs are failing:

  1. Poor Source Data: If the documentation you scraped with skill-seekers create was incomplete, the AI is “hallucinating” the gaps. Try adding more sources (like a PDF manual or a YouTube tutorial) into the same project.
  2. Missing “SKILL.md” Logic: Skill Seekers generates a file called SKILL.md. This is the “brain” of your skill. If your designs are messy, open this file. You can manually edit it to include the “Preserve Geometries” or “Constraints” we discussed earlier.
  3. Target Mismatch: You might be packaging the data for the wrong system. Ensure you are using --target gemini if you are using Google’s AI, or --target claude for Anthropic.

Quick Start for your Design Skill:

If you want to try a “clean install” of a skill to see if it fixes the mess:

  1. Open your terminal.
  2. Install: pip install skill-seekers
  3. Create from the official docs: skill-seekers create https://www.atlassian.com/work-management/knowledge-sharing/documentation/software-design-document
  4. Package it: skill-seekers package output/[folder] --target [your AI]

License

MIT License

Resources