What It Does
Install this skill when you need to programmatically process, generate, or analyze PDF documents at scale.
It handles everything from reading and extracting structured data to creating new documents from scratch, merging or splitting files, adding watermarks, encrypting with passwords, and filling PDF forms.
The PDF skill gives your AI agent a full suite of PDF manipulation capabilities powered by Python libraries (`pypdf`, `pdfplumber`, `reportlab`) and command-line tools (`qpdf`, `pdftotext`, `pdftk`).
Key Features
- Text & Table Extraction — Uses `pdfplumber` to extract plain text with layout preservation and structured tables from any page. Tables can be exported directly to `pandas` DataFrames and saved as Excel files for downstream analysis.
- PDF Creation with reportlab — Generate new PDF documents from scratch using `reportlab`'s canvas API or the higher-level Platypus document engine. Supports multi-page reports, headings, paragraphs, spacing, and page breaks.
- Merge, Split & Rotate — Combine multiple PDFs into one, split a document into individual pages or page ranges, and rotate pages by arbitrary degrees — all via `pypdf` in Python or `qpdf`/`pdftk` on the command line.
- OCR for Scanned PDFs — Converts scanned, image-based PDFs to images with `pdf2image` and runs `pytesseract` OCR on each page, recovering machine-readable text from documents that contain no embedded text layer.
- Watermarking & Password Protection — Overlay a watermark page onto every page of a document using `pypdf`'s `merge_page` API. Encrypt PDFs with separate user and owner passwords, or decrypt password-protected files with `qpdf`.
- PDF Form Handling — Supports programmatic form filling via `pypdf` or the JavaScript `pdf-lib` library (detailed in the skill's `forms.md`). Suitable for automating submission workflows that require populating standard PDF forms.
Requirements
Must be installed separately on the host system. - **poppler-utils** *(optional)* — Provides `pdftotext` and `pdfimages` command-line tools for text and image extraction. - **qpdf** *(optional)* — Command-line tool for merging, splitting, rotating, and decrypting PDFs. - **pdftk** *(optional)* — Alternative command-line tool for merging, splitting, and rotating PDFs, if available on the host.
- **Python runtime** — Required. Libraries used: `pypdf`, `pdfplumber`, `reportlab`, `pandas`, `pdf2image`, `pytesseract`. - **Tesseract OCR binary** *(optional)* — Required only for OCR on scanned PDFs.
Use Cases
- Automated report generation — An agent pulls data from an API or database, formats it using `reportlab`, and produces a branded multi-page PDF report — without any human touching a word processor.
- Bulk invoice or contract data extraction — An agent iterates over hundreds of PDF invoices, uses `pdfplumber` to extract line-item tables, and writes the structured results to a spreadsheet or database for accounting or compliance review.
- Scanned document digitization — An agent receives scanned PDFs (e.g., paper forms or legacy records), converts each page to an image, runs OCR with `pytesseract`, and stores the extracted text for search or further processing.
- PDF form auto-fill pipeline — An agent reads form field definitions from a PDF template, populates them with data from a CRM or spreadsheet, and outputs completed, ready-to-sign PDF forms — following the workflow described in `forms.md`.