📝

Pdf

Name: Pdf
Author: awspace

全方位 Pdf 工具組，可用於提取文字／表格、建立、合併、分割、加浮水印及填寫 Pdf 表單。

awspacev1.0.0

Productivity & TasksProductivityAI PoweredOpen SourceAutomationDeveloper Tool

正在連線至 VM...

npx clawhub@latest install pdf

178目前安裝數

v1.0.0版本

查看原始碼(ClawHub)

What It Does

Install this skill when you need to programmatically process, generate, or analyze PDF documents at scale.

It handles everything from reading and extracting structured data to creating new documents from scratch, merging or splitting files, adding watermarks, encrypting with passwords, and filling PDF forms.

The PDF skill gives your AI agent a full suite of PDF manipulation capabilities powered by Python libraries (`pypdf`, `pdfplumber`, `reportlab`) and command-line tools (`qpdf`, `pdftotext`, `pdftk`).

Key Features

Text & Table Extraction — Uses `pdfplumber` to extract plain text with layout preservation and structured tables from any page. Tables can be exported directly to `pandas` DataFrames and saved as Excel files for downstream analysis.
PDF Creation with reportlab — Generate new PDF documents from scratch using `reportlab`'s canvas API or the higher-level Platypus document engine. Supports multi-page reports, headings, paragraphs, spacing, and page breaks.
Merge, Split & Rotate — Combine multiple PDFs into one, split a document into individual pages or page ranges, and rotate pages by arbitrary degrees — all via `pypdf` in Python or `qpdf`/`pdftk` on the command line.
OCR for Scanned PDFs — Converts scanned, image-based PDFs to images with `pdf2image` and runs `pytesseract` OCR on each page, recovering machine-readable text from documents that contain no embedded text layer.
Watermarking & Password Protection — Overlay a watermark page onto every page of a document using `pypdf`'s `merge_page` API. Encrypt PDFs with separate user and owner passwords, or decrypt password-protected files with `qpdf`.
PDF Form Handling — Supports programmatic form filling via `pypdf` or the JavaScript `pdf-lib` library (detailed in the skill's `forms.md`). Suitable for automating submission workflows that require populating standard PDF forms.

Requirements

Must be installed separately on the host system. - **poppler-utils** *(optional)* — Provides `pdftotext` and `pdfimages` command-line tools for text and image extraction. - **qpdf** *(optional)* — Command-line tool for merging, splitting, rotating, and decrypting PDFs. - **pdftk** *(optional)* — Alternative command-line tool for merging, splitting, and rotating PDFs, if available on the host.

- **Python runtime** — Required. Libraries used: `pypdf`, `pdfplumber`, `reportlab`, `pandas`, `pdf2image`, `pytesseract`. - **Tesseract OCR binary** *(optional)* — Required only for OCR on scanned PDFs.

Use Cases

Automated report generation — An agent pulls data from an API or database, formats it using `reportlab`, and produces a branded multi-page PDF report — without any human touching a word processor.
Bulk invoice or contract data extraction — An agent iterates over hundreds of PDF invoices, uses `pdfplumber` to extract line-item tables, and writes the structured results to a spreadsheet or database for accounting or compliance review.
Scanned document digitization — An agent receives scanned PDFs (e.g., paper forms or legacy records), converts each page to an image, runs OCR with `pytesseract`, and stores the extracted text for search or further processing.
PDF form auto-fill pipeline — An agent reads form field definitions from a PDF template, populates them with data from a CRM or spreadsheet, and outputs completed, ready-to-sign PDF forms — following the workflow described in `forms.md`.

安裝方式

在終端機中執行

npx clawhub@latest install pdf

或

點擊本頁頂部的安裝按鈕即可一鍵設定

正在連線至 VM...

npx clawhub@latest install pdf

178目前安裝數

v1.0.0版本

查看原始碼(ClawHub)

常見問題

評價

0 則評價

登入後撰寫評價

尚無評價。來分享你的使用體驗吧！