Learn how to connect PerfectParser to your custom scripts and software to automate document extraction.

Welcome! This guide walks through a complete end-to-end integration — from creating a parser to downloading extracted JSON.

For Zapier, Make, n8n, and webhook patterns, see Automation Integrations.

How it Works

1. Configure Parser

Create a parser and upload a sample document to auto-detect fields.

2. Submit Documents

Upload files for extraction against your parser.

3. Get Results

Poll status or use webhooks, then download structured JSON.

Prerequisites

API Key — generate under Integrations → API Keys (starts with pp_live_).
Sample PDF — place a file like sample_document.pdf in your working directory.

End-to-end script

The templates below:

Verify your key with GET /v1/me (optional)
Create a parser
Detect fields via multipart upload (one step — no separate sample-file endpoint)
Submit an extraction with Idempotency-Key
Poll until complete
Fetch all results

import time
import requests
import json
import os
import uuid
 
API_KEY = "pp_live_your_api_key_here"
BASE_URL = "https://api.perfectparser.com"
FILE_PATH = "sample_document.pdf"
 
if API_KEY == "pp_live_your_api_key_here":
    raise SystemExit("Replace API_KEY with your actual key.")
 
if not os.path.exists(FILE_PATH):
    raise SystemExit(f"File not found: {FILE_PATH}")
 
headers = {"X-API-Key": API_KEY}
 
# Optional: verify key and check credits
me = requests.get(f"{BASE_URL}/v1/me", headers=headers).json()
print(f"Credits available: {me['user']['credits_available']}")
 
# Step 1: Create parser
create = requests.post(
    f"{BASE_URL}/v1/parsers",
    headers={**headers, "Content-Type": "application/json"},
    json={"name": f"auto-parser-{int(time.time())}"},
)
create.raise_for_status()
parser_id = create.json()["parser"]["parser_id"]
 
# Step 2: Detect fields (multipart — upload + detect in one call)
with open(FILE_PATH, "rb") as f:
    detect = requests.post(
        f"{BASE_URL}/v1/parsers/{parser_id}/detect-fields",
        headers=headers,
        files={"file": f},
    )
detect.raise_for_status()
detect_data = detect.json()
print(f"Schema fields: {list(detect_data['schema']['properties'].keys())}")
print(f"Sample file ID: {detect_data['sample_file_id']}")
 
# Step 3: Submit extraction
with open(FILE_PATH, "rb") as f:
    extract = requests.post(
        f"{BASE_URL}/v1/extractions",
        headers={**headers, "Idempotency-Key": str(uuid.uuid4())},
        data={"parser_id": parser_id},
        files={"files": f},
    )
extract.raise_for_status()
extraction = extract.json()["extraction"]
extraction_id = extraction["extraction_id"]
print(f"Extraction ID: {extraction_id}")
 
# Step 4: Poll status
status = extraction["status"]
while status in ("queued", "pending", "processing"):
    time.sleep(3)
    status_res = requests.get(
        f"{BASE_URL}/v1/extractions/{extraction_id}", headers=headers
    )
    status_res.raise_for_status()
    status = status_res.json()["extraction"]["status"]
    print(f"Status: {status}")
 
if status == "failed":
    raise SystemExit("Extraction failed.")
 
# Step 5: Fetch results
results = requests.get(
    f"{BASE_URL}/v1/extractions/{extraction_id}/results", headers=headers
)
results.raise_for_status()
documents = results.json()["extraction"]["documents"]
 
print("\n=== EXTRACTED DATA ===")
for doc in documents:
    print(f"File: {doc['filename']}")
    print(json.dumps(doc.get("extracted_data", {}), indent=2))

Fetch a single document (advanced)

If you only need one file's result (e.g. from a webhook payload), use the nested document endpoint with both IDs:

GET /v1/extractions/{extraction_id}/documents/{document_id}

See Get Document and Automation Integrations.

Next steps

Automation Integrations — webhooks, polling, idempotency
OpenAPI spec — machine-readable API definition
Webhooks — real-time delivery instead of polling

API Integration Guide