Docsensedeveloper documentation · v1

Documents in, structured data out.

Send a document and a field schema; get JSON back with a calibrated per-field confidence score and — as an add-on — bounding-box locations. Delivered as a REST API and an embeddable widget your users will actually trust.

BASE URLhttps://api.docsense.energyhub.cloud/v1  ·  AUTHAuthorization: Bearer sk_test_… | sk_live_…
Errors are RFC 9457 application/problem+json · Idempotency-Key honored on POSTs · EU-only processing (GDPR Art. 28 processor).
Console: sign in with email + password (invite-only — one-time links from your workspace admin). API keys are created and rotated from the console's Chiavi API section by workspace administrators.

Quickstart — extract in two calls

1 · Publish a template (immutable versions)

curl -X POST $BASE/v1/templates \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{
    "name": "electricity-invoice",
    "language_hint": "it",
    "features": {"grounding": true},
    "fields": [
      {"key": "holder_name",  "type": "string", "label": "Intestatario", "required": true},
      {"key": "pod",          "type": "string", "label": "Codice POD", "validate": "pod"},
      {"key": "iban",         "type": "string", "validate": "iban"},
      {"key": "total_amount", "type": "number", "required": true},
      {"key": "due_date",     "type": "date"}
    ]
  }'

Scalar types: string · number · integer · boolean · date · enum, plus group (flat object) and table (rows of scalar cells). Validators: iban · codice_fiscale · partita_iva · pod · pdr · cap · date · email — results are returned per field as checks and feed the confidence score.

2 · Run an extraction

curl -X POST $BASE/v1/extractions \
  -H "Authorization: Bearer $KEY" \
  -F "file=@bolletta.pdf" -F "template=electricity-invoice" -F "mode=sync"
{
  "status": "completed",
  "fields": {
    "pod": {
      "value": "IT001E12345678",
      "raw_text": "IT 001E 1234 5678",
      "confidence": 0.99,
      "status": "found",
      "checks": [{"name": "pod", "passed": true}],
      "locations": [{"page": 1, "bbox": {"x": 0.155, "y": 0.178, "w": 0.129, "h": 0.012}}]
    }
  },
  "usage": {"pages": 1},
  "processing": {"engine": "haiku", "duration_ms": 4800}
}

Use mode=async (documents over 10 pages always run async) and follow GET /v1/extractions/{id}/events — an SSE stream that emits every field the moment it is extracted, then completed.

Confidence bands used across the product: ≥ 0.95 confident · 0.70–0.95 check · < 0.70 needs attention. Automate on the score; it is calibrated and never gated by plan. Bounding boxes (locations) are the paid grounding add-on; confidence quality is identical with or without it.

File uploads & limits

Documents are capped at 50 MB on every path. To keep large uploads off your own request path (the bytes go straight to storage) — or to reuse one document across several extractions — upload the file first, then reference it by id:

curl -X POST $BASE/v1/files \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{"filename": "bolletta.pdf", "content_type": "application/pdf"}'
# → { "id": "file_…", "upload": { "url": "https://…", "method": "PUT" }, … }

# upload the bytes to the returned URL (presigned S3 PUT, valid 15 minutes)
curl -X PUT "$UPLOAD_URL" -H "Content-Type: application/pdf" --data-binary @bolletta.pdf

curl -X POST $BASE/v1/extractions \
  -H "Authorization: Bearer $KEY" \
  -F "file_id=file_…" -F "template=electricity-invoice" -F "mode=async"

Check readiness with GET /v1/files/{id}. Uploaded files stay reusable for 24 hours and are then purged — every extraction keeps its own copy of the document under your workspace's retention policy.

Sync extractions (mode=sync) are limited per workspace by concurrency: responses carry X-RateLimit-Limit / X-RateLimit-Remaining, and a request over the cap gets a 429 with Retry-After. The async path absorbs bursts through the queue.

Embeddable widget

The widget gives your end users upload → live progressive extraction → review with document highlights → confirm, inside your page. No account, no cookies; the document goes from the iframe straight to Docsense, never through your origin.

1 · Mint a session (server-side, secret key)

curl -X POST $BASE/v1/widget/sessions \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{
    "template": "electricity-invoice",
    "allowed_origins": ["https://app.yourcompany.com"],
    "locale": "it",
    "consent": "notice",
    "upload_hint": "Carica la tua ultima bolletta"
  }'
# → { "token": "swt_…", "expires_at": "…" }   (TTL 30 min, refreshable in-widget)

The origin allowlist also drives the iframe's frame-ancestors CSP — only your listed origins can embed the session. The widget requires a template with features.grounding (the highlights are the widget).

2 · Embed (client-side)

<script src="https://api.docsense.energyhub.cloud/sdk/docex.js"></script>
<div id="docsense-slot"></div>
<script>
  var widget = DocEx.create({
    sessionToken: 'swt_…',              // from your backend
    container: '#docsense-slot',
    locale: 'it',
    appearance: { variables: { colorPrimary: '#0E8A6A', borderRadius: '8px' } }
  })
  widget.on('fields.confirmed', function (p) {
    // p.values = flat map of leaf keys → final, user-reviewed values
    form.nome.value = p.values.holder_name || ''
    form.pod.value  = p.values.pod || ''
  })
</script>
Events (widget → you)Commands (you → widget)
ready · upload.started · upload.progress · extraction.started · field.extracted (progressive) · extraction.completed · fields.confirmed · error · resize · exit prefill(values) · setLocale('it'|'en') · setTheme(appearance) · retry() · destroy()

TypeScript typings ship as @docex/js. All messages are namespaced docex.*, versioned, exact-origin. Try the full flow on the live demo page.

Webhooks

Get notified when an extraction finishes instead of polling. At-least-once delivery with exponential backoff (5 s → 30 s → 2 m → 10 m → 1 h, 6 attempts).

curl -X POST $BASE/v1/webhook-endpoints \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{"url": "https://app.yourcompany.com/hooks/docsense",
       "events": ["extraction.completed", "extraction.failed"]}'
# → { "id": "whe_…", "secret": "whsec_…" }   (secret shown once)

Each request carries webhook-id, webhook-timestamp and webhook-signature headers (svix-compatible scheme). Verify before trusting:

# python
import base64, hmac, hashlib

def verify(secret, headers, body: bytes) -> bool:
    key = base64.b64decode(secret.removeprefix("whsec_"))
    signed = f'{headers["webhook-id"]}.{headers["webhook-timestamp"]}.'.encode() + body
    expected = base64.b64encode(hmac.new(key, signed, hashlib.sha256).digest()).decode()
    got = [s.split(",", 1)[1] for s in headers["webhook-signature"].split() if s.startswith("v1,")]
    return any(hmac.compare_digest(expected, g) for g in got)

Payload: {"type": "extraction.completed", "created_at": …, "data": …} where data is the same body GET /v1/extractions/{id} returns. Delivery attempts are inspectable at GET /v1/webhook-endpoints/{id}/deliveries.

Usage & billing

Two meters, per page: pages_basic (value + raw text + confidence + checks) and pages_grounded (adds per-field locations). The widget always meters grounded. Check totals any time with GET /v1/usage. Test-mode keys (sk_test_…) are never billed.

Docsense · EU-only processing · OpenAPI reference · widget demo
© 2026 iBill S.r.l. con Socio Unico · Via dei Castani, 144 – 00172 Roma