Documents in, structured data out.
Send a document and a field schema; get JSON back with a calibrated per-field confidence score and — as an add-on — bounding-box locations. Delivered as a REST API and an embeddable widget your users will actually trust.
https://api.docsense.energyhub.cloud/v1 ·
AUTHAuthorization: Bearer sk_test_… | sk_live_…Errors are RFC 9457
application/problem+json ·
Idempotency-Key honored on POSTs · EU-only processing (GDPR Art. 28 processor).Console: sign in with email + password (invite-only — one-time links from your workspace admin). API keys are created and rotated from the console's Chiavi API section by workspace administrators.
Quickstart — extract in two calls
1 · Publish a template (immutable versions)
curl -X POST $BASE/v1/templates \
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
-d '{
"name": "electricity-invoice",
"language_hint": "it",
"features": {"grounding": true},
"fields": [
{"key": "holder_name", "type": "string", "label": "Intestatario", "required": true},
{"key": "pod", "type": "string", "label": "Codice POD", "validate": "pod"},
{"key": "iban", "type": "string", "validate": "iban"},
{"key": "total_amount", "type": "number", "required": true},
{"key": "due_date", "type": "date"}
]
}'
Scalar types: string · number · integer · boolean · date · enum, plus
group (flat object) and table (rows of scalar cells).
Validators: iban · codice_fiscale · partita_iva · pod · pdr · cap · date · email —
results are returned per field as checks and feed the confidence score.
2 · Run an extraction
curl -X POST $BASE/v1/extractions \
-H "Authorization: Bearer $KEY" \
-F "file=@bolletta.pdf" -F "template=electricity-invoice" -F "mode=sync"
{
"status": "completed",
"fields": {
"pod": {
"value": "IT001E12345678",
"raw_text": "IT 001E 1234 5678",
"confidence": 0.99,
"status": "found",
"checks": [{"name": "pod", "passed": true}],
"locations": [{"page": 1, "bbox": {"x": 0.155, "y": 0.178, "w": 0.129, "h": 0.012}}]
}
},
"usage": {"pages": 1},
"processing": {"engine": "haiku", "duration_ms": 4800}
}
Use mode=async (documents over 10 pages always run async) and follow
GET /v1/extractions/{id}/events — an SSE stream that emits every field the
moment it is extracted, then completed.
locations) are the
paid grounding add-on; confidence quality is identical with or without it.File uploads & limits
Documents are capped at 50 MB on every path. To keep large uploads off your own request path (the bytes go straight to storage) — or to reuse one document across several extractions — upload the file first, then reference it by id:
curl -X POST $BASE/v1/files \
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
-d '{"filename": "bolletta.pdf", "content_type": "application/pdf"}'
# → { "id": "file_…", "upload": { "url": "https://…", "method": "PUT" }, … }
# upload the bytes to the returned URL (presigned S3 PUT, valid 15 minutes)
curl -X PUT "$UPLOAD_URL" -H "Content-Type: application/pdf" --data-binary @bolletta.pdf
curl -X POST $BASE/v1/extractions \
-H "Authorization: Bearer $KEY" \
-F "file_id=file_…" -F "template=electricity-invoice" -F "mode=async"
Check readiness with GET /v1/files/{id}. Uploaded files stay reusable for
24 hours and are then purged — every extraction keeps its own copy of the document under
your workspace's retention policy.
mode=sync) are limited per workspace by
concurrency: responses carry X-RateLimit-Limit /
X-RateLimit-Remaining, and a request over the cap gets a 429 with
Retry-After. The async path absorbs bursts through the queue.Embeddable widget
The widget gives your end users upload → live progressive extraction → review with document highlights → confirm, inside your page. No account, no cookies; the document goes from the iframe straight to Docsense, never through your origin.
1 · Mint a session (server-side, secret key)
curl -X POST $BASE/v1/widget/sessions \
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
-d '{
"template": "electricity-invoice",
"allowed_origins": ["https://app.yourcompany.com"],
"locale": "it",
"consent": "notice",
"upload_hint": "Carica la tua ultima bolletta"
}'
# → { "token": "swt_…", "expires_at": "…" } (TTL 30 min, refreshable in-widget)
The origin allowlist also drives the iframe's frame-ancestors CSP — only
your listed origins can embed the session. The widget requires a template with
features.grounding (the highlights are the widget).
2 · Embed (client-side)
<script src="https://api.docsense.energyhub.cloud/sdk/docex.js"></script>
<div id="docsense-slot"></div>
<script>
var widget = DocEx.create({
sessionToken: 'swt_…', // from your backend
container: '#docsense-slot',
locale: 'it',
appearance: { variables: { colorPrimary: '#0E8A6A', borderRadius: '8px' } }
})
widget.on('fields.confirmed', function (p) {
// p.values = flat map of leaf keys → final, user-reviewed values
form.nome.value = p.values.holder_name || ''
form.pod.value = p.values.pod || ''
})
</script>
| Events (widget → you) | Commands (you → widget) |
|---|---|
ready · upload.started · upload.progress ·
extraction.started · field.extracted (progressive) ·
extraction.completed · fields.confirmed · error ·
resize · exit |
prefill(values) · setLocale('it'|'en') ·
setTheme(appearance) · retry() · destroy() |
TypeScript typings ship as @docex/js. All messages are
namespaced docex.*, versioned, exact-origin. Try the full flow on the
live demo page.
Webhooks
Get notified when an extraction finishes instead of polling. At-least-once delivery with exponential backoff (5 s → 30 s → 2 m → 10 m → 1 h, 6 attempts).
curl -X POST $BASE/v1/webhook-endpoints \
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
-d '{"url": "https://app.yourcompany.com/hooks/docsense",
"events": ["extraction.completed", "extraction.failed"]}'
# → { "id": "whe_…", "secret": "whsec_…" } (secret shown once)
Each request carries webhook-id, webhook-timestamp and
webhook-signature headers (svix-compatible scheme). Verify before trusting:
# python
import base64, hmac, hashlib
def verify(secret, headers, body: bytes) -> bool:
key = base64.b64decode(secret.removeprefix("whsec_"))
signed = f'{headers["webhook-id"]}.{headers["webhook-timestamp"]}.'.encode() + body
expected = base64.b64encode(hmac.new(key, signed, hashlib.sha256).digest()).decode()
got = [s.split(",", 1)[1] for s in headers["webhook-signature"].split() if s.startswith("v1,")]
return any(hmac.compare_digest(expected, g) for g in got)
Payload: {"type": "extraction.completed", "created_at": …, "data": …} where
data is the same body GET /v1/extractions/{id} returns. Delivery
attempts are inspectable at GET /v1/webhook-endpoints/{id}/deliveries.
Usage & billing
Two meters, per page: pages_basic (value + raw text + confidence + checks)
and pages_grounded (adds per-field locations). The widget always
meters grounded. Check totals any time with GET /v1/usage. Test-mode keys
(sk_test_…) are never billed.
Docsense · EU-only processing ·
OpenAPI reference ·
widget demo
© 2026 iBill S.r.l. con Socio Unico · Via dei Castani, 144 – 00172 Roma