PolyForm

How it works
← Open the app

A buyer's onboarding form goes in.
The same PDF comes back — filled, grounded, verifiable.

Suppliers receive hundreds of onboarding questionnaires — every buyer, a different layout; most of them German legalese; all of them answered today by hand, hours per form. PolyForm fills them automatically from one company master record and proves where every answer came from.

You don't read German? You don't have to. Every answer shows the master fact behind it — you verify by fact, not by language.

The pipeline — any form, one mechanism
1
Synthesize
Flat PDF without form fields? Fields are created from the printed layout first.
2
Extract
Every widget on every page, with stable IDs — the raw anatomy of the form.
3
Label
Geometry attaches meaning: which printed question does each box belong to?
4
Vision rescue
Where the layout garbles a label, a vision model reads the page like a human.
5
Map
One batched LLM call maps master data onto the fields — schema-enforced, reasoned, never invented.
6
Fill
The real PDF is filled in place. Structure, fonts and checkboxes stay intact.
7
Verify
Deterministic code traces every answer back to a master fact before you see it.
Geometry decides where. Vision decides what. The LLM decides meaning. Deterministic code decides trust.
Why the output can be trusted — the part that wins contracts
Grounding cross-check
Every answer is traced to the master fact it came from. The banner says it plainly: 38/38 grounded · 0 invented.
Unfounded-answer backstop
A printed Ja needs a truthy fact, a Nein an explicitly false one. No fact — no answer, deterministically.
Dead-section blanking
Sections whose controlling question is “no” stay completely blank. No data leaks into chapters that don't apply.
Residue stripping
Previous submitters' leftovers are wiped and reported — including a warning when a foreign company name is printed in the form itself.
Honest gaps
If master data can't answer a question, the field stays blank and is surfaced for a human — never guessed.
Language discipline
Answers come out in the form's language. An English “no” never leaks onto a German form.
Proof, not promises
241golden answers, hand-authored across 3 real forms
100 · 100 · 95.2% output accuracy on SOMA · WEBER · ViscoTec — zero LLM-stage errors
28deterministic gates run before and after every change
3 → 0live field-test rounds driven from 11 findings to zero defects

Accuracy is measured per pipeline stage against hand-authored golden truth, so every miss is attributed to the stage that caused it — that's what makes the system improvable instead of lucky.

See it in 90 seconds
  1. Open the app — the demo company's master data is already loaded.
  2. Upload a blank onboarding form (SOMA takes ~30s; WEBER, 221 fields across 11 pages, ~2½ min).
  3. Watch the grounding banner, review or edit any answer, download the filled PDF.

The interface is English; the answers always come out in the form's own language — an English “no” on a German form would be wrong, and the engine enforces that.

Engine: Python · pikepdf · OpenAI structured outputs  ·  Built as a working MVP — multi-tenant accounts, persistence and SSO are deliberate post-pilot work.