Normal AI - Asking PDFs questions

Asking PDFs questions actually isn’t about text, it’s about images! Except… it’s kind of about text. And layout. And images. It’s complicated.

Use cases

When you’re working with semi-structured documents, like PDFs of receipts or invoices, there are often individual fields you need to pull out: totals, dates, locations, etc.

If you’re looking to pull out things like names, places, or legal rulings from a larger body of text, I recommend converting the PDF to text and using entity recognition instead.

Try it out

This example of DocQuery is a great sample of asking a PDF a question. Be sure to click the examples!

Models

Popular models

LayoutLM is a common base, with microsoft/layoutlmv3-base being by far the most popular implementation. To get a look at a fine-tuned version, I might suggest impira/layoutlm-invoices.

State of the art

Asking PDFs questions falls under a few categories, but in this case we’ll go with “document layout analysis” on paperswithcode. There aren’t very many benchmarks, but Microsoft’s LayoutLMv3 is the top of at least one of them.