Help

Data extraction from pdf, doc and docx in airtable?

Topic Labels: Automations
4123 4
cancel
Showing results for 
Search instead for 
Did you mean: 
ghuznee
5 - Automation Enthusiast
5 - Automation Enthusiast

Can airtable take an uploaded document, such as pdf, doc, docx and then extract the text from it?

If so how do I sort that out?

I want users to upload a document rather than paste the words in the document for a smoother UX journey

4 Replies 4
Hannah_Wiginton
10 - Mercury
10 - Mercury

Unfortunately, Airtable doesn't have native OCR capabilities. However, you could do this with the Vision API extension - https://support.airtable.com/docs/vision-extension - or other OCR APIs with Scripting.

Either will require knowledge of code. If you need support with this, you might look at hiring an Airtable consultant - https://ecosystem.airtable.com/consultants

And/or posting in the Airtable Job Board - https://community.airtable.com/t5/job-board/bd-p/jobs

You might also look at OCR integrations on Zapier

https://zapier.com/apps/airtable/integrations/mindee-ocr

https://zapier.com/apps/airtable/integrations/nanonets-ocr

 

______________________________________
Hannah - On2Air.com - Automated Backups for Airtable

You can use a third party service that can extract text. CloudConvert can convert pdf, doc, and docx to plain text. Pdf.co can convert pdf to text as well. You can integrate with either of these services using scripting or an integration tool such as Zapier or Make.

paulo
6 - Interface Innovator
6 - Interface Innovator

We have created product/extensions that integrates with Airtable. It can extract text from a PDF file in an attachment field and save this text in a text field. It can process records in bulk and you can set it to run regularly. This can only process PDF file.

atusr
6 - Interface Innovator
6 - Interface Innovator

Should we try to extract the text with CloudConvert or Pdf.co

And if they fail then try an OCR solution?