Feb 27, 2023 03:11 AM
Can airtable take an uploaded document, such as pdf, doc, docx and then extract the text from it?
If so how do I sort that out?
I want users to upload a document rather than paste the words in the document for a smoother UX journey
Feb 27, 2023 06:28 AM - edited Feb 27, 2023 06:30 AM
Unfortunately, Airtable doesn't have native OCR capabilities. However, you could do this with the Vision API extension - https://support.airtable.com/docs/vision-extension - or other OCR APIs with Scripting.
Either will require knowledge of code. If you need support with this, you might look at hiring an Airtable consultant - https://ecosystem.airtable.com/consultants
And/or posting in the Airtable Job Board - https://community.airtable.com/t5/job-board/bd-p/jobs
You might also look at OCR integrations on Zapier
https://zapier.com/apps/airtable/integrations/mindee-ocr
https://zapier.com/apps/airtable/integrations/nanonets-ocr
Feb 27, 2023 08:50 AM
You can use a third party service that can extract text. CloudConvert can convert pdf, doc, and docx to plain text. Pdf.co can convert pdf to text as well. You can integrate with either of these services using scripting or an integration tool such as Zapier or Make.
Dec 17, 2023 01:16 AM
We have created product/extensions that integrates with Airtable. It can extract text from a PDF file in an attachment field and save this text in a text field. It can process records in bulk and you can set it to run regularly. This can only process PDF file.
May 19, 2024 04:11 AM
Should we try to extract the text with CloudConvert or Pdf.co
And if they fail then try an OCR solution?
Oct 07, 2024 11:06 AM
Hey,
Maybe this can be helpfull, but there is a tool that does exaclty this and is built around Airtable !
www.askair.ai
And I know about it, because I made it 😁
I love airtable, and I had soooo many documents to fill in for our accounting, that we built this plateform.
- First you set up your table with every columns you need, and you add descriptions to your columns
- Second you connect askair to airtable, authorize your base, activate your table and you obtain an email for the extractor
- Third, you send your document to the extractor email, it parses the document and it fills an airtable record for you !
Hope that helps !
Oct 07, 2024 10:51 PM
Summary of solutions:
- Vision API extension for airtable: /support.airtable.com/docs/vision-extension
- zapier.com/apps/airtable/integrations/mindee-ocr
- zapier.com/apps/airtable/integrations/nanonets-ocr (G2, 4,8, 78 reviews).
- CLoudConvert (G2 4,7, 20 reviews)
- Pdf.co (G2, 4,7, 72 reviews)
- askair.ai
- datafetcher.com/blog/extract-data-pdfs-airtable-openai
- FileDrop for gsheets (not for airtable)
Which one are you using and what's your opinion?
Nov 29, 2024 10:56 AM
Hello
I have the same requirement , I would like to do it via script-only without Zapier or Make.
I have quite a sparse requirement as trial: PDF.co does not have a free tier.
I tried https://dpdf.io/ Dynamic PDF API which is promising but I get the following error in Airtable:
Error: Fetch response exceeds size limit of 4.5mb
Is it because Airtable runs in Next.JS Vertex? Do you have any suggestion to overcome this?
D