Skip to main content

Hey! Some weeks ago I built a super simple, yet very very powerful automation, for helping a friend parse invoices, and thought it would be helpful for others to post a basic guide (and open the discussion for improvements) here.

 



Proof of Concept:

  1. By uploading an Invoice to Airtable.

  2. An automation would use AI to extract, Invoice Line Item Descriptions, Quantity and Totals.

  3. Then the automation would create the corresponding records with precise information.

  4. Therefore my friend can now get all insights from the invoices his company receives (how much they’re spending, what they’re spending on, and a clear picture of overall finances).

 

Needs:

  1. Airtable paid plan for scripting and AI features.
     

Areas of improvement:

  1. Prompt for AI, as well as Json output would need some iterations to allow handling multiple different invoice formats.
  2. Prompt for AI should include an example Json.
  3. Many others for sure
     

High-level setup:

  1. Invoices Table
  2. Invoice Line Items table

Automation:
 

  1. Triggers when attachment field in Invoices table is not empty
  2. AI block prompt: “ {Attachment} contains an invoice. Within it you'll find invoice line items. Each line item will have (i) Description; (ii) Quantity; and (iii) Amount. Please output a json with list of each invoice and each value. Do not output any thing different to the json
  3. Script: 
    // Retrieve the configuration inputs from the automation
    let config = input.config();
    let invoiceJson = config.invoiceJson;
    let recordID = config.recordID;

    // If invoiceJson is passed as a string, parse it into an object
    if (typeof invoiceJson === "string") {
    try {
    invoiceJson = JSON.parse(invoiceJson);
    } catch (error) {
    throw new Error("Failed to parse invoiceJson: " + error.message);
    }
    }

    // If invoiceJson is not an array, check if it's an object with an "invoices" key that is an array
    if (!Array.isArray(invoiceJson)) {
    if (invoiceJson.invoices && Array.isArray(invoiceJson.invoices)) {
    invoiceJson = invoiceJson.invoices;
    } else {
    throw new Error("invoiceJson is not an array or does not contain an 'invoices' array.");
    }
    }

    // Build an array of line item objects from the invoiceJson data
    let lineItems = invoiceJson.map(item => ({
    "Description": item.description,
    "Quantity": item.quantity,
    "Amount": item.amount,
    // If you need to pass the invoice record ID along, include it as needed.
    "Invoice": recordID
    }));

    // Output the list of line items so subsequent steps can access it
    output.set("lineItems", lineItems);
    1. Repeating group with Create Record block, for creating records in Line Items table, mapping values from each line item output of the script.

Video going through it.

What other use cases could this apply to? How could this be further improved? Do you have similar workflows in place?

Happy to discuss the above!

Mike, Consultant @ Automatic Nation

Hi ​@Mike_AutomaticN ,

thank you! Great proposal! I directly “copied” it for “normal” invoices, just reading invoice amount, date and invoicing company, no line items. When I upload a file directly in airtable, it runs perfectly. I now built an automation in make.com that watches a google drive folder and as soon as a document is found there it uploads it to airtable as a new record. In this case, i.e. the record with the attachment is uploaded from make.com the automation (step generate with ai) does just deliver the json without any values; it seems as if the document is not passed in this case, or the automation triggers too fast and the document is not fully uploaded or any other issue… Any clue on that? thank you! br Chris


Hi ​@Mike_AutomaticN ,

thank you! Great proposal! I directly “copied” it for “normal” invoices, just reading invoice amount, date and invoicing company, no line items. When I upload a file directly in airtable, it runs perfectly. I now built an automation in make.com that watches a google drive folder and as soon as a document is found there it uploads it to airtable as a new record. In this case, i.e. the record with the attachment is uploaded from make.com the automation (step generate with ai) does just deliver the json without any values; it seems as if the document is not passed in this case, or the automation triggers too fast and the document is not fully uploaded or any other issue… Any clue on that? thank you! br Chris

It’s probably as you surmise, the automation is triggering too quickly. When your Make automation runs, does it result in the proper file being attached to the Airtable record? 

If so, then it’s happening too quickly and you want to change the trigger to implement some kind of delay. There are scripts that add a couple of seconds on delay into the process, or you could try something like breaking up the process into two automations (when attachments is not empty, check box. When box is checked, run script).


Hey ​@Christian_L! Thanks!!

Following with ​@DisraeliGears01 questions above.

1. Is the Make automation actually attaching a file on your Attachment field?
2. If it is, once you manually open the attachment is that the actual file you were expecting?
3. If so, what is currently your Airtable automation trigger? 
4. If it is not “When a record matches conditions where attachment is not empty”, then you might try with this one.

Re: Scripts for delays need to be used wisely, as if you trigger the automation, use a delay script, and then on the next action you dynamically map data obtained from the trigger, then the delay will not solve the issue as data is still outdated on the trigger. For this to work you would need to Find record with record ID of the record which triggered the automation, to capture any new data, and map this updated data (rather than that of the trigger) on your next action block.

Also, if you are already using Make.com or other automation tools, then you might as well assess having the parsing done on your third party automation tool with some help of OpenAI modudle/node. Just an idea, but it does require some additional development.

I’d be happy to hop on a quick call and take a look at it with you :D

Mike, Consultant @ Automatic Nation


@Mike_AutomaticN , ​@DisraeliGears01 Thank you for the replies!! In fact it was the delay… I found a solution: I added a row that checks if the upload is completed (by formula, as this can be seen on the URL format):

IF(

  AND(

    {_document URL} != "",

    FIND("dl.airtable.com", {_document URL}) > 0

  ),

  1,

  0

)

→ this works now like a charm. Or better, it worked perfectly. Now I have another problem.  The automation, which now starts correctly when the upload is completed, then calls the “generate with AR” and attached the document. This automation worked perfectly, the generate with AI delivered a correct json stream for the documents. Unfortuntely it stopped working - abviously the LLM cannot access the pdf any more. I did not change anything on the prompt; even in test mode; I select the record with the pdf, call the “generate with AI” step, it can be seen, that the pdf is added to the prompt, I do not get a result; I switched the llm (now claude, orginially it was chatgpt, which worked for some days) and claud now delivers at least an error reply: ”No access. The PDF document does not contain any visible text information or content that I can read. The file appears to be empty, or the content is not available in a format that I can read.”
Do you have any clues on that? The files are pdf, in the prompt I have the attachment, I have no clue any more, especially as it already worked…

 

thank you! br Chris

 

p.S: Update: I found a hint now: It works for some pdfs, and for some not… pdf, that I open and print > save as pdfs and then store again in airtable → it works. For some other pdfs (some Of them I scanned, some I used a scanner app on my mobile) it is not working; however If I print> save as pdf them again, then the llms can access them. Any Idea on that? especially on how to include this in the automation to have pdfs that work...


This is really just a guess, but perhaps some of your PDFs aren’t flattened and so the LLM is getting confused by the layers? Then presumably when you save for printing that flattens the file, hence why they start working. 

PDF can really be a mess of a file format sometimes 🤷‍♂️


Reply