Working with movie scripts and props

Jason_Zuidema · ‎Mar 14, 2020

Is it possible to import a movie script (from writers duet or final draft) with tagged props, and have Airtable categorize them automatically by page and scene number?

Ashwin_P · ‎Mar 14, 2020

Final Draft supports exporting to XML file. It should be possible to parse this to create Scenes, Characters, etc and then load into Airtable. I don’t think this exists though. DM me and we can discuss further.

Jason_Zuidema · ‎Mar 14, 2020

Email me: jason@jz.media

Bill_French · ‎Mar 15, 2020

Yes, but… one must ask, is this the only objective concerning the import of a script into Airtable, or are there other compelling reasons to treat the script more like data? This is an important question to ask because you may be able to achieve your objectives without using Airtable.

If the props are already managed in Airtable, this is a compelling integrated approach. But even so, it would require that the tags in the props data are consistent with the tags extraction from the script content system. If they are just sort’a consistent, then you also need a little AI - a script process that is able to perform clever connections between (for example) - a prop named “Black Scarf” and mentioned in the script as “Scarf (black)”.

Ashwin_P · ‎Mar 15, 2020

Yup that’s right Bill. Is there an API that can identify props/entities from text? Have you tried any of the ML Apis for text analysis provided by google or azure ?

Bill_French · ‎Mar 15, 2020

Not that I’m aware in terms of a specific ML model designed for Hollywood/entertainment industry per-se. But it’s about time, right?

I do have a number of clients in the entertainment sector and I was one of the architects that developed the data infrastructure for the visual effects for the first Matrix film, this included the precursor film What Dreams May Come, both being Manex Special Effects projects.

I’ve never done any work in the realm of scripts and props, but it sure makes sense to treat script content as data to accelerate processes.

An entity extraction model could be trained for props, and you could even do a rudimentary model based on a terms dictionary and some clever regular expression matching.

The most likely candidates for crafting a truly accurate AI process is probably Scikit-Learn, one of the most widely used open-source libraries for machine learning. This library provides accessible tools for training NLP models for classification, extraction, regression, and clustering. Moreover, it provides other useful capabilities such as dimensionality reduction, grid search, and cross-validation. Scikit-Learn has a huge community and a significant number of tutorials to help you get started.

Another one is spaCy, an NLP library for Python that has always made me look absolutely brilliant. :winking_face:

Ashwin_P · ‎Mar 15, 2020

Thank you for the insights Bill.

This API - https://cloud.google.com/natural-language, I think can aid in the process of automatic extraction of props.

I tested a sample and It categorized “car” as “consumer goods”. Most other entities such as “sweater” and “guitar” get categorized as “Other”.

Bill_French · ‎Mar 15, 2020

Yes, it can. This is the underlying NLP framework for DialogFlow which is part of the TensorFlow project and very powerful for creating things like smart chatbots. I’ve used this AI platform extensively for a real-time dispatch-center chatbot that allows users to ask questions like:

Where is 1805?

We trained the system to understand 1805 is an entity known as a vehicle ID. The system then performs queries into the LA-area real-time network tracking all public buses and brings it up instantly on a map with other real-time analytics along with live video.

The dispatch operator might just as easily type:

Find 1805, Show 1805, Locate 180*

Each of these queries is supported in our [Stream It] solution not because we had to program each of them - rather, we simply had to teach DialogFlow that find, show, and locate are common command entities (process-related synonyms) and four-digit numbers in context with command entities are indeed vehicle IDs.

Other command entities provide things like:

Hold 1805 at next stop

This triggers a process automation that flashes an urgent message on the heads-up display for the driver.

For the world of props, a similar effort would be needed to teach the model that artefact references are entities that match up with actual production requirements. What’s so cool about this science is that once you teach the model with approximately 100 artefacts, it surmises the remaining near-infinite possible things that you might be referencing in a script.

Ashwin_P · ‎Mar 15, 2020

very cool !! Is there any tutorial that you recommend for somebody like me who has no idea about all this and is curious to learn the basics?

Thanks.

Bill_French · ‎Mar 15, 2020

I recommend diving into DialogFlow because you can go really far and even test your models without ever writing a line of code. It’s tedious wading through it all because there are so many new concepts and vernacular to understand, but the very first video (at dialogflow.com) sets the table pretty well.

Then you’ll need to start here with a barrage of examples and tutorials that should keep you deep in the rabbit hole until Coronavirus dies out.