Self Help And The California Public Records Act — The Case Of FilmLA — And Their Weirdly Intransigent Attitude Towards The Law — And A Hacky But Functional Way To Scrape Their Website — Which I Did Over The Last Week Or So — And Now It’s — At Least Theoretically — Possible To Batch Search The Permits

Background: This post follows up on a post from a few days ago, and here’s some useful background from there:

This month Los Angeles activists were forced to think a lot about film permits. First the extraordinary Ktown For All broke what turned into an international story about the City shutting down a COVID test site at Union Station to accommodate a film shoot.

Then less than two weeks later Streetwatch LA member Ian Carr broke the story that an entirely different film company had somehow arranged for a large encampment in front of City Hall East to be swept away in advance of their shoot. Twitter user @publicownedbus also provided valuable info, and then ace Knock LA reporter Cerise Castle also wrote about this incident.1

Recent events have made it clear that we need an effective way to search the content of Los Angeles film permits for names and phone numbers of location managers, locations, and other essential information. Permits are coordinated by an entity called FilmLA. FilmLA is putatively private but is made subject to the California Public Records Act at least by its contract with the City of Los Angeles.2 But FilmLA bossman Paul Audley refuses to comply with the law.

And while I’m not giving up on legal remedies, they take forever and it turns out that it’s not necessary to wait in order to obtain some of the records. In particular, the permits themselves. Audley admits that the permits are subject to the CPRA and they are all in some technical sense available on FilmLA’s website. However, the search is abysmal.

It’s only possible to search on four predetermined fields, which are Permit Number, Company Name, Production Title, and Date of First Activity. If you want other information, like all permits at a given location, you’re out of luck. Not only that, but it’s impossible to search even those fields without being logged in. This excludes search engines from indexing the permits (unless arrangements are made to allow them in, which FilmLA has not done).3

But there’s probably no way to compel these people to let search engines in, even with a lawsuit, so I took matters into my own hands and scraped the site of most of the permits.4 I’m in the process of putting these all on Archive.Org. There are presently more than 45K individual files uploaded but there are over 100GB and it’s taking a while to get them up. The Archive allows search engines to index their site, of course, so eventually all the permits will be searchable on the open internet.

Having all these permits on a hard drive is moderately useful also. The PDFs are machine readable, so it’s possible to search all of them for any term by script, although it’s not so easy to search for multiword phrases.5

For instance, I was able to search for location manager Mary Pat Kasravi, who I picked randomly, and find her associated with at least6 permits Permit_E00019568, Permit_E00019604, Permit_E00020576, Permit_E00021935, Permit_E00022424, Permit_E00022721, Permit_E00024562, Permit_E00026821, Permit_F00018870. If you need her contact info for a film shoot it’s on these permits.

Similarly, if one were interested in permits for filming at the Village Theater, just for example, it’s possible to find at least these ones: Permit_E00019568, Permit_E00019604, Permit_E00019653, Permit_E00020576, Permit_E00021935, Permit_E00021990, Permit_E00022424, Permit_E00022721, Permit_E00026890, Permit_E00028447.7 Try doing either of those searches on!

If you’re interested, I searched the PDFs using the following Bash script, launched from the directory containing all the permits. Obviously there’s a lot of room for improvement, but it works:

for item in Permit_*
  search_result=`pdftotext -q ${item} – | grep -i <searchstring>`
  if [[ ! -z $search_result ]]   then
    echo “————”
    echo “$item”
    echo “$search_result”

Where “<searchstring>” is the word you’re looking for. I don’t know enough about either style=”font-size: 80%; font-family: monospace; font-weight: bold;”>grep or style=”font-size: 80%; font-family: monospace; font-weight: bold;”>pdftotext to predict what would happen with multiple words in quotes. I suspect that even if the Bash quoting could be worked out the method will fail if there’s a linebreak in between any of the terms. I’m too lazy to figure it out right now, but I did the two word search for “Village Theater” by chaining a couple of greps:

search_result=`pdftotext -q ${item} – | grep -i village | grep -i theater`

Finally, in case you’re interested, I scraped the site by using the developer console to see the GET request a browser sends to FilmLA to get the next page of permits. I copied that into a script that ran through all 750 pages (or whatever it was) of results and got the download links. I manipulated these via script into a number of html files with about 30K links each and then used the Firefox extension Downthemall to download the files.8

I wasn’t able to download the permits themselves with a script because the URLs are hidden behind a server-side script and I don’t understand the scripting tools well enough to handle that, but Downthemall does! Anyway, that’s the story, and eventually, like I said, none of this nonsense will be necessary and we’ll be able to use a search engine to understand this public information, as we should have been able to do all along!

  1. Full disclosure: Castle is a freelance reporter rather than a Knock LA reporter. I said it the way I said it because I wanted room to link to Knock LA separately. OK, are you happy now?! Me neither. I mean me too!
  2. I say “at least” because it’s almost certainly true that if the question were asked in court the answer would be that FilmLA is subject to the CPRA as a matter of state law as well. As surely it’s subject to the Brown Act, and that’s not in their contract. But neither question has been asked, so for now, at least, it’s their contract that’s keeping them subject to the CPRA.
  3. I assume FilmLA doesn’t allow search engines to log in because none of the contents of the permits is searchable on Google. I also have a CPRA request in on it but I’m expecting them to tell me there are no responsive records, which will confirm my theory. That or else they’ll just continute to argue pointlessly about it, which will confirm my theory that the City of Los Angeles thinks it’s more efficient in the long run to pay CPRA lawyers to sue them than to comply with the damn law. Not saying they’re wrong from their point of view, either.
  4. Not all because my method is inefficient and cobbled-together, but I surely got most of them. I have a request in to Paul Audley for all the permits, which is still necessary since I can’t be sure my collection is complete. I’m pretty sure it’s not, actually, but not by much.
  5. What I mean is that searching reliably for multiword phrases is beyond my technical ability. I can write a script to do it but I wouldn’t be able e.g. to testify under oath that it was doing it correctly, and it sure takes a long time.
  6. At least these because after I got enough results to show the method works I killed the search script, because who really cares what this particular answer is?
  7. Again, this result is far from complete. Again I killed the search once I got a few results.
  8. I’m leaving out the details because they’re too involved. If you’re interested in doing this feel free to ask me about it if you have questions.

Leave a Reply

Your email address will not be published. Required fields are marked *