LA Metro Scrapers Documentation

Welcome to the documentation for the LA Metro Scrapers! Here, you’ll find information about local development, deployment, and an overview of each scraper (and decisions that we’ve made about them).

How do they work?

At a high level, the scrapers retrieve information from Metro instances of the Legistar interface, also known as InSite, and the Legistar API (endpoints at https://webapi.legistar.com/metro/*).

See the relevant scraper documentation for more information about where information comes from, and how it is parsed.

How are they run?

The scrapers are run by Airflow and populate LA Metro Councilmatic instances, outlined below.

Scraper image tag	Airflow instance	Metro instance
`main`	https://la-metro-dashboard-heroku.datamade.us/home	https://la-metro-councilmatic-staging.herokuapp.com/
`deploy`	https://la-metro-dashboard-heroku-prod.datamade.us/home	https://boardagendas.metro.net

See Deployment for more on how scraper image tags are built.

When do they run?

Tip

See the Airflow dashboard for information about the latest and next scraper runs.

Tip

Scrape schedules are written in UTC!

Subtract 7 hours to convert to Los Angeles time.
Subtract 5 hours to convert to Chicago time.

Mental math getting you down? Try World Time Buddy!

windowed_bill_scrape

Scrape bills with a window of 0.05 at 5, 20, 35, and 50 minutes past the hour. This generally takes somewhere between a few seconds and a few minutes, depending on the volume of updates.

At 5, 20, 35, and 50 minutes past the hour, Sunday through Thursday
At 5, 20, 35, and 50 minutes past the hour, between 12:00 AM and 08:59 PM, only on Friday
At 5, 20, 35, and 50 minutes past the hour, between 06:00 AM and 11:59 PM, only on Saturday

fast_windowed_bill_scrape

Scrape bills with a window of 1 at 35 and 50 minutes past the hour. This generally takes somewhere between a few seconds and a few minutes, depending on the volume of updates.

At 35 and 50 minutes past the hour, between 09:00 PM and 11:59 PM, only on Friday
At 35 and 50 minutes past the hour, between 12:00 AM and 05:59 AM, only on Saturday

fast_full_bill_scrape

Scrape all bills quickly at 5 past the hour. This generally takes less than 30 minutes.

At 5 minutes past the hour, between 09:00 PM and 11:59 PM, only on Friday
At 5 minutes past the hour, between 12:00 AM and 05:59 AM, only on Saturday

windowed_event_scrape

Scrape events with a window of 0.05 at 0, 15, 30, and 45 minutes past the hour. This generally takes somewhere between a few seconds and a few minutes, depending on the volume of updates.

At 0, 15, 30, and 45 minutes past the hour, Sunday through Thursday
At 0, 15, 30, and 45 minutes past the hour, between 12:00 AM and 08:59 PM, only on Friday
At 0, 15, 30, and 45 minutes past the hour, between 06:00 AM and 11:59 PM, only on Saturday

fast_windowed_event_scrape

Scrape events with a window of 1 at 35 and 50 minutes past the hour. This generally takes somewhere between a few seconds and a few minutes, depending on the volume of updates.

At 30 and 45 minutes past the hour, between 09:00 PM and 11:59 PM, only on Friday
At 30 and 45 minutes past the hour, between 12:00 AM and 05:59 AM, only on Saturday

fast_full_event_scrape

Scrape all events quickly on the hour. This generally takes less than 30 minutes.

Between 09:00 PM and 11:59 PM, only on Friday
Between 12:00 AM and 05:59 AM, only on Saturday

person_scrape

Scrape all people and committees. Run in lieu of full scrape on Fridays, when all bills and events are scraped once an hour.

At 03:05 AM, only on Saturday

What do they depend on?

The scrapers have a couple of key dependencies.

pupa is the framework for scraping and organizing data according to the Open Civic Data standard. Our scrapers are subclasses of pupa.Scraper, and we use the pupa CLI to run scrapes.
- See Useful pupa commands for more on the CLI.
python-legistar-scraper is a Python wrapper for InSite and the Legistar API that we use to retrieve data. Our scrapers are also subclasses of the relevant LegistarScraper subclasses from this library.