How Scrape Works

The scrape runs every six hours on a schedule that shifts with daylight savings time. The scrape is built from scripts that manipulates files in directories. Some files are rolled up from similarly named files in subdirectories. github

github

The cron script sequences a series of scripts that collect data from federation sites.

cron script

The sites directory accumulates flat file indices of data collected from scraped sites.

sites directory

The activity directory holds new information discovered fresh with each scrape.

activity directory

The public directory serves composite files used by various downstream reports.

public directory

Here we depart from title case for names of these unix elements. We use suffix words rather than extensions but actual file names will be mentioned on the pages that explain them.

Merge All Graphs for a single diagram of all pages.

Merge All Graphs

See How Search Works where data collected here is used.

How Search Works