Skip to content

CampusPulse/data-ingest

 
 

Repository files navigation

event-data-ingest

Pipeline for ingesting data about events on campus.

Contributing

How to

  1. Configure your environment (instructions on the wiki).
  2. Choose an unassigned issue, and comment that you're working on it.
  3. Open a PR containing a new fetch, parse, or normalize script! (details on these stages)

Run the tool

See the wiki for instructions on how to run event-data-ingest.

Production Details

For more information on (pipeline stages) and how to contribute, see the wiki!

The below details on interacting with our production environment are intended for staff developers.

Overall setup

In production, all stages for all runners are run, and outputs are stored to the vaccine-feeds bucket on GCS.

If you are developing a feature that interacts with the remote storage, you need to test GCS then install the gcloud SDK from setup instructions and use the vaccine-feeds-dev bucket (you will need to be granted access).

Results are also periodically committed to vaccine-feed-ingest-results.

Loading to a frontend API

To load the generated output to a frontend API, the following bash one-liner can be used to grab the most recent normalized output from all runner stages and concatenate them together into one file.

find out -type f -mtime -1 -exec ls -lt {} + | grep "normalized" | awk '{print $NF}' 2> /dev/null |xargs cat > "$(date +'%Y-%m-%d')_concatenated_events.parsed.normalized.ndjson"

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.7%
  • Shell 3.1%
  • Dockerfile 0.2%