Forecasting Framework for Mozilla
Each forecast is executed via a metaflow flow in the flows
directory. Currently supported forecasts are:
- Ads Tiles Revenue Forecast (ad_tiles_forecast)
There are additionally helper functions in (TBD when kpi_forecasting branch gets merged)
This project uses uv for project and dependency management. To be able to use it you'll need to have it installed on the system you're using. This is very easy to do with pip (pip install uv
). Once it's installed, you can install the new dependencies in a virtualenv with uv sync
in the root directory of the project. This will create a new virtual environment in a new .venv
directory.
To use darts, there some additionaly dependencies you may need. For example, installing libomp on a mac with homebrew with brew install libomp
Once the virtualenv is set up, a pipeline can be run locally with uv run <PATH TO FLOW FILE> run
To run locally the GCP_PROJECT_NAME
environment variable must be set to a profile that the user can create a client from. For most people data scientists mozdata
will work. This can be set via an rc file with the command export GCP_PROJECT_NAME=mozdata
, or it can be put in front of the command to run the job like METAFLOW_PROFILE=local GCP_PROJECT_NAME=mozdata uv run flows/ad_tiles_forecast.py run
To run a notebook that can access the local metaflow, put %env METAFLOW_PROFILE=local
in a cell before loading the metaflow package.
When you set up outerbounds (see next section), a new metaflow config file is created. This means you'll be using Outerbounds' perimeters for authentication by default from then on. To use your local authentication, you want to make sure there is a file at ~/.metaflowconfig/config_local.json
that only has an empty json object in it (this might be created by default, if not you can create it yourself). You can then run with this local profile by doing: METAFLOW_PROFILE=local uv run flows/ad_tiles_forecast.py run
. See: (https://outerbounds.com/docs/use-multiple-metaflow-configs/)
Outerbounds is used to run metaflow flows in the cloud. The code is run with a docker image via kubernetes. An image is created by circleci and stored in GCR for use in Outerbounds at us-docker.pkg.dev/moz-fx-mfouterbounds-prod-f98d/mfouterbounds-prod/moz-forecasting
. See the Development.Docker
section below for more details.
If you are new to Outerbounds, you'll need to be added to the revenue
permimeter. If you do not have access, reach out to Chelsea Troy or another member of the MLOps team. You can see which perimeters you have access to with the outerbounds perimeter list
command. You will also need to configure metaflow to use Outerbounds. Instructions for doing this can be found in the mlops template repo README
Once this is set up, make sure you are in the revenue
perimeter locally with the command outerbounds perimeter switch --id revenue
. The whole pipeline can then be run in Outerbounds from the command line using the --with kubernetes
flag like:
uv run <PATH TO FLOW FILE> run --with kubernetes:image=kubernetes:image=us-docker.pkg.dev/moz-fx-mfouterbounds-prod-f98d/mfouterbounds-prod/moz-forecasting:latest
One nice feature of metaflow is that a specific step can be configured to run in the cloud. This is done via the @kubernetes
decorator. As with the command line argument, the docker image needs to be specified. It would go before the step decorator and would look somethign like @kubernetes(image="us-docker.pkg.dev/moz-fx-mfouterbounds-prod-f98d/mfouterbounds-prod/moz-forecasting:latest", cpu=1)
Backfills can be executed via backfill.py
. It is run via uv run
from the root of the project. The script iterates over all the months between start_month
and end_month
inclusive, setting forecast_month
to each month and running the pipeline. It will raise an error if you attempt to write and at least one of the months included already has data (meaning that there is a forecast_month
in the table equal to the month you are including). It is run from the root directly with the following arguments:
- test_mode: whether or not to write to the test or prod table. This also controls whether it runs locall (False) or in Outerbounds (True)
- start_month: the earliest month to include in the backfill
- end_month: the latest month to include in the backfill.
- config: path to the config file from the root of the project. This is required so that the output table can be checked
- flow: path to the flow file from the root of the project
An example of the command to run the ad_tiles_forecast
pipeline for only 2024-02
: uv run backfill.py --start_month=2024-02 --end_month=2024-02 --config=moz_forecasting/ad_tiles_forecast/config_2025_planning_baseline.yaml --flow=moz_forecasting/ad_tiles_forecast/flow.py
Tests can be found in the tests
directory and run with uv run pytest
Linting is done via ruff. In the CI it is currently pinned to version 0.6.5. It does not need to be added as a dependency to the project and can be run as a tool via uv [email protected] check
and uv [email protected] format
Note that docstrings are included in the rules and must be in numpy format.
Linting and Testing is run via circleci
A docker image is built for this project and is used by Outerbounds to run it. The image is created by the CI and is only accessible to the service account associated with this project, moz-fx-mfouterbounds-prod-f98d
. The image is stored in GCR, which the CI interacts through via the gcp-gcr
orb. The image is only updated when a commit is made on the main
branch, which (due to the repo's branch protection rules) will only happen when a PR gets merged.
New flows should be added to a directory under the moz_forecasting
project directory. Each flow should read in a config file which should parameterize as much about the flow as possible. In order to be compatible with the backfill code in backfill.py
,
the config should have an output
section containing test
and prod
sections, each specifying the table
, dataset
and project
associated with the output using keys with those names. For example:
output:
test:
project: moz-fx-data-bq-data-science
dataset: jsnyder
table: amp_rpm_forecasts
prod:
project: moz-fx-data-shared-prod
dataset: ads_derived
table: tiles_monthly_v1