Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting integration for violations #580

Open
ritazh opened this issue Apr 21, 2020 · 21 comments
Open

Alerting integration for violations #580

ritazh opened this issue Apr 21, 2020 · 21 comments
Labels
docs Pure prose enhancement New feature or request help wanted Extra attention is needed reporting triaged

Comments

@ritazh
Copy link
Member

ritazh commented Apr 21, 2020

e.g. send violations to slack

This is from the CNCF webinar.
To summary the ask from the webinar: When a violation is detected, it would be good to get an alert from this event into systems like Slack, Datadog, or Prometheus.

@ritazh ritazh added enhancement New feature or request help wanted Extra attention is needed labels Apr 21, 2020
@maxsmythe
Copy link
Contributor

Could you add a link or something so we know what the goal of this bug would be?

This possibly sounds like an enforcement action. I also wonder how this would interact with alerts sourced from Prometheus.

One danger to watch out for: API request volume to the admin server can be extremely high for some kinds, so we risk spamming the alert pipeline without some volume-reducing solution.

@swapnild2111
Copy link

Hello,

I think I am waiting for same. I have deployed Gatekeeper with dryrun enabled. I can see Violations in status field. However am not sure how to set alert for this violations in Datadog / Prometheus / slack anywhere.

Could you please help?

@maxsmythe
Copy link
Contributor

What kind of alerts are you looking to have?

@swapnild2111
Copy link

Alerts as in send a slack message or show it in logs / metrics in datadog saying violations found wit details.

@sozercan
Copy link
Member

We can do a write-up about integrating gatekeeper metrics with prometheus and alertmanager (which includes integrations with slack, datadog and many others)

Other than violations over a certain threshold, is there anything else you would like alerts on?

@swapnild2111
Copy link

That would be great :)

@maxsmythe
Copy link
Contributor

Also, the logs can be parsed for more detailed data about rejections to alert on.

@bytemare
Copy link

bytemare commented May 6, 2020

Hello there 👋
One of the teams I'm working with have deployed OPA Gatekeeper, and we would like to do the same to monitor every policy/compliance violation, not yet block deployments (or the devs would kill us).

Ideally, we would need alerts sent over webhooks in json or syslog, containing all the info about the violations.

Is this possible/configurable at this moment, or planned?
I would gladly help if needed.

Thanks

@maxsmythe
Copy link
Contributor

We are emitting audit violations via stderr/stdout logs. Are you able to pipe those into syslog/ELK/other log aggregator and use those to drive alerts?

That would probably give you the most detailed violation information.

@swapnild2111
Copy link

What I have do is -

  1. Enabled enforcementAction: dryrun
  2. Added --log-denies
  3. Added unique log message for violations.

After this I could see violations in logs, which I am streaming to Datadog.

In Datadog, I have created charts & added monitors by tracing those unique log messages.
The things I can do with this approach are pretty limited.

If I get dryrun_violation_count etc in metrics, things will become much more easier.

@sozercan
Copy link
Member

sozercan commented May 7, 2020

@swapnild2111 you can get violations count, like:

gatekeeper_violations{enforcement_action="deny"} 19
gatekeeper_violations{enforcement_action="dryrun"} 7

See https://github.com/open-policy-agent/gatekeeper/blob/master/docs/Metrics.md for list of all metrics

@swapnild2111
Copy link

thank you, it worked perfectly for me :)

@lechuk47
Copy link

lechuk47 commented Jun 12, 2020

It would be useful to have the constraint details as metric tags. e.g. Having the constraint name and type as tags in the violation metrics will be enough to set alerts on Prometheus.

@morganwalker
Copy link

@swapnild2111 how did you leverage those metrics via Datadog? While I plan on parsing the logs ingested to DD for violations, ideally I'd like to be able to use the metrics in DD.

@swapnild2111
Copy link

@morganwalker sorry for very late reply.

I have below annotations on my deployment to send prometheus metrics to Datadog:

ad.datadoghq.com/manager.check_names: '["prometheus"]'
ad.datadoghq.com/manager.init_configs: '[{}]'
ad.datadoghq.com/manager.instances: '[{"prometheus_url":"http://%%host%%:8888/metrics", "namespace": "gatekeeper-system", "metrics":["*"]}]'
prometheus.io/port: "8888"
prometheus.io/scrape: "true"

Would that be helpful for you?

@teochenglim
Copy link

Anyone working on slack yet?

allow an optional enable slack feature, then you just need 2 inputs usually which is "slack webhook url" and "which channel to send to". Also since webhook is used, you just need a HTTP Post to make it work so not much dependency.

My suggestion is to have 2 kind of slack massage

  1. realtime, scan per violet message
  2. every hour/day/week report of all how many occurrence count, report type. a bit more complex because you need to hold variable, but maybe can reuse prometheus existing metrics?

Same as many other frameworks, slack webhook URL could be created as a k8s secret

helm example

slack:
  enable: true ### default false
  slack_channel: ""
  slack_title: ""  ## If we monitor all security audit in 1 single room, it will be helpful to have a title to know
  slack_text_prefix: ""   ## we can create prefix to tell which cluster is this message from, staging/production
  slack_text_subfix: ""
  slack_webhook_url: ""  ## we don't want create the k8s secret, just give URL. 
  slack_secret_name: ""  ## where we define slack webhook URL
  slack_report:
    slack_cron: "* 2 * * *" ## minute, hour, day (month), month, day (week)

@maxsmythe
Copy link
Contributor

@sozercan did we ever document alert manager integrations? That seems like it would address use case #2.

As for use case #1, that sounds similar to:

#1037
#898
The push based pipeline referenced in #897

IIRC we were also thinking about generic webhook-based reporting at some point

@stale
Copy link

stale bot commented Jul 23, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jul 23, 2022
@ritazh ritazh added docs Pure prose reporting and removed wontfix This will not be worked on labels Aug 2, 2022
@stale
Copy link

stale bot commented Oct 1, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 1, 2022
@ritazh ritazh added triaged and removed stale labels Oct 3, 2022
@debu99
Copy link

debu99 commented Jun 15, 2023

Is this feature available now?

@a-thorat
Copy link

a-thorat commented Nov 7, 2023

@swapnild2111 @maxsmythe
I am trying to implement the violation alerting with MS teams for Gatekeeper Operator installled on OpenShift V.4.13 but not able to achieve as everything coming out of operator. any idea how i can integrate here

@salaxander salaxander self-assigned this Feb 21, 2024
@salaxander salaxander removed their assignment Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Pure prose enhancement New feature or request help wanted Extra attention is needed reporting triaged
Projects
None yet
Development

No branches or pull requests