Now Reading
Unifying alerts from varied sources

Unifying alerts from varied sources

2023-11-26 06:12:43

TL;DR;

On this weblog put up, we are going to display the energy of a unified API in consolidating and managing alerts. We’ll create a workflow that, upon an alert triggers, generates a ServiceNow ticket, enriches it with information from a manufacturing database, and notifies the stakeholders.

What’s in it for you

This technical weblog put up will information you on :

  1. Join with any device that generates alerts.
  2. Combination all alerts in a single interface.
  3. Improve alerts with extra info from varied sources.
  4. Automate processes based mostly on these alerts.

Introduction

Earlier than we delve into the technicalities, let’s have a quick introduction.

What’s Maintain?

Maintain is an open-source alert management and automation platform that integrates together with your monitoring instruments’ alerts and supplies an abstraction layer.

What’s the issue Maintain solves?

Regardless of a pattern in the direction of consolidation within the observability house, many organizations nonetheless make the most of a number of instruments to generate alerts.

The Grafana’s Observability Survey from 2023 signifies that over 52% of corporations make use of greater than six observability instruments, usually because of legacy methods, value issues, and particular functionalities.

Maintain terminology

  1. Suppliers – These are third-party instruments that both set off alerts, enrich alerts with information, or notify about alerts. Suppliers can embrace monitoring instruments, databases, ticketing methods, or communication platforms.
  2. Alerts – Primarily, these are occasions or alerts triggered by your monitoring instruments.
  3. Workflows – Configurable automated processes which can be initiated in response to alerts, designed to streamline your response to incidents by executing predefined actions, comparable to opening tickets, sending notifications, or initiating scripts.

Sufficient speaking, let’s get began

Set up the CLI

# Clone Maintain's repo and set up Maintain CLI utilizing poetry
gh repo clone keephq/hold 
cd hold && poetry set up
# or simply set up it utilizing pip
pip set up keepcli
# for different set up choices (e.g. docker) see https://docs.keephq.dev/cli/set up

Configure the CLI

You possibly can simply begin utilizing Maintain’s managed platform with out some other conditions by operating:

# This may launch an oauth2 movement that can create a tenant for you and set you up
hold auth login

In case you are utilizing Maintain’s open supply, run hold config to configure the CLI:

You can begin utilizing Maintain with out API key (the default docker-compose configuration). When you deploy Maintain to manufacturing, examine how to add authentication.

hold config
Enter your hold url [http://localhost:8080]: 
Enter your api key (go away clean for localhost) []: 
Config file created at .hold.yaml

Confirm the whole lot is OK

hold whoami
Api key legitimate{'tenant_id': 'XXXXXX-YYYY-ZZZZ-8b5a-939af9d7f63b'}

Join your instruments

Now we’re going to join all of the suppliers we’d like – Datadog to get the alerts, ServiceNow to create and monitor the tickets, MySQL to counterpoint alerts with manufacturing information, and Slack – to inform who is required.

# no suppliers
hold supplier checklist
+----+------+------+--------------+-------------------+
| ID | Kind | Title | Put in by | Set up time |
+----+------+------+--------------+-------------------+
+----+------+------+--------------+-------------------+

# checklist obtainable suppliers
hold supplier checklist --available
+-----------------+-------------------------------------------------------+
|     Supplier    |                      Description                      |
+-----------------+-------------------------------------------------------+
|       aks       |           Enrich alerts utilizing information from AKS.          |
...
|      zabbix     |        Pull/Push alerts from Zabbix into Maintain.        |
|     zenduty     |              Create incident in Zenduty.              |
+-----------------+-------------------------------------------------------+

Now, let’s join datadog, MySQL, servicenow and slack

# For each supplier, you'll be able to what authentication particulars wanted
hold supplier join datadog --help
+----------+--------------+----------+-----------------+
| Supplier | Config Param | Required |   Description   |
+----------+--------------+----------+-----------------+
| datadog  |   api_key    |   True   | Datadog Api Key |
|          |   app_key    |   True   | Datadog App Key |
+----------+--------------+----------+-----------------+
# Join Slack
hold supplier join slack --provider-name slack-prod --webhook-url https://hooks.slack.com/companies/T03PMXXXXX/B0656YYYY/yQ7zncdkuhzrGDWILtuZZZZZ
Supplier slack-prod put in efficiently
Supplier id: 82a2c69d26e64d3f8ec81eb25d13f972

# Join datadog
hold supplier join datadog --provider-name datadog-prod --api-key XXXXXXX --app-key YYYYYYY
Supplier datadog-prod put in efficiently
Supplier id: e33c9960d862453dace829f6a8aecbcf

# Join mysql
hold supplier join mysql --provider-name mysql-prod --username dbuser --password dbpass --host keepdb
Supplier mysql-prod put in efficiently
Supplier id: d1c3a24621254565970ac6fab74697b7

# Join Service Now
hold supplier join servicenow --provider-name servicenow-prod --service-now-base-url https://dev123456.service-now.com --username consumer --password password

# Confirm the suppliers related
hold supplier checklist
+----------------------------------+------------+-----------------+-------------------+----------------------------+
|                ID                |    Kind    |       Title      |    Put in by   |     Set up time      |
+----------------------------------+------------+-----------------+-------------------+----------------------------+
| e33c9960d862453dace829f6a8aecbcf |  datadog   |   datadog-prod  | apikey@keephq.dev | 2023-11-08T13:23:29.531775 |
| d1c3a24621254565970ac6fab74697b7 |   mysql    |    mysql-prod   | apikey@keephq.dev | 2023-11-08T13:26:12.249923 |
| 066f2a02326c41819c19d61ed6976b65 | servicenow | servicenow-prod | apikey@keephq.dev | 2023-11-08T13:28:35.930792 |
| 82a2c69d26e64d3f8ec81eb25d13f972 |   slack    |    slack-prod   | apikey@keephq.dev | 2023-11-08T13:19:00.539780 |
+----------------------------------+------------+-----------------+-------------------+----------------------------+

If we go the the UI at http://localhost:3000, we will see that the suppliers are put in:

Evaluate the alerts

On this part, we’re going to assessment the alerts, present how the alert appears in Maintain, and display enrichment and filtering capabilities.

bash
# checklist all alerts
hold alert checklist
+---------------------+------------------------------------------------------------------+--------------------------------+----------+-----------+-------------+---------+-------------+---------------------+
|          ID         |                           Fingerprint                            |              Title              | Severity |   Standing  | Setting | Service |    Supply   |    Final Acquired    |
+---------------------+------------------------------------------------------------------+--------------------------------+----------+-----------+-------------+---------+-------------+---------------------+
| 7308482322424796476 | 5bcafb4ea94749f36871a2e1169d5252ecfb1c589d7464bd8bf863cdeb76b864 |  Unauthorized entry to API    |   excessive   | Recovered |  undefined  |   None  | ['datadog'] | 2023-11-13T15:32:38 |
| 7308433771057253905 | 39f3a0d2cfe87885be0283c94ffd1cc35be1fd1bdd108c86ddf8e9db5d3bd7f0 |           Take a look at Alert           | important | Recovered |  undefined  |   None  | ['datadog'] | 2023-11-13T14:44:24 |
...
extra alerts
...
+-----------+----------------------------+----------------------------+----------+--------+-------------+----------+-------------+---------------------------+

# Filter by attribute
hold alert checklist --filter service=keep-api
+-----------+----------------------------+----------------------------+----------+--------+-------------+----------+-------------+---------------------------+
|     ID    |        Fingerprint         |            Title            | Severity | Standing | Setting | Service  |    Supply   |       Final Acquired       |
+-----------+----------------------------+----------------------------+----------+--------+-------------+----------+-------------+---------------------------+
| 120458754 | 5bcafb4ea94749f36871a2e1169d5252ecfb1c589d7464bd8bf863cdeb76b864  | 4xx-5xx Standing Code Alert  |  medium  |   OK   |  manufacturing | keep-api | ['datadog'] | 2023-05-31T10:59:29+00:00 |
| 122655180 | 5bcafb4ea94749f36871a2e1169d5252ecfb1c389d7464bd8bf863cdeb76b864 | Unauthorized entry to API |   excessive   |   OK   |  manufacturing | keep-api | ['datadog'] | 2023-11-08T13:29:31+00:00 |
+-----------+----------------------------+----------------------------+----------+--------+-------------+----------+-------------+---------------------------+


hold alert checklist --filter severity=important
+-----------+-------------+------------+----------+--------+-------------+----------+-------------+---------------------------+
|     ID    | Fingerprint |    Title    | Severity | Standing | Setting | Service  |    Supply   |       Final Acquired       |
+-----------+-------------+------------+----------+--------+-------------+----------+-------------+---------------------------+
| 117493674 |  5bcafb4ea94749f36871a2e1169d5252ecfb1c589d7464bd8bf863cdeb76b862 | Prod Alert | important |   OK   |  manufacturing | tal-test | ['datadog'] | 2023-09-13T11:20:25+00:00 |
+-----------+-------------+------------+----------+--------+-------------+----------+-------------+---------------------------+

However what’s even cooler is that we will filter on ANY alert attribute. Along with that Maintain allows you to enrich alerts with attributes from totally different sources, and you may obtain very cool issues.

To place issues into earth, for instance we created (we are going to in fact automate this later) a ticket in our ticketing system.
We need to correlate the alert with the ticket, so we can sync any additional modifications to the ticket.

We additionally need details about the client that’s saved on our clients’ database. We will get this info by operating

See Also

choose * from clients the place customer_id = %customer_id%

+----+---------------------+------------+---------------------+--------------+---------------+-----------------------------+--------------------------------------+
| id | identify                | tier       | e-mail               | phone_number | handle       | notes                       | customer_id                          |
+----+---------------------+------------+---------------------+--------------+---------------+-----------------------------+--------------------------------------+
|  1 | ABC Company     | Enterprise | abc@instance.com     | 123-456-7890 | 123 Foremost St   | Buyer since 2010         | 05bc71af-820a-11ee-b23f-0242ac110002 |

Assuming we need to enrich the alert with buyer identify, buyer e-mail and ticket id:

hold alert enrich --fingerprint 39f3a0d2cfe87885be0283c94ffd1cc35be1fd1bdd108c86ddf8e9db5d3bd7f0 customer_id=1234 ticket_id=INC00001 customer_email=abd@instance.com

# Now we will filter by responder:
hold alert checklist --filter ticket_id=INC00001

Create workflows

Up to now, we related the suppliers, reviewed our Datadog alerts, and enriched them with buyer information and ServiceNow tickets.

Now we are going to wrap it up and automate the entire course of utilizing Maintain Workflows.

Anatomy of a Workflow

Earlier than diving into the CLI instructions, let’s assessment the workflow we’re going to run. Maintain Workflows are similar to GitHub Motion workflows. We did not need to invent the wheel right here, so you have to be fairly conversant in the syntax.

The complete workflow YAML could be discovered here.

workflow:
  # some metadata
  id: example-workflow
  description: Enriches the alert and create a ServiceNow ticket

  # The primary half is the triggers. We wish this workflow to execute solely on important alerts. We will filter on any alert attribute and likewise use regex.
  triggers:
    - sort: alert
      filters:
        - key: severity
          worth: important
  steps:
  # Step one is to counterpoint the alert based mostly on the SQL question. We need to add the client identify, e-mail, and tier. 
  - identify: get-more-details
    supplier:
      sort: mysql 
      config: " {{ suppliers.mysql-prod }} "
      # {{ alert.customer_id }} might be extracted on runtime
      with:
        question: "choose * from clients the place customer_id = {{ alert.customer_id }}"
        # Add these fields to the alert so we will use it
        enrich_alert:
          - key: customer_name
            worth: outcomes[0].identify
          - key: customer_email
            worth: outcomes[0].e-mail
          - key: customer_tier
            worth: outcomes[0].tier
  # second half - the actions 
  actions:
    # create the servicenow ticket
    - identify: create-service-now-ticket
      # In case the alert already assigned a ticket id, do not create a brand new one (think about the case when the alert was triggered after which resolved, we do not need one other ticket for the resolved). Additionally, we need to create a ticket just for Enterprise clients.
      if: "not '{{ alert.ticket_id }}' and '{{ alert.tier }}' == 'Enterprise'"
      supplier:
        sort: servicenow
        config: " {{ suppliers.servicenow }} "
        with:
          table_name: INCIDENT
          payload:
            short_description: "{{ alert.identify }} - {{ alert.description }} [created by Keep]"
            description: "{{ alert.description }}"
          # Enrich the alert with these fields so we can have correlation between the alert and the ticket
          enrich_alert:
            - key: ticket_type
              worth: servicenow
            - key: ticket_id
              worth: outcomes.sys_id
            - key: ticket_url
              worth: outcomes.hyperlink
            - key: ticket_status
              worth: outcomes.stage
            - key: table_name
              worth: "{{ alert.annotations.ticket_type }}"

Now after we’ve got the workflow, let’s apply and run it.

# no workflows
hold workflow checklist
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------------------------+--------------------------+----------------+
|                  ID                  |             Workflow ID              |         Begin Time         |                   Triggered By                  |          Standing          | Execution Time |
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------------------------+--------------------------+----------------+
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------------------------+--------------------------+----------------+
# Apply it:
hold workflow apply -f workflow.yaml
Workflow examples/workflows/blogpost.yml utilized efficiently
Workflow id: 652fe84e-5239-425b-8271-40accb1af72f
Workflow revision: 1
hold workflow checklist
+--------------------------------------+-------------------+-----------------------------------+----------+--------------+----------------------------+----------------------------+----------------------------+-----------------------+
|                  ID                  |        Title       |            Description            | Revision |  Created By  |       Creation Time        |        Replace Time         |    Final Execution Time     | Final Execution Standing |
+--------------------------------------+-------------------+-----------------------------------+----------+--------------+----------------------------+----------------------------+----------------------------+-----------------------+
| 652fe84e-5239-425b-8271-40accb1af72f | blogpost-workflow | Enrich the alerts and open ticket |    10    |     hold     | 2023-11-12T08:08:43.585226 | 2023-11-12T14:34:07.544301 |            None            |          None         |
+--------------------------------------+-------------------+-----------------------------------+----------+--------------+----------------------------+----------------------------+----------------------------+-----------------------+
# Run it with alert as enter 
hold workflow run --workflow-id blogpost-workflow --fingerprint 39f3a0d2cfe87885be0283c94ffd1cc35be1fd1bdd108c86ddf8e9db5d3bd7f0
Workflow blogpost-workflow run efficiently
Workflow Run ID 33e71955-81f4-4118-9771-7b638f8c59b0

# Let's assessment the run
hold workflow runs logs 33e71955-81f4-4118-9771-7b638f8c59b0

+-----+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |  ID |         Timestamp          | Message                                                                                                                                                                                                                                                         |
+-----+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 733 | 2023-11-13T16:11:40.462000 | Operating step get-more-details                                                                                                                                                                                                                                   |
| 734 | 2023-11-13T16:11:40.463000 | Motion get-more-details evaluated to run! Motive: no situation, therefore true.                                                                                                                                                                                     |
| 735 | 2023-11-13T16:11:40.524000 | Step get-more-details ran efficiently                                                                                                                                                                                                                          |
| 736 | 2023-11-13T16:11:40.525000 | Operating motion create-service-now-ticket                                                                                                                                                                                                                        |
| 737 | 2023-11-13T16:11:40.525000 | Motion create-service-now-ticket evaluated to run! Motive: no situation, therefore true.                                                                                                                                                                            |
| 738 | 2023-11-13T16:11:44.784000 | Created ticket: {'end result': {'dad or mum': '', 'made_sla': 'true', 'caused_by': '', 'watch_list': '', 'upon_reject': 'cancel', 'sys_updated_on': '2023-11-13 14:11:41', 'child_incidents': '0', 'hold_reason': '', 'origin_table': '', 'task_effective_number': 'INC' |
| 740 | 2023-11-13T16:12:47.552000 | Enriching alert                                                                                                                                                                                                                                                 |
| 741 | 2023-11-13T16:12:47.572000 | Alert enriched                                                                                                                                                                                                                                                  |
| 742 | 2023-11-13T16:12:47.573000 | Motion create-service-now-ticket ran efficiently                                                                                                                                                                                                               |
| 743 | 2023-11-13T16:12:47.574000 | End to run workflow blogpost-workflow                                                                                                                                                                                                                        |
+-----+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

hold workflow runs checklist
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------+-------------+----------------------------------------------------+----------------+
|                  ID                  |             Workflow ID              |         Begin Time         |          Triggered By         |    Standing   | Error                                              | Execution Time |
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------+-------------+----------------------------------------------------+----------------+
| 103df0aa-d6be-4290-9938-1563f8005e55 | 75c7eba2-51dc-411d-b39c-a500c98e3893 | 2023-11-13T14:11:37.911898 | manually by apikey@keephq.dev |   success   | None                                               |       69       |
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------+-------------+----------------------------------------------------+----------------+
# Let's make certain the alert was enriched with the ticket id
hold alert get 39f3a0d2cfe87885be0283c94ffd1cc35be1fd1bdd108c86ddf8e9db5d3bd7f0 | jq .ticket_id
"0f9982ec97667110beb0f0571153afa1"
# :)

Voila! Now, at any time when an alert is triggered, it will likely be routinely enriched with information from our manufacturing database, and acceptable actions might be taken. If the alert is of excessive or important severity, a ServiceNow ticket might be created and the alert might be up to date with the ticket ID. For much less extreme alerts, the related particular person will merely be notified.

Subsequent steps

1. Be a part of our Slack and begin speaking about alerting and monitoring.
2. ⭐️ Keep repo.
3. Begin enjoying with Maintain (no bank card wanted!) at https://platform.keephq.dev
4. Lacking any supplier/function? simply open a difficulty at https://github.com/keephq/hold and we are going to add it ASAP (and naturally contributions are welcome!)

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top