Back to overview

BGPKIT Broker API outage resolved

Apr 17 at 08:00am PDT
Affected services
BGPKIT Broker API

Resolved
Apr 17 at 08:00am PDT

Broker API outage on 2023-04-17 post-mortem

The broker API suffered from a 2 hour long outage during which queries were timing out frequently.

TL;DR: a surge of queries overloaded our database; we switched our database instance from AWS-hosted PostgreSQL to a bare-metal-hosted instance with significant more processing power.

Root cause

The outage was caused by a significant increase in queries that overloads our cloud-based database hosted on AWS. The database is not provisioned to handle that much of load, resulting in queries timed out waiting for responses.

Response

We have long been running multiple instances of the BGPKIT Broker databases, one on AWS with some basic provision while the other runs on a newly-provisioned Equinix Metal bare-metal instance. This outage was caused by our AWS database instance unable to handle surge of requests. We pointed our API endpoint to using our bare-metal instance and the issue was resolved.

Next steps

We will continue improve our infrastructure and monitoring to make sure we understand the query loads and allow all users to continue use our broker services freely.