Reducing Costs and Improving Observability With Loki
George Graham, Shawn Saavedra and Gladson George all contributed to this piece.
As one of the 3 pillars of Observability, logs help engineers understand applications, troubleshoot anomalies and deliver quality products to customers. ActiveCampaign produces large volumes of logs and has historically maintained multiple fragmented ELK (Elasticsearch, Logstash, and Kibana) implementations across different teams and AWS accounts. Each development team was responsible for the management of their own ELK stack, which led to a wide variance of logging standards, governance, and a limited ability for correlation across ActiveCampaign platforms.Â
This proved challenging for a few reasons. ELK is expensive at scale, requiring pre-provisioned Elasticsearch storage at a rate of $0.30/GB. Accounting for current and estimated growth, ELK datastores were forecast to grow and cost several 10s of thousands of dollars per month. In addition, log-based alerting is not an option in the open source version of ELK. The ELK stacks were cumbersome to maintain, expensive to operate, and were limiting our ability to efficiently drive correlation of events across our platforms and alert driven responsiveness to critical events when they did manifest.