(download a PDF version of this report)
On August 17th starting at 6:30am EDT the V3 comments API started to show failures. At 1pm the degradation started causing intermittent failure in V3 comment widgets loading across all sites. Investigation continued till 3pm when the issue was resolved.
Our V3 comment widget usage increased more than 4x the normal average usage from August 10th to 17th. Viafoura’s infrastructure autoscaling is normally triggered by CPU usage in compute resources and scaled the compute resources accordingly, however, the database connection pool and heap size proved to be a bottleneck during the scaling and needed to be scaled at a higher rate than the compute resources. Failure to scale the database connection pool size eventually caused degradation in comments API responses.
From August 17th 6am till 1pm the users experienced degradation in comments widgets load times. From 1pm to 3pm afterwards, API timeouts and failures in loading comments widgets were experienced. The incident only impacted V3 comment widgets and loading V3 comments in the moderation console. V2 comment APIs and widgets were not impacted.
The performance bottleneck was identified and corrected accordingly. The application memory allocated to each compute node was also adjusted to accommodate for the handling of more concurrent connections. By August 17th at 3pm, after the adjustments all services were back to normal operation.
We observed proper autoscaling based on CPU usage after increased connection pool sizes were applied and the adjustment will prevent connection pool size becoming the bottleneck in the future autoscaling events.