Back to overview
Maintenance

Application Availability and Resilience

Feb 23, 2026 at 5:00pm UTC  –  Feb 27, 2026 at 10:00pm UTC
Affected services
Portfolio e Blog
Portfolio e Blog

Resolved
Feb 23, 2026 at 5:00pm UTC

  1. Application Availability (Uptime): The measure of the percentage of time an application is accessible and functioning correctly to users. It focuses on preventing downtime through redundancy, such as deploying across multiple servers or data centers.

  2. Application Resilience (Recovery): The ability of a system to continue operating, or to gracefully degrade, during unplanned disruptions (e.g., component failures, network outages) and quickly recover normal functionality.

Key Aspects:
- Fault Tolerance: The design of a system to handle errors without interrupting the user experience.
- Graceful Degradation: Maintaining core functionality even when non-essential features fail.
- Automation: Using tools (like Kubernetes) to automatically self-heal and re-route traffic without manual intervention.
- Core Metrics: Measured by Service Level Objectives (SLOs), Recovery Time - Objectives (RTO—how fast you recover), and Mean Time to Recovery (MTTR).