Proberix Update: Introducing Performance Thresholds
I am happy to announce new features in Proberix!
Monitoring uptime is not just about detecting failures. For many systems, performance degradation can be just as disruptive. Some of our users manage distributed systems where maintaining low and consistent latency across different geographic locations is critical. If response times vary too much, their systems can start failing even when everything is technically "up."
To address this, we’ve introduced Performance-Based Thresholds, allowing locations to be marked as "down" not just when they fail but also when their performance degrades beyond an acceptable level.
Performance-Based Thresholds
Until now, Proberix has primarily relied on failure-based thresholds. If a location fails a certain number of consecutive checks, it is marked as "down." While this works well for detecting complete outages, it doesn’t catch critical slowdowns that can severely impact operations.
Now, users can define a maximum acceptable response time (in milliseconds). If the rolling average response time over a 10-minute period exceeds this threshold, the location is automatically marked as down, and an alert is triggered.
This ensures that monitoring covers both availability and performance stability, giving teams a more complete picture of system health.
How It Works
- Users enable this option by selecting the Enable Performance Threshold checkbox in notification policy settings.
- A threshold value in milliseconds must be specified.
- Proberix continuously calculates a 10-minute rolling average for every probe result.
- If the rolling average exceeds the defined threshold, the location is considered down, triggering an alert.
With Performance-Based Thresholds, teams can detect and respond to slowdowns before they cause failures, improving overall system reliability.
These are the most critical features we are adding while keeping the system lean, focused, and free of unnecessary complexity.