Overview of Application Request Routing’s Health Check features
I had a recent question about the ARR health check features which resulted in this overview. Hopefully this is helpful to those of you setting up new ARR deployments.
URL TEST Feature
The URL Test feature tests against a specified URL for the following conditions. If any of the conditions fail the server that failed the test will be taken offline.
- A response was received within the configured timeout period
- The HTTP status meets the configured acceptable status codes
- The body of the response contains the specified text configured in the response match.
If the URL is set to the FQDN of the ARR server then the tests will be performed against all servers configured in the farm.
Servers can be either brought online manually or when the URL test determines the server is healthy again after then next successful test run.
As this feature is limited to only one URL it is recommended to create a smoke test page that is a representation of overall health of the server. ARR can be further configured to look for specific words in the response entity body, so that the results of the smoke test can be taken into the consideration for determining the health.
The feature is disabled if there is no URL present in the dialog.
UI Settings |
Configuration Attribute |
Description |
URL | <attribute name="url" type="string"/> | URL of content you want to perform the check against. Since the test is designed to read the contents of the response, a text file is sufficient. i.e. http://(server/ name or FQDN of ARR server)/healthCheck.txt |
Interval | <attribute name="interval" type="timeSpan" defaultValue="00:00:30" validationType="timeSpanRange" validationParameter="0,86400,1"/> | How often the test is performed |
Time-out(seconds) | <attribute name="timeout" type="timeSpan" defaultValue="00:00:30" validationType="timeSpanRange" validationParameter="1,86400,1"/> | Configurable timeout period to receive a valid response. |
Acceptable status codes | <attribute name="statusCodeMatch" type="string" defaultValue="200-399"/> | Range of acceptable HTTP status codes that signify success. |
Response Match | <attribute name=”responseMatch” type=”string” caseSensitive=”true”/> | String that will be contained in the body of the successfully response. If the string “Healthy” was entered here the response body must contain this string. |
Live Traffic Test (enabled by default)
Will mark a server unhealthy if the following conditions are met.
- If the HTTP response matches the configured failure code range as configured by liveTrafficFailureCodes
- And If there are X # of failures as configured by maxLiveTrafficFailures
- And if the # of failures occur during the configured time per liveTrafficFailurePeriod.
To disable the feature set the Failover period to 0.
UI Settings |
Configuration Attribute |
Description |
Failure Codes | <attribute name="liveTrafficFailureCodes" type="string" defaultValue="500-"/> | Range of Http status codes that signify failure. |
Maximum Live Traffic Failures: | <attribute name="maxLiveTrafficFailures" type="uint" defaultValue="10" validationType="integerRange" validationParameter="1,4294967295"/> |
Specifies the maximum number of failures that are allowed during the failover period. |
Failover period (seconds) | <attribute name="liveTrafficFailurePeriod" type="timeSpan" defaultValue="00:00:00" validationType="timeSpanRange" validationParameter="0,86400,1"/> |
Specifies the failover period in seconds. To disable the Live Traffic test, set this value to 0. |
Minimum Servers |
<attribute name="minServers" type="uint" defaultValue="0"/>
|
Set as a percentage of Healthy servers. When the number of healthy servers drops below this number the following takes place. An Event is raised and logged to the event viewer Requests are route to all servers regardless of their status except for servers that were taken offline manually. |
What happens if all Servers are offline?
The user will receive the following error in the browser:
Failed Request Tracing logs will log the error and reason below.
Summary
When setting up your environment its important to note that the URL Test feature can be used to both mark a server to unhealthy as well as healthy ,while the Live Traffic test is only used to mark a server as unhealthy.
The reason for this is that the URL test has no user impact and there is no risk to the user experience. If you experience transient outages where you want to recover via health checking features it would be recommended to use a combination of both the Live Traffic and the URL test. If you want to failed servers to stay offline until some administrator action then just use the Live Traffic test.