502.3 Bad Gateway “The operation timed out” with IIS Application Request Routing(ARR)
When working with ARR deployments one of the errors you might see is 502.3 Bad Gateway. The 502.3 means while acting as a proxy ARR was unable to complete the request to the upstream server and subsequently send a response back to the client. This can happen for multiple reasons including , failure to connect to the server , no response from the server, or server took too long to respond (time out).
For the purposes of this post we are going to look at a timeout error and the data that can be gathered to help isolate the cause.
If you are looking at this post then you probably have already seen this error or something similar. This is shown in the browser when detailed errors are enabled in IIS.
Another way to identify the source of the 502.3 is with Failed Request Tracing logs in IIS configured to capture Status code 502.
From the message the key details are the ErrorCode which you can use to map to to the Winhttp error message, which in this case is ERROR_WINHTTP_TIMEOUT ( Reference WinHttp Error Codes). You will also see in the next line that this is translated to “The operation timed out”. Note that both the 0x80072ee2 and 2147954402 map to the same error ERROR_WINHTTP_TIMEOUT.
Now that we know its a timeout we need to determine what type of timeout occurred. Here is a list of the timeouts that can occur in Winhttp ( which if you haven't guessed already is what ARR uses to proxy requests)
- ResolveTimeout : This occurs if name resolution takes longer than the specified timeout period.
- ConnectTimeout : This occurs if it takes longer than the specified timeout period to connect to the server after the name resolved.
- SendTimeout : If sending a request takes longer than this time-out value, the send is canceled.
- ReceiveTimeout : If a response takes longer than this time-out value, the request is canceled.
To identify what type of timeout we can use Winhttp’s built in logging,These can be enabled from the command line on the ARR server using NETSH.
- Winhttp Traces:
Following the example below you can search your log for WinHttpOpenRequest to find your request. The calls to WinHttpSetTimeouts are setting the 4 timeout values based on your ARR time settings found in the Proxy Configuration page of your ARR Server Farm or in the Server Proxy settings. Next we find WinHttpSendRequest then WinHttpReceiveResponse , so we know we are now in the receive stage. Finally we see that the error in is in RecvResponse so we know this is a ReceiveTimeout. For a different failure such as ResolveTimeout we would not see the log make it to WinHttpReceiveResponse and the failure would be logged earlier.
10:23:45.100 ::WinHttpOpenRequest(0x35a970, "GET", "/sleep/default.aspx", "HTTP/1.0", "", 0x0, 0x00000080) ………… 10:23:45.100 ::WinHttpSetTimeouts(0x2e42d80, 30000, 30000, 30000, 30000) 10:23:45.100 ::WinHttpSetTimeouts() returning TRUE …………10:23:45.100 ::WinHttpSendRequest(0x2e42d80, "Accept: */*\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: en-CA\r\nHost: contoso.com", 479, 0x0, 0, 0, 24794c0) …………
10:24:15.397 ::Completing WinHttpSendRequest() with success; Request Handle = 02E42D80, Context = 024794C0, Buffer = 00000000 (0x0), Buffer Length = 0 10:24:15.397 ::WinHttpReceiveResponse(0x2e42d80, 0x0) ………… 10:24:15.397 ::sys-recver failed to receive headers; error = ? (1460) 10:24:15.397 ::ERROR_WINHTTP_FROM_WIN32 mapped (?) 1460 to (ERROR_WINHTTP_TIMEOUT) 12002 10:24:15.397 ::sys-recver returning ERROR_WINHTTP_TIMEOUT (12002) from RecvResponse() 10:24:15.397 ::sys-req completes recv-headers inline (sync); error = ERROR_WINHTTP_TIMEOUT (12002)
Now that we know this is receive timeout we can look at the content server and see how long the request took.
-
IIS logs on the Content Server:
Examine the IIS logs on the content server and check the sc-status and sc-win32-status and time-taken fields. This will give you an idea of whether the request processed successfully ( sc-status = 200) and the time-taken to see if this exceeds your ARR timeout and if this is the expected execution time for your web page . You can determine from this whether you need to troubleshoot a long running application or simply increase the ARR timeout settings. Checking the Win32 field for errors such as 1236 (ERROR_CONNECTION_ABORTED) or 64 (ERROR_NETNAME_DELETED) indicate that something happened on the network layer such as a connection reset.
Using our example the IIS logs here show that the Request took ~35 seconds (time-taken=35615) , the request was processed successfully on the server (sc-status=200) , but there was a problem sending the request (sc-win32-status=64) which means the connection was gone when the content server tried to send the request. This was because the client (ARR) has already timed-out and closed the TCP connection.
#Software: Microsoft Internet Information Services 7.0 #Version: 1.0 #Date: 2010-06-23 20:11:33 #Fields:date time cs-method cs-uri-stem s-port sc-status sc-substatus sc-win32-status time-taken 2010-06-23 20:11:33 GET /sleep/default.aspx 80 200 0 64 35615
Summary
So in this case we can see that the request took >35 seconds which is longer than the default timeout in ARR. When this occurs ARR ( or Winhttp underlying ARR) will close the connection to the content server which is what cause the Win32 error 64.
Now its up to you to determine whether its acceptable that your page is running for 35 seconds and you just need to increase time outs in ARR.
Since the application issues are beyond the scope of this blog I’ll leave you with two command lines for setting ARR timeouts for either a Server Proxy configuration or Server Farms.
Server Proxy : appcmd.exe set config -section:system.webServer/proxy /timeout:"00:00:45" /commit:apphost
Server Farm : appcmd.exe set config -section:webFarms /[name='ArrFarm'].applicationRequestRouting.protocol.timeout:"00:00:45" /commit:apphost
References
Netsh Commands for Windows Hypertext Transfer Protocol (WINHTTP)
Failed Request Tracing in IIS (FREB)
UPDATE : New Post for Winhttp Tracing Methods