Web Based Link Checker
I had a client the other day ask me if there was an easy way for him to check a page he had that had a list of over one hundred web sites. Obviously it's very tedious to have to go and click each link one by one, so I offered to try and come up with some code that can do it for him. While I know it's fairly trivial to write a Windows app that would do this, I wanted to make it web-based so it was easy to distribute and re-use within our organization.
Well, as it turns out, there's no "quick and easy" way to parse HTML with server-side ASP.NET code! In a Windows app it's easy to use a Webbrowser control and get an HTMLDocument object, but there's no server-side equivalent. After much searching I ran across something called the HTMLAgilityPack which made it super easy to get a list of A elements in a web page and parse out the text and href information.
Here's some simple code if you want to try something like this yourself ...
linkChecker.aspx
<
form id="form1" runat="server"><div>
URL to check: <asp:TextBox ID="TextBoxURL" runat="server" Width="550px" /> <asp:Button ID="ButtonGo" runat="server" Text="Go" />
</div>
<div>
<asp:literal ID="LinkTable" runat="server" />
</div>
</form>
linkChecker.aspx.vb
Imports SystemImports
System.IOImports
System.NetImports
System.TextImports
System.Runtime.InteropServicesImports
HtmlAgilityPackPartial Class linkCheckerInherits System.Web.UI.Page Protected Sub ButtonGo_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles ButtonGo.ClickDim hw As HtmlWeb = New HtmlWebDim request As WebRequest = WebRequest.Create(TextBoxURL.Text)
'use this line if you need to authenticaterequest.Credentials = New NetworkCredential("username", "password", "domain")Dim response As HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)Dim doc As HtmlDocument = New HtmlDocument
doc.Load(response.GetResponseStream())
'select all the Anchor elements
Dim hrefs As HtmlNodeCollection = doc.DocumentNode.SelectNodes("//a[@href]")Dim links As String = "<table style='width:100%;font-family:verdana;font-size:10pt;'>" & vbCrLfFor Each href As HtmlNode In hrefsDim linkColor As String = "blue"Dim uri As String = href.Attributes("href").ValueIf InStr(uri, "http", CompareMethod.Text) > 0 Thenlinks +=
"<tr><td>"TryDim testReq As WebRequest = WebRequest.Create(uri)Dim myProxy As New WebProxy()myProxy.Address =
New Uri(http://proxy.mydomain.com:8000)'if you go through a proxy for external site but exclude for internals site, put your domain hereIf (InStr(uri, "mydomain.com/", CompareMethod.Text) > 0) ThentestReq.Proxy =
NothingElsetestReq.Proxy = myProxy
End IfDim testRes As HttpWebResponse = CType(testReq.GetResponse(), HttpWebResponse)links += testRes.StatusDescription
Catch ex As Exceptionlinks +=
"<span style='color:red;'>Err</span>"linkColor =
"red"End Trylinks += "</td><td>" & href.InnerHtml & "<br><a style='color:" & linkColor & ";' href='" & uri & "'>" & uri & "</a></td></tr>" & vbCrLfEnd IfNextlinks += "</table>" & vbCrLfLinkTable.Text = links
End SubEnd
Class