Web Based Link Checker

I had a client the other day ask me if there was an easy way for him to check a page he had that had a list of over one hundred web sites.  Obviously it's very tedious to have to go and click each link one by one, so I offered to try and come up with some code that can do it for him.  While I know it's fairly trivial to write a Windows app that would do this, I wanted to make it web-based so it was easy to distribute and re-use within our organization. 

Well, as it turns out, there's no "quick and easy" way to parse HTML with server-side ASP.NET code!  In a Windows app it's easy to use a Webbrowser control and get an HTMLDocument object, but there's no server-side equivalent.  After much searching I ran across something called the HTMLAgilityPack which made it super easy to get a list of A elements in a web page and parse out the text and href information.

Here's some simple code if you want to try something like this yourself ...

linkChecker.aspx

<form id="form1" runat="server">
<div>
URL to check: <asp:TextBox ID="TextBoxURL" runat="server" Width="550px" /> <asp:Button ID="ButtonGo" runat="server" Text="Go" />
</div>
<div>
<asp:literal ID="LinkTable" runat="server" />
</div>
</form>

linkChecker.aspx.vb

Imports System

Imports System.IO

Imports System.Net

Imports System.Text

Imports System.Runtime.InteropServices

Imports HtmlAgilityPack

 

Partial
Class linkChecker

Inherits System.Web.UI.Page

 

Protected Sub ButtonGo_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles ButtonGo.Click

Dim hw As HtmlWeb = New HtmlWeb

Dim request As WebRequest = WebRequest.Create(TextBoxURL.Text)

 

'use this line if you need to authenticate

request.Credentials = New NetworkCredential("username", "password", "domain")

Dim response As HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)

Dim doc As HtmlDocument = New HtmlDocument

doc.Load(response.GetResponseStream())

'select all the Anchor elements

Dim hrefs As HtmlNodeCollection = doc.DocumentNode.SelectNodes("//a[@href]")

Dim links As String = "<table style='width:100%;font-family:verdana;font-size:10pt;'>" & vbCrLf

For Each href As HtmlNode In hrefs

Dim linkColor As String = "blue"

Dim uri As String = href.Attributes("href").Value

If InStr(uri, "http", CompareMethod.Text) > 0 Then

links += "<tr><td>"

Try

Dim testReq As WebRequest = WebRequest.Create(uri)

Dim myProxy As New WebProxy()

myProxy.Address = New Uri(http://proxy.mydomain.com:8000)

'if you go through a proxy for external site but exclude for internals site, put your domain here

If (InStr(uri, "mydomain.com/", CompareMethod.Text) > 0) Then

testReq.Proxy = Nothing

Else

testReq.Proxy = myProxy

End If

Dim testRes As HttpWebResponse = CType(testReq.GetResponse(), HttpWebResponse)

links += testRes.StatusDescription

Catch ex As Exception

links += "<span style='color:red;'>Err</span>"

linkColor = "red"

End Try

links += "</td><td>" & href.InnerHtml & "<br><a style='color:" & linkColor & ";' href='" & uri & "'>" & uri & "</a></td></tr>" & vbCrLf

End If

Next

links += "</table>" & vbCrLf

LinkTable.Text = links

End Sub

End Class

3 Comments

Comments have been disabled for this content.