Web Based Link Checker

Posted: Jun 26, 2008  5 comments  

Average Rating

Tags
ASP.NET
HTML
parse

I had a client the other day ask me if there was an easy way for him to check a page he had that had a list of over one hundred web sites.  Obviously it's very tedious to have to go and click each link one by one, so I offered to try and come up with some code that can do it for him.  While I know it's fairly trivial to write a Windows app that would do this, I wanted to make it web-based so it was easy to distribute and re-use within our organization. 

Well, as it turns out, there's no "quick and easy" way to parse HTML with server-side ASP.NET code!  In a Windows app it's easy to use a Webbrowser control and get an HTMLDocument object, but there's no server-side equivalent.  After much searching I ran across something called the HTMLAgilityPack which made it super easy to get a list of A elements in a web page and parse out the text and href information.

Here's some simple code if you want to try something like this yourself ...

linkChecker.aspx

<form id="form1" runat="server">
<div>
URL to check: <asp:TextBox ID="TextBoxURL" runat="server" Width="550px" /> <asp:Button ID="ButtonGo" runat="server" Text="Go" />
</div>
<div>
<asp:literal ID="LinkTable" runat="server" />
</div>
</form>

linkChecker.aspx.vb

Imports System

Imports System.IO

Imports System.Net

Imports System.Text

Imports System.Runtime.InteropServices

Imports HtmlAgilityPack

 

Partial
Class linkChecker

Inherits System.Web.UI.Page

 

Protected Sub ButtonGo_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles ButtonGo.Click

Dim hw As HtmlWeb = New HtmlWeb

Dim request As WebRequest = WebRequest.Create(TextBoxURL.Text)

 

'use this line if you need to authenticate

request.Credentials = New NetworkCredential("username", "password", "domain")

Dim response As HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)

Dim doc As HtmlDocument = New HtmlDocument

doc.Load(response.GetResponseStream())

'select all the Anchor elements

Dim hrefs As HtmlNodeCollection = doc.DocumentNode.SelectNodes("//a[@href]")

Dim links As String = "<table style='width:100%;font-family:verdana;font-size:10pt;'>" & vbCrLf

For Each href As HtmlNode In hrefs

Dim linkColor As String = "blue"

Dim uri As String = href.Attributes("href").Value

If InStr(uri, "http", CompareMethod.Text) > 0 Then

links += "<tr><td>"

Try

Dim testReq As WebRequest = WebRequest.Create(uri)

Dim myProxy As New WebProxy()

myProxy.Address = New Uri(http://proxy.mydomain.com:8000)

'if you go through a proxy for external site but exclude for internals site, put your domain here

If (InStr(uri, "mydomain.com/", CompareMethod.Text) > 0) Then

testReq.Proxy = Nothing

Else

testReq.Proxy = myProxy

End If

Dim testRes As HttpWebResponse = CType(testReq.GetResponse(), HttpWebResponse)

links += testRes.StatusDescription

Catch ex As Exception

links += "<span style='color:red;'>Err</span>"

linkColor = "red"

End Try

links += "</td><td>" & href.InnerHtml & "<br><a style='color:" & linkColor & ";' href='" & uri & "'>" & uri & "</a></td></tr>" & vbCrLf

End If

Next

links += "</table>" & vbCrLf

LinkTable.Text = links

End Sub

End Class

Comments

  1. Web Based Link Checker: real world web
    June 26, 2008

    Pingback from  Web Based Link Checker: real world web

  2. » Web Based Link Checker
    June 26, 2008

    Pingback from  &raquo; Web Based Link Checker

  3. Anonymous
    June 26, 2008

    Look into Xenu link checker

  4. Anonymous
    November 11, 2008

    Nice site, thanks for information!

  5. Anonymous
    November 11, 2008

    Not bad... Not bad.

Submit a Comment

Microsoft Communities