Last year I wrote a blog titled Using Classic ASP and URL Rewrite for Dynamic SEO Functionality, in which I described how you could combine Classic ASP and the URL Rewrite module for IIS to dynamically create Robots.txt and Sitemap.xml files for your website, thereby helping with your Search Engine Optimization (SEO) results. A few weeks ago I had a follow-up question which I thought was worth answering in a blog post.
Here is the question that I was asked:
"What if I don't want to include all dynamic pages in sitemap.xml but only a select few or some in certain directories because I don't want bots to crawl all of them. What can I do?"
That's a great question, and it wasn't tremendously difficult for me to update my original code samples to address this request. First of all, the majority of the code from my last blog will remain unchanged - here's the file by file breakdown for the changes that need made:
|Sitemap.asp||See the sample later in this blog|
So if you are already using the files from my original blog, no changes need to be made to your Robot.asp file or the URL Rewrite rules in your Web.config file because the question only concerns the files that are returned in the the output for Sitemap.xml.
Updating the Necessary Files
The good news it, I wrote most of the heavy duty code in my last blog - there were only a few changes that needed to made in order to accommodate the requested functionality. The main difference is that the original Sitemap.asp file used to have a section that recursively parsed the entire website and listed all of the files in the website, whereas this new version moves that section of code into a separate function to which you pass the unique folder name to parse recursively. This allows you to specify only those folders within your website that you want in the resultant sitemap output.
With that being said, here's the new code for the Sitemap.asp file:
<% Option Explicit On Error Resume Next Response.Clear Response.Buffer = True Response.AddHeader "Connection", "Keep-Alive" Response.CacheControl = "public" Dim strUrlRoot, strPhysicalRoot, strFormat Dim objFSO, objFolder, objFile strPhysicalRoot = Server.MapPath("/") Set objFSO = Server.CreateObject("Scripting.Filesystemobject") strUrlRoot = "http://" & Request.ServerVariables("HTTP_HOST") ' Check for XML or TXT format. If UCase(Trim(Request("format")))="XML" Then strFormat = "XML" Response.ContentType = "text/xml" Else strFormat = "TXT" Response.ContentType = "text/plain" End If ' Add the UTF-8 Byte Order Mark. Response.Write Chr(CByte("&hEF")) Response.Write Chr(CByte("&hBB")) Response.Write Chr(CByte("&hBF")) If strFormat = "XML" Then Response.Write "<?xml version=""1.0"" encoding=""UTF-8""?>" & vbCrLf Response.Write "<urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">" & vbCrLf End if ' Always output the root of the website. Call WriteUrl(strUrlRoot,Now,"weekly",strFormat) ' Output only specific folders. Call ParseFolder("/marketing") Call ParseFolder("/sales") Call ParseFolder("/hr/jobs") ' -------------------------------------------------- ' End of file system loop. ' -------------------------------------------------- If strFormat = "XML" Then Response.Write "</urlset>" End If Response.End ' ====================================================================== ' ' Recursively walks a folder path and return URLs based on the ' static *.html files that it locates. ' ' strRootFolder = The base path for recursion ' ' ====================================================================== Sub ParseFolder(strParentFolder) On Error Resume Next Dim strChildFolders, lngChildFolders Dim strUrlRelative, strExt ' Get the list of child folders under a parent folder. strChildFolders = GetFolderTree(Server.MapPath(strParentFolder)) ' Loop through the collection of folders. For lngChildFolders = 1 to UBound(strChildFolders) strUrlRelative = Replace(Mid(strChildFolders(lngChildFolders),Len(strPhysicalRoot)+1),"\","/") Set objFolder = objFSO.GetFolder(Server.MapPath("." & strUrlRelative)) ' Loop through the collection of files. For Each objFile in objFolder.Files strExt = objFSO.GetExtensionName(objFile.Name) If StrComp(strExt,"html",vbTextCompare)=0 Then If StrComp(Left(objFile.Name,6),"google",vbTextCompare)<>0 Then Call WriteUrl(strUrlRoot & strUrlRelative & "/" & objFile.Name, objFile.DateLastModified, "weekly", strFormat) End If End If Next Next End Sub ' ====================================================================== ' ' Outputs a sitemap URL to the client in XML or TXT format. ' ' tmpStrFreq = always|hourly|daily|weekly|monthly|yearly|never ' tmpStrFormat = TXT|XML ' ' ====================================================================== Sub WriteUrl(tmpStrUrl,tmpLastModified,tmpStrFreq,tmpStrFormat) On Error Resume Next Dim tmpDate : tmpDate = CDate(tmpLastModified) ' Check if the request is for XML or TXT and return the appropriate syntax. If tmpStrFormat = "XML" Then Response.Write " <url>" & vbCrLf Response.Write " <loc>" & Server.HtmlEncode(tmpStrUrl) & "</loc>" & vbCrLf Response.Write " <lastmod>" & Year(tmpLastModified) & "-" & Right("0" & Month(tmpLastModified),2) & "-" & Right("0" & Day(tmpLastModified),2) & "</lastmod>" & vbCrLf Response.Write " <changefreq>" & tmpStrFreq & "</changefreq>" & vbCrLf Response.Write " </url>" & vbCrLf Else Response.Write tmpStrUrl & vbCrLf End If End Sub ' ====================================================================== ' ' Returns a string array of folders under a root path ' ' ====================================================================== Function GetFolderTree(strBaseFolder) Dim tmpFolderCount,tmpBaseCount Dim tmpFolders() Dim tmpFSO,tmpFolder,tmpSubFolder ' Define the initial values for the folder counters. tmpFolderCount = 1 tmpBaseCount = 0 ' Dimension an array to hold the folder names. ReDim tmpFolders(1) ' Store the root folder in the array. tmpFolders(tmpFolderCount) = strBaseFolder ' Create file system object. Set tmpFSO = Server.CreateObject("Scripting.Filesystemobject") ' Loop while we still have folders to process. While tmpFolderCount <> tmpBaseCount ' Set up a folder object to a base folder. Set tmpFolder = tmpFSO.GetFolder(tmpFolders(tmpBaseCount+1)) ' Loop through the collection of subfolders for the base folder. For Each tmpSubFolder In tmpFolder.SubFolders ' Increment the folder count. tmpFolderCount = tmpFolderCount + 1 ' Increase the array size ReDim Preserve tmpFolders(tmpFolderCount) ' Store the folder name in the array. tmpFolders(tmpFolderCount) = tmpSubFolder.Path Next ' Increment the base folder counter. tmpBaseCount = tmpBaseCount + 1 Wend GetFolderTree = tmpFolders End Function %>
It should be easily seen that the code is largely unchanged from my previous blog.
One last thing to consider, I didn't make any changes to the Robots.asp file in this blog. But that being said, when you do not want specific paths crawled, you should add rules to your Robots.txt file to disallow those paths. For example, here is a simple Robots.txt file which allows your entire website:
# Robots.txt # For more information on this file see: # http://www.robotstxt.org/ # Define the sitemap path Sitemap: http://localhost:53644/sitemap.xml # Make changes for all web spiders User-agent: * Allow: / Disallow:
If you were going to deny crawling on certain paths, you would need to add the specific paths that you do not want crawled to your Robots.txt file like the following example:
# Robots.txt # For more information on this file see: # http://www.robotstxt.org/ # Define the sitemap path Sitemap: http://localhost:53644/sitemap.xml # Make changes for all web spiders User-agent: * Disallow: /foo Disallow: /bar
With that being said, if you are using my Robots.asp file from my last blog, you would need to update the section of code that defines the paths like my previous example:
Response.Write "# Make changes for all web spiders" & vbCrLf Response.Write "User-agent: *" & vbCrLf Response.Write "Disallow: /foo" & vbCrLf Response.Write "Disallow: /bar" & vbCrLf
I hope this helps. ;-](Cross-posted from http://blogs.msdn.com/robert_mcmurray/)