IIS 7.0 and URL Rewrite, make your Web Site SEO
In the past few days I've been reading a bit about SEO and trying to understand more about what makes a Web Site be SEO (Search-Engine-Optimized) and what are some of the typical headaches when trying to achieve that as well as how we can implement them in IIS.
Today I decided to post how you can make your Web Site running IIS 7.0 a bit "friendlier" to Search Engines without having to modify any code in your application. Being SEO is a big statement since it can include several things, so for now I will scope the discussion to 3 things that can be easily addressed using the IIS URL Rewrite Module:
- Canonicalization
- Friendly URL's
- Site Reorganization
1) Canonicalization
Basically the goal of canonicalization is to ensure that the content of a page is only exposed as a unique URI. The reason this is important is because even though for humans it's easy to tell that http://www.carlosag.net is the same as http://carlosag.net, many search engines will not make any assumptions and keep them as two separate entries, potentially splitting the rankings of them lowering their relevance. Another example of this is http://www.carlosag.net/default.aspx and http://www.carlosag.net/. You can certainly minimize the impact of this by writing your application using the canonical forms of your links, for example in your links you can always link to the right content for example: http://www.carlosag.net/tools/webchart/ and remove the default.aspx, however that only accounts for part of the equation since you cannot assume everyone referencing your Web Site will follow this carefully, you cannot control their links.
This is when URL Rewrite comes into play and truly solves this problem.
Host name.
URL Rewrite can help you redirect when the users type your URL in a way you don't unnecessarily want them to, for example just carlosag.net. Choosing between using WWW or not is a matter of taste but once you choose one you should ensure that you guide everyone to the right one. The following rule will automatically redirect everyone using just carlosag.net to www.carlosag.net. This configuration can be saved in the Web.config file in the root of your Web Site.Note that I'm only including the XML in this blog, however I used IIS Manager to generate all of these settings so you don't need to memorize the XML schema since the UI includes several friendly capabilities to generate all of these..
<system.webServer>
<rewrite>
<rules>
<rule name="Redirect to WWW" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{HTTP_HOST}" pattern="^carlosag.net$" />
</conditions>
<action type="Redirect" url="http://www.carlosag.net/{R:0}" redirectType="Permanent" />
</rule>
</rules>
</rewrite>
</system.webServer>
</configuration>
Note that one important thing is to use Permanent redirects (301) , this will ensure that if anybody links your page using a non-WWW link when the search engine bot crawls their Web Site it will identify the link as permanently moved and it will treat the new URL as the correct address and it will not index the old URL, which is the case when using Temporary (302) redirects. The following shows how the response of the server looks like:
Content-Type: text/html; charset=UTF-8
Location: http://www.carlosag.net/tools/
Server: Microsoft-IIS/7.0
X-Powered-By: ASP.NET
Date: Mon, 01 Sep 2008 22:45:49 GMT
Content-Length: 155
<head><title>Document Moved</title></head>
<body><h1>Object Moved</h1>This document may be found <a HREF=http://www.carlosag.net/tools/>here</a></body>
Default Documents
IIS has a feature called Default Document that allows you to specify the content that should be processed when a user enters a URL that is mapped to a directory and not an actual file. In other words, if the user enters http://www.carlosag.net/tools/ then they will actually get the content as if they entered http://www.carlosag.net/tools/default.aspx. That is all great, the problem is that this feature only works one way by mapping a Directory to a File, however it does not map the File to the Document, this means that if some of your links or other users enter the full URL, then search engines will see two different URL's. To solve that problem we can use a configuration very similar to the rule above, following is a rule that will redirect the default.aspx to the canonical URL (the folder).
<match url="(.*)default.aspx" />
<action type="Redirect" url="{R:1}" redirectType="Permanent" />
</rule>
This again, uses a Permanent redirect to extract everything before Default.aspx and redirect it to the "parent" URL path, so for example, if the user enters http://www.carlosag1.net/Tools/WindowsLiveWriter/default.aspx it will be redirected to http://www.carlosag1.net/Tools/WindowsLiveWriter/ as well as http://www.carlosag1.net/Tools/default.aspx to http://www.carlosag1.net/Tools/. You can place this rule at the root of your site and it will take care of all the default documents (if you have a default.aspx in every folder)
2) Friendly URL's
Asking your user to remember that www.contoso.com/books.aspx?isbn=0735624410 is the URL for the IIS Resource Kit is not the nicest thing to do, first of all why do they care about this being an ASPX and the fact that it takes arguments and what not. It seems that providing them with a URL like www.contoso.com/books/IISResourceKit will truly resonate with them and be easier for them to remember and pass along. Most importantly it really doesn't tie you to any Web technology.
With URL Rewrite you can easily build this kind of logic automatically without having to modify your code using Rewrite Maps:
<system.webServer>
<rewrite>
<rules>
<rule name="Rewrite for Books" stopProcessing="true">
<match url="Books/(.+)" />
<action type="Rewrite" url="books.aspx?isbn={Books:{R:1}}" />
</rule>
</rules>
<rewriteMaps>
<rewriteMap name="Books">
<add key="IISResourceKit" value="0735624410" />
<add key="ProfessionalIIS7" value="0470097825" />
<add key="IIS7AdministratorsPocketConsultant" value="0735623643" />
<add key="IIS7ImplementationandAdministration" value="0470178930" />
</rewriteMap>
</rewriteMaps>
</rewrite>
</system.webServer>
</configuration>
The configuration above includes a rule that uses a Rewrite Map to translate a URL like: http://www.contoso.com/books/IISResourceKit into http://www.contoso.com/books.aspx?isbn=0735624410 automatically. Using maps is a very convenient way to have a "table" of values that can be transformed into any other value to be used in the result URL. Of course there are better ways of doing this when using large catalogs or values that change frequently but is extremely useful when you have a consistent set of values or when you can't make changes to an existing application. Note that since we use Rewrite the end users never see the "ugly-URL" unless they knew it already and typed it, and of course this means you can use the inverse approach to ensure the canonicalization is preserved:
<rules>
<rule name="Redirect Books to Canonical URL" stopProcessing="true">
<match url="books\.aspx" />
<action type="Redirect" url="Books/{ISBN:{C:1}}" appendQueryString="false" />
<conditions>
<add input="{QUERY_STRING}" pattern="isbn=(.+)" />
</conditions>
</rule>
</rules>
<rewriteMaps>
<rewriteMap name="ISBN">
<add key="0735624410" value="IISResourceKit" />
<add key="0470097825" value="ProfessionalIIS7" />
<add key="0735623643" value="IIS7AdministratorsPocketConsultant" />
<add key="0470178930" value="IIS7ImplementationandAdministration" />
</rewriteMap>
</rewriteMaps>
</rewrite>
The rule above does the "inverse" by matching the URL books.aspx, extracting the ISBN query string value and doing a lookup in the ISBN table and redirecting the client to the canonical URL, so again if user enters http://www.contoso.com/books.aspx?isbn=0735624410 they will be redirected to http://www.contoso.com/books/IISResourceKit.
This Friendly URL to me is more of a user feature than a SEO feature, however I've read in every SEO guide to reduce the number of parameters in your Query String, however, I have not find yet any document that clearly states if there is truly a limit in the search engine bot's that would truly impact the search relevance. I guess it makes sense that they wouldn't keep track of thousands of links to a catalog.aspx that has zillions of permutations based on hundreds of values in the query string (category, department, price range, etc) even if all of them were linked, but again I don't have any prove.
3) Site Reorganization
One complex tasks that Web Developers face sometimes is trying to reorganize their current Web Site structure, whether its moving a section to a different path, or something as simple as renaming a single file, you need to take into consideration things like, Is this move a temporary thing?, How do I ensure old clients get the new URL?, How do I prevent losing the search engine relevance?. URL Rewrite will help you perform these tasks.
Rename a file
If you rename a file you can very easily just write a Rewrite or Redirect Rule that ensures that your users continue getting the content. If your intent is to never go back to the old name you should use a Redirect Permanent so everyone starts getting the new content with its new "Canonical URL", however, if this could be a temporary thing you should use a Redirect Temporary. Finally a Rewrite is useful if you still want both URL's to continue to be valid (though this breaks the canonicality).
<match url="File\.php" />
<action type="Redirect" url="MyFile.aspx" redirectType="Permanent" />
</rule>
Moving directories
Another common scenario is when you need to move an entire directory to another place of the Web Site. It could also be that based on some criteria (say Mobile browsers or other User Agent) get a different set of pages/images. Either way, URL rewrite helps with this. The following configuration will redirect every call to the /Images directory to the /NewImages directory.
<match url="^images/(.*)" />
<action type="Redirect" url="NewImages/{R:1}" redirectType="Permanent" />
</rule>
A related scenario is if you wanted to show different smaller images whenever a user of Windows CE was accessing your site, you could have a "img" directory where all the small images are stored and use a rule like the following:
<match url="^images/(.*)" />
<action type="Rewrite" url="/img/{R:1}" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="Windows CE" ignoreCase="false" />
</conditions>
</rule>
Note, that in this case the use of Rewrite makes sense since we want the small images to look as the original images to the browser and it will save a "round-trip" to it.
Moving multiple files
Another common operation is when you randomly need to relocate pages for whatever reason (such as Marketing Campaigns, Branding, etc). In this case if you have several files that have been moved or renamed you can have a single rule that catches all of those and redirects them accordingly. Similarly, another sample could include an incremental migration from one technology to another where say you are moving from Classic ASP to ASP.NET and as you rewrite some of the old ASP pages into ASPX pages you want to start serving them without breaking any links or the search engine relevance.
<rules>
<rule name="Redirect Old Files and Broken Links" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{OldFiles:{REQUEST_URI}}" pattern="(.+)" />
</conditions>
<action type="Redirect" url="{C:0}" />
</rule>
</rules>
<rewriteMaps>
<rewriteMap name="OldFiles">
<add key="/tools/WebChart/sample.asp" value="tools/WebChart/sample.aspx" />
<add key="/tools/default.asp" value="tools/" />
<add key="/images/brokenlink.jpg" value="/images/brokenlink.png" />
</rewriteMap>
</rewriteMaps>
</rewrite>
Now, you can just keep adding to this table any broken link and specify its new address.
Others
Other potential use of URL Rewrite is when using RIA applications in the browser, whether using things like AJAX, Silverlight or Flash, that are not easy to parse and index by search engines, you could use URL Rewrite to rewrite the URL to static HTML versions of your content, however you should make sure that the content is consistent so you don't misguide users and search engines. For example the following rule will rewrite all the files in the RIAFiles table to their static HTML counterpart but only if the User Agent is the MSNBot or the GoogleBot:
<rules>
<rule name="Rewrite RIA Files" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="MSNBot|Googlebot" />
<add input="{RIAFiles:{REQUEST_URI}}" pattern="(.+)" />
</conditions>
<action type="Rewrite" url="{C:0}" />
</rule>
</rules>
<rewriteMaps>
<rewriteMap name="RIAFiles">
<add key="/samples/Silverlight.aspx" value="/samples/Silverlight.htm" />
<add key="/samples/MyAjax.aspx" value="/samples/MyAjax.htm" />
</rewriteMap>
</rewriteMaps>
</rewrite>
Related to this is that you might want to prevent search engines from crawling certain files (or your entire site), for that, you can use the Robots.txt semantics and use a "disallow", however, you can also use URL Rewrite to prevent this with more functionality such as blocking only a specific user agent:
<rules>
<rule name="Prevent access to files" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="SomeRandomBot" />
<add input="{NonIndexedFiles:{REQUEST_URI}}" pattern="(.+)" />
</conditions>
<action type="AbortRequest" />
</rule>
</rules>
<rewriteMaps>
<rewriteMap name="NonIndexedFiles">
<add key="/profile.aspx" value="block" />
<add key="/personal.aspx" value="block" />
</rewriteMap>
</rewriteMaps>
</rewrite>
There are several other things you can do to ensure that your Web Site is friendly with Search Engines, however most of them require changes to your application, but certainly worth the effort, for example:
- Ensure your HTML includes a <title> tag.
- Ensure your HTML includes a <meta name="description".
- Use the correct HTML semantics, use H1 once and only once, use the alt attribute in your <img>, use <noscript> etc.
- Redirect using status code 301 and not 302.
- Provide Site Map's and/or Robots.txt.
- Beware of POST backs and links that require script to run.
Resources
For this entry I read and used some of the resources at several Web Sites, including: