Posted by: sammu on: August 20, 2008
Use custom 404 error pages and ASP to create pseudo-URLs for your site, and hopefully help your visitors, and the search engines in the process.
Introduction
A lot of search engine optimisation (SEO) techniques can be frowned upon. The method I’m about to describe and develop could be considered as “spamming” if used in the wrong manner, so there is no guarantee made by this application. However, if you interpret this method as a way to make your URLs more logical, memorable and crawlable, you shouldn’t have any problems.
The problem
Do you run a database-driven website? If so, you may well have ended up with page URLs with a mass of name/value pairs in the querystring.
Let’s take an example of a store selling music CDs. A typical URL may look like this:
http://www.mydomain.com/product.asp?genreID=12&artistID=34&albumID=56
Fine, it works, and you may be crawled by the search engines, but the above address shows no logical structure to your online shop. Similarly, do you expect anybody to remember that address?
Search engines (SEs) *will* crawl these (although there is debate about the use of “ID” in querystrings), however, if you run a search through Google and you spot your search matches in the actual URL of the page, you may well be more likely to click through, as compared to a site linked as an address full of ID values.
mod_rewrite
Here’s where Apache webservers have an advantage over Windows IIS systems. Using Apache, we can use the mod_rewrite module to powerfully manipulate URLs. This module takes the requested URL and translates it into a valid path to a page, passing the querystring data to the page. It does this according to a set of rules the webmaster writes into the .htaccess file.
ASP to the rescue
Unfortunately we can’t use this on a Windows host. It just isn’t supported. So, I’ve come up with some basic ASP logic to attempt to emulate this module. The following is only a basic example of what can be done. It can be adapted and made more powerful to fit your specific site needs.
So, back to our CD store example, how could we generate a more “friendly” URL. Wouldn’t it be more user-friendly (and cooler, right?) to have something like this?
http://www.mydomain.com/products/rock/dire_straits/money_for_nothing/56
This way, people looking at the URL instantly know the content of the page, your shop’s structure (products > genre > artist > album) therefore giving your URLs a real context. The extra advantage is that your URL now contains keywords, potentially increasing its chances of being picked out in the SERPs (Search Engine Results Pages). Bonus!
The above example, in literal terms, actually points to the root of a directory named “56″, which sits within a directory named “money_for_nothing” which in turn is a sub-directory of “dire_straits”, as a sub-directory of “rock”, as a sub-directory of “products”. Maintaining a structure of this kind would be a nightmare. Imagine browsing your site using FTP, having to click through all of those directories to reach the files! In reality, those directories don’t exist. They never will. Here’s where the fun begins.
A request of the suggested URL will flag a “404 Not Found” error, naturally. The structure doesn’t exist. So, we can apply some cunning ASP code into our custom 404 page to interpret the request. Here is the code for our custom 404 error page. It needs to be saved as an ASP page.
<%
Dim strQuerystring, aParameters
strQuerystring = Mid(Request.ServerVariables("QUERY_STRING"),12)
aParameters = Split(strQuerystring,"/")
On Error Resume Next
Server.Transfer(aParameters(1) & ".asp")
If Err Then
Response.Status = "404 Not Found"
Server.Transfer("404message.htm")
End If
%>
Let’s go through it line by line:
Option Explicit
This ensures we declare all of our variables explicitly, to prevent renaming. Makes debugging a lot easier, and is good practice.
Dim strQuerystring, aParameters
Declares a variable, and an array, which we will be using later.
strQuerystring = Mid(Request.ServerVariables("QUERY_STRING"),12)
Sets the value of our strQuerystring variable. When a custom 404 error page is requested, it is done so with the requested (not found) URL in the querystring. This line strips the “404;http://” from this string.
aParameters = Split(strQuerystring,"/")
Creates an array of each value found in the querystring. So, using our example, the first value of the array is “mydomain.com”. The second is “products”, and so on.
On Error Resume Next
Activates error handling…
Server.Transfer(aParameters(1) & ".asp")
Tries calling our products page. The Server.Transfer page effectively processes the requested page and places the results into the page it was requested from. It’s almost like a ‘dynamic server side include’. The benefits of this are that they can use the POST or GET data submitted to the parent page (in this case, the custom 404 page).
In this case, we attempt to load an ASP page. The path of this page is created by the second value in the aParameters array. For this reason, we pass “products” in the example URL. This way, it processes “products.asp”. If your site also has an Articles section, you could pass “articles” in the URL and it will process “articles.asp”.
If Err Then
Response.Status = "404 Not Found"
Server.Transfer("404message.htm")
End If
If an error is flagged, i.e. the Server.Transfer fails, load in our ‘404 message’ HTML page. This page usually displays an apology for the page not being found.
Phew!
So, we now have a custom 404 page set up. If the first Server.Transfer works (i.e. we are using it to ‘mask our URL), the corresponding ASP page is called. If it is a genuine 404 error, then the visitor is still given a 404 page by means of including 404message.htm.
What about the rest of the querystring?
So far we have only looked at the first part of the querystring. More importantly, the very first querystring value (“products”) needs to be the same name as our ASP page which does all of the database processing and so on.
So how do we pass a value to this page so it can select the desired record?
For this, we need to make a slight adaptation to our products page.
Product record page
Our normal product record page would have taken the “productID” from the querystring, and selected this record from the database.
Now we’re using Server.Transfer, this ID is no longer passed. If you look at our URL example again, you’ll see the very last ‘querystring’ value is the albumID (“56″).
As I said earlier, using Server.Transfer means that our “products.asp” loaded into our 404 custom error page inherits the Form and Querystring values. So by putting some code into products.asp we can grab this ID value.
It is here I need to stress this example was created as a generic handler page, designed to pass ONE value to a page. This one value must always be the LAST value in the querystring. Make sense? Here’s an example.
http://www.mydomain.com/products/56
http://www.mydomain.com/products/1/2/3/4/56
http://www.mydomain.com/products/rock/dire_straits/money_for_nothing/56
In all instances, only the “products” and “56″ values are important. Everything else is discarded.
Therefore our products.asp page only needs to grab this final “56″. By doing so, all we have to do is ake the querystring again, create an array using each “/” as the delimeter as before, and take the final value.
<%
Option Explicit
Dim strQuerystring, aParameters
strQuerystring = Mid(Request.ServerVariables("QUERY_STRING"),12)
aParameters = Split(strQuerystring,"/")
strFinalValue = aParameters(Ubound(aParameters))
If strFinalValue = "" Then
strFinalValue = aParameters(Ubound(aParameters)-1)
End If
%>
The above should go into the top of products.asp. It selects the final value in our querystring. If a trailing “/” is found, it merely back-steps to obtain the previous, intended value.
Our SQL to select the product should now use “strFinalValue” instead of “Request.Querystring(“productID”)” as before.
Conclusions
Obviously there are some advantages, and indeed possible disadvantages in using this method.
A point worthy of note is that because you’re creating a pseudo-URL, your browser will have a hard time locating external files, for example relative paths to images. However, including the BASE tag in your HTML will mean you can keep all of your pages the same, leaving the relative links the same.
<base href="http://www.mydomain.com">
This means that although your page may look as if it is several layers deep through your structure, your paths can still be relative in the HTML. Sorted.