Thursday, December 17, 2015

The Old Website Challenge - 03 - SEO and old static files

As I previously wrote, the old website I'm modernizing was rewritten in php back in 2003; yet, the old html static pages were never removed from their original folder, and are still appearing in search engines and being indexed. With some cleverness we can take that into advantage for faster indexing of the new URLs we set up in the previous post.

Going more into details, the old pages are stored in a directory named xhtml/ and their names mostly reflect the names given the files which serve the content for the php version of the website. Knowing this, we can setup a rewrite rule that can redirect the traffic directed to those old files to the new correspondent php pages.

There is one catch, though: not all old pages have the file name equal to the new one. We have two ways to solve this:

  1. Change the name of the old files which do not correspond to the new ones
  2. Implement a check in the rewrite configuration so that we only redirect pages with names that match the new ones
The first solution goes against the very principle of what we are doing (which is keeping the old pages in the search engines' indexes), since it will practically make the old pages with non-matching names disappear from search engines. Thus we go with the second option (which means also more fun!).

To implement the second solution, we move the old files to a new directory, which we call xhtml_old/, and then we need two rewriting rules. 
First we need a rule to redirect the old pages to the new ones:
RewriteRule ^xhtml/([a-z0-9]+).html$ p/$1 [L]
Then we need a different rule to redirect the requests for the old pages to the correspondent files in the new location:
RewriteRule ^xhtml/([a-z0-9_]+).html$ xhtml_old/$1.html [L]
...and now comes the most interesting part of today, that is deciding when one rule applies and when the other does. We can do this using the RewriteCond constructs before each of the two rules.
For the first rule, we want it to be executed when we have a content page for the CMS existing (-F option) with the same name of the old file (the $1 parameter coming from the rule):
RewriteCond include/$1.html -F
For the second rule, we want it to be executed after the first one, and in case we have the old file in the new location:
RewriteCond xhtml_old/$1.html -F
Now we can put all the pieces together in our .htaccess file, adding these lines before the rule defined in the previous post:
RewriteCond include/$1.html -F
RewriteRule ^xhtml/([a-z0-9]+).html$ p/$1 [L]
RewriteCond xhtml_old/$1.html -F
RewriteRule ^xhtml/([a-z0-9_]+).html$ xhtml_old/$1.html [L]
...and now let the crawlers re-index the old pages!

No comments: