Thursday, December 17, 2015

The Old Website Challenge - 03 - SEO and old static files

As I previously wrote, the old website I'm modernizing was rewritten in php back in 2003; yet, the old html static pages were never removed from their original folder, and are still appearing in search engines and being indexed. With some cleverness we can take that into advantage for faster indexing of the new URLs we set up in the previous post.

Going more into details, the old pages are stored in a directory named xhtml/ and their names mostly reflect the names given the files which serve the content for the php version of the website. Knowing this, we can setup a rewrite rule that can redirect the traffic directed to those old files to the new correspondent php pages.

There is one catch, though: not all old pages have the file name equal to the new one. We have two ways to solve this:

  1. Change the name of the old files which do not correspond to the new ones
  2. Implement a check in the rewrite configuration so that we only redirect pages with names that match the new ones
The first solution goes against the very principle of what we are doing (which is keeping the old pages in the search engines' indexes), since it will practically make the old pages with non-matching names disappear from search engines. Thus we go with the second option (which means also more fun!).

To implement the second solution, we move the old files to a new directory, which we call xhtml_old/, and then we need two rewriting rules. 
First we need a rule to redirect the old pages to the new ones:
RewriteRule ^xhtml/([a-z0-9]+).html$ p/$1 [L]
Then we need a different rule to redirect the requests for the old pages to the correspondent files in the new location:
RewriteRule ^xhtml/([a-z0-9_]+).html$ xhtml_old/$1.html [L]
...and now comes the most interesting part of today, that is deciding when one rule applies and when the other does. We can do this using the RewriteCond constructs before each of the two rules.
For the first rule, we want it to be executed when we have a content page for the CMS existing (-F option) with the same name of the old file (the $1 parameter coming from the rule):
RewriteCond include/$1.html -F
For the second rule, we want it to be executed after the first one, and in case we have the old file in the new location:
RewriteCond xhtml_old/$1.html -F
Now we can put all the pieces together in our .htaccess file, adding these lines before the rule defined in the previous post:
RewriteCond include/$1.html -F
RewriteRule ^xhtml/([a-z0-9]+).html$ p/$1 [L]
RewriteCond xhtml_old/$1.html -F
RewriteRule ^xhtml/([a-z0-9_]+).html$ xhtml_old/$1.html [L]
...and now let the crawlers re-index the old pages!

Wednesday, December 16, 2015

The Website Challenge - 02 - Hiding variables from the URL

The custom cms of the website uses a GET variable named pag to go into a specific folder and look for a <pagname>.html file with the page content to be loaded for each page. The result is a url which looks like this:
http://<domain.tld>/pagina.php?pag=pagname
This kind of URL is not very SEO-friendly, so I decided to use the mod_rewrite module, which is pretty much the standard on Apache installations, to turn them into something more human- and search-engine-readable.
New URLs would be in the form of
http://<domain.tld>/p/pagname
In order to do this I played around with the .htaccess file in the main directory and added the following lines:
RewriteEngine on
RewriteBase /
RewriteRule ^p/([a-z0-9]+)$ pagina.php?pag=$1 [NC,L]
The first two lines simply initialize the mod_rewrite extension since it was not previously used and tell it to calculate the addresses relative to the web root folder. The third line is the real rewriting rule, which tells to internally translate any url in the new form into the one using the GET variable; this will happen transparently, without the real address being shown to the users browsing.

To be noted is the fact that there is a debate online about whether is better to terminate URLs with a slash or not, with no clear winner. The most used strategy on CMSs is to add .html to those addresses, making dynamic pages actually look like they were static files. I might do some A/B testing in the future, but SEO is no exact science, so don't expect clear results.

Tuesday, December 15, 2015

The old website challenge (01)

One of the reasons that pushed me to go back updating this blog is a personal challenge I'm taking, which is modernizing an old website of mine, dating back to year 2001! The last big update to the website, namely its port to php from static html, dates back to exactly twelve years ago (December 15th, 2003); I made some other small updates later, up until September 2004.

Later my attention moved to my personal website (www.xfnet.it), leaving the other one untouched until a few weeks ago, when, for several reasons, I got interested again in WebDev, SEO, and the online development world in general.

I will document in this blog the steps I'm taking in order to take the old website into the current times. To be fair, I already took several actions, so the first few posts will be retroactive, but I believe posting every step of the process will help me get a better idea about where I'm going and about the road that I'm taking.

I know, I didn't say which website we are talking about, yet, but let's give it time :)

A new direction

Since I last updated this blog, my professional life went into a different direction; as such it's time to update the direction this blog is going to.

To begin with, it will be more focused on WebDev and online technologies.
Second ...you guessed it... I will be writing in English, as nowadays this is the language I'm expressing myself into most of the time.

As always, I cannot promise I will keep updating the blog, but I'll make another effort.

See you soon :)

Monday, September 5, 2011

L'ocelot inizia ad apparire nei sogni

Ovvero è uscita la prima beta di Ubuntu 11.10 Oneiric Ocelot; l'ho installata sul vecchio pc fisso e sembrano apparire dei bacozzi belli evidenti, ma rimando commenti ed impressioni ad un post successivo più dettagliato.

HP PlayBook, la storia infinita...

Dopo tutto il casino in cui sembrava che la PlayBook fosse quasi invendibile, tanto da doverla scontare a livelli inverosimili, ora vien fuori che l'HP ne produrrà un altro lotto per soddisfare le richieste rimaste inevase. Complimenti per la capacità di previsione del mercato...
Le speranze di vedere un tablet con WebOS da queste parti, dopo la dipartita di HP dal settore hardware consumer e la smentita di Samsung a proposito del suo supposto interesse per WebOS, diventano sempre meno. Vedremo...

Thursday, September 1, 2011

Fedora 16 alpha ...troppo alpha??

Scaricata la iso live, masterizzata su CD-RW, provata sul portatile... errore... provata in macchina virtuale... errore (dopo diversi minuti di caricamento)... stasera proverò sul vecchio fisso, ma l'esperienza fin qui non è confortante! Vedremo se c'è qualche speranza.

--- Aggiornamento ---

Anche sul fisso dava la schermata di errore che diceva di fare per forza logout; in un momento di disperazione ho provato a chiuderla con alt+F4 ...e ha funzionato!! Da quel momento in poi, lentezza ed un paio di bachi a parte, sembrava funzionare, e devo dire che Gnome 3 inizia seriamente ad incuriosirmi... :)