Sunday, July 25, 2010

Google and our Documentation

As I mentioned in a previous blog post, trying to find pages in the PostgreSQL documentation using Google doesn't work very well: most often, one gets links to older versions.

A recent thread on pgsql-performance (somewhat off-topic for that mailing list, but that's where it was) suggested that perhaps we could use Google's canonical URL feature to work around this problem.

Another suggestion was that we ask people who link to our docs to link to (or some sub-page) rather than linking to a specific version (e.g. the same URL with 8.4 in place of current). That way, as new versions come out, everyone's links will still be pointing at the latest version of the docs, helping the new versions accumulate "Google karma" more quickly than they would otherwise. Or at least, that's the idea: I have no idea whether it would actually work.


  1. Not to sound naive, but is there no one at Google who could answer the question?

  2. The Django project uses what I think is a nice system for this for their documentation ( at the top of each page it tells you to which version the documentation applies and includes a link to other versions.

  3. I think linking to the current version url at the top of the page would be a better idea. That should increase the page rank for that version so hopefully it comes up first. People who search for "postgresql 8.1 select" would still find the 8.1 version.

    The other problem is that the same content is under both interactive and static for each version. Perhaps the user comments could be moved to a separate page. With a little bit of javascipt you could set a cookie and have the comments automatically loaded with an ajax request.

  4. patryk.kordylewskiJuly 26, 2010 12:25 PM

    Another approach could be to create a XML Sitemap (with a priority for each page, high prio for current, low prio for older versions). Maybe that could help.

    The actual XML Sitemap ( doesn't do what it looks like - according to the spec you have to list _every_ page, not just a starting url.

    I wrote a patch to generate such a sitemap based on the existing tsearch table, but it got rejected ;p

  5. Robert,

    Whenever I search for PostgreSQL documentation, I always use the site: prefix, including the path for the relevant version; that way, I'm guaranteed to only get documents relevant for my version:





  6. What about simply teaching robots.txt to deny all the documentation subdirectories except /docs/current/static/?