Tag Archives: cms

How to Fix Your ExpressionEngine RSS Template

I began investigating the fascinating minutia of RSS when I couldn't find a reasonable answer in the EE forums to why Google Reader was re-posting every updated post on my site even thought the entry dates hadn't changed. As I went through the prevalent templates floating around the EE community line-by-line I noticed several things that could be improved upon. The only critical fix, in my opinion, is removal of seconds from the dates in the <item /> section. If you want your feed to validate you'll want to add the atom namespace. The rest are optional improvements.

If you want to skip the wheres and whys, I've posted the the updated (fixed) and improved RSS templates on Snipplr.com and in their entirety at the end of this article.

Existing RSS templates for ExpressionEngine

Resources

Methodology

I'm not going for a PHD in online syndication, I just wanted my RSS feeds to be error-free, work as expected in major aggregators and to use best-practices which could be determined within a somewhat low threshold of research pain. In addition to referring first to the RSS 2.0 specifications, I used the RSS feeds of some really large websites to serve as examples, making the assumption (I know) that these sites have probably done thorough research on this topic. Often my choices were the result of seeing if these "big players" were all doing the same thing. All the feeds are major sites with the exception of the Flickr blog. The Flickr blog is using WordPress.com. I figured with their huge user-base not only would the feeds have been thoroughly vetted, most aggregators will be able to read them due to the sheer volume of WordPress-powered sites out there. Also I chose a WordPress.com feed instead of a self-hosted installation of WP to make sure it was the well-tested standard feed. The feeds I used are:

RSS Feed Breakdown

Feed Format: RSS vs. Atom

As of mid-2005, the two most likely candidates are RSS 2.0 and Atom 1.0. Google reader supports either fully and they suggest choosing one or the other (not both) because most RSS readers support all major formats and offering both can confuse users. The Atom syndication format, whose creation was in part motivated by a desire to get a clean start free of the issues surrounding RSS, has been adopted as IETF Proposed Standard RFC 4287 and is used by Google. However, RSS 2.0 was the first to support enclosures and has captured the podcasting audience and is the recommended format in the iTunes podcasts specs.

I generally do as Google does when it comes to web optimization, and I am a big fan of standards. In some regards I would call Atom "the higher path". That said, I am also a big fan of simplicity and ease-of-use so I'm going with RSS 2.0 because:

  • I already hand-built an RSS 2.0 feed for podcasting (well, for iTunes) so would rather learn one standard / keep all feeds similarly formatted.
  • One less term that could potentially confuse end-users and "web lite" folk who might inherit my work later on.
  • A lot of really big sites that have probably carefully considered this topic went with RSS 2.0, including NYTimes.com, AListApart.com, Ebay.com, news.BBC.co.uk, and CNN.com.

RSS XML Namespaces

Here's where my first change kicks in. If you aren't actually USING a namespace in your RSS feed there's no need to include it - it's just cruft.

Before

<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:admin="http://webns.net/mvcb/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:content="http://purl.org/rss/1.0/modules/content/">

After

<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/">

Even Better

<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom">

The namespaces sy and content aren't used at all and can just be removed. Optional: I chose to remove xmlns:admin="http://webns.net/mvcb/" and the only tag that used it, <admin:generatorAgent rdf:resource="http://expressionengine.com/" /> - removing the admin namespace also eliminated rdf, which admin namespace depends on.

I'm not saying any of the removed namespaces are bad, just unnecessary. For example, admin is a very common namespace, but if you leave it in you should update the URI to include the EE version number (dynamically) and add the <admin:errorReportsTo rdf:resource="URI"/> pointing to a valid email address for errors. It may be beneficial to aggregators in reporting statistical reporting of web frameworks and content management systems delivering aggregated RSS feeds.

I rather admire the efficency and simplicity of the namespace declarations used by NYTimes.com, CNN.com and others - so again, this is personal choice.

The third option (even better) adds the atom namespace. The default EE RSS feed will not validate because it lacks the required atom:link tag containing the URI of the feed itself. There is some debate on whether this is actually necessary (some say the validator is wrong, not the lack of the atom tag)- read the article Adding Atom:Link to Your RSS Feed for background on this.

Depending on your site's content, your SEO practices and target audience, it is likely that you may need additional namespaces. A media-rich site, for example, would benefit from the media namespace. The media namespace is used to syndicate video, images and other media and can open up your feed to consumption by media-rich aggregators and services like Cooliris and Yahoo Video Search.

Channel Area

Most of the default template makes perfect sense - just make sure to take a look at your feed output to make sure all the EE fields used have valid values. Also, make sure you want to the {weblog_foo} tags - if you are providing a site feed that combines multiple sections you will probably want to hand-edit many of the tags in the channel area.

Dates

Important tip: make sure your feed is correctly reporting the timestamp of your entry date. If the seconds are changing on an item whenever you make an update this will cause many aggregators, including Google Reader, to repost the entry to your RSS feed. Either take the seconds off the time or replace '%s' with an arbitrary static value like '15'.

A search for 'RSS Updates' in the EE Forums will reveal that many people have had this problem. I tested all the date fields in my feeds (see this thread for details) and found that although the entry date day, hour and minute doesn't change on update (as expected), the seconds do! This weird behavior has something to do with how the dates are stored in the database and/or how the date is interpreted by EEs date tags. What it means is that if you go back and edit a post from three years ago, some aggregators will repost the item to your RSS feed even though you did not change the entry date. This can be especially troubling if you like to go back and tweak a post a lot right after publishing - you may go to your feed reader to see it reposted several times in a row.

pubDate vs. dc:date

<pubDate> is part of the RSS 2.0 spec. A lot of feeds out there still use <dc:date> and this either because they kept it from their RSS 1.0 template (for which dc:date was the only option) or they really like the very popular Dublin Core namespace or they prefer it because of the ISO 8601 date format which is much more prevalent than the (really old, as in ARPANET old!) RFC 822 date format that <pubDate> uses. On one hand it makes sense to stay with the spec and pull in namespace elements only as required. On the other hand, it makes sense to provide output in the most reusable way (updated date format). Feed readers parse either just fine, so this is judgment call on your part. Here's an agrument for each:

Based on my own survey of the feeds referenced above, I opted to switch to <pubDate>, replacing <dc:date /> in the channel with:

<pubDate>{gmt_date format="%D, %d %M %Y %H:%i:%s %T"}</pubDate>

And replacing <dc:date /> in the item declaration(s) with:

<pubDate>{gmt_entry_date format=&qout;%D, %d %M %Y %H:%i %T"}</pubDate>

Item <title ... />

The default tag is fine, but if your content people keep putting special characters in their titles (like mine do) then you might want to add the protect_entities="yes" attribute to the {exp:xml_encode} tag. For example the main EE site I work on uses &#187; (») and &amp; (&) a lot in titles.

Even after protecting entities I was still having a heck of a time getting a trademark (™) symbol that is used on a site in many post titles and in a category to consistently display on both the webpage and in RSS feed aggregators - after some digging I realized the character entity that was being used (&#x2122;) for it was not the UTF-8 reference (&#8482;) specified as the encoding for both the RSS and XHTML. So, make sure you (or your content editors) use the correct character encoding entities for special characters!

Item <guid ... />

As formated in the official EE RSS template the <guid> is not a permalink, and therefore should have isPermaLink="false" attribute added to it. Of course you could use your actual permalink and then you could leave that off or change it to "true".

"We recommend the use of the Atom and RSS 2.0 elements to unambiguously identify items. An item that is updated should keep its original ID, and a new item should never reuse an older item's ID. Changing IDs unnecessarily may result in duplicate items, and reusing IDs may cause some items to be hidden. "Tag URIs" make good IDs, since they don't change even when you need to reorganize your links." - Google Reader Tips for Publishers > Implementing Feeds

The above recommendatio explains the multi-posting of an entry on update issue I referred to earlier. Because of this, you will probably want to remove the '%s' from the formatting attribute as well. So, change the gmt_entry_date format string in the <guid> line to "%H:%iZ".

Optionally, you could just use the actual URI of your article and change the isPermalink attribute to true. EE won't let you post two items with the same URL title within a weblog/channel, so you are pretty safe there (EE adds a number to the end of URL title if one already exists).

Item <description.../>

This line is technically fine, but most people will change this to allow HTML formatting of their entries: <description><![CDATA[{summary}{body}]]></description>. This is what displays the bulk of your entry item and where most of your site-specific customization will happen, Customizing ExpressionEngine RSS 2.0 Template on 'A Blog Not Limited' is a great resource for this.

Categories

The template uses <dc:subject>. Your feed will be more interoperable with other systems and make more sense programatically if each category is in its own tag. You can do this using the <dc:subject> format, or you can switch to using the <category> tag for each as provided for in the RSS 2.0 spec.

Original Template

Code:

<dc:subject>{exp:xml_encode}{categories backspace="1"}{category_name}, {/categories}{/exp:xml_encode}</dc:subject>

Result:

<dc:subject>Architecture, Science, Workplace</dc:subject>

Separate using <dc:subject>

Code:

{categories}<dc:subject>{exp:xml_encode}{category_name}{/exp:xml_encode}</dc:subject>{/categories}

Result:

<dc:subject>Architecture</dc:subject>
<dc:subject>Science</dc:subject>
<dc:subject>Workplace</dc:subject>

Separate using <category>

Code:

{categories}<category>{exp:xml_encode}{category_name}{/exp:xml_encode}</category>{/categories}

Result:

<category>Architecture</category>
<category>Science</category>
<category>Workplace</category>

Updated EE RSS 2.0 Template

This includes what I consider minimal mandatory fixes to ensure error-free code and to prevent (re)posting problems.

{assign_variable:master_weblog_name="blog"}
{assign_variable:master_weblog_status="open"}
{exp:rss:feed weblog="{master_weblog_name}" status="{master_weblog_status}"}

<?xml version="1.0" encoding="{encoding}"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:admin="http://webns.net/mvcb/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel>
    <title>{exp:xml_encode}{weblog_name}{/exp:xml_encode}</title>
    <link>{weblog_url}</link>
    <description>{weblog_description}</description>
    <dc:language>{weblog_language}</dc:language>
    <dc:creator>{email}</dc:creator>
    <dc:rights>Copyright {gmt_date format="%Y"}</dc:rights>
    <dc:date>{gmt_date format="%Y-%m-%dT%H:%i:%s%Q"}</dc:date>
    <admin:generatorAgent rdf:resource="http://expressionengine.com/" />
{exp:weblog:entries weblog="{master_weblog_name}" limit="10" rdf="off" dynamic_start="on" disable="member_data|trackbacks" status="{master_weblog_status}"}
    <item>
      <title>{exp:xml_encode}{title}{/exp:xml_encode}</title>
      <link>{title_permalink=site/index}</link>
      <guid isPermaLink="false">{title_permalink=site/index}#When:{gmt_entry_date format="%H:%iZ"}</guid>
      <description>{exp:xml_encode}{summary}{body}{/exp:xml_encode}</description>
      <dc:subject>{exp:xml_encode}{categories backspace="1"}{category_name}, {/categories}{/exp:xml_encode}</dc:subject>
      <dc:date>{gmt_entry_date format="%Y-%m-%dT%H:%i%Q"}</dc:date>
    </item>
{/exp:weblog:entries}
    </channel>
</rss>
{/exp:rss:feed}

Improved EE RSS 2.0 Template

This includes optional changes that I added as a result of various articles, the RSS 2.0 spec and by examining the feeds of major professional news sites.

{assign_variable:master_weblog_name="BLOG"}
{assign_variable:master_weblog_status="OPEN"}
{assign_variable:master_rss_uri="http://PATH/TO/THIS/RSS/FEED"}

{exp:rss:feed weblog="{master_weblog_name}" status="{master_weblog_status}"}
<?xml version="1.0" encoding="{encoding}"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
    <title>{exp:xml_encode}{weblog_name}{/exp:xml_encode}</title>
    <link>{weblog_url}</link>
    <description>{weblog_description}</description>
    <dc:language>{weblog_language}</dc:language>
    <dc:creator>{email}</dc:creator>
    <dc:rights>Copyright {gmt_date format="%Y"}</dc:rights>
    <pubDate>{gmt_date format="%D, %d %M %Y %H:%i:%s %T"}</pubDate>
    <atom:link href="{master_rss_uri}" rel="self" type="application/rss+xml" />   
{exp:weblog:entries weblog="{master_weblog_name}" limit="10" rdf="off" dynamic_start="on" disable="member_data|trackbacks" status="{master_weblog_status}"}
    <item>
      <title>{exp:xml_encode protect_entities="yes"}{title}{/exp:xml_encode}</title>
      <link>{title_permalink=site/index}</link>
      <guid isPermaLink="false">{title_permalink="site/index"}#id:{entry_id}#date:{gmt_entry_date format="%H:%i"}</guid>
      <description><![CDATA[{summary}{body}]]></description>
      {categories}<category>{exp:xml_encode protect_entities="yes"}{category_name}{/exp:xml_encode}</category>
      {/categories}
      <pubDate>{gmt_entry_date format="%D, %d %M %Y %H:%i %T"}</pubDate>
    </item>
{/exp:weblog:entries}
    </channel>
</rss>
{/exp:rss:feed}						

Feedback

Please let me know if you have any suggestions for improvements on the basic template. I have already submitted these suggestions (as have others) on the EE Forums and I hope this article will soon be out-dated. For further information on customizing your RSS feed including adding Google Analytics tracking and additional fields such as author name, see Customizing ExpressionEngine RSS 2.0 Template at 'A Blog Not Limited' (if you use their updated template don't forget to remove the seconds from date fields in the item section).

Drupal 5.x on Ubuntu LAMP

The quick and dirty dev install of Drupal on Ubuntu

USE THIS INFORMATION AT YOUR OWN RISK. Any information found on this website is offered only as informational and includes no warranty, guarantees or support. The author claims no authority on any subject whatsoever.

Why Drupal, Why Ubuntu?

For me it's all about community. I've always enjoyed apache web development in part because of the active and helpful user groups, forums, irc channels, etc. I use Ubuntu as the operating system for my LAMP because it's really popular right now - it has a very active forum and pretty good documentation. Drupal is an open-source content management system, or you could look at it as a framework since it was built to make it easy for coders to override almost anything it does without hacking the core. This means you could make it do anything you want if you happen to be good enough at PHP and still take advantage of core development and security updates no matter how much you modify the product.

Why write installation instructions?

Good question? Well, the installation instructions at Drupal.org are good but they cover all sorts of environments (who wants to slog through all that?) and those in the Ubuntu Community Docs are great and pretty specific but cover Drupal 4.6.7 and 5.1. I probably should update the docs at Ubuntu, perhaps I will after I hash it out here and after a few people let me know they worked or what to change. Also, I like to search for instructions specific to my situation whenever I approach a new installation. It's good to see what other people in similar circumstances have encountered, I call it due diligence. I would suggest any user doing this install review the documentation mentioned above thoroughly. Also see related links at the end of this article.

Environment

These instructions don't cover the setup of your server environment. Mine happens to be:
  • Ubuntu 6.06 LTS server
  • Apache 2.0.5.5
  • PHP 5.1.2
  • MySQL 5.0.22

Get Drupal

wget http://ftp.drupal.org/files/projects/drupal-5.7.tar.gz
tar -zxvf drupal-5.7.tar.gz

I'm a big fan of apt-get but there were a lot of issues in the forum started by people having problems with Drupal in the repositories. Community Docs recommend getting the latest package from Drupal.org, right now that happens to be Drupal 5.7. (Drupal 6 is out now as well, and is very cool, but CCK/Views aren't ready for prime-time and I'm installing for the purposes of following tutorials written for 5.x.)

Move Drupal

sudo mkdir /var/www/drupaltest
sudo mv drupal-5.3/* drupal-5.3/.htaccess /var/www/drupaltest
sudo mkdir /var/www/drupaltest/files

My apache install is pretty much setup to default config. /var/www is my web root, yours may vary. Because I'm just using this particular install as a test which I plan on destroying later I'm going to put it in the boring old subdirectory 'drupaltest', actually I named mine d57_test_01 but thought drupaltest would be more comprehendable in the example.

In the mv command we explicitly move .htaccess because it's a hidden file.

Database Setup

mysqladmin -u root -p create db_drupaltest
mysql -u root -p

Create the database for Drupal to use - you can replace 'db_drupaltest' with whatever you'd like to call the database. You'll need to enter your mySQL root password. If you get an access denied error make sure you're using the mySQL root password and not your login or Ubuntu root password. The second command puts you in mySQL monitor, the command line interface for managing your MySQL server. The commands in the next code section are SQL. You could also run this in phpMyAdmin if you'd rather have a GUI.

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE TEMPORARY TABLES, LOCK TABLES ON db_drupaltest.* TO 'drupal_usr'@'localhost' IDENTIFIED BY 'secretpassword';
FLUSH PRIVILEGES; \q

Change the datebase name, the username 'drupal_usr' and 'secretpassword' to whatever you like. Just don't forget to write it down somewhere safe because you'll need to know it later.

Edit Settings.php

sudo vi /var/www/drupaltest/sites/default/settings.php

Using vi (or whatevs) change the $db_url line. Note: If you use a fancy charcters or dashes in your user, password or database names replace them URI hex encodings, this is detailed in database settings comments section in the settings.php file.

$db_url="mysql://drupal_usr:secretpassword@localhost/db_drupaltest";

Up Your PHP Memory Allocation

If you have a new LAMP install the default memory setting for scripts is 8M. This is redonkulous and Drupal will suck. Look for the 'Resource Limits' section and change memory_limit to 32M and then restart apache.

sudo vi /etc/php5/apache2/php.ini sudo /etc/init.d/apache2 restart

Final Steps

Go to http://localhost/drupaltest/install.php (or your servername instead of localhost if DNS is setup). You should see this:

Screenshot of Drupal Installation Message

One last thing, if you click Administer you will probably get a 'one or more problems were detected' error message. Two things: your files directory isn't writable and you cron job hasn't run. The first one is easy - just make the files area writable by all:

sudo chmod 777 /var/www/drupaltest/files

As for cron, you can just click 'run cron manually' on the Status report page - but you'll need to do this anytime you want to update the index. For a quick dev install you're likely to trash soon it may not be necessary but for a production or long-term dev install you'll want to set up a cron job to hit http://localhost/drupaltest/admin/logs/status/run-cron every few minutes depending upon your site's traffic and requirements. See Configuring cron jobs in Drupal's Getting Started guide for more.

That's it. Good luck folks, now enjoy surfing the Drupal learning curve...heheh.

Related Links