How to Fix Your ExpressionEngine RSS Template

I began investigating the fascinating minutia of RSS when I couldn't find a reasonable answer in the EE forums to why Google Reader was re-posting every updated post on my site even thought the entry dates hadn't changed. As I went through the prevalent templates floating around the EE community line-by-line I noticed several things that could be improved upon. The only critical fix, in my opinion, is removal of seconds from the dates in the <item /> section. If you want your feed to validate you'll want to add the atom namespace. The rest are optional improvements.

If you want to skip the wheres and whys, I've posted the the updated (fixed) and improved RSS templates on Snipplr.com and in their entirety at the end of this article.

Existing RSS templates for ExpressionEngine

Resources

Methodology

I'm not going for a PHD in online syndication, I just wanted my RSS feeds to be error-free, work as expected in major aggregators and to use best-practices which could be determined within a somewhat low threshold of research pain. In addition to referring first to the RSS 2.0 specifications, I used the RSS feeds of some really large websites to serve as examples, making the assumption (I know) that these sites have probably done thorough research on this topic. Often my choices were the result of seeing if these "big players" were all doing the same thing. All the feeds are major sites with the exception of the Flickr blog. The Flickr blog is using WordPress.com. I figured with their huge user-base not only would the feeds have been thoroughly vetted, most aggregators will be able to read them due to the sheer volume of WordPress-powered sites out there. Also I chose a WordPress.com feed instead of a self-hosted installation of WP to make sure it was the well-tested standard feed. The feeds I used are:

RSS Feed Breakdown

Feed Format: RSS vs. Atom

As of mid-2005, the two most likely candidates are RSS 2.0 and Atom 1.0. Google reader supports either fully and they suggest choosing one or the other (not both) because most RSS readers support all major formats and offering both can confuse users. The Atom syndication format, whose creation was in part motivated by a desire to get a clean start free of the issues surrounding RSS, has been adopted as IETF Proposed Standard RFC 4287 and is used by Google. However, RSS 2.0 was the first to support enclosures and has captured the podcasting audience and is the recommended format in the iTunes podcasts specs.

I generally do as Google does when it comes to web optimization, and I am a big fan of standards. In some regards I would call Atom "the higher path". That said, I am also a big fan of simplicity and ease-of-use so I'm going with RSS 2.0 because:

  • I already hand-built an RSS 2.0 feed for podcasting (well, for iTunes) so would rather learn one standard / keep all feeds similarly formatted.
  • One less term that could potentially confuse end-users and "web lite" folk who might inherit my work later on.
  • A lot of really big sites that have probably carefully considered this topic went with RSS 2.0, including NYTimes.com, AListApart.com, Ebay.com, news.BBC.co.uk, and CNN.com.

RSS XML Namespaces

Here's where my first change kicks in. If you aren't actually USING a namespace in your RSS feed there's no need to include it - it's just cruft.

Before

<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:admin="http://webns.net/mvcb/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:content="http://purl.org/rss/1.0/modules/content/">

After

<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/">

Even Better

<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom">

The namespaces sy and content aren't used at all and can just be removed. Optional: I chose to remove xmlns:admin="http://webns.net/mvcb/" and the only tag that used it, <admin:generatorAgent rdf:resource="http://expressionengine.com/" /> - removing the admin namespace also eliminated rdf, which admin namespace depends on.

I'm not saying any of the removed namespaces are bad, just unnecessary. For example, admin is a very common namespace, but if you leave it in you should update the URI to include the EE version number (dynamically) and add the <admin:errorReportsTo rdf:resource="URI"/> pointing to a valid email address for errors. It may be beneficial to aggregators in reporting statistical reporting of web frameworks and content management systems delivering aggregated RSS feeds.

I rather admire the efficency and simplicity of the namespace declarations used by NYTimes.com, CNN.com and others - so again, this is personal choice.

The third option (even better) adds the atom namespace. The default EE RSS feed will not validate because it lacks the required atom:link tag containing the URI of the feed itself. There is some debate on whether this is actually necessary (some say the validator is wrong, not the lack of the atom tag)- read the article Adding Atom:Link to Your RSS Feed for background on this.

Depending on your site's content, your SEO practices and target audience, it is likely that you may need additional namespaces. A media-rich site, for example, would benefit from the media namespace. The media namespace is used to syndicate video, images and other media and can open up your feed to consumption by media-rich aggregators and services like Cooliris and Yahoo Video Search.

Channel Area

Most of the default template makes perfect sense - just make sure to take a look at your feed output to make sure all the EE fields used have valid values. Also, make sure you want to the {weblog_foo} tags - if you are providing a site feed that combines multiple sections you will probably want to hand-edit many of the tags in the channel area.

Dates

Important tip: make sure your feed is correctly reporting the timestamp of your entry date. If the seconds are changing on an item whenever you make an update this will cause many aggregators, including Google Reader, to repost the entry to your RSS feed. Either take the seconds off the time or replace '%s' with an arbitrary static value like '15'.

A search for 'RSS Updates' in the EE Forums will reveal that many people have had this problem. I tested all the date fields in my feeds (see this thread for details) and found that although the entry date day, hour and minute doesn't change on update (as expected), the seconds do! This weird behavior has something to do with how the dates are stored in the database and/or how the date is interpreted by EEs date tags. What it means is that if you go back and edit a post from three years ago, some aggregators will repost the item to your RSS feed even though you did not change the entry date. This can be especially troubling if you like to go back and tweak a post a lot right after publishing - you may go to your feed reader to see it reposted several times in a row.

pubDate vs. dc:date

<pubDate> is part of the RSS 2.0 spec. A lot of feeds out there still use <dc:date> and this either because they kept it from their RSS 1.0 template (for which dc:date was the only option) or they really like the very popular Dublin Core namespace or they prefer it because of the ISO 8601 date format which is much more prevalent than the (really old, as in ARPANET old!) RFC 822 date format that <pubDate> uses. On one hand it makes sense to stay with the spec and pull in namespace elements only as required. On the other hand, it makes sense to provide output in the most reusable way (updated date format). Feed readers parse either just fine, so this is judgment call on your part. Here's an agrument for each:

Based on my own survey of the feeds referenced above, I opted to switch to <pubDate>, replacing <dc:date /> in the channel with:

<pubDate>{gmt_date format="%D, %d %M %Y %H:%i:%s %T"}</pubDate>

And replacing <dc:date /> in the item declaration(s) with:

<pubDate>{gmt_entry_date format=&qout;%D, %d %M %Y %H:%i %T"}</pubDate>

Item <title ... />

The default tag is fine, but if your content people keep putting special characters in their titles (like mine do) then you might want to add the protect_entities="yes" attribute to the {exp:xml_encode} tag. For example the main EE site I work on uses &#187; (») and &amp; (&) a lot in titles.

Even after protecting entities I was still having a heck of a time getting a trademark (™) symbol that is used on a site in many post titles and in a category to consistently display on both the webpage and in RSS feed aggregators - after some digging I realized the character entity that was being used (&#x2122;) for it was not the UTF-8 reference (&#8482;) specified as the encoding for both the RSS and XHTML. So, make sure you (or your content editors) use the correct character encoding entities for special characters!

Item <guid ... />

As formated in the official EE RSS template the <guid> is not a permalink, and therefore should have isPermaLink="false" attribute added to it. Of course you could use your actual permalink and then you could leave that off or change it to "true".

"We recommend the use of the Atom and RSS 2.0 elements to unambiguously identify items. An item that is updated should keep its original ID, and a new item should never reuse an older item's ID. Changing IDs unnecessarily may result in duplicate items, and reusing IDs may cause some items to be hidden. "Tag URIs" make good IDs, since they don't change even when you need to reorganize your links." - Google Reader Tips for Publishers > Implementing Feeds

The above recommendatio explains the multi-posting of an entry on update issue I referred to earlier. Because of this, you will probably want to remove the '%s' from the formatting attribute as well. So, change the gmt_entry_date format string in the <guid> line to "%H:%iZ".

Optionally, you could just use the actual URI of your article and change the isPermalink attribute to true. EE won't let you post two items with the same URL title within a weblog/channel, so you are pretty safe there (EE adds a number to the end of URL title if one already exists).

Item <description.../>

This line is technically fine, but most people will change this to allow HTML formatting of their entries: <description><![CDATA[{summary}{body}]]></description>. This is what displays the bulk of your entry item and where most of your site-specific customization will happen, Customizing ExpressionEngine RSS 2.0 Template on 'A Blog Not Limited' is a great resource for this.

Categories

The template uses <dc:subject>. Your feed will be more interoperable with other systems and make more sense programatically if each category is in its own tag. You can do this using the <dc:subject> format, or you can switch to using the <category> tag for each as provided for in the RSS 2.0 spec.

Original Template

Code:

<dc:subject>{exp:xml_encode}{categories backspace="1"}{category_name}, {/categories}{/exp:xml_encode}</dc:subject>

Result:

<dc:subject>Architecture, Science, Workplace</dc:subject>

Separate using <dc:subject>

Code:

{categories}<dc:subject>{exp:xml_encode}{category_name}{/exp:xml_encode}</dc:subject>{/categories}

Result:

<dc:subject>Architecture</dc:subject>
<dc:subject>Science</dc:subject>
<dc:subject>Workplace</dc:subject>

Separate using <category>

Code:

{categories}<category>{exp:xml_encode}{category_name}{/exp:xml_encode}</category>{/categories}

Result:

<category>Architecture</category>
<category>Science</category>
<category>Workplace</category>

Updated EE RSS 2.0 Template

This includes what I consider minimal mandatory fixes to ensure error-free code and to prevent (re)posting problems.

{assign_variable:master_weblog_name="blog"}
{assign_variable:master_weblog_status="open"}
{exp:rss:feed weblog="{master_weblog_name}" status="{master_weblog_status}"}

<?xml version="1.0" encoding="{encoding}"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:admin="http://webns.net/mvcb/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel>
    <title>{exp:xml_encode}{weblog_name}{/exp:xml_encode}</title>
    <link>{weblog_url}</link>
    <description>{weblog_description}</description>
    <dc:language>{weblog_language}</dc:language>
    <dc:creator>{email}</dc:creator>
    <dc:rights>Copyright {gmt_date format="%Y"}</dc:rights>
    <dc:date>{gmt_date format="%Y-%m-%dT%H:%i:%s%Q"}</dc:date>
    <admin:generatorAgent rdf:resource="http://expressionengine.com/" />
{exp:weblog:entries weblog="{master_weblog_name}" limit="10" rdf="off" dynamic_start="on" disable="member_data|trackbacks" status="{master_weblog_status}"}
    <item>
      <title>{exp:xml_encode}{title}{/exp:xml_encode}</title>
      <link>{title_permalink=site/index}</link>
      <guid isPermaLink="false">{title_permalink=site/index}#When:{gmt_entry_date format="%H:%iZ"}</guid>
      <description>{exp:xml_encode}{summary}{body}{/exp:xml_encode}</description>
      <dc:subject>{exp:xml_encode}{categories backspace="1"}{category_name}, {/categories}{/exp:xml_encode}</dc:subject>
      <dc:date>{gmt_entry_date format="%Y-%m-%dT%H:%i%Q"}</dc:date>
    </item>
{/exp:weblog:entries}
    </channel>
</rss>
{/exp:rss:feed}

Improved EE RSS 2.0 Template

This includes optional changes that I added as a result of various articles, the RSS 2.0 spec and by examining the feeds of major professional news sites.

{assign_variable:master_weblog_name="BLOG"}
{assign_variable:master_weblog_status="OPEN"}
{assign_variable:master_rss_uri="http://PATH/TO/THIS/RSS/FEED"}

{exp:rss:feed weblog="{master_weblog_name}" status="{master_weblog_status}"}
<?xml version="1.0" encoding="{encoding}"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
    <title>{exp:xml_encode}{weblog_name}{/exp:xml_encode}</title>
    <link>{weblog_url}</link>
    <description>{weblog_description}</description>
    <dc:language>{weblog_language}</dc:language>
    <dc:creator>{email}</dc:creator>
    <dc:rights>Copyright {gmt_date format="%Y"}</dc:rights>
    <pubDate>{gmt_date format="%D, %d %M %Y %H:%i:%s %T"}</pubDate>
    <atom:link href="{master_rss_uri}" rel="self" type="application/rss+xml" />   
{exp:weblog:entries weblog="{master_weblog_name}" limit="10" rdf="off" dynamic_start="on" disable="member_data|trackbacks" status="{master_weblog_status}"}
    <item>
      <title>{exp:xml_encode protect_entities="yes"}{title}{/exp:xml_encode}</title>
      <link>{title_permalink=site/index}</link>
      <guid isPermaLink="false">{title_permalink="site/index"}#id:{entry_id}#date:{gmt_entry_date format="%H:%i"}</guid>
      <description><![CDATA[{summary}{body}]]></description>
      {categories}<category>{exp:xml_encode protect_entities="yes"}{category_name}{/exp:xml_encode}</category>
      {/categories}
      <pubDate>{gmt_entry_date format="%D, %d %M %Y %H:%i %T"}</pubDate>
    </item>
{/exp:weblog:entries}
    </channel>
</rss>
{/exp:rss:feed}						

Feedback

Please let me know if you have any suggestions for improvements on the basic template. I have already submitted these suggestions (as have others) on the EE Forums and I hope this article will soon be out-dated. For further information on customizing your RSS feed including adding Google Analytics tracking and additional fields such as author name, see Customizing ExpressionEngine RSS 2.0 Template at 'A Blog Not Limited' (if you use their updated template don't forget to remove the seconds from date fields in the item section).

16 Comments »

  1. Florian Schroiff said,

    January 5, 2010 @ 12:11 am

    Great summary!

    Did you test if your feed template works when you give it more than one “weblog”?

  2. Lea said,

    January 5, 2010 @ 10:29 am

    This is excellent. I’ve had issues with visitors telling me that my feed keeps pinging them, not realizing it had something to do with the seconds. I’ve taken your improved template and implemented it to my site. Thanks!

  3. mahalie said,

    January 5, 2010 @ 10:38 am

    @Florian – it’s not different in it’s EE specific code than any other EE templates. This is a basic template which you will likely need to modify to suit your specific install. Use bars to include more than one weblog in the (master_weblog_name=”foo|bar”) and of course if your fields have different names in different weblogs you’re going to need to detect which weblog the item belongs to and offer different description tags.

  4. Brian said,

    January 5, 2010 @ 7:08 pm

    Thanks for this information. This answered a few question I had regarding how up to date the RSS 2 templates were. I knew some of the information could be reduced!

  5. vanni said,

    January 6, 2010 @ 10:12 am

    I have never managed to successfully get three weblogs to output rss feeds. Would love to see an example that works.

  6. mahalie said,

    January 6, 2010 @ 3:18 pm

    @vanni – if you are saying three weblogs into one rss feed then you can use the same templates as shown here (even the default EE rss templates work, just not perfectly). Include multiple weblogs as I mentioned to Florian, and then you’ll need to wrap the description tags in {if weblog == “Weblog Name”}{/if} for each weblog unless your weblogs happen to have the exact same field names used in the description tag.

  7. Michael Fraase said,

    January 8, 2010 @ 11:53 am

    I’ve implemented your improved RSS 2.0 template, successfully I think. Problem is, it feeds a “This feed contains no entries” (title and body) every hour.

    Please advise.

  8. Jim Arment said,

    January 9, 2010 @ 4:12 am

    Great article. I found it quite informational and educational. I implemented your “Improved EE RSS 2.0 Template” (with some minor customization) for two different feeds and it’s working perfectly.

    Your template is getting added to my inventory file for future sites. Thanks for the article.

  9. Jeremy Bise said,

    June 20, 2010 @ 6:23 am

    Thanks so much for this. I’ve been going nuts trying to figure out why I was getting duplicate entries in various readers and hopefully the removal of the seconds will fix it right up for me! Thanks!

  10. Anonymous said,

    July 28, 2011 @ 3:24 am

    Thank you so much for this. I can’t believe this post has been here since 2009. I’ve been getting around this issue by creating a “Don’t Go On RSS” post status. Which is a pain because it’s easy to forget about changing the status whenever updating an entry. This solution is a lot more elegant!

  11. Andrew said,

    June 20, 2012 @ 9:20 am

    Thanks for this great information. We’ve been having these feed-entry update issues you described for a long time. We even banned making updates to posted entries so as not to re-ping the feed. Your research is invaluable. Thanks!

  12. mahalie said,

    June 20, 2012 @ 9:27 am

    I’m glad this information is still useful. Aquariumofpacific.org is a beautiful website by the way!

  13. seo consulta said,

    July 5, 2012 @ 10:10 am

    Thanks so much for this. Your template is getting added to my inventory file for future sites.

  14. Helbal Product said,

    August 17, 2012 @ 8:14 am

    Thanks a lot for your invaluable information. I’ve implemented your improved RSS 2.0 template. Thanks!

  15. Donovan said,

    November 14, 2012 @ 2:38 am

    Thanks for the post. Question is it still viable? I know it’s been a few years and the weblog tag is being used so I’m hoping that with a few changes it can be used.

    Thanks

  16. mahalie said,

    December 28, 2012 @ 2:52 pm

    Hi Donovan, I haven’t used EE for a long time and this is not updated for EE2 even so it’s likely out dated. Although you can’t copy and paste you should be able to extrapolate any fixes that might still be necessary to current template tags. Good luck & feel free to post back any updated markup.

RSS feed for comments on this post

Leave a Comment