Tag Archives: code

How to Fix Your ExpressionEngine RSS Template

I began investigating the fascinating minutia of RSS when I couldn't find a reasonable answer in the EE forums to why Google Reader was re-posting every updated post on my site even thought the entry dates hadn't changed. As I went through the prevalent templates floating around the EE community line-by-line I noticed several things that could be improved upon. The only critical fix, in my opinion, is removal of seconds from the dates in the <item /> section. If you want your feed to validate you'll want to add the atom namespace. The rest are optional improvements.

If you want to skip the wheres and whys, I've posted the the updated (fixed) and improved RSS templates on Snipplr.com and in their entirety at the end of this article.

Existing RSS templates for ExpressionEngine

Resources

Methodology

I'm not going for a PHD in online syndication, I just wanted my RSS feeds to be error-free, work as expected in major aggregators and to use best-practices which could be determined within a somewhat low threshold of research pain. In addition to referring first to the RSS 2.0 specifications, I used the RSS feeds of some really large websites to serve as examples, making the assumption (I know) that these sites have probably done thorough research on this topic. Often my choices were the result of seeing if these "big players" were all doing the same thing. All the feeds are major sites with the exception of the Flickr blog. The Flickr blog is using WordPress.com. I figured with their huge user-base not only would the feeds have been thoroughly vetted, most aggregators will be able to read them due to the sheer volume of WordPress-powered sites out there. Also I chose a WordPress.com feed instead of a self-hosted installation of WP to make sure it was the well-tested standard feed. The feeds I used are:

RSS Feed Breakdown

Feed Format: RSS vs. Atom

As of mid-2005, the two most likely candidates are RSS 2.0 and Atom 1.0. Google reader supports either fully and they suggest choosing one or the other (not both) because most RSS readers support all major formats and offering both can confuse users. The Atom syndication format, whose creation was in part motivated by a desire to get a clean start free of the issues surrounding RSS, has been adopted as IETF Proposed Standard RFC 4287 and is used by Google. However, RSS 2.0 was the first to support enclosures and has captured the podcasting audience and is the recommended format in the iTunes podcasts specs.

I generally do as Google does when it comes to web optimization, and I am a big fan of standards. In some regards I would call Atom "the higher path". That said, I am also a big fan of simplicity and ease-of-use so I'm going with RSS 2.0 because:

  • I already hand-built an RSS 2.0 feed for podcasting (well, for iTunes) so would rather learn one standard / keep all feeds similarly formatted.
  • One less term that could potentially confuse end-users and "web lite" folk who might inherit my work later on.
  • A lot of really big sites that have probably carefully considered this topic went with RSS 2.0, including NYTimes.com, AListApart.com, Ebay.com, news.BBC.co.uk, and CNN.com.

RSS XML Namespaces

Here's where my first change kicks in. If you aren't actually USING a namespace in your RSS feed there's no need to include it - it's just cruft.

Before

<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:admin="http://webns.net/mvcb/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:content="http://purl.org/rss/1.0/modules/content/">

After

<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/">

Even Better

<rss version="2.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom">

The namespaces sy and content aren't used at all and can just be removed. Optional: I chose to remove xmlns:admin="http://webns.net/mvcb/" and the only tag that used it, <admin:generatorAgent rdf:resource="http://expressionengine.com/" /> - removing the admin namespace also eliminated rdf, which admin namespace depends on.

I'm not saying any of the removed namespaces are bad, just unnecessary. For example, admin is a very common namespace, but if you leave it in you should update the URI to include the EE version number (dynamically) and add the <admin:errorReportsTo rdf:resource="URI"/> pointing to a valid email address for errors. It may be beneficial to aggregators in reporting statistical reporting of web frameworks and content management systems delivering aggregated RSS feeds.

I rather admire the efficency and simplicity of the namespace declarations used by NYTimes.com, CNN.com and others - so again, this is personal choice.

The third option (even better) adds the atom namespace. The default EE RSS feed will not validate because it lacks the required atom:link tag containing the URI of the feed itself. There is some debate on whether this is actually necessary (some say the validator is wrong, not the lack of the atom tag)- read the article Adding Atom:Link to Your RSS Feed for background on this.

Depending on your site's content, your SEO practices and target audience, it is likely that you may need additional namespaces. A media-rich site, for example, would benefit from the media namespace. The media namespace is used to syndicate video, images and other media and can open up your feed to consumption by media-rich aggregators and services like Cooliris and Yahoo Video Search.

Channel Area

Most of the default template makes perfect sense - just make sure to take a look at your feed output to make sure all the EE fields used have valid values. Also, make sure you want to the {weblog_foo} tags - if you are providing a site feed that combines multiple sections you will probably want to hand-edit many of the tags in the channel area.

Dates

Important tip: make sure your feed is correctly reporting the timestamp of your entry date. If the seconds are changing on an item whenever you make an update this will cause many aggregators, including Google Reader, to repost the entry to your RSS feed. Either take the seconds off the time or replace '%s' with an arbitrary static value like '15'.

A search for 'RSS Updates' in the EE Forums will reveal that many people have had this problem. I tested all the date fields in my feeds (see this thread for details) and found that although the entry date day, hour and minute doesn't change on update (as expected), the seconds do! This weird behavior has something to do with how the dates are stored in the database and/or how the date is interpreted by EEs date tags. What it means is that if you go back and edit a post from three years ago, some aggregators will repost the item to your RSS feed even though you did not change the entry date. This can be especially troubling if you like to go back and tweak a post a lot right after publishing - you may go to your feed reader to see it reposted several times in a row.

pubDate vs. dc:date

<pubDate> is part of the RSS 2.0 spec. A lot of feeds out there still use <dc:date> and this either because they kept it from their RSS 1.0 template (for which dc:date was the only option) or they really like the very popular Dublin Core namespace or they prefer it because of the ISO 8601 date format which is much more prevalent than the (really old, as in ARPANET old!) RFC 822 date format that <pubDate> uses. On one hand it makes sense to stay with the spec and pull in namespace elements only as required. On the other hand, it makes sense to provide output in the most reusable way (updated date format). Feed readers parse either just fine, so this is judgment call on your part. Here's an agrument for each:

Based on my own survey of the feeds referenced above, I opted to switch to <pubDate>, replacing <dc:date /> in the channel with:

<pubDate>{gmt_date format="%D, %d %M %Y %H:%i:%s %T"}</pubDate>

And replacing <dc:date /> in the item declaration(s) with:

<pubDate>{gmt_entry_date format=&qout;%D, %d %M %Y %H:%i %T"}</pubDate>

Item <title ... />

The default tag is fine, but if your content people keep putting special characters in their titles (like mine do) then you might want to add the protect_entities="yes" attribute to the {exp:xml_encode} tag. For example the main EE site I work on uses &#187; (») and &amp; (&) a lot in titles.

Even after protecting entities I was still having a heck of a time getting a trademark (™) symbol that is used on a site in many post titles and in a category to consistently display on both the webpage and in RSS feed aggregators - after some digging I realized the character entity that was being used (&#x2122;) for it was not the UTF-8 reference (&#8482;) specified as the encoding for both the RSS and XHTML. So, make sure you (or your content editors) use the correct character encoding entities for special characters!

Item <guid ... />

As formated in the official EE RSS template the <guid> is not a permalink, and therefore should have isPermaLink="false" attribute added to it. Of course you could use your actual permalink and then you could leave that off or change it to "true".

"We recommend the use of the Atom and RSS 2.0 elements to unambiguously identify items. An item that is updated should keep its original ID, and a new item should never reuse an older item's ID. Changing IDs unnecessarily may result in duplicate items, and reusing IDs may cause some items to be hidden. "Tag URIs" make good IDs, since they don't change even when you need to reorganize your links." - Google Reader Tips for Publishers > Implementing Feeds

The above recommendatio explains the multi-posting of an entry on update issue I referred to earlier. Because of this, you will probably want to remove the '%s' from the formatting attribute as well. So, change the gmt_entry_date format string in the <guid> line to "%H:%iZ".

Optionally, you could just use the actual URI of your article and change the isPermalink attribute to true. EE won't let you post two items with the same URL title within a weblog/channel, so you are pretty safe there (EE adds a number to the end of URL title if one already exists).

Item <description.../>

This line is technically fine, but most people will change this to allow HTML formatting of their entries: <description><![CDATA[{summary}{body}]]></description>. This is what displays the bulk of your entry item and where most of your site-specific customization will happen, Customizing ExpressionEngine RSS 2.0 Template on 'A Blog Not Limited' is a great resource for this.

Categories

The template uses <dc:subject>. Your feed will be more interoperable with other systems and make more sense programatically if each category is in its own tag. You can do this using the <dc:subject> format, or you can switch to using the <category> tag for each as provided for in the RSS 2.0 spec.

Original Template

Code:

<dc:subject>{exp:xml_encode}{categories backspace="1"}{category_name}, {/categories}{/exp:xml_encode}</dc:subject>

Result:

<dc:subject>Architecture, Science, Workplace</dc:subject>

Separate using <dc:subject>

Code:

{categories}<dc:subject>{exp:xml_encode}{category_name}{/exp:xml_encode}</dc:subject>{/categories}

Result:

<dc:subject>Architecture</dc:subject>
<dc:subject>Science</dc:subject>
<dc:subject>Workplace</dc:subject>

Separate using <category>

Code:

{categories}<category>{exp:xml_encode}{category_name}{/exp:xml_encode}</category>{/categories}

Result:

<category>Architecture</category>
<category>Science</category>
<category>Workplace</category>

Updated EE RSS 2.0 Template

This includes what I consider minimal mandatory fixes to ensure error-free code and to prevent (re)posting problems.

{assign_variable:master_weblog_name="blog"}
{assign_variable:master_weblog_status="open"}
{exp:rss:feed weblog="{master_weblog_name}" status="{master_weblog_status}"}

<?xml version="1.0" encoding="{encoding}"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:admin="http://webns.net/mvcb/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel>
    <title>{exp:xml_encode}{weblog_name}{/exp:xml_encode}</title>
    <link>{weblog_url}</link>
    <description>{weblog_description}</description>
    <dc:language>{weblog_language}</dc:language>
    <dc:creator>{email}</dc:creator>
    <dc:rights>Copyright {gmt_date format="%Y"}</dc:rights>
    <dc:date>{gmt_date format="%Y-%m-%dT%H:%i:%s%Q"}</dc:date>
    <admin:generatorAgent rdf:resource="http://expressionengine.com/" />
{exp:weblog:entries weblog="{master_weblog_name}" limit="10" rdf="off" dynamic_start="on" disable="member_data|trackbacks" status="{master_weblog_status}"}
    <item>
      <title>{exp:xml_encode}{title}{/exp:xml_encode}</title>
      <link>{title_permalink=site/index}</link>
      <guid isPermaLink="false">{title_permalink=site/index}#When:{gmt_entry_date format="%H:%iZ"}</guid>
      <description>{exp:xml_encode}{summary}{body}{/exp:xml_encode}</description>
      <dc:subject>{exp:xml_encode}{categories backspace="1"}{category_name}, {/categories}{/exp:xml_encode}</dc:subject>
      <dc:date>{gmt_entry_date format="%Y-%m-%dT%H:%i%Q"}</dc:date>
    </item>
{/exp:weblog:entries}
    </channel>
</rss>
{/exp:rss:feed}

Improved EE RSS 2.0 Template

This includes optional changes that I added as a result of various articles, the RSS 2.0 spec and by examining the feeds of major professional news sites.

{assign_variable:master_weblog_name="BLOG"}
{assign_variable:master_weblog_status="OPEN"}
{assign_variable:master_rss_uri="http://PATH/TO/THIS/RSS/FEED"}

{exp:rss:feed weblog="{master_weblog_name}" status="{master_weblog_status}"}
<?xml version="1.0" encoding="{encoding}"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
    <title>{exp:xml_encode}{weblog_name}{/exp:xml_encode}</title>
    <link>{weblog_url}</link>
    <description>{weblog_description}</description>
    <dc:language>{weblog_language}</dc:language>
    <dc:creator>{email}</dc:creator>
    <dc:rights>Copyright {gmt_date format="%Y"}</dc:rights>
    <pubDate>{gmt_date format="%D, %d %M %Y %H:%i:%s %T"}</pubDate>
    <atom:link href="{master_rss_uri}" rel="self" type="application/rss+xml" />   
{exp:weblog:entries weblog="{master_weblog_name}" limit="10" rdf="off" dynamic_start="on" disable="member_data|trackbacks" status="{master_weblog_status}"}
    <item>
      <title>{exp:xml_encode protect_entities="yes"}{title}{/exp:xml_encode}</title>
      <link>{title_permalink=site/index}</link>
      <guid isPermaLink="false">{title_permalink="site/index"}#id:{entry_id}#date:{gmt_entry_date format="%H:%i"}</guid>
      <description><![CDATA[{summary}{body}]]></description>
      {categories}<category>{exp:xml_encode protect_entities="yes"}{category_name}{/exp:xml_encode}</category>
      {/categories}
      <pubDate>{gmt_entry_date format="%D, %d %M %Y %H:%i %T"}</pubDate>
    </item>
{/exp:weblog:entries}
    </channel>
</rss>
{/exp:rss:feed}						

Feedback

Please let me know if you have any suggestions for improvements on the basic template. I have already submitted these suggestions (as have others) on the EE Forums and I hope this article will soon be out-dated. For further information on customizing your RSS feed including adding Google Analytics tracking and additional fields such as author name, see Customizing ExpressionEngine RSS 2.0 Template at 'A Blog Not Limited' (if you use their updated template don't forget to remove the seconds from date fields in the item section).

Installing Netbeans PHP IDE on Ubuntu

Lizard Steals Green Bean

Nothing amazing going on, just a few tips that might save you some time:

  • You need java runtime installed and working, prolly apt-cache search to make
    sure you’re putting in the most recent version (6 as of this writing)

    • sudo apt-get update
    • sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-font
  • The netbeans in the repo is for the Java IDE, so don’t bother with apt-get
  • Download the install file here: http://www.netbeans.org/downloads/index.html
    • Be sure to pick the PHP bundle
  • If clicking on the netbeans-x.x-ml-php-linux.sh file gives you an error or tries to open in gedit or something, right-click > properties > permissions and check ‘allow executing file as a program’
  • Select Run (not ‘Run in Terminal’), running in terminal will throw some GTK errors

Next you might want to head over to the Netbeans website and watch the intro vid and orient yourself to the plethora of PHP-centric features.

As fair newb to programming in PHP I can’t say I’m qualified to suggest an IDE. So why Netbeans? It’s free. It porvides syntactic and semantic code highlighting for PHP and debugging through Xdebug. Folks in my Seattle PHP meetup group who know a lot more about programming than I do seem to really like it, every time I go to install Eclipse I am daunted by the website, instructions and innumerable options. Finally, it was recommended in the recent Smashing Magazine article The Big PHP IDE Test: Why Use One And Which To Choose (2009.02.11) so I stopped resisting.

Do you like it, recommend others over it?