RSS TutorialThis tutorial explains the features and benefits of an internet format called RSS, and provide a quick technical overview of it. It also includes information on an identical format called Atom. The reader is assumed to possess some familiarity with XML and other Web technologies. It’s not meant to be exhaustive; for more information, see the ‘More Information about RSS’ section.
Think about all of the knowledge that you simply access on the online on a day-to-day basis; news headlines, search results, “What’s New”, job vacancies, then forth. an outsized amount of this content are often thought of as a list; although it probably isn’t in HTML elements, the knowledge is list-oriented.
More Information Syndicated content — Good list of best practices for creating an RSS feed. RSS Workshop — A well-regarded introduction to publishing RSS feeds, from the state of Utah Online Services division. RSS Devcenter — O’reilly’s Web portal for all things RSS.
Most people got to track variety of those lists, but it becomes difficult once there are quite a couple of sources. this is often because they need to travel to every page, load it, remember how it’s formatted, and find where they last left off within the list.
RSS is an XML-based format that permits the syndication of lists of hyperlinks, along side other information, or metadata, that helps viewers decide whether or not they want to follow the link.
This allows peoples’ computers to fetch and understand the knowledge , in order that all of the lists they’re curious about are often tracked and personalized for them. it’s a format that’s intended to be used by computers on behalf of individuals , instead of being directly presented to them (like HTML).
To enable this, an internet site will make a feed, or channel, available, a bit like the other file or resource on the server. Once a feed is out there , computers can regularly fetch the file to urge the foremost recent items on the list. most frequently , people will do that with an aggregator, a program that manages variety of lists and presents them during a single interface.
Feeds also can be used for other forms of list-oriented information, like syndicating the content itself (often weblogs) along side the links. However, this tutorial focuses on the utilization of RSS for syndication of links.
What’s during a feed?
A feed contains an inventory of things or entries, each of which is identified by a link. Each item can have any amount of other metadata related to it also .
The most basic metadata for an entry includes a title for the link and an outline of it; when syndicating news headlines, these fields could be used for the story title and therefore the first paragraph or a summary, for instance . for instance , an easy entry might look like:
the world was attacked by an invasion fleet
from halfway across the galaxy; luckily, a fatal
miscalculation of scale resulted within the entire armada
being eaten by a little dog.
Additionally, the feed itself can have metadata related to it, in order that it are often given a title (e.g., “Bob’s news headlines”), description, and other fields like publisher and copyright terms.
For a thought of what full feeds appear as if , see ‘RSS Versions and Modules’.
How do people use feeds?
Aggregators are the foremost common use of feeds, and there are several types. Web aggregators (sometimes called portals) make this view available during a Web page; my Yahoo may be a well-known example of this. Aggregators have also been integrated into e-mail clients, users’ desktops, or standalone, dedicated software.
Aggregators offers a spread of special features, including combining several related feeds into one view, hiding entries that the viewer has already seen, and categorizing feeds and entries.
Other uses of feeds include site tracking by search engines and other software; because the feed is machine-readable, the search software doesn’t need to find out which parts of the location are important and which parts are just the navigation and presentation. you’ll also prefer to allow people to republish your feeds on their internet sites , giving them the power to represent your content as they require.
While this seems bad initially glance, it actually improves your site’s visibility; by making it easier for your users to stay up together with your site — allowing them to ascertain it the way they need to — it’s more likely that they’ll know when something that interests them is out there on your site.
For example, imagine that your company announces a replacement product or feature monthly or two. Without a feed, your viewers need to remember to return to your site and see if they find anything new — if they need time. If you provide a feed for them, they will point their aggregator or other software at it, and it’ll give them a link and an outline of developments at your site almost as soon as they happen.
News is similar; because there are numerous sources of stories on the online , most of your viewers won’t come to your site a day . By providing a feed, you’re ahead of them constantly, improving the probabilities that they’ll click through to a piece of writing that catches their eye.
You also control what information is syndicated within the feed, whether it’s a full article or simply a teaser. Your content can still be protected by your current access control mechanisms; only the links and metadata are distributed. you’ll also protect the RSS feed itself with SSL encryption and HTTP username/password authentication too, if you’d like.
In some ways , syndication is analogous to the subscription newsletters that a lot of sites offer to stay viewers up-to-date. the large difference is that they don’t need to supply an e-mail address, lowering the barrier of privacy concerns, while still supplying you with an immediate channel to your viewers. Also, they get to ascertain the content within the manner that’s most convenient to them, which suggests that you simply get more eyes watching your content.
News & Announcements – headlines, notices and any list of announcements that are added to over time
Document listings – lists of added or changed pages, in order that people don’t got to constantly check for various content
Bookmarks and other external links – while most of the people use RSS for sharing links from their own sites, it’s a natural fit sharing lists of external links
Calendars – listings of past or upcoming events, deadlines or holidays
Mailing lists – to go with a Web-based archive of public or private e-mail lists
Search results – to let people track changing or new results to their searches
Databases – job listings, software releases, etc.
While it’s an honest start to possess a “master feed” for your site that lists recent news and events, don’t stop there. Generally, each area of your site that features a changing list of data should have a corresponding feed; this enables viewers to exactly target their interests.
For example, if your news site has pages for World news, national news, local news, business, sports, etc., there should be a feed for every of those sections.
If your site offers a customized view of knowledge (e.g., people can choose categories of data which will show abreast of their home page), offer this as a feed, in order that the viewers’ sites match the content of their feeds.
A great example of this is often the variability of feeds that Netflix provides; not only are you able to keep track of latest releases, but also personalised recommendations and even an inventory of the films in your queue.
Another exemplar is Apple’s iTunes Music Store RSS feed generator; you’ll customize it supported your preferences, and therefore the views it allows match those provided within the Music Store itself.
Finally, remember that feeds are even as — if less — useful on an Intranet as they’re on the web . Syndication are often a strong tool for sharing and integrating information inside a corporation .
If that option isn’t available, you’ve got variety of choices;
Self-scraping — the simplest thanks to publish a feed from existing content. Scraping tools fetch your website and pull out the relevant parts for the feed, in order that you don’t need to change your publishing system. Some use regular expressions or XPath expressions, while others require you to price your page with minimal hints (usually using or tags) that help it decide what should be put into the feed.
Feed integration — If your site is dynamically generated (using languages like Perl, Python or PHP), it’s going to have a RSS library available, in order that you’ll integrate the feed into your publishing process.
Starting with the feed — Alternatively, you’ll manage the list-oriented parts of your content within the RSS feed itself, and generate your sites (as well as other content, like e-mail lists) from the feed. This has the advantage of always having the right information within the feed, and tools like XSLT make this feature easy, especially if you’re ranging from scratch.
Third party scraping — If none of those options work for you, some people on the online will scrape your site for you and make the feed available. Be warned, however, that this is often never as reliable or accurate as doing it yourself, because they don’t know the small print of your content or your system. Also, using third parties introduces another point of failure within the delivery process; problems there (network, server or business) will cause your feed to be unavailable.
For more information about all of those options, see “Feed Tools” and “More Information”.
Pages that have an associated RSS feed should clearly indicate this to viewers by employing a link containing like ‘RSS feed’. for instance ,
RSS feed for this page
where ‘feed.rss’ is that the URL for the feed. the ‘type’ attribute tells browsers that this is often a link to an RSS feed how that they understand.
Additionally, some programs search for a link within the <head> section of your HTML. To support this, include a tag;
<link rel=”alternate” type=”application/rss+xml”
href=”feed.rss” title=”RSS feed for My Page”>
These links should be placed on the online page that’s most almost like the feed content; this permits people to seek out them as they browse.
Note that Atom feeds should use application/atom+xml instead of application/rss+xml in both sorts of use.
Finally, there are variety of guides and registries for RSS feeds that folks can search and flick through , very similar to the Yahoo directory for Web sites; it’s an honest idea to register your feed; see More Information.
There’s another choice; Atom is an attempt within the IETF (an Internet standards body) to return up with a well-documented, standard syndication format. Although it’s a special name, it’s an equivalent basic functions as RSS, and lots of people use the term “RSS” to ask RSS or Atom syndication.
This section presents a fast overview of each; for more information, see their specifications and supporting materials.
This branch of RSS is predicated on RSS 0.91, which was first documented at Netscape and later refined by Userland.
Included in 2.0.1 – the newest stable version of this branch — are channel metadata like link, title, description; image, which allows you to specify a thumbnail image to display with the feed); webMaster and managing Editor, to spot who’s liable for the feed, and lastBuildDate, which shows when the feed was last updated.
Items have the quality link, title and outline metadata, also as other, more experimental facilities like enclosure, which allows attachments to be automatically downloaded (don’t expect these features to be supported by all aggregators, however). Finally, items can have a guid element that identifies the item uniquely; this enables some advanced functionality in some aggregators.
Here’s an example of a minimal RSS 2.0 feed:
My example channel
<title>News for September the Second</title>
other things happened today
<title>News for September the First</title>
In the RSS 2.0 roadmap, Winer states that this branch is, for all practical purposes, frozen, apart from clarifications to the specification.
However, extensions to the format are allowed in separate modules, using XML Namespaces to avoid conflicts in their names. for instance , if you had an ISBN module to trace books, it’d appear as if this;
Generally, though, you ought to search for available RSS Modules, instead of defining your own, unless you’re sure that what you would like doesn’t exist.
RSS 1.0 stands for “RDF Site Summary.” This flavor of RSS incorporates RDF, an internet standard for metadata. Because RSS 1.0 uses RDF, any RDF processor can understand RSS without knowing anything about it especially . this enables syndicated feeds to simply become a part of the Semantic Web.
RSS 1.0 also uses XML Namespaces to permit extensions, during a manner almost like RSS 2.0.
RSS 1.0 feeds look very almost like RSS 2.0 feeds, with a couple of key differences;
The entire feed is wrapped in … elements (so that processors know that it’s RDF)
Each has an rdf:about attribute that sometimes , but not always, matches the <link>; this assigns an identifier to every item
There’s an element within the channel metadata that contains an inventory of things within the channel, in order that RDF processors can keep track of the connection between the things
Some metadata uses the rdf:resource attribute to hold links, rather than putting it inside the element.
RSS 1.0 is developed and maintained by a billboard hoc group of interested people; see their internet site for more information about RSS 1.0 and RSS Modules. See below for an example of an RSS 1.0 feed.
Dublin Core Module
The most well-known example of an RSS 1.0 Module is that the Dublin Core Module. The Dublin Core may be a set of metadata developed by librarians and knowledge scientists that standardizes a group of common metadata that’s useful for describing documents, among other things. The Dublin Core Module uses these metadata to connect information to both feeds (in the channel metadata) and to individual items.
This module includes useful elements like dc:date, for associating dates with items, dc:subject, which may be useful for categorizing items or feeds, and dc:rights, for dictating the property rights related to an item or a feed.
Here’s an example of a minimal RSS 1.0 feed that uses the Dublin Core Module:
<description>My example channel</description>
<title>News for September the First</title>
<description>other things happened today</description>
<title>News for September the Second</title>
As you’ll see, RSS 1.0 may be a bit more verbose than 2.0, mostly because it must be compatible with other versions of RSS while containing the markup that RDF processors need.
Some people are concerned by this, because such specifications are often changed at the whim of the people that control it. Standards bodies bring stability, by limiting change and having well-established procedures for introducing it. To introduce such stability to syndication, a gaggle of individuals established an IETF working party to standardise a format called Atom.
Atom is functionally almost like both branches of RSS, and is additionally an XML-based format.
|<?xml version=”1.0″ encoding=”utf-8″?>
<title>Atom-Powered Robots Run Amok</title>
As you’ll see, Atom features a feed element that contains both the feed-level metadata also because the entrys (analogous to RSS’ items), and entry can contain similar metadata, like title, link, id (instead of RSS 1.0’s rdf:about or RSS 2.0’s guid), and a brief textual summary (instead of RSS’ description).
Generally, Atom isn’t as widely supported as RSS 1.0 or 2.0 immediately , because it’s relatively new. However, it should catch up quickly, due to the broad base of vendors supporting the standardisation effort.
Which Format Should I Choose?
One of the foremost confusing and unfortunate problems in syndication is that the sizable amount of formats in use. additionally to those listed above, there are many other formats (e.g., RSS 0.9, 0.91, 0.92) that are commonly encountered on the online .
For better or worse, the choice isn’t as critical as you would possibly think. Most aggregators and other software use syndication libraries which abstract out the actual format that a feed is in, in order that they will consume any popular syndication feed.
As a result, which format to settle on may be a matter of private taste. RSS 1.0 is extremely extensible, and useful if you would like to integrate it into Semantic Web systems. RSS 2.0 is extremely simple and straightforward to author by hand. Atom is now an IETF Standard, bringing stability and a natural community to support its use.
Distinct Entries — confirm that aggregators can tell your entries apart, by using different identifiers in rdf:about (RSS 1.0), guid (RSS 2.0) and id (Atom). this may save tons of headaches down the road.
Meaningful Metadata — attempt to make the metadata useful on its own; for instance , if you simply include a brief <title>, people might not know what the link is about. By an equivalent token, if you shove a whole article into , it’ll crowd people’s view of the feed, and they’re less likely to remain curious about what you’ve got to mention . Generally, you would like to place enough into the feed to assist someone decide whether or not they should follow the link.
Encoding HTML — Although it’s tempting, refrain from including HTML markup (like , or ) in your RSS feed; because you don’t skills it’ll be presented, doing so can prevent your feed from being displayed correctly. If you would like to incorporate a tag within the text of the feed (e.g., the title of an entry is “Ode to <title>”), confirm you escape ampersands and angle brackets (so that it might be “Ode to <title>”).
XML Entities — Remember that XML doesn’t predefined entities like HTML does; therefore, you won’t have © and other common entities available. you’ll define them within the XML, or alternatively just use an character encoding that creates what you would like available.
Character Encoding — Some software generates feeds using Windows character sets, and sometimes mislabels them. The safest thing to try to to is to encode your feed as UTF-8 and check it by parsing it with an XML parser.
Communicating with Viewers — Don’t use entries in your feed to speak to your users; for instance , some feeds are known to use the to dictate copyright terms. Use the acceptable element or module.
Communicating with Machines — Likewise, use the acceptable HTTP status codes if your feed has relocated (usually, 301 Moved Permanently) or is not any longer available (410 Gone or 404 Not Found).
Making your Feed Cache-Friendly — Successful feeds see a good amount of traffic because clients poll them often to ascertain if they’ve changed. To support the load, Web Caching can help; see the caching tutorial.
Validate — use the Feed Validator to catch any problems in your feed; it works with RSS and Atom. Also, don’t just run it once; confirm you often check your feed, in order that you’ll catch transient errors.
This is an incomplete list of tools for creating feeds and checking them to form sure that you’ve done so correctly. Note that there are more libraries that help parsing feeds; these haven’t been included here because this tutorial focuses on the Webmaster, not consumers of feeds.
Site Summaries in XHTML — Online service (also available as an XSLT stylesheet) that uses hints in your HTML to get a feed.
RSS.py — Python library for generating and parsing RSS.
ROME — Java library for parsing and generating RSS and Atom feeds, also as translating between formats.
XML::RSS — Perl module for generating and parsing RSS.
Online Validator – Check your RSS 1.0, 2.0 and Atom feeds.