How to Add Meta Noindex to Your Feeds
Want to make sure that your feeds are not indexed by Google and other compliant search engines? Add the following code to the channel
element of your XML-based (RSS, etc.) feeds:
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
Here is an example of how I use this tag for Perishable Press feeds (vertical spacing added for emphasis):
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>Perishable Press</title>
<link>https://perishablepress.com/</link>
<description>Digital Design and Dialogue ~</description>
<pubDate>Mon, 29 Oct 2007 21:38:24</pubDate>
<language>en</language>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
<image>
<link>https://perishablepress.com/</link>
<url>https://perishablepress.com/_/perishable-press.jpeg</url>
<title>Perishable Press</title>
</image>
<item>
<title>Welcome to Perishable Press</title>
<link>https://perishablepress.com/</link>
<dc:creator>Perishable</dc:creator>
<dc:subject>WordPress</dc:subject>
.
.
.
Of course, other meta
elements may be added as well, including this one that disallows Yahoo! Pipes from processing your feed:
<meta xmlns="http://pipes.yahoo.com" name="pipes" content="noprocess" />
While we’re at it, what do you think are some other useful meta
elements to add to XML/RSS feeds?
About the Author
Jeff Starr = Web Developer. Book Author. Secretly Important.
15 responses to “How to Add Meta Noindex to Your Feeds”
Another solution is to use the robot.txt file to forbid the indexing of feeds and co.
In robot.txt :
(note that I’ve commented the last 2 because HeadSpace plugin already put no-index meta tag on theme and tag pages.)
If someone want to debate the choice of the robot.txt technique VS the meta no-index technique, I’m highly interested !
Well, I don’t know about debating you, but it should be pointed out that
robots.txt
directives function differently than those of themeta
noindex
variety. As far as I know,disallow
rules specified viarobots.txt
forbid compliant search engines from accessing matching resources entirely. On the other hand,meta
noindex
rules do not prevent search engines from accessing and crawling the page. This enables search engines to follow links contained withinnoindex
content. A subtle distinction, perhaps, but important nonetheless.Yes, “debate” was not the word i should have used. That’s not easy to express in a another langage.
Thanks for pointing out the fact that no-index allow crawlers to follow links, where as robot.txt strictly forbid access to those pages.
Isn’t the link of your feed image broken ?
<image>
<link>https://perishablepress.com/</link>
<url>https://perishablepress.com/pressburner.jpe</url>
<title>Perishable Press</title>
</image>
https://perishablepress.com/pressburner.jpe
leads to a 404.Oh, i’ve just come accros a blog that says that Google would understand the no-index statement in robots.txt files. You would write something like :
Disallow: /wp-
Noindex: /feed/
It would be awesome to fight duplicate content from one unique robot.txt file !
That would be awesome, especially at a higher scale than a WordPress weblog – imagine the SEO work on a website like Flickr!
Tough I’ve been thinking a lot since I read your post, about this follow/no-index (meta no-index) – no-follow/no-index (robots.txt) dilemma.
My point is that on a typical WordPress weblog, why would one need the crawlers to access the categories, tags, search pages; and the feed if it’s got the same content as the blog offers ?
All the links that are on those pages are already on the posts. Also, crawlers searching into duplicate content are wasting bandwith. On a big website, with a much crawling, it represents a lot of money.
So again, why would you want bots to crawl the links of your duplicate content pages ?
Hi Louis,
The image path was changed during my latest site overhaul/upgrade project. I consolidated all of the miscellaneous site logos and icons into a single location. These images are available to the public at the official “Link to Perishable Press” page.
As for the
robots.txt noindex
trick, yes, that would be awesome, however, as of now Google would be the only search engine supporting it. And, until the others join in, addingmeta noindex
to your feeds and pages remains highly useful, especially for SEO purposes.Eventually, I suspect, robots.txt will evolve into a full-fledged, highly flexible protocol that will replace
noindex
,noarchive
,nofollow
,disallow
, and other crawl-related directives with its own, specifically developed language.. kind of like CSS for spiders ;)When it comes to controlling link equity and indexing of content, we have three primary tools, each of which serves a different function.
Robots.txt directives prevent compliant search engines from accessing specified resources. This is useful for admin pages and other directories that do not need to be included in the search listings.
Meta tags such as
noindex
andnoarchive
assume search-engine access and enable spiders to crawl the pages and follow links. Link equity will also be passed through such pages.Nofollow tags as applied directly to links allow search engine access, but forbid the passing of link equity to the target pages. This method is useful for controlling directly the flow of link juice throughout a site.
Depending on your SEO goals, manipulating the ebb and flow of link juice is greatly facilitated by the functional variety provided by these three techniques.
Very useful, thank you!
I’m always on the look out for useful tips like this, and your site is full of them! I’ll be bookmarking you for sure!
Thank you, John! I am glad to be of service ;)
I am trying to use this with a google/yahoo sitemap. This validates, but will it really work the way it appears?
Thanks for the great post–only one I could find on the topic.
Custom web design
Yes, I think this method will work.. hence the article ;) I am glad you found the information useful — thanks for the feedback!