How to Add Meta Noindex to Your Feeds

Updated July 7, 2018 • 15 comments

Want to make sure that your feeds are not indexed by Google and other compliant search engines? Add the following code to the channel element of your XML-based (RSS, etc.) feeds:

<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />

Here is an example of how I use this tag for Perishable Press feeds (vertical spacing added for emphasis):

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">

<channel>
	<title>Perishable Press</title>
	<link>https://perishablepress.com/</link>
	<description>Digital Design and Dialogue ~</description>
	<pubDate>Mon, 29 Oct 2007 21:38:24</pubDate>
	<language>en</language>


	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />


	<image>
	   <link>https://perishablepress.com/</link>
	   <url>https://perishablepress.com/_/perishable-press.jpeg</url>
	   <title>Perishable Press</title>
	</image>
	<item>
	   <title>Welcome to Perishable Press</title>
	   <link>https://perishablepress.com/</link>
	   <dc:creator>Perishable</dc:creator>
	   <dc:subject>WordPress</dc:subject>
	   .
	   .
	   .

Of course, other meta elements may be added as well, including this one that disallows Yahoo! Pipes from processing your feed:

<meta xmlns="http://pipes.yahoo.com" name="pipes" content="noprocess" />

While we’re at it, what do you think are some other useful meta elements to add to XML/RSS feeds?

About the Author

Jeff Starr = Web Developer. Book Author. Secretly Important.

15 responses to “How to Add Meta Noindex to Your Feeds”

Louis 2007/12/02 1:56 pm

Another solution is to use the robot.txt file to forbid the indexing of feeds and co.

In robot.txt :

Disallow: /wp-
Disallow: /feed
Disallow: /comments/feed
Disallow: /feed/$
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
Disallow: /*/*/feed/$
Disallow: /*/*/feed/rss/$
Disallow: /*/*/trackback/$
Disallow: /*/*/*/feed/$
Disallow: /*/*/*/feed/rss/$
Disallow: /*/*/*/trackback/$
Disallow: /*?*
Disallow: /*?
#Disallow: /theme/*/*
#Disallow: /tag/*/*

(note that I’ve commented the last 2 because HeadSpace plugin already put no-index meta tag on theme and tag pages.)

If someone want to debate the choice of the robot.txt technique VS the meta no-index technique, I’m highly interested !

Perishable 2007/12/02 2:33 pm • Post Author

Well, I don’t know about debating you, but it should be pointed out that robots.txt directives function differently than those of the meta noindex variety. As far as I know, disallow rules specified via robots.txt forbid compliant search engines from accessing matching resources entirely. On the other hand, meta noindex rules do not prevent search engines from accessing and crawling the page. This enables search engines to follow links contained within noindex content. A subtle distinction, perhaps, but important nonetheless.

Louis 2007/12/02 10:35 pm

Yes, “debate” was not the word i should have used. That’s not easy to express in a another langage.

Thanks for pointing out the fact that no-index allow crawlers to follow links, where as robot.txt strictly forbid access to those pages.

Louis 2007/12/03 3:31 am

Isn’t the link of your feed image broken ?

<image>
     <link>https://perishablepress.com/</link>
     <url>https://perishablepress.com/pressburner.jpe</url>
     <title>Perishable Press</title>
</image>

https://perishablepress.com/pressburner.jpe leads to a 404.

Louis 2007/12/03 3:48 am

Oh, i’ve just come accros a blog that says that Google would understand the no-index statement in robots.txt files. You would write something like :

Disallow: /wp-
Noindex: /feed/

It would be awesome to fight duplicate content from one unique robot.txt file !

Louis 2007/12/03 11:45 am

That would be awesome, especially at a higher scale than a WordPress weblog – imagine the SEO work on a website like Flickr!

Tough I’ve been thinking a lot since I read your post, about this follow/no-index (meta no-index) – no-follow/no-index (robots.txt) dilemma.

My point is that on a typical WordPress weblog, why would one need the crawlers to access the categories, tags, search pages; and the feed if it’s got the same content as the blog offers ?

All the links that are on those pages are already on the posts. Also, crawlers searching into duplicate content are wasting bandwith. On a big website, with a much crawling, it represents a lot of money.

So again, why would you want bots to crawl the links of your duplicate content pages ?

Perishable 2007/12/03 11:01 am • Post Author

Hi Louis,

The image path was changed during my latest site overhaul/upgrade project. I consolidated all of the miscellaneous site logos and icons into a single location. These images are available to the public at the official “Link to Perishable Press” page.

As for the robots.txt noindex trick, yes, that would be awesome, however, as of now Google would be the only search engine supporting it. And, until the others join in, adding meta noindex to your feeds and pages remains highly useful, especially for SEO purposes.

Eventually, I suspect, robots.txt will evolve into a full-fledged, highly flexible protocol that will replace noindex, noarchive, nofollow, disallow, and other crawl-related directives with its own, specifically developed language.. kind of like CSS for spiders ;)

Perishable 2007/12/03 12:07 pm • Post Author

When it comes to controlling link equity and indexing of content, we have three primary tools, each of which serves a different function.

Robots.txt directives prevent compliant search engines from accessing specified resources. This is useful for admin pages and other directories that do not need to be included in the search listings.

Meta tags such as noindex and noarchive assume search-engine access and enable spiders to crawl the pages and follow links. Link equity will also be passed through such pages.

Nofollow tags as applied directly to links allow search engine access, but forbid the passing of link equity to the target pages. This method is useful for controlling directly the flow of link juice throughout a site.

Depending on your SEO goals, manipulating the ebb and flow of link juice is greatly facilitated by the functional variety provided by these three techniques.

John Wilberforce 2007/12/04 5:14 am

Very useful, thank you!
I’m always on the look out for useful tips like this, and your site is full of them! I’ll be bookmarking you for sure!

Perishable 2007/12/04 8:57 am • Post Author

Thank you, John! I am glad to be of service ;)

custom web design 2008/05/02 12:47 pm

I am trying to use this with a google/yahoo sitemap. This validates, but will it really work the way it appears?

Thanks for the great post–only one I could find on the topic.

Custom web design

Perishable 2008/05/04 7:21 am • Post Author

Yes, I think this method will work.. hence the article ;) I am glad you found the information useful — thanks for the feedback!

Comments are closed for this post. Something to add? Let me know.