Forums / Setup & design / prevent Google from indexing media/images
Pascal Specht
Thursday 16 April 2009 3:00:44 am
Hi there,
I've seen recently that Google indexed some folders with image content inside the media/images: What can I do to prevent search engines from indexing the media content? Did anybody have success with a robots rule on www/var/ezwebin_site/storage/images/media for example? Or is there another place I should look at?
Thanks in advance,</Pascal>
Gaetano Giunta
Thursday 16 April 2009 5:23:06 am
robots.txt is surely your friend here.
Please note that it will not prevent malicious bots from indexing images - the only way to achieve that is to do for "content" images what is normally done for other binary content: a - allow access to them via links to "content/download" instead of direct access (might involve creating a custom template operator and download handler)b - set up a web server rule to block direct access to images
Principal Consultant International Business Member of the Community Project Board
Andreas Kaiser
Thursday 16 April 2009 6:47:38 am
Some ways are:
1. to have a robot rule for not indexing "Media" directory in the robots.txt
User-agent: * Disallow: /var/ Disallow: /Media/
2. Adding
<meta name="ROBOTS" content="NOINDEX,NOFOLLOW">
in the head of the media pages. You can use section id for adding this tag only in the media section...
eZ Partner in Madrid (Spain) Web: http://www.atela.net/
Thursday 16 April 2009 8:15:34 am
Thank you both for your help!
</P>
Michael Gross
Saturday 11 July 2009 12:03:08 am
To follow up the original poster's question, what is the surest way to prevent any spiders from searching an entire site? Would a robots.txt with the following do it?
User-Agent: * Disallow: [File Name a]
Thanks,
You've a noob here, so I also need to know where in the site heirarchy the robots.txt file goes.
As far as editing the site configuration files, is there a mandatory or recommended text encoding that should be used?
Michael
André R.
Saturday 11 July 2009 3:59:15 am
robots.txt is not eZ Puiblish spefic, so if you want to learn about it, the best way is to google it (wikipedia has a good entry, at least the english one is*). Short: just like favicon.ico, place it in the root of you installation and make sure you can access it in your browser, like http://ez.no/robots.txtRewrite rules in Apache are key her, but the ones recomended in doc** has it enabled by default. And if you use .htaccess (shared server where you don't have access to apache config), you just need to uncomment the lines about robots.txt / favicon.ico to allow access***.
*: http://en.wikipedia.org/wiki/Robots_exclusion_standard **: http://ez.no/doc/ez_publish/technical_manual/4_0/installation/virtual_host_setup***: http://pubsvn.ez.no/nextgen/trunk/.htaccess_root
eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription @: http://twitter.com/andrerom
Heath
Saturday 11 July 2009 4:17:46 am
But a little parts of using a robots.txt file most often require some eZ Publish configuration changes in order for the file to be recognized by search engines.
If you are using virtualhost mode you will want to add a similar exclusion rule for /robots.txt file. Otherwise eZ publish will try to resolve the url internally and fail. To avoid this problem exclude the file with mod_write rules.
Snippet of apache vhost configuration file's mod_rewrite rules,
RewriteRule ^/robots\.txt - [L] RewriteRule .* /index.php
Technically it would be more eZ Publish compatible to have the robots.txt file stored within an extension (extension/mydesign/design/standard/files/robot.txt) and use a transparent mod_rewrite/mod_proxy redirection to the actual destination contents. I don't have a handy example of these rules though.
Cheers,Heath
Brookins Consulting | http://brookinsconsulting.com/ Certified | http://auth.ez.no/certification/verify/380350 Solutions | http://projects.ez.no/users/community/brookins_consulting eZpedia community documentation project | http://ezpedia.org
Monday 13 July 2009 10:32:05 pm
Thanks for your replies. Based on the techinical nature of the replies, and my lack of experience, I think I'll just put up some summary content and allow the indexing. I have put your replies into my notebook for further research.