noindex - Controlling Search Engine Index Removals -
my site has particular pages are:
- already indexed in search engines, want remove them indexes.
- numerous, dynamic (based on query string).
- a bit "heavy." (an overzealous bot can strain server more i'd like.)
because of #2, i'm going let them removed naturally, need settle on plan.
i started out doing following:
- bots: abort execution using user-agent detection in application, , send blank response. (i don't mind if bots slip through , render real page, i'm blocking common ones.)
- bots: throw 403 (forbidden) response code.
- all clients: send "x-robots-tag: noindex" header.
- all clients: added
rel="nofollow"
links lead these pages. - did not disallow bots pages in robots.txt. (i think it's useful disallow bots if beginning, or else after pages completely removed search engines; otherwise, engines can't crawl/access pages discover/honor noindex header, wouldn't remove them. mention because think robots.txt might commonly misunderstood, , might suggested inappropriate silver bullet.)
however, since then, think of steps either useless toward goal, or problematic.
- i'm not sure if throwing 403 bots idea. search engines see , disregard x-robots-tag? better let them respond 200?
- i think
rel="nofollow"
potentially affects target page rank, , doesn't affect crawling @ all.
the rest of plan seems okay (correct me if i'm wrong), i'm not sure above bullets in grand scheme.
i think plan:
- bots: abort execution using user-agent detection in application, , send blank response. (i don't mind if bots slip through , render real page, i'm blocking common ones.)
- bots: send 410 (gone) response code.
"in general, webmasters little caught in tiny little details , if page gone, it's fine serve 404, if know it's gone real it's fine serve 410,"
- http://goo.gl/awjdez - all clients: send "x-robots-tag: noindex" header. think extraneous known bots got 410, cover unknown engines' bots.
- all clients: add
rel="nofollow"
links lead these pages. isn't necessary, wouldn't hurt. - do not disallow bots pages in robots.txt. (it's useful disallow bots if beginning, or else after pages completely removed search engines; otherwise, engines can't crawl/access pages discover/honor noindex header, wouldn't remove them. mention because think robots.txt might commonly misunderstood, , might suggested inappropriate silver bullet.)
This article is very informative and easy to understand. Thank you for sharing!
ReplyDeleteWeb Designer