noindex - Controlling Search Engine Index Removals -


my site has particular pages are:

  1. already indexed in search engines, want remove them indexes.
  2. numerous, dynamic (based on query string).
  3. a bit "heavy." (an overzealous bot can strain server more i'd like.)

because of #2, i'm going let them removed naturally, need settle on plan.

i started out doing following:

  1. bots: abort execution using user-agent detection in application, , send blank response. (i don't mind if bots slip through , render real page, i'm blocking common ones.)
  2. bots: throw 403 (forbidden) response code.
  3. all clients: send "x-robots-tag: noindex" header.
  4. all clients: added rel="nofollow" links lead these pages.
  5. did not disallow bots pages in robots.txt. (i think it's useful disallow bots if beginning, or else after pages completely removed search engines; otherwise, engines can't crawl/access pages discover/honor noindex header, wouldn't remove them. mention because think robots.txt might commonly misunderstood, , might suggested inappropriate silver bullet.)

however, since then, think of steps either useless toward goal, or problematic.

  • i'm not sure if throwing 403 bots idea. search engines see , disregard x-robots-tag? better let them respond 200?
  • i think rel="nofollow" potentially affects target page rank, , doesn't affect crawling @ all.

the rest of plan seems okay (correct me if i'm wrong), i'm not sure above bullets in grand scheme.

i think plan:

  1. bots: abort execution using user-agent detection in application, , send blank response. (i don't mind if bots slip through , render real page, i'm blocking common ones.)
  2. bots: send 410 (gone) response code.
    "in general, webmasters little caught in tiny little details , if page gone, it's fine serve 404, if know it's gone real it's fine serve 410,"
    - http://goo.gl/awjdez
  3. all clients: send "x-robots-tag: noindex" header. think extraneous known bots got 410, cover unknown engines' bots.
  4. all clients: add rel="nofollow" links lead these pages. isn't necessary, wouldn't hurt.
  5. do not disallow bots pages in robots.txt. (it's useful disallow bots if beginning, or else after pages completely removed search engines; otherwise, engines can't crawl/access pages discover/honor noindex header, wouldn't remove them. mention because think robots.txt might commonly misunderstood, , might suggested inappropriate silver bullet.)

Comments

  1. This article is very informative and easy to understand. Thank you for sharing!


    Web Designer

    ReplyDelete

Post a Comment

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -