DISQUS

DISQUS Hello! dmiessler.com | grep understanding is using DISQUS, a powerful comment system, to manage its comments. Learn more.

Community Page

dmiessler.com | grep understanding

dmiessler.com/about/
Jump to original thread »
Author

The Whitehouse.gov Website’s Robots.txt File Has 1839 Lines In It

Started by Daniel Miessler · 7 månader sedan

1839?

I’m sure they’re not trying to hide anything, as doing so with a robots.txt file would be asinine, but 1839 entries is just insane. At some point you have to start wondering whether the content should be online at all if you don’t want it indexed. ... Continue reading »

4 comments

  • Looking at most of those entries, it looks like they're excluding pages which look to be designed for text only browsers/screen readers.. nearly every directory ends in /text

    Disallow: /asia/2005/photoessay/china/text
    Disallow: /asia/2005/photoessay/japan/text
    Disallow: /asia/2005/photoessay/korea/text
    Disallow: /asia/2005/photoessay/mongolia/text
    Disallow: /asia/2005/photoessay/mrsbush1/text
    Disallow: /asia/2005/photoessay/mrsbush2/text


    and if you browse up one directory, you get the same story with pictures..

    I'd say it looks like they are doing it to work around for a poor file structure or possibly to keep search engines from finding duplicate text (although without pictures)

    *shrugs* I'm all for pointing out when the administration does something crooked, but I can't see fault in this one.. (granted, I've only checked out 20 or so of the links.. the only one that didn't go anywhere for me was /video/text )
  • Search in Google for 'robots.txt' shows whitehouse.gov at position 5
  • Ooooh, /secret/ directories. *nods head*
  • Yup, even I noted it sometime back as an excellent sitemap. ;-)

Add New Comment

Returning? Login