Page Content | Main Menu | Section Menu | | Support Us | Contact Us
Center for Democracy and Technology
Working for Democratic Values in a Digital Age
Advanced Search
Support CDT
Contact Us
Hiding In Plain Sight
This Section

Sitemapping

The Sitemap protocol is an open and freely available standard that can be used to create a document that allows search engines to effectively crawl and index Web sites. Sitemaps are, in some ways, the opposite of robot.txt files. Like robots.txt, the protocol uses a file in a well-known location. However, rather than listing locations that the crawlers should not index as found in robots.txt, sitemap.xml is a list of locations that the crawler should index, but might not find.

The leading search engines – Ask, Google, Microsoft Live, and Yahoo – have adopted the Sitemap protocol. Government agency implementation of the Sitemap protocol allows them to make exhaustive lists of content so that all participating search engines can easily find it.

The E-Government regulations have established the Web site at http://www.USA.gov as the portal for government information. The search engine used by USA.gov is provided by a major commercial search engine and, thus, is subject to the limitations of all search crawlers — it cannot access most government databases, because of the way that they have been implemented. While this is simply a complication for the commercial search engines, it is a major problem for the USA.gov search. USA.gov's tagline is "Government Made Easy," but in this case, it is just as hard to find this information on its search as others. With the implementation of the Sitemap protocol, agency Web sites can ensure that their resources are indexed by search engines and are available to the American public through USA.gov and most commercial search engines.


Previous section: Robots.txt Files | Next section: How Sitemaps Work

       Top
Privacy Policy | Feedback