Frequently asked questions
How do I represent URLs in the Sitemap?
Does it matter which character encoding method I use to generate my Sitemap files?
How do I compute lastmod date?
My site has tens of millions of URLs; can I somehow submit only those that have changed recently?
What do I do after I create my Sitemap?
Do URLs in the Sitemap need to be completely specified?
My site has both "http" and "https" versions of URLs. Do I need to list both?
URLs on my site have session IDs in them. Do I need to remove them?
Does position of a URL in a Sitemap influence its use?
Can I zip my Sitemaps or do they have to be gzipped?
Will the "priority" hint in the XML Sitemap change the ranking of my pages in search results?
Is there an XML schema that I can validate my XML Sitemap against?
What if I have another question about using the protocol or submitting a Sitemap?
Q: How do I represent URLs in the Sitemap?
As with all XML files, any data values (including URLs) must use entity escape codes for the following characters: ampersand (&), single quote ('), double quote ("), less than (<) and greater than (>). You should also make sure that all URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs and the XML standard. If you are using a script to generate your URLs, you can generally URL escape them as part of that script. You will still need to entity escape them. For instance, the following python script entity escapes http://www.example.com/view?widget=3&count>2
$ python Python 2.2.2 (#1, Feb 24 2003, 19:13:11) >>> import xml.sax.saxutils >>> xml.sax.saxutils.escape("http://www.example.com/view?widget=3&count>2")
The resulting URL from the example above is:
http://www.example.com/view?widget=3&count>2
Q: Does it matter which character encoding method I use to generate my Sitemap files?
Yes. Your Sitemap files must use UTF-8 encoding.
Use W3C Datetime encoding for the lastmod timestamps and all other dates and times in this protocol. For example, 2004-09-22T14:12:14+00:00.
This encoding allows you to omit the time portion of the ISO8601 format; for example, 2004-09-22 is also valid. However, if your site changes frequently, you are encouraged to include the time portion so crawlers have more complete information about your site.
Q: How do I compute lastmod date?
For static files, this is the actual file update date. You can use the UNIX date command to get this date:
$ date --iso-8601=seconds -u -r /home/foo/www/bar.html >> 2004-10-26T08:56:39+00:00
For many dynamic URLs, you may be able to easily compute a lastmod date based on when the underlying data was changed or by using some approximation based on periodic updates (if applicable). Using even an approximate date or timestamp can help crawlers avoid crawling URLs that have not changed. This will reduce the bandwidth and CPU requirements for your web servers.
Q: Where do I place my Sitemap?
It is strongly recommended that you place your Sitemap at the root directory of your HTML server; that is, place it at http://example.com/sitemap.xml.
In some situations, you may want to produce different Sitemaps for different paths on your site — e.g., if security permissions in your organisation compartmentalise write access to different directories.
We assume that if you have the permission to upload http://example.com/path/sitemap.xml, you also have permission to report metadata under http://example.com/path/.
All URLs listed in the Sitemap must reside on the same host as the Sitemap. For instance, if the Sitemap is located at http://www.example.com/sitemap.xml, it cannot include URLs from http://subdomain.example.com. If the Sitemap is located at http://www.example.com/myfolder/sitemap.xml, it cannot include URLs from http://www.example.com.
Sitemaps should be no larger than 50MB (52,428,800 bytes) and can contain a maximum of 50,000 URLs. These limits help to ensure that your web server does not get bogged down serving very large files. This means that if your site contains more than 50,000 URLs or your Sitemap is bigger than 50MB, you must create multiple Sitemap files and use a Sitemap index file. You should use a Sitemap index file even if you have a small site but plan on growing beyond 50,000 URLs or a file size of 50MB. A Sitemap index file can include up to 50,000 Sitemaps and must not exceed 50MB (52,428,800 bytes). You can also use gzip to compress your Sitemaps.
Q: My site has tens of millions of URLs; can I somehow submit only those that have changed recently?
You can list the URLs that change frequently in a small number of Sitemaps and then use the lastmod tag in your Sitemap index file to identify those Sitemap files. Search engines can then incrementally crawl only the changed Sitemaps.
Q: What do I do after I create my Sitemap?
Once you have created your Sitemap, let search engines know about it by submitting directly to them, pinging them, or adding the Sitemap location to your robots.txt file.
Q: Do URLs in the Sitemap need to be completely specified?
Yes. You need to include the protocol (for instance, http) in your URL. You also need to include a trailing slash in your URL if your web server requires one. For example, http://www.example.com/ is a valid URL for a Sitemap, whereas www.example.com is not.
Q: My site has both "http" and "https" versions of URLs. Do I need to list both?
No. Please list only one version of a URL in your Sitemaps. Including multiple versions of URLs may result in incomplete crawling of your site.
Q: URLs on my site have session IDs in them. Do I need to remove them?
Yes. Including session IDs in URLs may result in incomplete and redundant crawling of your site.
Q: Does position of a URL in a Sitemap influence its use?
No. The position of a URL in the Sitemap is not likely to impact on how it is used or regarded by search engines.
Q: Some of the pages on my site use frames. Should I include the frameset URLs or the URLs of the frame contents?
Please include both URLs.
Q: Can I zip my Sitemaps or do they have to be gzipped?
Please use gzip to compress your Sitemaps. Remember, your Sitemap must not be larger than 50MB (52,428,800 bytes), whether compressed or not.
Q: Will the "priority" hint in the XML Sitemap change the ranking of my pages in search results?
The "priority" hint in your Sitemap only indicates the importance of a particular URL relative to other URLs on your own site and does not imply any effect on the ranking of your pages in search results.
Q: Is there an XML schema that I can validate my XML Sitemap against?
Yes. An XML schema is available for Sitemap files at http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd and a schema for Sitemap index files is available at http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd. You can also read more about validating your Sitemap.
Q: What if I have another question about using the protocol or submitting a Sitemap?
See the documentation available from each search engine for more details about submission and usage of Sitemaps.
Last Updated: 21 November 2016