Let’s move on to our second article in this series that provides some practical examples of how to use Google Sheets features for SEO without being so dependent on certain tools.
If you read the first article, where I talk about using Google Sheets for technical SEO validations, I mentioned some examples of how we could apply the IMPORTXML formula, and one of them was extracting data from an XML (in addition to a standard HTML page).
In this case, the idea is the same. We specify which URL we want to get information from and in the next parameter, which XPath corresponds to that information. This will follow a structure similar to this formula in Google Sheets:
With just this structure, it already returns a list with all the URLs present in the specified XML Sitemap. But be careful, depending on the size of the file you query (up to around 1,000 URLs this structure handles well), some errors may occur and you might have to work around it using some script that makes this request progressively.
As I mentioned, you can make these same extractions through other SEO software or custom scripts, but the focus here is using Google Sheets. If you have a high volume of listed pages, software like Screaming Frog Spider might be the best alternative as it allows you to import URLs through a Sitemap URL.
XPath of in an XML Sitemap
The secret to extracting URLs lies in how you use XPath. When I researched to make this URL query, I found several examples, but the XPath that actually worked was: //[local-name() =’url’]/[local-name() =’loc’]”).
And applying it in a practical example:
Remember the single quotes tip from the other article? Well, don’t forget it here!
And to make it easier, I’ve left an example spreadsheet for you to make a copy and test with a real example.
Example spreadsheet with XPath to locate URLs from an XML Sitemap
Getting other Information
The good thing about the XPath from the previous example is that it can be easily adapted to get other information within an XML. For example, if you had the following code in an XML Sitemap:
And I wanted to get the value of the last modification date represented by
Easy, right? Just change loc to lastmod. Take advantage of the template I made available here and test it with other information from a sitemap on your site!
Other Ideas with XML
In the paragraphs above, I followed a line of reasoning based on a standard XML file that is commonly used in SEO. But nothing prevents you from applying this to other XML files such as:
- Index Sitemap.
- Google Shopping XML.
- XML from some aggregator like Buscapé.
- Map competitor URLs (many platforms leave the URLs of these files at a standard address or within robots.txt).
Other Scenarios where this is Useful
I’ve experienced some projects where there was great difficulty (or even no resource) in extracting certain reports with site URLs and often the site only had a Google Shopping XML. Which ended up being a lifesaver, as it comes with additional information and was very useful when I wanted to get some additional information about the URL (e.g., like the category with a friendly name).
You can also use it in migration cases, as a way to verify if all URLs are present in your redirect planning.
And regarding this example spreadsheet from this article, you can advance further and get information such as:
- Status code of pages in the XML (if it shows any error or redirect).
- Use another IMPORTXML to get other information within these pages (e.g., the title and description).
- Evaluate metrics related to the page (doing a VLOOKUP in a Google Analytics or Google Search Console report).
- This last one can help you a lot in prioritizing some activities and evaluating if there are any errors in your file.
So, what did you think of this application for IMPORTXML? Don’t forget to comment if this example was useful to you and share your experience if you’ve used it for any other case.