Welcome To Ask or Share your Answers For Others

Use Google Sheets ImportXML with XPath to import Amazon product title

Welcome To Ask or Share your Answers For Others

1 Reply

replyed Jan 31, 2022 by 深蓝 (71.8m points)

Much of HTML isn't valid XML, in particular Amazon's pages are not valid XML. So, importXML fails on them.

You can use an Apps Script via a custom function as follows (remove the space before "amazon", it's here to prevent SO from rewriting the URL):

=producttitle("https://www. amazon.com/dp/B01MSR8J29")

returns "Army Flag Shirt: Become Brothers Army TShirt", provided that the custom function is entered in Script Editor as follows:

function productTitle(url) {   
  var content = UrlFetchApp.fetch(url).getContentText();
  var match = content.match(/<span id="productTitle".*>([^<]*)</span>/);
  return match && match [1] ? match[1] : 'Title not found';
}

Here, the first line gets the source of the page; then a regex extracts the item title.

You will find a similar post here, including the question of whether this activity is compliant with Amazon's Terms of Services.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

...