Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
468 views
in Technique[技术] by (71.8m points)

Use Google Sheets ImportXML with XPath to import Amazon product title

I want to get the Product title from an Amazon product page in Google Sheets.

I have searched and tried different approaches and came up with this

=ImportXML("https://www.amazon.com/dp/B01MSR8J29","//*[@id='productTitle']")

I want to get the title: Army Flag Shirt: Become Brothers Army TShirt

But I get Error, Imported content is empty.

Demo Sheet

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Much of HTML isn't valid XML, in particular Amazon's pages are not valid XML. So, importXML fails on them.

You can use an Apps Script via a custom function as follows (remove the space before "amazon", it's here to prevent SO from rewriting the URL):

=producttitle("https://www. amazon.com/dp/B01MSR8J29")  

returns "Army Flag Shirt: Become Brothers Army TShirt", provided that the custom function is entered in Script Editor as follows:

function productTitle(url) {   
  var content = UrlFetchApp.fetch(url).getContentText();
  var match = content.match(/<span id="productTitle".*>([^<]*)</span>/);
  return match && match [1] ? match[1] : 'Title not found';
}

Here, the first line gets the source of the page; then a regex extracts the item title.

You will find a similar post here, including the question of whether this activity is compliant with Amazon's Terms of Services.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...