Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
361 views
in Technique[技术] by (71.8m points)

javascript - Crawling multiple URLs in a loop using Puppeteer

I have an array of URLs to scrape data from:

urls = ['url','url','url'...]

This is what I'm doing:

urls.map(async (url)=>{
  await page.goto(url);
  await page.waitForNavigation({ waitUntil: 'networkidle' });
})

This seems to not wait for page load and visits all the URLs quite rapidly (I even tried using page.waitFor).

I wanted to know if am I doing something fundamentally wrong or this type of functionality is not advised/supported.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

map, forEach, reduce, etc, does not wait for the asynchronous operation within them, before they proceed to the next element of the iterator they are iterating over.

There are multiple ways of going through each item of an iterator synchronously while performing an asynchronous operation, but the easiest in this case I think would be to simply use a normal for operator, which does wait for the operation to finish.

const urls = [...]

for (let i = 0; i < urls.length; i++) {
    const url = urls[i];
    await page.goto(`${url}`);
    await page.waitForNavigation({ waitUntil: 'networkidle2' });
}

This would visit one url after another, as you are expecting. If you are curious about iterating serially using await/async, you can have a peek at this answer: https://stackoverflow.com/a/24586168/791691


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...