Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
267 views
in Technique[技术] by (71.8m points)

php - Guzzle: Parallel file download using Guzzle's Pool:batch() and `sink` option

You can execute http requests in parallel using Guzzle's Pool:batch() method. It allows you to set default options for requests using options key in the third parameter.

But what if I need different options for different requests in the pool? I would like to execute GET requests using a pool and stream each response to a different file on disk. There is a sink option for that. But how to apply different values of this option to requests?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Rastor's example is almost right, but it's incorrectly implemented if you want to provide "options" to the Pool() constructor.

He's missing the critical implementation of the Pool options array mentioned here.

The Guzzle docs say:

When a function is yielded by the iterator, the function is provided the "request_options" array that should be merged on top of any existing options, and the function MUST then return a wait-able promise.

Also, if you look at the Pool() code below the comment I linked to, you can see that Guzzle's Pool calls the callable and gives it the Pool's "options" as the argument, precisely so that you are supposed to apply it to your request.

The correct precedence is

Per-request options > Pool options > Client defaults.

If you don't apply the Pool() object's options array to your request objects, you will end up with severe bugs such as if you try making a new Pool($client, $requests(100), ['options'=>['timeout'=>30.0]]);. Without my corrected code, your Pool-options won't be applied at all, since you didn't support merging the pool options properly and therefore simply ended up discarding them.

So here is the correct code with support for Pool() options:

<?php

$client = new GuzzleHttpClient();

$requests = function ($total) use ($client) {
    for ($i = 0; $i < $total; $i++) {
        $url = "domain.com/picture/{$i}.jpg";
        $filepath = "/tmp/{$i}.jpg";

        yield function($poolOpts) use ($client, $url, $filepath) {
            /** Apply options as follows:
             * Client() defaults are given the lowest priority
             * (they're used for any values you don't specify on
             * the request or the pool). The Pool() "options"
             * override the Client defaults. And the per-request
             * options ($reqOpts) override everything (both the
             * Pool and the Client defaults).
             * In short: Per-Request > Pool Defaults > Client Defaults.
             */
            $reqOpts = [
                'sink' => $filepath
            ];
            if (is_array($poolOpts) && count($poolOpts) > 0) {
                $reqOpts = array_merge($poolOpts, $reqOpts); // req > pool
            }

            return $client->getAsync($url, $reqOpts);
        };
    }
};

$pool = new Pool($client, $requests(100));

Note however that you don't have to support the Pool() options, if you know that you will never be adding any options to your new Pool() constructor. In that case, you can just look at the official Guzzle docs for an example.

The official example looks as follows:

// Using a closure that will return a promise once the pool calls the closure.
$client = new Client();

$requests = function ($total) use ($client) {
    $uri = '127.0.0.1:8126/guzzle-server/perf';
    for ($i = 0; $i < $total; $i++) {
        yield function() use ($client, $uri) {
            return $client->getAsync($uri);
        };
    }
};

$pool = new Pool($client, $requests(100));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...