Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
195 views
in Technique[技术] by (71.8m points)

javascript - Number of Web Workers Limit

PROBLEM

I've discovered that there is a limit on the number of Web Workers that can be spawned by a browser.

Example

main HTML / JavaScript

<script type="text/javascript">
$(document).ready(function(){
    var workers = new Array();
    var worker_index = 0;
    for (var i=0; i < 25; i++) {
        workers[worker_index] = new Worker('test.worker.js');
        workers[worker_index].onmessage = function(event) {
            $("#debug").append('worker.onmessage i = ' + event.data + "<br>");
        };
        workers[worker_index].postMessage(i); // start the worker.      

        worker_index++;
    }   
});
</head>
<body>
<div id="debug">
</div>

test.worker.js

self.onmessage = function(event) {
    var i = event.data; 

    self.postMessage(i);
};

This will generate only 20 output lines in the container when using Firefox (version 14.0.1, Windows 7).

QUESTION

Is there a way around this? The only two ideas I can think of are:

1) Daisy chaining the web workers, i.e., making each web worker spawn the next one

Example:

<script type="text/javascript">
$(document).ready(function(){
    createWorker(0);
});

function createWorker(i) {

    var worker = new Worker('test.worker.js');
    worker.onmessage = function(event) {
        var index = event.data;

        $("#debug").append('worker.onmessage i = ' + index + "<br>");

        if ( index < 25) {
            index++;
            createWorker(index);
        } 
    };
    worker.postMessage(i); // start the worker.
}
</script>
</head>
<body>
<div id="debug"></div>

2) Limit the number of web workers to a finite number and modify my code to work with that limit (i.e., share the work load across a finite number of web workers) - something like this: http://www.smartjava.org/content/html5-easily-parallelize-jobs-using-web-workers-and-threadpool

Unfortunately #1 doesn't seem to work (only a finite number of web workers will get spawned on a page load). Are there any other solutions I should consider?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Old question, let's revive it! readies epinephrine

I've been looking into using Web Workers to isolate 3rd party plugins since web workers can't access the host page. I'll help you out with your methods which I'm sure you've solved by now, but this is for teh internetz. Then I'll give some relevant information from my research.

Disclaimer: In the examples that I used your code, I've modified and cleaned the code to provide a full source code without jQuery so that you and others can run it easily. I've also added a timer which alerts the time in ms to execute the code.

In all examples, we reference the following genericWorker.js file.

genericWorker.js

self.onmessage = function(event) {
    self.postMessage(event.data);
};

Method 1 (Linear Execution)

Your first method is nearly working. The reason why it still fails is that you aren't deleting any workers once you finish with them. This means the same result (crashing) will happen, just slower. All you need to fix it is to add worker.terminate(); before creating a new worker to remove the old one from memory. Note that this will cause the application to run much slower as each worker must be created, run, and be destroyed before the next can run.

Linear.html

<!DOCTYPE html>
<html>
<head>
    <title>Linear</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function createWorker() {
            var worker = new Worker('genericWorker.js');
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '
'));
                worker.terminate();
                if (index < totalWorkers) createWorker(index);
                else alert((new Date).getTime() - start);
            };
            worker.postMessage(index++); // start the worker.
        }

        createWorker();
    </script>
</body>
<html>

Method 2 (Thread Pool)

Using a thread pool should greatly increase running speed. Instead of using some library with complex lingo, lets simplify it. All the thread pool means is having a set number of workers running simultaneously. We can actually just modify a few lines of code from the linear example to get a multi-threaded example. The code below will find how many cores you have (if your browser supports this), or default to 4. I found that this code ran about 6x faster than the original on my machine with 8 cores.

ThreadPool.html

<!DOCTYPE html>
<html>
<head>
    <title>Thread Pool</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var maxWorkers = navigator.hardwareConcurrency || 4;
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function createWorker() {
            var worker = new Worker('genericWorker.js');
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '
'));
                worker.terminate();
                if (index < totalWorkers) createWorker();
                else if(--maxWorkers === 0) alert((new Date).getTime() - start);
            };
            worker.postMessage(index++); // start the worker.
        }

        for(var i = 0; i < maxWorkers; i++) createWorker();
    </script>
</body>
<html>

Other Methods

Method 3 (Single worker, repeated task)

In your example, you're using the same worker over and over again. I know you're simplifying a probably more complex use case, but some people viewing will see this and apply this method when they could be using just one worker for all the tasks.

Essentially, we'll instantiate a worker, send data, wait for data, then repeat the send/wait steps until all data has been processed.

On my computer, this runs at about twice the speed of the thread pool. That actually surprised me. I thought the overhead from the thread pool would have caused it to be slower than just 1/2 the speed.

RepeatedWorker.html

<!DOCTYPE html>
<html>
<head>
    <title>Repeated Worker</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();
        var worker = new Worker('genericWorker.js');

        function runWorker() {
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '
'));
                if (index < totalWorkers) runWorker();
                else {
                    alert((new Date).getTime() - start);
                    worker.terminate();
                }
            };
            worker.postMessage(index++); // start the worker.
        }

        runWorker();
    </script>
</body>
<html>

Method 4 (Repeated Worker w/ Thread Pool)

Now, what if we combine the previous method with the thread pool method? Theoretically, it should run quicker than the previous. Interestingly, it runs at just about the same speed as the previous on my machine.

Maybe it's the extra overhead of sending the worker reference on each time it's called. Maybe it's the extra workers being terminated during execution (only one worker won't be terminated before we get the time). Who knows. Finding this out is a job for another time.

RepeatedThreadPool.html

<!DOCTYPE html>
<html>
<head>
    <title>Repeated Thread Pool</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var maxWorkers = navigator.hardwareConcurrency || 4;
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function runWorker(worker) {
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '
'));
                if (index < totalWorkers) runWorker(worker);
                else {
                    if(--maxWorkers === 0) alert((new Date).getTime() - start);
                    worker.terminate();
                }
            };
            worker.postMessage(index++); // start the worker.
        }

        for(var i = 0; i < maxWorkers; i++) runWorker(new Worker('genericWorker.js'));
    </script>
</body>
<html>

Now for some real world shtuff

Remember how I said I was using workers to implement 3rd party plugins into my code? These plugins have a state to keep track of. I could start the plugins and hope they don't load too many for the application to crash, or I could keep track of the plugin state within my main thread and send that state back to the plugin if the plugin needs to be reloaded. I like the second one better.

I had written out several more examples of stateful, stateless, and state-restore workers, but I'll spare you the agony and just do some brief explaining and some shorter snippets.

First-off, a simple stateful worker looks like this:

StatefulWorker.js

var i = 0;

self.onmessage = function(e) {
    switch(e.data) {
        case 'increment':
            self.postMessage(++i);
            break;
        case 'decrement':
            self.postMessage(--i);
            break;
    }
};

It does some action based on the message it receives and holds data internally. This is great. It allows for mah plugin devs to have full control over their plugins. The main app instantiates their plugin once, then will send messages for them to do some action.

The problem comes in when we want to load several plugins at once. We can't do that, so what can we do?

Let's think about a few solutions.

Solution 1 (Stateless)

Let's make these plugins stateless. Essentially, every time we want to have the plugin do something, our application should instantiate the plugin then send it data based on its old state.

data sent

{
    action: 'increment',
    value: 7
}

StatelessWorker.js

self.onmessage = function(e) {
    switch(e.data.action) {
        case 'increment':
            e.data.value++;
            break;
        case 'decrement':
            e.data.value--;
            break;
    }
    self.postMessage({
        value: e.data.value,
        i: e.data.i
    });
};

This could work, but if we're dealing with a good amount of data this will start to seem like a less-than-perfect solution. Another similar solution could be to have several smaller workers for each plugin and sending only a small amount of data to and from each, but I'm uneasy with that too.

Solution 2 (State Restore)

What if we try to keep the worker in memory as long as possible, but if we do lose it, we can restore its state? We can use some sort of scheduler to see what plugins the user has been using (and maybe some fancy algorithms to guess what the user will use in the future) and keep those in memory.

The cool part about this is that we aren't looking at one worker per core anymore. Since most of the time the worker is active will be idle, we just need to worry about the memory it takes up. For a good number of workers (10 to 20 or so), this won't be substantial at all. We can keep the primary plugins loaded while the ones not used as often get switched out as needed. All the plugins will still need some sort of state restore.

Let's use the following worker and assume we either send 'increment', 'decrement', or an integer containing the state it's supposed to be at.

StateRestoreWorker.js

var i = 0;

self.onmessage = function(e) {
    switch(e.data) {
        case 'increment':
            self.postMessage(++i);
            break;
        case 'decrement':
            self.postMessage(--i);
            break;
        default:
            i = e.data;
    }
};

These are all pretty simple examples, but I hope I helped understand methods of using multiple workers efficiently! I'll most likely be writing a scheduler and optimizer for this stuff, but who knows when I'll get to that point.

Good luck, and happy coding!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...