This is a very good question regarding threading and its link to the real work, meaning the available physical CPU(s) and its cores and hyperthreads.
- Multiple threads might allow you to do things in parallel, if your CPU has more than one core available. So in an ideal world, e.g. calulating some primes, might be 4 times faster using 4 threads, if your CPU has 4 cores available and your algorithm work really parallel.
- If you start more threads as cores are available, the thread management of your OS will spend more and more time in Thread-Switches and in such your effiency using your CPU(s) becomes worse.
- If the compiler, CPU cache and/or runtime realized that you run more than one thread, accessing the same data-area in memory, is operates in a different optimization mode: As long as the compile/runtime is sure that only one thread access the data, is can avoid writing data out to extenral RAM too often and might efficently use the L1 cache of your CPU. If not: Is has to activate semaphores and also flush cached data more often from L1/L2 cache to RAM.
So my lessons learned from highly parrallel multithreading have been:
- If possible use single threaded, shared-nothing processes to be more efficient
- If threads are required, decouple the shared data access as much as possible
- Don't try to allocate more loaded worker threads than available cores if possible
Here a small programm (javafx) to play with. It:
- Allocates a byte array of 100.000.000 size, filled with random bytes
- Provides a method, counting the number of bits set in this array
- The method allow to count every 'nth' bytes bits
- count(0,1) will count all bytes bits
- count(0,4) will count the 0', 4', 8' byte bits allowing a parallel interleaved counting
Using a MacPro (4 cores) results in:
- Running one thread, count(0,1) needs 1326ms to count all 399993625 bits
- Running two threads, count(0,2) and count(1,2) in parallel needs 920ms
- Running four threads, needs 618ms
- Running eight threads, needs 631ms
Changing the way to count, e.g. incrementing a commonly shared integer (AtomicInteger or synchronized) will dramatically change the performance of many threads.
public class MulithreadingEffects extends Application {
static class ParallelProgressBar extends ProgressBar {
AtomicInteger myDoneCount = new AtomicInteger();
int myTotalCount;
Timeline myWhatcher = new Timeline(new KeyFrame(Duration.millis(10), e -> update()));
BooleanProperty running = new SimpleBooleanProperty(false);
public void update() {
setProgress(1.0*myDoneCount.get()/myTotalCount);
if (myDoneCount.get() >= myTotalCount) {
myWhatcher.stop();
myTotalCount = 0;
running.set(false);
}
}
public boolean isRunning() { return myTotalCount > 0; }
public BooleanProperty runningProperty() { return running; }
public void start(int totalCount) {
myDoneCount.set(0);
myTotalCount = totalCount;
setProgress(0.0);
myWhatcher.setCycleCount(Timeline.INDEFINITE);
myWhatcher.play();
running.set(true);
}
public void add(int n) {
myDoneCount.addAndGet(n);
}
}
int mySize = 100000000;
byte[] inData = new byte[mySize];
ParallelProgressBar globalProgressBar = new ParallelProgressBar();
BooleanProperty iamReady = new SimpleBooleanProperty(false);
AtomicInteger myCounter = new AtomicInteger(0);
void count(int start, int step) {
new Thread(""+start){
public void run() {
int count = 0;
int loops = 0;
for (int i = start; i < mySize; i+=step) {
for (int m = 0x80; m > 0; m >>=1) {
if ((inData[i] & m) > 0) count++;
}
if (loops++ > 99) {
globalProgressBar.add(loops);
loops = 0;
}
}
myCounter.addAndGet(count);
globalProgressBar.add(loops);
}
}.start();
}
void pcount(Label result, int n) {
result.setText("("+n+")");
globalProgressBar.start(mySize);
long start = System.currentTimeMillis();
myCounter.set(0);
globalProgressBar.runningProperty().addListener((p,o,v) -> {
if (!v) {
long ms = System.currentTimeMillis()-start;
result.setText(""+ms+" ms ("+myCounter.get()+")");
}
});
for (int t = 0; t < n; t++) count(t, n);
}
void testParallel(VBox box) {
HBox hbox = new HBox();
Label result = new Label("-");
for (int i : new int[]{1, 2, 4, 8}) {
Button run = new Button(""+i);
run.setOnAction( e -> {
if (globalProgressBar.isRunning()) return;
pcount(result, i);
});
hbox.getChildren().add(run);
}
hbox.getChildren().addAll(result);
box.getChildren().addAll(globalProgressBar, hbox);
}
@Override
public void start(Stage primaryStage) throws Exception {
primaryStage.setTitle("ProgressBar's");
globalProgressBar.start(mySize);
new Thread("Prepare"){
public void run() {
iamReady.set(false);
Random random = new Random();
random.setSeed(4711);
for (int i = 0; i < mySize; i++) {
inData[i] = (byte)random.nextInt(256);
globalProgressBar.add(1);
}
iamReady.set(true);
}
}.start();
VBox box = new VBox();
Scene scene = new Scene(box,400,80,Color.WHITE);
primaryStage.setScene(scene);
testParallel(box);
GUIHelper.allowImageDrag(box);
primaryStage.show();
}
public static void main(String[] args) { launch(args); }
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…