Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

powershell - How to implement a parallel jobs and queues system

I spent days trying to implement a parallel jobs and queues system, but... I tried but I can't make it. Here is the code without implementing nothing, and CSV example from where looks.

I'm sure this post can help other users in their projects.

Each user have his pc, so the CSV file look like:

pc1,user1
pc2,user2
pc800,user800 

CODE:

#Source File:
$inputCSV = '~desktop
eport.csv'
$csv = import-csv $inputCSV -Header PCName, User
echo $csv #debug

#Output File:
$report = "~desktopoutput.csv"

#---------------------------------------------------------------

#Define search:
$findSize = 40GB
Write-Host "Lonking for $findSize GB sized Outlook files"

#count issues:
$issues = 0 

#---------------------------------------------------------------

foreach($item in $csv){

    if (Test-Connection -Quiet -count 1 -computer $($item.PCname)){

        $w7path = "\$($item.PCname)c$users$($item.User)appdataLocalmicrosoftoutlook"

        $xpPath = "\$($item.PCname)c$Documents and Settings$($item.User)Local SettingsApplication DataMicrosoftOutlook"

            if(Test-Path $W7path){

                if(Get-ChildItem $w7path -Recurse -force -Include *.ost -ErrorAction "SilentlyContinue" | Where-Object {$_.Length -gt $findSize}){

                    $newLine =  "{0},{1},{2}" -f $($item.PCname),$($item.User),$w7path
                    $newLine |  add-content $report

                    $issues ++
                    Write-Host "Issue detected" #debug
                    }
            }

            elseif(Test-Path $xpPath){

                if(Get-ChildItem $w7path -Recurse -force -Include *.ost -ErrorAction "SilentlyContinue" | Where-Object {$_.Length -gt $findSize}){

                    $newLine =  "{0},{1},{2}" -f $($item.PCname),$($item.User),$xpPath
                    $newLine |  add-content $report

                    $issues ++
                    Write-Host "Issue detected" #debug
                    }
            }

            else{
                write-host "Error! - bad path"
            }
    }

    else{
        write-host "Error! - no ping"
    }
}

Write-Host "All done! detected $issues issues"
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Parallel data processing in PowerShell is not quite simple, especially with queueing. Try to use some existing tools which have this already done. You may take look at the module SplitPipeline. The cmdlet Split-Pipeline is designed for parallel input data processing and supports queueing of input (see the parameter Load). For example, for 4 parallel pipelines with 10 input items each at a time the code will look like this:

$csv | Split-Pipeline -Count 4 -Load 10, 10 {process{
    <operate on input item $_>
}} | Out-File $outputReport

All you have to do is to implement the code <operate on input item $_>. Parallel processing and queueing is done by this command.


UPDATE for the updated question code. Here is the prototype code with some remarks. They are important. Doing work in parallel is not the same as directly, there are some rules to follow.

$csv | Split-Pipeline -Count 4 -Load 10, 10 -Variable findSize {process{
    # Tips
    # - Operate on input object $_, i.e $_.PCname and $_.User
    # - Use imported variable $findSize
    # - Do not use Write-Host, use (for now) Write-Warning
    # - Do not count issues (for now). This is possible but make it working
    # without this at first.
    # - Do not write data to a file, from several parallel pipelines this
    # is not so trivial, just output data, they will be piped further to
    # the log file
    ...
}} | Set-Content $report
# output from all jobs is joined and written to the report file

UPDATE: How to write progress information

SplitPipeline handled pretty well a 800 targets csv, amazing. Is there anyway to let the user know if the script is alive...? Scan a big csv can take about 20 mins. Something like "in progress 25%","50%","75%"...

There are several options. The simplest is just to invoke Split-Pipeline with the switch -Verbose. So you will get verbose messages about the progress and see that the script is alive.

Another simple option is to write and watch verbose messages from the jobs, e.g. Write-Verbose ... -Verbose which will write messages even if Split-Pipeline is invoked without Verbose.

And another option is to use proper progress messages with Write-Progress. See the scripts:

Test-ProgressTotal.ps1 also shows how to use a collector updated from jobs concurrently. You can use the similar technique for counting issues (the original question code does this). When all is done show the total number of issues to a user.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...