Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.2k views
in Technique[技术] by (71.8m points)

c++ - Linux not respecting SCHED_FIFO priority ? ( normal or GDB execution )

TL;DR

On multiprocessors/multicores engines, more than one RT SCHED_FIFO threads may be scheduled on more than one execution unit. So thread with priority 60 and thread with priority 40 may run simultaneously on 2 different cores.

This may be counter-intuitive, especially when simulating embedded systems that (often as today) run on single core processors and rely on strict priority execution.

See my other answer in this post for summary


Original problem description

I have difficulties even with very simple code to make Linux respect the priority of my threads with scheduling policy SCHED_FIFO.

  • See MCVE at the end of the question.
  • See modified MCVE in answer

This situation comes from the need to simulate an embedded code under a Linux PC in order to perform integration tests

The main thread with fifo priority 10 will launch the thread divisor and ratio.

divisor thread should get priority 2 so that the ratio thread with priority 1 will not evaluate a/b before b gets a decent value ( this is a completely hypothetical scenario only for the MCVE, not a real life case with semaphores or condition variables ).

Potential Prerequiste: You need to be root or BETTER to setcap the program so that to can change the scheduling policy and priority

sudo setcap cap_sys_nice+ep main

johndoe@VirtualBox:~/Code/gdb_sched_fifo$ getcap main
main = cap_sys_nice+ep
  • First experiments were done under Virtualbox environment with 2 vCPUs(gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git) were code behaviour was almost OK under normal execution but NOK under GDB.

  • Other experiments on Native Ubuntu 20.04 show very frequent NOK behaviours even in normal execution with I3-1005 2C/4T (gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1 )

Compile basically:

johndoe@VirtualBox:~/Code/gdb_sched_fifo$ g++ main.cc -o main -pthread

Normal execution sometimes OK sometimes not if no root or no setcap

johndoe@VirtualBox:~/Code/gdb_sched_fifo$ ./main
Problem with setschedparam: Operation not permitted(1)  <<-- err msg if no root or setcap
Result: 0.333333 or Result: Inf                         <<-- 1/3 or div by 0

Normal execution OK (e.g with setcap )

johndoe@VirtualBox:~/Code/gdb_sched_fifo$ ./main
Result: 0.333333

Now if you want to debug this program you get again an the error message.

(gdb) run
Starting program: /home/johndoe/Code/gdb_sched_fifo/main 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f929a6a9700 (LWP 2633)]
Problem with setschedparam: Operation not permitted(1)     <<--- ERROR MSG
Result: inf                                                <<--- DIV BY 0
[New Thread 0x7f9299ea8700 (LWP 2634)]
[Thread 0x7f929a6a9700 (LWP 2633) exited]
[Thread 0x7f9299ea8700 (LWP 2634) exited]
[Inferior 1 (process 2629) exited normally]

This is explained in this question gdb appears to ignore executable capabilities ( allmost all answers may be relevant ).

So in my case I did

  • sudo setcap cap_sys_nice+ep /usr/bin/gdb
  • create a ~/.gdbinit with set startup-with-shell off

And as a result I got:

(gdb) run
Starting program: /home/johndoe/Code/gdb_sched_fifo/main 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6e85700 (LWP 2691)]
Result: inf                              <<-- NO ERR MSG but DIV BY 0 
[New Thread 0x7ffff6684700 (LWP 2692)]
[Thread 0x7ffff6e85700 (LWP 2691) exited]
[Thread 0x7ffff6684700 (LWP 2692) exited]
[Inferior 1 (process 2687) exited normally]
(gdb) 

So conclusion and question

  • I thought the only problem came from GDB
  • Testing on another (non-virtual) target showed even worse results under normal execution

I saw other questions related to RT SCHED_FIFO not respected but I find that the answers have no or unclear conclusions. My MCVE is also much smaller with fewer potential side-effects

Linux SCHED_FIFO not respecting thread priorities

SCHED_FIFO higher priority thread is getting preempted by the SCHED_FIFO lower priority thread?

Comments brought some pieces of answer but I am still not convinced ... ( ... it should work like this )

The MCVE:

#include <iostream>
#include <thread>
#include <cstring>

double a = 1.0F;
double b = 0.0F;

void ratio(void)
{
    struct sched_param param;
    param.sched_priority = 1;
    int ret = pthread_setschedparam(pthread_self(),SCHED_FIFO,&param);
        if ( 0 != ret )
    std::cout << "Problem with setschedparam: " << std::strerror(errno) << '(' << errno << ')' << "
" << std::flush;

    std::cout << "Result: " << a/b << "
" << std::flush;
}

void divisor(void)
{
    struct sched_param param;
    param.sched_priority = 2;
    pthread_setschedparam(pthread_self(),SCHED_FIFO,&param);

    b = 3.0F;

    std::this_thread::sleep_for(std::chrono::milliseconds(2000u));
}


int main(int argc, char * argv[])
{
    struct sched_param param;
    param.sched_priority = 10;
    pthread_setschedparam(pthread_self(),SCHED_FIFO,&param);

    std::thread thr_ratio(ratio);
    std::thread thr_divisor(divisor);

    thr_ratio.join();
    thr_divisor.join();

    return 0;
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are a few things obviously wrong with your MCVE:

  1. You have a data race on b, i.e. undefined behavior, so anything can happen.

  2. You are expecting that the divisor thread will have finished pthread_setschedparam call before the ratio thread gets to computing the ratio.

    But there is absolutely no guarantee that the first thread will not run to completion long before the second thread is even created.

    Indeed that is what's likely happening under GDB: it must trap thread creation and destruction events in order to keep track of all the threads, and so thread creation under GDB is significantly slower than outside of it.

To fix the second problem, add a counting semaphore, and have both threads randevu after each executed the pthread_setschedparam call.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...