I think you're seeing some effects unique to the first frame. I repeated your experiment, with the further addition of forcing doRender = false
around line 244 to avoid the sleep calls used to manage the output frame rate. I see:
01-29 14:05:36.552 9115 9224 I Grafika : queueInputBuffer index/pts, 2,0
01-29 14:05:36.562 9115 9224 I Grafika : queueInputBuffer index/pts, 0,66655
01-29 14:05:36.572 9115 9224 I Grafika : queueInputBuffer index/pts, 3,133288
01-29 14:05:36.582 9115 9224 I Grafika : queueInputBuffer index/pts, 1,199955
01-29 14:05:36.602 9115 9224 I Grafika : dequeueOutputBuffer index/pts, 4,0
01-29 14:05:36.602 9115 9224 I Grafika : dequeueOutputBuffer index/pts, 3,66655
01-29 14:05:36.602 9115 9224 I Grafika : dequeueOutputBuffer index/pts, 2,133288
01-29 14:05:36.612 9115 9224 I Grafika : dequeueOutputBuffer index/pts, 4,199955
(Extraneous lines removed for clarity.) This confirms your results. Note that, while there was a 50ms lag between input and output for pts=0, the subsequent output frames were available almost instantly. The video I used was "camera-test.mp4" (720p camera output).
For insight into why this is happening, take a look at the other stuff in the log, and where it appears. Starting from the first queueInputBuffer
log line, count the number of logs that appear between that and the first dequeueOutputBuffer
line. I count about 60 lines of output from OMX-VDEC-1080P on mine. Now count how many OMX-VDEC lines appear after output buffers start appearing. I see none until the video ends.
The video decoder is clearly deferring some expensive initialization until data is available. So the next question is... how much data does it need? I added a 500ms sleep after submitting the second frame (pts==66633). The result: two frames submitted, 500ms pause, two frames submitted, big pile of OMX-VDEC logs. So it seems that the decoder wants several frames before it will start.
This suggests that we can reduce the start-up latency by feeding the first few frames in quickly. To test that, I changed TIMEOUT_USEC
to zero, so it'll respond quickly but burn CPU. New log output (your logs, no sleep, no rendering):
01-29 14:29:04.542 10560 10599 I Grafika : queueInputBuffer index/pts, 0,0
01-29 14:29:04.542 10560 10599 I Grafika : queueInputBuffer index/pts, 2,66633
01-29 14:29:04.542 10560 10599 I Grafika : queueInputBuffer index/pts, 3,133288
...
01-29 14:29:04.572 10560 10599 I Grafika : dequeueOutputBuffer index/pts, 4,0
01-29 14:29:04.572 10560 10599 I Grafika : dequeueOutputBuffer index/pts, 3,66633
01-29 14:29:04.572 10560 10599 I Grafika : dequeueOutputBuffer index/pts, 2,133288
By feeding the initial frames in quickly, we've reduced the initial latency from 50ms to 30ms.
(Notice how all the timestamps end in '2'? The timer used for the log times appears to be rounding to the nearest 10ms, so the actual time delta may be slightly different.)
The reason we're feeding the initial frames slowly is that we're trying to drain output from the decoder after each input buffer is submitted, waiting 10ms for output that never appears. My initial thought is that we want to want to wait for a timeout on either dequeueInputBuffer()
or dequeueOutputBuffer()
, but not both -- maybe use a timeout on input and a quick poll for output at first, then switch to a timeout on output when we run out of input to feed. (For that matter, the initial timeout for input could be -1, since we know nothing is going to happen until the first input buffer is queued.)
I don't know if there is a way to reduce the latency further.