Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
363 views
in Technique[技术] by (71.8m points)

filesystems - How can I use an SD card for logging 16-bit data at 48 ksamples/s?

Background

My board incorporates an STM32 microcontroller with an SD/MMC card on SPI and samples analogue data at 48 ksamples/s. I am using the Keil Real-time Library RTX kernel, and ELM FatFs.

I have a high priority task that captures analogue data via DMA in blocks of 40 samples (40 x 16 bit); the data is passed via a queue of length 128 (which constitutes about 107 ms of sample buffering) to a second low priority task that collates sample blocks into a 2560 byte buffer (this being a multiple of both the 512 byte SD sector size and the 40 sample block size). when this buffer is full (32 blocks or approx 27 ms), the data is written to the file system.

Observation

By instrumenting the code, I can see that every 32 blocks, the data is written and that the write takes about 6 ms. This is sustained until (on FAT16) the file size gets to 1 MB, when the write operation takes 440 ms, by which time the queue fills and logging is aborted. If I format the card as FAT32, the file size before the 'long-write' event is 4 MB.

The fact that the file size at which this occurs changes between FAT16 and FAT32 suggests to me that it is not a limitation of the card but rather something that the file system does at the 1 MB or 4 MB boundaries that takes additional time.

It also appears that my tasks are being scheduled in a timely manner, and that the time is consumed in the ELM FatFs code only at the 1 MB (or 4 for FAT32) boundary.

The question

Is there an explanation or a solution? Is it a FAT issue, or rather specific to ELM's FatFs code perhaps?

I have considered using multiple files, but in my experience FAT does not handle large numbers of files in a single directory very well and this would simply fail also. Not using a file system at all and writing to the card raw would be a possibility, but ideally I'd like to read the data on a PC with standard drivers and no special software.

It occurred to me to try compiler optimisations to get the write-time down; this seems to have an effect, but the write times seemed much more variable. At -O2 I did get a 8 MB file, but the results were inconsistent. I am now not sure whether there is a direct correlation between the file size and the point at which it fails; I have seen it fail in this way at various file lengths on no particular boundary. Maybe it is a card performance issue.

I further instrumented the code and applied a divide an conquer approach. This observation probably renders the question obsolete and all previous observations are erroneous or red-herrings.

I finally narrowed it down to an instance a multi-sector write (CMD25) where occasionally the "wait ready" polling of the card takes 174 ms for the first three sectors out of a block of 5. The timeout for wait ready is set to 500 ms, so it would happily busy-wait for that long. Using CMD24 (single sector write) iteratively is much slower in the general case - 140 ms per sector - rather than just occasionally.

So it seems a behaviour of the card after all. I shall endeavour to try a range of cards SD and MMC.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The first thing to try could be quite easy: increase the queue depth to 640. That would give you 535 ms of buffering and should survive at least this particular file system event.

The second thing to look at is the configuration of the ELM FatFs. Many embedded file systems are very stingy with buffer usage by default. I've seen one that used a single 512 byte block buffer for all operations and it crawled for certain file system transactions. We gave it a couple of kilobytes and the thing became orders of magnitude faster.

Both of the above are dependent on whether you have more RAM available, of course.

A third option would be to preallocate a huge file and then just overwrite the data during data collection. That would eliminate a number of expensive cluster allocation and FAT manipulation operations.

Since compiler optimization affected this, you must also consider the possibility that it is a multi-threading issue. Are there other threads running that could disturb the lower priority reader thread? You should also try changing the buffering there to something other than a multiple of the sample size and flash block size in case you're hitting some kind of system resonance.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...