I'd like to build a C pre-processor / compiler that allows functions to be collected from local and online sources. ie:
#fetch MP3FileBuilder http://scripts.com/MP3Builder.gz
#fetch IpodDeviceReader http://apple.com/modules/MP3Builder.gz
void mymodule_main() {
MP3FileBuilder(&some_data);
}
That's the easy part.
The hard part is I need a reliable way to "sandbox" the imported code from direct or unrestricted access to disk or system resources (including memory allocation and the stack). I want a way to safely run small snippets of untrusted C code (modules) without the overhead of putting them in separate process, VM or interpreter (a separate thread would be acceptable though).
REQUIREMENTS
- I'd need to put quotas on its access to data and resources including CPU time.
- I will block direct access to the standard libraries
- I want to stop malicious code that creates endless recursion
- I want to limit static and dynamic allocation to specific limits
- I want to catch all exceptions the module may raise (like divide by 0).
- Modules may only interact with other modules via core interfaces
- Modules may only interact with the system (I/O etc..) via core interfaces
- Modules must allow bit ops, maths, arrays, enums, loops and branching.
- Modules cannot use ASM
- I want to limit pointer and array access to memory reserved for the module (via a custom safe_malloc())
- Must support ANSI C or a subset (see below)
- The system must be lightweight and cross-platform (including embedded systems).
- The system must be GPL or LGPL compatible.
I'm happy to settle for a subset of C. I don't need things like templates or classes. I'm primarily interested in the things high-level languages don't do well like fast maths, bit operations, and the searching and processing of binary data.
It is not the intention that existing C code can be reused without modification to create a module. The intention is that modules would be required to conform to a set of rules and limitations designed to limit the module to basic logic and transformation operations (like a video transcode or compression operations for example).
The theoretical input to such a compiler/pre-processor would be a single ANSI C file (or safe subset) with a module_main function, NO includes or pre-processor directives, no ASM, It would allow loops, branching, function calls, pointer maths (restricted to a range allocated to the module), bit-shifting, bitfields, casts, enums, arrays, ints, floats, strings and maths. Anything else is optional.
EXAMPLE IMPLEMENTATION
Here's a pseudo-code snippet to explain this better. Here a module exceeds it's memory allocation quota and also creates infinite recursion.
buffer* transcodeToAVI_main( &in_buffer ) {
int buffer[1000000000]; // allocation exceeding quota
while(true) {} // infinite loop
return buffer;
}
Here's a transformed version where our preprocessor has added watchpoints to check for memory usage and recursion and wrapped the whole thing in an exception handler.
buffer* transcodeToAVI_main( &in_buffer ) {
try {
core_funcStart(__FILE__,__FUNC__); // tell core we're executing this function
buffer = core_newArray(1000000000, __FILE__, __FUNC__); // memory allocation from quota
while(true) {
core_checkLoop(__FILE__, __FUNC__, __LINE__) && break; // break loop on recursion limit
}
core_moduleEnd(__FILE__,__FUNC__);
} catch {
core_exceptionHandler(__FILE__, __FUNC__);
}
return buffer;
}
I realise performing these checks impact the module performance but I suspect it will still outperform high-level or VM languages for the tasks it is intended to solve. I'm not trying to stop modules doing dangerous things outright, I'm just trying to force those dangerous things to happen in a controlled way (like via user feedback). ie: "Module X has exceeded it's memory allocation, continue or abort?".
UPDATE
The best I've got so far is to use a custom compiler (Like a hacked TCC) with bounds checking and some custom function and looping code to catch recursions. I'd still like to hear thoughts on what else I need to check for or what solutions are out there. I imagine that removing ASM and checking pointers before use solves a lot of the concerns expressed in previous answers below. I added a bounty to pry some more feedback out of the SO community.
For the bounty I'm looking for:
- Details of potential exploits against the theoretical system defined above
- Possible optimisations over checking pointers on each access
- Experimental open-source implementations of the concepts (like Google Native Client)
- Solutions that support a wide range of OS and devices (no OS/hardware based solutions)
- Solutions that support the most C operations, or even C++ (if that's possible)
Extra credit for a method that can work with GCC (ie, a pre-processor or small GCC patch).
I'll also give consideration to anyone who can conclusively prove what I'm attempting cannot be done at all. You will need to be pretty convincing though because none of the objections so far have really nailed the technical aspects of why they think it's impossible. In the defence of those who said no this question was originally posed as a way to safely run C++. I have now scaled back the requirement to a limited subset of C.
My understanding of C could be classed as "intermediate", my understanding of PC hardware is maybe a step below "advanced". Try to coach your answers for that level if you can. Since I'm no C expert I'll be going largely based on votes given to an answer as well as how closely the answer comes to my requirements. You can assist by providing sufficient evidence for your claims (respondents) and by voting (everyone else). I'll assign an answer once the bounty countdown reaches 6 hours.
Finally, I believe solving this problem would be a major step towards maintaining C's relevance in an increasingly networked and paranoid world. As other languages close the gap performance-wise and computing power grows it will be harder and harder to justify the added risk of C development (as it is now with ASM). I believe your answers will have a much greater relevance than scoring a few SO points so please contribute what you can, even if the bounty has expired.
question from:
https://stackoverflow.com/questions/980170/how-to-create-a-lightweight-c-code-sandbox