Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
514 views
in Technique[技术] by (71.8m points)

.net - How does the number of classes in an assembly impact performance?

The project I'm working on will generate code for a large number of classes - hundreds to thousands is expected. It is not known at generation time how many of these classes will actually be accessed.

The generated classes could (1) all live in a single assembly (or possibly a handful of assemblies), which would be loaded when toe consuming process starts.

...or (2) I could generate a single assembly for each class, much like Java compiles every class to a single *.class binary file, and then come up with a mechanism to load the assemblies on demand.

The question: which case would yield better (memory and time) performance?

My gut feeling is that for case (1) the load time and used memory are directly proportional to the number of classes that make up the single monolithic assembly. OTOH, case (2) comes with its complications.

If you know any resources pertaining to the internals of loading assemblies, especially what code gets called (if any!?) and what memory is allocated (book-keeping for the newly loaded assembly), please share them.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are trying to solve a non-existing problem, assembly loading is very heavily optimized in .NET.

Breaking up a large assembly into many smaller ones is without a doubt the very worst thing you can do. By far the biggest expense in loading an assembly is finding the file. This is a cold start problem, the CLR loader is bogged down by the slow disk, directory entries need to be retrieved and searched to locate the disk sectors that contain the file content. This problem disappears on a warm start when the assembly data can be retrieved from the file system cache. Note that Java doesn't do it it this way either, it packages .class files into a .jar. A .jar is the rough equivalent of an assembly.

Once the file is located, .NET uses an operating system facility to making actually loading the assembly data very cheap. It uses a memory mapped file. Which involves merely reserving the virtual memory for the file but not reading from the file.

Reading doesn't happen until later and is done by a page fault. A feature of any demand-paged virtual memory operating system. Accessing the virtual memory produces a page fault, the operating system loads the data from the file and maps the virtual memory page into RAM. After which the program continues, never being aware that it got interrupted by the OS. It will be the jitter that produces these page faults, it accesses the metadata tables in the assembly to locate the IL for a method. From which it then generates executable machine code.

An automatic benefit from this scheme is that you never pay for code that is in an assembly but is not being used. The jitter simply has no reason to look at the section of the file that contains the IL so it never actually gets read.

And note the disadvantage of this scheme, using a class for the first time does involve a perf hit due to the disk read. This needs to be paid for one way or another, in .NET the debt is due at the last possible moment. Which is why attributes have a reputation for being slow.

Bigger assemblies are always better than many smaller assemblies.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...