Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
313 views
in Technique[技术] by (71.8m points)

compiler construction - How exactly does java compilation take place?

Confused by java compilation process

OK i know this: We write java source code, the compiler which is platform independent translates it into bytecode, then the jvm which is platform dependent translates it into machine code.

So from start, we write java source code. The compiler javac.exe is a .exe file. What exactly is this .exe file? Isn't the java compiler written in java, then how come there is .exe file which executes it? If the compiler code is written is java, then how come compiler code is executed at the compilation stage, since its the job of the jvm to execute java code. How can a language itself compile its own language code? It all seems like chicken and egg problem to me.

Now what exactly does the .class file contain? Is it a abstract syntax tree in text form, is it tabular information, what is it?

can anybody tell me clear and detailed way about how my java source code gets converted in machine code.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

OK i know this: We write java source code, the compiler which is platform independent translates it into bytecode,

Actually the compiler itself works as a native executable (hence javac.exe). And true, it transforms source file into bytecode. The bytecode is platform independent, because it's targeted at Java Virtual Machine.

then the jvm which is platform dependent translates it into machine code.

Not always. As for Sun's JVM there are two jvms: client and server. They both can, but not certainly have to compile to native code.

So from start, we write java source code. The compiler javac.exe is a .exe file. What exactly is this .exe file? Isn't the java compiler written in java, then how come there is .exe file which executes it?

This exe file is a wrapped java bytecode. It's for convenience - to avoid complicated batch scripts. It starts a JVM and executes the compiler.

If the compiler code is written is java, then how come compiler code is executed at the compilation stage, since its the job of the jvm to execute java code.

That's exactly what wrapping code does.

How can a language itself compile its own language code? It all seems like chicken and egg problem to me.

True, confusing at first glance. Though, it's not only Java's idiom. The Ada's compiler is also written in Ada itself. It may look like a "chicken and egg problem", but in truth, it's only a bootstrapping problem.

Now what exactly does the .class file contain? Is it an abstract syntax tree in text form, is it tabular information, what is it?

It's not Abstract Syntax Tree. AST is only used by tokenizer and compiler at compiling time to represent code in memory. .class file is like an assembly, but for JVM. JVM, in turn, is an abstract machine which can run specialized machine language - targeted only at virtual machine. In it's simplest, .class file has a very similar structure to normal assembly. At the beginning there are declared all static variables, then comes some tables of extern function signatures and lastly the machine code.

If You are really curious You can dig into classfile using "javap" utility. Here is sample (obfuscated) output of invoking javap -c Main:

0:   new #2; //class SomeObject
3:   dup
4:   invokespecial   #3; //Method SomeObject."<init>":()V
7:   astore_1
8:   aload_1
9:   invokevirtual   #4; //Method SomeObject.doSomething:()V
12:  return

So You should have an idea already what it really is.

can anybody tell me clear and detailed way about how my java source code gets converted in machine code.

I think it should be more clear right now, but here's short summary:

  • You invoke javac pointing to your source code file. The internal reader (or tokenizer) of javac reads your file and builds an actual AST out of it. All syntax errors come from this stage.

  • The javac hasn't finished its job yet. When it has the AST the true compilation can begin. It's using visitor pattern to traverse AST and resolves external dependencies to add meaning (semantics) to the code. The finished product is saved as a .class file containing bytecode.

  • Now it's time to run the thing. You invoke java with the name of .class file. Now the JVM starts again, but to interpret Your code. The JVM may, or may not compile Your abstract bytecode into the native assembly. The Sun's HotSpot compiler in conjunction with Just In Time compilation may do so if needed. The running code is constantly being profiled by the JVM and recompiled to native code if certain rules are met. Most commonly the hot code is the first to compile natively.

Edit: Without the javac one would have to invoke compiler using something similar to this:

%JDK_HOME%/bin/java.exe -cp:myclasspath com.sun.tools.javac.Main fileToCompile

As you can see it's calling Sun's private API so it's bound to Sun JDK implementation. It would make build systems dependent on it. If one switched to any other JDK (wiki lists 5 other than Sun's) then above code should be updated to reflect the change (since it's unlikely the compiler would reside in com.sun.tools.javac package). Other compilers could be written in native code.

So the standard way is to ship javac wrapper with JDK.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...