Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

assembly - How to create or manipulate GPU assembler?

Does any one have experience in creating/manipulating GPU machine code, possibly at run-time?

I am interested in modifying GPU assembler code, possibly at run time with minimal overhead. Specifically I'm interested in assembler based genetic programming.

I understand ATI has released ISAs for some of their cards, and nvidia recently released a disassembler for CUDA for older cards, but I am not sure if it is possible to modify instructions in memory at runtime or even before hand.

Is this possible? Any related information is welcome.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In the CUDA driver API, the module management functions allow an application to load at runtime a "module", which is (roughly) a PTX or cubin file. PTX is the intermediate language, while cubin is an already compiled set of instructions. cuModuleLoadData() and cuModuleLoadDataEx() appear to be capable of "loading" the module from a pointer in RAM, which means that no actual file is required.

So your problem seems to be: how to programmatically build a cubin module in RAM ? As far as I know, NVIDIA never released details on the instructions actually understood by their hardware. There is, however, an independent opensource package called decuda which includes "cudasm", a assembler for what the "older" NVIDIA GPU understand ("older" = GeForce 8xxx and 9xxx). I do not know how easy it would be to integrate in a wider application; it is written in Python.

Newer NVIDIA GPU use a distinct instruction set (how much distinct, I do not know), so a cubin for an old GPU ("computing capability 1.x" in NVIDIA/CUDA terminology) may not work on a recent GPU (computing capability 2.x, i.e. "Fermi architecture" such as a GTX 480). Which is why PTX is usually preferred: a given PTX file will be portable across GPU generations.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...