Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
304 views
in Technique[技术] by (71.8m points)

packaging - Determining the location of distutils data files programmatically in Python

I'm trying to include data files in distutils for my package and then refer to them using relative paths (following http://docs.python.org/distutils/setupscript.html#distutils-additional-files)

My dir structure is:

myproject/
  mycode.py
  data/
    file1.dat

the code in mycode.py, which is actually a script in the package. It relies on accessing data/file1.dat, refer to it using that relative path. In setup.py, I have:

setup(
 ...
 scripts = "myproject/mycode.py"
 data_files = [('data', 'myproject/data/file1.dat')]
)

suppose the user now uses:

python setup.py --prefix=/home/user/

Then mycode.py will appear in some place like /home/user/bin/. But the reference to data/file1.dat is now broken, since the script lives elsewhere from the data.

How can I find out, from mycode.py, the absolute path to myproject/data/file1.dat, so I can refer to it properly depending on where the user installed the package?

EDIT
When I install this with prefix=/home/user/, I get data/file1.dat created in /home/user/ which is exactly what I want, the only missing piece is how to retrieve the absolute path to this file programmatically, given only a relative path and not knowing where the user installed the package. When I try to use package_data instead of data_files, it does not work - I simply don't get data/file1.dat created anywhere, even if I delete my MANIFEST file.

I've read all the of the current discussions of this apparently very common problem. All the proposed solutions however are not dealing with the case I have a above, where the code that needs to access data_files is a script and its location might change depending on the --prefix argument to setup.py. The only hack I can think of to resolve this is to add the data file to scripts= in setup(), as in:

setup(
  ...
  scripts = ["myproject/mycode.py", "myproject/data/file1.data"]
)

this is a horrible hack but it is the only way I can think of to ensure that file1.data will be in the same place as the scripts defined in scripts=, since I cannot find any platform independent and installation sensitive API to recover the location of data_files after the user ran setup.py install (potentially with --prefix= args).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think the confusion arises from the usage of scripts. Scripts should refer to a runnable executable, perhaps a utility script related to your package or perhaps an entry point into functionality for your package. In either case, you should expect that any scripts will not be installed alongside the rest of your package. This expectation is due mainly to the convention that packages are considered libraries (and installed to lib directories) whereas scripts are considered executables (and installed to bin or Scripts directories). Furthermore, data files are neither executables nor libraries and are completely separate.

So from the script, you need to determine where the data files are located. According to the Python docs,

If directory is a relative path, it is interpreted relative to the installation prefix.

Therefore, you should write something like the following in the mycode script to locate the data file:

import sys
import os

def my_func():
    with open(os.path.join(sys.prefix, 'data', 'file1.dat')) as f:
        print(next(f))

if __name__ == '__main__':
    my_func()

If you're not pleased with the way that your code and data are not bundled together (and I would not be), then I would restructure your package so that you have an actual Python package (and module) and use packages= and package_data= to inject the data into the package, and then create a simple script that calls into the module in the package.

I did that by creating this tree:

.
│   setup.py
│
├───myproject
│   │   mycode.py
│   │   __init__.py
│   │
│   └───data
│           file1.dat
│
└───scripts
        run-my-code.py

With setup.py:

from distutils.core import setup

setup(
    name='myproject',
    version='1.0',
    scripts=['scripts/run-my-code.py'],
    packages=['myproject'],
    package_data = {
        'myproject': ['data/file1.dat'],
    },
)

run-my-code.py is simply:

from myproject import mycode

mycode.my_func()

__init__ is empty and mycode.py looks like:

import os

here = os.path.dirname(__file__)

def my_func():
    with open(os.path.join(here, 'data', 'file1.dat')) as f:
        print(next(f))

This latter approach keeps the data and code bundled together (in site-packages/myproject) and only installs the script in a different location (so it shows up in the $PATH).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...