Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
122 views
in Technique[技术] by (71.8m points)

python - Pandas DataFrame Object Inheritance or Object Use?

I am building a library for working with very specific structured data and I am building my infrastructure on top of Pandas. Currently I am writing a bunch of different data containers for different use cases, such as CTMatrix for Country x Time Data etc. to house methods appropriate for all CountryxTime structured data.

I am currently debating between

Option 1: Object Inheritance

class CTMatrix(pd.DataFrame):
    methods etc. here

or Option 2: Object Use

class CTMatrix(object):
    _data = pd.DataFrame

    then use getter, setter methods to control access to _data etc. 

From a software engineering perspective is there an obvious choice here?

My thoughts so far are:

Option 1:

  1. Can use DataFrame methods directly on the CTMatrix Class (like CTmatrix.sort()) without having to support them via methods on the encapsulated _data object in Option #2
  2. Updates and New methods in Pandas are inherited, except for methods that may be overwritten with local class methods

BUT

  1. Complications with some methods such as __init__() and having to pass the attributes up to the superclass super(MyDF, self).__init__(*args, **kw)

Option 2:

  1. More control over the Class and it's behavior
  2. Possibly more resilient to updates in Pandas?

But

  1. Having to use a getter() or non-hidden attribute to use the object like a dataframe such as (CTMatrix.data.sort())

Are there any additional downsides for taking the approach in Option #1?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I would avoid subclassing DataFrame, because many of the DataFrame methods will return a new DataFrame and not another instance of your CTMatrix object.

There are a few of open issues on GitHub around this e.g.:

More generally, this is a question of composition vs inheritance. I would be especially wary of benefit #2. It might seem great now, but unless you are keeping a close eye on updates to Pandas (and it is a fast moving target), you can easily end up with unexpected consequences and your code will end up intertwined with Pandas.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...