开源软件名称(OpenSource Name):yandex/rep开源软件地址(OpenSource Url):https://github.com/yandex/rep开源编程语言(OpenSource Language):Jupyter Notebook 88.7%开源软件介绍(OpenSource Introduction):Reproducible Experiment Platform (REP)REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way. Main features:
REP is not trying to substitute scikit-learn, but extends it and provides better user experience. Howto examplesTo get started, look at the notebooks in /howto/ Notebooks can be viewed (not executed) online at nbviewer Examples code is written in python 2, but library is python 2 and python 3 compatible. Installation with DockerWe provide the docker image with Installation with bare handsHowever, if you want to install Links
LicenseApache 2.0, library is open-source. Minimal examplesREP wrappers are sklearn compatible: from rep.estimators import XGBoostClassifier, SklearnClassifier, TheanetsClassifier
clf = XGBoostClassifier(n_estimators=300, eta=0.1).fit(trainX, trainY)
probabilities = clf.predict_proba(testX) Beloved trick of kagglers is to run bagging over complex algorithms. This is how it is done in REP: from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(base_estimator=XGBoostClassifier(), n_estimators=10)
# wrapping sklearn to REP wrapper
clf = SklearnClassifier(clf) Another useful trick is to use folding instead of splitting data into train/test. This is specially useful when you're using some kind of complex stacking from rep.metaml import FoldingClassifier
clf = FoldingClassifier(TheanetsClassifier(), n_folds=3)
probabilities = clf.fit(X, y).predict_proba(X) In example above all data are splitted into 3 folds, and each fold is predicted by classifier which was trained on other 2 folds. Also REP classifiers provide report: report = clf.test_on(testX, testY)
report.roc().plot() # plot ROC curve
from rep.report.metrics import RocAuc
# learning curves are useful when training GBDT!
report.learning_curve(RocAuc(), steps=10) You can read about other REP tools (like smart distributed grid search, folding and factory) in documentation and howto examples. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论