开源软件名称(OpenSource Name):ClimbsRocks/auto_ml开源软件地址(OpenSource Url):https://github.com/ClimbsRocks/auto_ml开源编程语言(OpenSource Language):Python 100.0%开源软件介绍(OpenSource Introduction):auto_ml
Installation
Getting startedfrom auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
df_train, df_test = get_boston_dataset()
column_descriptions = {
'MEDV': 'output',
'CHAS': 'categorical'
}
ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
ml_predictor.train(df_train)
ml_predictor.score(df_test, df_test.MEDV) Show off some more features!auto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model. from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
from auto_ml.utils_models import load_ml_model
# Load data
df_train, df_test = get_boston_dataset()
# Tell auto_ml which column is 'output'
# Also note columns that aren't purely numerical
# Examples include ['nlp', 'date', 'categorical', 'ignore']
column_descriptions = {
'MEDV': 'output'
, 'CHAS': 'categorical'
}
ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
ml_predictor.train(df_train)
# Score the model on test data
test_score = ml_predictor.score(df_test, df_test.MEDV)
# auto_ml is specifically tuned for running in production
# It can get predictions on an individual row (passed in as a dictionary)
# A single prediction like this takes ~1 millisecond
# Here we will demonstrate saving the trained model, and loading it again
file_name = ml_predictor.save()
trained_model = load_ml_model(file_name)
# .predict and .predict_proba take in either:
# A pandas DataFrame
# A list of dictionaries
# A single dictionary (optimized for speed in production evironments)
predictions = trained_model.predict(df_test)
print(predictions) 3rd Party Packages- Deep Learning with TensorFlow & Keras, XGBoost, LightGBM, CatBoostauto_ml has all of these awesome libraries integrated!
Generally, just pass one of them in for model_names.
Available options are
All of these projects are ready for production. These projects all have prediction time in the 1 millisecond range for a single prediction, and are able to be serialized to disk and loaded into a new environment after training. Depending on your machine, they can occasionally be difficult to install, so they are not included in auto_ml's default installation. You are responsible for installing them yourself. auto_ml will run fine without them installed (we check what's installed before choosing which algorithm to use). Feature ResponsesGet linear-model-esque interpretations from non-linear models. See the docs for more information and caveats. ClassificationBinary and multiclass classification are both supported. Note that for now, labels must be integers (0 and 1 for binary classification). auto_ml will automatically detect if it is a binary or multiclass classification problem - you just have to pass in Feature LearningAlso known as "finally found a way to make this deep learning stuff useful for my business". Deep Learning is great at learning important features from your data. But the way it turns these learned features into a final prediction is relatively basic. Gradient boosting is great at turning features into accurate predictions, but it doesn't do any feature learning. In auto_ml, you can now automatically use both types of models for what they're great at. If you pass Across some problems, we've witnessed this lead to a 5% gain in accuracy, while still making predictions in 1-4 milliseconds, depending on model complexity.
This feature only supports regression and binary classification currently. The rest of auto_ml supports multiclass classification. Categorical EnsemblingEver wanted to train one market for every store/customer, but didn't want to maintain hundreds of thousands of independent models? With Just tell us which column holds the category you want to split on, and we'll handle the rest. As always, saving the model, loading it in a different environment, and getting speedy predictions live in production is baked right in.
More details available in the docshttp://auto-ml.readthedocs.io/en/latest/ AdviceBefore you go any further, try running the code. Load up some data (either a DataFrame, or a list of dictionaries, where each dictionary is a row of data). Make a Everything else in these docs assumes you have done at least the above. Start there and everything else will build on top. But this part gets you the output you're probably interested in, without unnecessary complexity. DocsThe full docs are available at https://auto_ml.readthedocs.io Again though, I'd strongly recommend running this on an actual dataset before referencing the docs any futher. What this project doesAutomates the whole machine learning process, making it super easy to use for both analytics, and getting real-time predictions in production. A quick overview of buzzwords, this project automates:
Running the testsIf you've cloned the source code and are making any changes (highly encouraged!), or just want to make sure everything works in your environment, run
CI is also set up, so if you're developing on this, you can just open a PR, and the tests will run automatically on Travis-CI. The tests are relatively comprehensive, though as with everything with auto_ml, I happily welcome your contributions here! |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论