CatBoostClassifier
CatBoost can use categorical features directly and is scalable in nature.
It is an Open source library contributed by Yandex
"CatBoost is a high-performance open source library for gradient boosting on decision trees.""
It is a readymade classifier in scikit-learn’s conventions terms that would deal with categorical features automatically.
It can work with diverse data types to help solve a wide range of problems (described later) that businesses face today.
Also, it provides best-in-class accuracy
1.It yields state-of-the-art results without extensive data training typically required by other machine learning methods, an
2.Provides powerful out-of-the-box support for the more descriptive data formats that accompany many business problems
“CatBoost” name comes from two words - “Category” and “Boosting”.
It works well with multiple categories of data, such as audio, text, image including historical data.
“Boost” comes from gradient boosting machine learning algorithm as this library is based on gradient boosting library. Gradient boosting is a powerful machine learning algorithm that is widely applied to multiple types of business challenges like fraud detection, recommendation items, forecasting and it performs well also. It can also return very good results with relatively less data, unlike DL models that need to learn from a massive amount of data.
It reduces the need for extensive hyper-parameter tuning and lower the chances of overfitting also which leads to more generalized models. Although, CatBoost has multiple parameters to tune and it contains parameters like the number of trees, learning rate, regularization, tree depth, fold size, bagging temperature and others.
We have multiple boosting libraries like XGBoost, H2O and LightGBM and all of these perform well on variety of problems.
CatBoost developer have compared the performance with competitors on standard ML datasets.
The comparison above shows the log-loss value for test data and it is lowest in the case of CatBoost in most cases. It clearly signifies that CatBoost mostly performs better for both tuned and default models.
For More
https://www.kaggle.com/prashant111/catboost-classifier-in-python
Comments
Post a Comment