Machine Learning (머신러닝) parameter optimization

티스토리 뷰

인공지능/머신러닝

Machine Learning (머신러닝) parameter optimization

RosyPark 2020. 2. 20. 17:14

머신러닝을 하면서 hyperparameter에 대한 고민이 많았다. 단순히 노가다 성격이 아닌 어떻게 하는지 정리해보려고 한다. 핵심코드만 정리해둠

순서는 7가지나 자주 많이 쓰이는 GridSearch , BayesianOptimization ,Optuna 세 개만 정리해보려고 한다.

이건 개인적으로 정리하는 용도로 만들었기 때문에 만약에 전체 flow를 보고 싶다면 밑의 링크를 참고하면 된다.

전체코드는 여기서 보실 수 있음

순서

1. GridSearch
2. BayesianOptimization
3. Optuna
4. scikit-optimize
5. Hyperopt
6. Spearmint
7. benderopt

1. GridSearch

- 기본적인 방법

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
rfc=RandomForestClassifier(random_state=42)
 
param_grid = { 
    'n_estimators': [200, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6,7,8],
    'criterion' :['gini', 'entropy']
}
 
CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(X_train, y_train)
 
"""
GridSearchCV(cv=5, error_score='raise-deprecating',
             estimator=RandomForestClassifier(bootstrap=True, class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features='auto',
                                              max_leaf_nodes=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              n_estimators='warn', n_jobs=None,
                                              oob_score=False, random_state=42,
                                              verbose=0, warm_start=False),
             iid='warn', n_jobs=None,
             param_grid={'criterion': ['gini', 'entropy'],
                         'max_depth': [4, 5, 6, 7, 8],
                         'max_features': ['auto', 'sqrt', 'log2'],
                         'n_estimators': [200, 500]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)"""
 
 
 
CV_rfc.best_params_
 
"""
{'criterion': 'gini',
 'max_depth': 4,
 'max_features': 'auto',
 'n_estimators': 200}
"""
 
CV_rfc.score(X_test, y_test)
"""
0.9333333333333333
"""
 
Colored by Color Scripter

cs

2. 베이지안 optimization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

import xgboost as xgb
from bayes_opt import BayesianOptimization
from sklearn.metrics import mean_squared_error
 
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test)
 
 
def xgb_evaluate(max_depth, gamma, colsample_bytree):
    params = {'eval_metric': 'rmse',
              'max_depth': int(max_depth),
              'subsample': 0.8,
              'eta': 0.1,
              'gamma': gamma,
              'colsample_bytree': colsample_bytree}
    # Used around 1000 boosting rounds in the full model
    cv_result = xgb.cv(params, dtrain, num_boost_round=100, nfold=3)    
    
    # Bayesian optimization only knows how to maximize, not minimize, so return the negative RMSE
    return -1.0 * cv_result['test-rmse-mean'].iloc[-1]
 
 
xgb_bo = BayesianOptimization(xgb_evaluate, {'max_depth': (3, 7), 
                                             'gamma': (0, 1),
                                             'colsample_bytree': (0.3, 0.9)})
 
xgb_bo.maximize(init_points=10, n_iter=10)
 
"""
|   iter    |  target   | colsam... |   gamma   | max_depth |
-------------------------------------------------------------
|  1        | -0.2626   |  0.4545   |  0.3145   |  4.561    |
|  2        | -0.2643   |  0.4066   |  0.5222   |  5.02     |
|  3        | -0.2801   |  0.3877   |  0.1135   |  5.54     |
|  4        | -0.2188   |  0.5447   |  0.4376   |  4.533    |
|  5        | -0.3082   |  0.3934   |  0.0108   |  4.212    |
|  6        | -0.2225   |  0.6972   |  0.6869   |  5.31     |
|  7        | -0.2235   |  0.6137   |  0.07268  |  4.697    |
|  8        | -0.223    |  0.867    |  0.7894   |  5.302    |
|  9        | -0.2678   |  0.3241   |  0.7919   |  4.154    |
|  10       | -0.2203   |  0.5369   |  0.08622  |  6.587    |
|  11       | -0.2287   |  0.9      |  1.0      |  3.0      |
|  12       | -0.2713   |  0.3      |  1.0      |  7.0      |
|  13       | -0.2229   |  0.9      |  0.0      |  3.0      |
|  14       | -0.2199   |  0.9      |  0.0      |  7.0      |
|  15       | -0.2119   |  0.9      |  0.4256   |  3.862    |
|  16       | -0.2694   |  0.3      |  1.0      |  3.0      |
|  17       | -0.3237   |  0.3      |  0.0      |  7.0      |
|  18       | -0.2164   |  0.9      |  0.0      |  5.875    |
|  19       | -0.2287   |  0.9      |  1.0      |  7.0      |
|  20       | -0.2287   |  0.9      |  1.0      |  4.356    |
=============================================================
Wall time: 7.71 s
 
"""
print("Final result : ", xgb_bo.max)
 
"""
 
Final result :  {'target': -0.21194533333333332, 'params': {'colsample_bytree': 0.9, 'gamma': 0.4255966405409401, 'max_depth': 3.862004501194054}}
 
"""
 
 
 
Colored by Color Scripter

cs

3. Optuna

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

import optuna
import sklearn
 
 
def objective(trial):
   
    n_estimators = trial.suggest_int('n_estimators', 2, 20)
    max_depth = int(trial.suggest_loguniform('max_depth', 1, 32))
    
    clf = sklearn.ensemble.RandomForestClassifier(
        n_estimators=n_estimators, max_depth=max_depth)
    
    return sklearn.model_selection.cross_val_score(
        clf, iris_data, iris_label, n_jobs=-1, cv=3).mean()
 
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
 
trial = study.best_trial
 
 
print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))
 
"""
 
Accuracy: 0.9738562091503268
Best hyperparameters: {'n_estimators': 19, 'max_depth': 27.236798483232246}
 
"""
 
 
Colored by Color Scripter

cs

<출처>

'인공지능 > 머신러닝' 카테고리의 다른 글

XAI란? 설명가능한 인공지능 (0)	2020.04.21
머신러닝 - 베이즈 통계학 (0)	2020.03.17
머신러닝 기본적인 LinearRegression Code (0)	2020.02.19
[ML Algorithm] 군집화(clustering) (0)	2020.01.12
머신러닝 - Feature 생각해보기 (0)	2019.11.28

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

글 보관함

Rosy's Artificial Intelligence Blog

티스토리 뷰

Machine Learning (머신러닝) parameter optimization

순서

1. GridSearch

2. 베이지안 optimization

3. Optuna

'인공지능 > 머신러닝' 카테고리의 다른 글

티스토리툴바