티스토리 뷰

머신러닝을 하면서 hyperparameter에 대한 고민이 많았다. 단순히 노가다 성격이 아닌 어떻게 하는지 정리해보려고 한다. 핵심코드만 정리해둠 

순서는 7가지나 자주 많이 쓰이는 GridSearch , BayesianOptimization ,Optuna 세 개만 정리해보려고 한다.

이건 개인적으로 정리하는 용도로 만들었기 때문에 만약에 전체 flow를 보고 싶다면 밑의 링크를 참고하면 된다. 

전체코드는 여기서 보실 수 있음 

 

순서

1. GridSearch
2. BayesianOptimization
3. Optuna
4. scikit-optimize
5. Hyperopt
6. Spearmint
7. benderopt

 

1. GridSearch 

- 기본적인 방법

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
rfc=RandomForestClassifier(random_state=42)
 
param_grid = { 
    'n_estimators': [200500],
    'max_features': ['auto''sqrt''log2'],
    'max_depth' : [4,5,6,7,8],
    'criterion' :['gini''entropy']
}
 
CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(X_train, y_train)
 
"""
GridSearchCV(cv=5, error_score='raise-deprecating',
             estimator=RandomForestClassifier(bootstrap=True, class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features='auto',
                                              max_leaf_nodes=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              n_estimators='warn', n_jobs=None,
                                              oob_score=False, random_state=42,
                                              verbose=0, warm_start=False),
             iid='warn', n_jobs=None,
             param_grid={'criterion': ['gini', 'entropy'],
                         'max_depth': [4, 5, 6, 7, 8],
                         'max_features': ['auto', 'sqrt', 'log2'],
                         'n_estimators': [200, 500]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)"""
 
 
 
CV_rfc.best_params_
 
"""
{'criterion': 'gini',
 'max_depth': 4,
 'max_features': 'auto',
 'n_estimators': 200}
"""
 
CV_rfc.score(X_test, y_test)
"""
0.9333333333333333
"""
 
cs

 

 

2. 베이지안 optimization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
import xgboost as xgb
from bayes_opt import BayesianOptimization
from sklearn.metrics import mean_squared_error
 
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test)
 
 
def xgb_evaluate(max_depth, gamma, colsample_bytree):
    params = {'eval_metric''rmse',
              'max_depth'int(max_depth),
              'subsample'0.8,
              'eta'0.1,
              'gamma': gamma,
              'colsample_bytree': colsample_bytree}
    # Used around 1000 boosting rounds in the full model
    cv_result = xgb.cv(params, dtrain, num_boost_round=100, nfold=3)    
    
    # Bayesian optimization only knows how to maximize, not minimize, so return the negative RMSE
    return -1.0 * cv_result['test-rmse-mean'].iloc[-1]
 
 
xgb_bo = BayesianOptimization(xgb_evaluate, {'max_depth': (37), 
                                             'gamma': (01),
                                             'colsample_bytree': (0.30.9)})
 
xgb_bo.maximize(init_points=10, n_iter=10)
 
"""
|   iter    |  target   | colsam... |   gamma   | max_depth |
-------------------------------------------------------------
|  1        | -0.2626   |  0.4545   |  0.3145   |  4.561    |
|  2        | -0.2643   |  0.4066   |  0.5222   |  5.02     |
|  3        | -0.2801   |  0.3877   |  0.1135   |  5.54     |
|  4        | -0.2188   |  0.5447   |  0.4376   |  4.533    |
|  5        | -0.3082   |  0.3934   |  0.0108   |  4.212    |
|  6        | -0.2225   |  0.6972   |  0.6869   |  5.31     |
|  7        | -0.2235   |  0.6137   |  0.07268  |  4.697    |
|  8        | -0.223    |  0.867    |  0.7894   |  5.302    |
|  9        | -0.2678   |  0.3241   |  0.7919   |  4.154    |
|  10       | -0.2203   |  0.5369   |  0.08622  |  6.587    |
|  11       | -0.2287   |  0.9      |  1.0      |  3.0      |
|  12       | -0.2713   |  0.3      |  1.0      |  7.0      |
|  13       | -0.2229   |  0.9      |  0.0      |  3.0      |
|  14       | -0.2199   |  0.9      |  0.0      |  7.0      |
|  15       | -0.2119   |  0.9      |  0.4256   |  3.862    |
|  16       | -0.2694   |  0.3      |  1.0      |  3.0      |
|  17       | -0.3237   |  0.3      |  0.0      |  7.0      |
|  18       | -0.2164   |  0.9      |  0.0      |  5.875    |
|  19       | -0.2287   |  0.9      |  1.0      |  7.0      |
|  20       | -0.2287   |  0.9      |  1.0      |  4.356    |
=============================================================
Wall time: 7.71 s
 
"""
print("Final result : ", xgb_bo.max)
 
"""
 
Final result :  {'target': -0.21194533333333332, 'params': {'colsample_bytree': 0.9, 'gamma': 0.4255966405409401, 'max_depth': 3.862004501194054}}
 
"""
 
 
 
cs

 

 

 

3. Optuna

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import optuna
import sklearn
 
 
def objective(trial):
   
    n_estimators = trial.suggest_int('n_estimators'220)
    max_depth = int(trial.suggest_loguniform('max_depth'132))
    
    clf = sklearn.ensemble.RandomForestClassifier(
        n_estimators=n_estimators, max_depth=max_depth)
    
    return sklearn.model_selection.cross_val_score(
        clf, iris_data, iris_label, n_jobs=-1, cv=3).mean()
 
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
 
trial = study.best_trial
 
 
print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))
 
"""
 
Accuracy: 0.9738562091503268
Best hyperparameters: {'n_estimators': 19, 'max_depth': 27.236798483232246}
 
"""
 
 
cs

 

 

 

 

<출처>

댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
TAG more
«   2024/11   »
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
글 보관함