티스토리 뷰
1. Pima Indians Diabetes Database
2. logistic 회귀 이용 코드
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
|
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score
from sklearn.metrics import f1_score, confusion_matrix, precision_recall_curve, roc_curve
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
def get_clf_eval(y_test,pred):
confusion = confusion_matrix(y_test,pred)
accuracy = accuracy_score(y_test,pred)
precision = precision_score(y_test,pred)
recall = recall_score(y_test,pred)
f1 = f1_score(y_test, pred)
roc_score = roc_auc_score(y_test,pred)
print("오차행렬")
print(confusion)
print('정확도 : {0:.4f}, 정밀도 : {1:.4f}, 재현율 : {2:.4f}, F1 : {3:.4f}, ROC AUC 값 {4:.4f}: '.format(accuracy, precision, recall, f1, roc_score))
diabets_data = pd.read_csv('diabetes.csv')
print(diabets_data['Outcome'].value_counts())
"""
0 500
1 268
"""
print(diabets_data.head(5))
"""
Name: Outcome, dtype: int64
Pregnancies Glucose BloodPressure ... DiabetesPedigreeFunction Age Outcome
0 6 148 72 ... 0.627 50 1
1 1 85 66 ... 0.351 31 0
2 8 183 64 ... 0.672 32 1
3 1 89 66 ... 0.167 21 0
4 0 137 40 ... 2.288 33 1
"""
print(diabets_data.info())
"""
[5 rows x 9 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
Pregnancies 768 non-null int64
Glucose 768 non-null int64
BloodPressure 768 non-null int64
SkinThickness 768 non-null int64
Insulin 768 non-null int64
BMI 768 non-null float64
DiabetesPedigreeFunction 768 non-null float64
Age 768 non-null int64
Outcome 768 non-null int64
"""
X = diabets_data.iloc[:,:-1]
y = diabets_data.iloc[:,-1]
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2, random_state = 156, stratify =y)
lr_clf = LogisticRegression()
lr_clf.fit(X_train,y_train)
pred = lr_clf.predict(X_test)
get_clf_eval(y_test,pred)
"""
오차행렬
[[87 13]
[22 32]]
정확도 : 0.7727, 정밀도 : 0.7111, 재현율 : 0.5926, F1 : 0.6465, ROC AUC 값 0.7313:
"""
|
cs |
<출처>
1. 파이썬 머신러닝 완벽 가이드
'인공지능 > 캐글' 카테고리의 다른 글
pytorch에 Albumentations란? (0) | 2020.06.28 |
---|---|
Evaluation 종류 (추가예정) (0) | 2020.06.10 |
벡터의 내적과 외적 (0) | 2020.04.21 |
kaggle을 위한 데이터 분석 [1] - 데이터 파악 (0) | 2019.10.11 |
Kaggle Project - Predict Future Sales (0) | 2019.10.06 |
댓글