create account

Tensorflow入门——Keras处理分类问题,Classification with Keras by hongtao

View this thread on: hive.blogpeakd.comecency.com
· @hongtao · (edited)
$2.61
Tensorflow入门——Keras处理分类问题,Classification with Keras
Tensorflow 和 Keras 除了能处理[前一篇](https://busy.org/@hongtao/tensorflow-keras)文章提到的回归(Regression,拟合&预测)的问题之外,还可以处理分类(Classfication)的问题。

这篇文章我们就介绍一下如何用Keras快速搭建一个线性分类器或神经网络,通过分析病人的生理数据来判断这个人是否患有糖尿病。

同样的,为了方便与读者交流,所有的源代码都放在了这里:

https://github.com/zht007/tensorflow-practice

### 1. 数据的导入

数据的csv文件已经放在了项目目录中,也可以去[Kaggle](https://www.kaggle.com/uciml/pima-indians-diabetes-database)下载。

![](https://ws3.sinaimg.cn/large/006tKfTcgy1g15est7xhdj31540f80v5.jpg)

### 2.数据预处理

#### 2.1 Normalization(标准化)数据

标准化数据可以用sklearn的工具,但我这里就直接计算了。要注意的是,这里没有标准化年龄。

```python
cols_to_norm = ['Number_pregnant', 'Glucose_concentration', 'Blood_pressure', 'Triceps',
       'Insulin', 'BMI', 'Pedigree']

diabetes[cols_to_norm] = diabetes[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
```

#### 2.2 年龄分段

对于向年龄这样的数据,通常需要按年龄段进行分类,我们先看看数据中的年龄构成。

![](https://ws2.sinaimg.cn/large/006tKfTcgy1g15f5pmw97j30qu0lwac5.jpg)

可以通过panda自带的cut函数对年龄进行分段,我们这里将年龄分成0-30,30-50,50-70,70-100四段,分别标记为0,1,2,3

```python
bins = [0,30,50,70,100]
labels =[0,1,2,3]
diabetes["Age_buckets"] = pd.cut(diabetes["Age"],bins=bins, labels=labels, include_lowest=True)
```

#### 3.4 训练和测试分组

这一步不用多说,还是用sklearn.model_selection 的 train_test_split工具进行处理。

```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x_data,labels,test_size=0.33, random_state=101)
```

### 3. 用Keras搭建线性分类器

与[前一篇](https://busy.org/@hongtao/tensorflow-keras)文章中介绍的线性回归模型一样,但线性分类器输出的Unit 为 2 需要加一个"sorftmax"的激活函数。

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Activation
from tensorflow.keras.optimizers import SGD,Adam
from tensorflow.keras.utils import to_categorical

model = Sequential()
model.add(Dense(2,input_shape = (X_train.shape[1],),activation = 'softmax'))
```

需要注意的是标签y需要进行转换,实际上是将一元数据转换成二元数据(Binary)的"One Hot"数据。比如原始标签用"[1]"和"[0]"这样的一元标签来标记"是"“否”患病,转换之后是否患病用"[1 , 0]"和"[0 , 1]"这样的二元标签来标记。

```python
y_binary_train= to_categorical(y_train)
y_binary_test = to_categorical(y_test)
```

同样可以选用SGD的优化器,但是要注意的是,在Compile的时候损失函数要选择"categorical_crossentropy"

```python
sgd = SGD(0.005)
model.compile(loss = 'categorical_crossentropy', optimizer = sgd, metrics=['accuracy'])
```

### 4. 分类器的训练

训练的时候可以直接将测试数据带入,以方便评估训练效果。

```python
H = model.fit(X_train, y_binary_train, validation_data=(X_test, y_binary_test),epochs = 500)
```

### 5. 训练效果验证

训练效果可以直接调用history查看损失函数和准确率的变化轨迹,线性分类器的效果还不错。

![](https://ws1.sinaimg.cn/large/006tKfTcgy1g15g21e2p5j30oy0jc0v8.jpg)

### 6. 改用神经网络试试

这里我在model中搭建一个20x10的两层全连接的神经网络,优化器选用adam

```python
model = Sequential()
model.add(Dense(20,input_shape = (X_train.shape[1],), activation = 'relu'))
model.add(Dense(10,activation = 'relu'))
model.add(Dense(2, activation = 'softmax'))

adam = Adam(0.01)
```

可以看到,虽然精确度比采用线性分类器稍高,但是在200个epoch之后,明显出现过拟合(Over fitting)的现象。

![](https://ws4.sinaimg.cn/large/006tKfTcgy1g15g8670u8j30og0jc43p.jpg)

### 7. 用模型进行预测

同样的我们可以用训练得到的模型对验证数据进行预测,这里需要注意的是我们最后需要将二元数据用np.argmax转换成一元数据。

```python
import numpy as np
y_pred_softmax = model.predict(X_test)
y_pred = np.argmax(y_pred_softmax, axis=1)
```
---
同步到我的简书
https://www.jianshu.com/u/bd506afc6fc1
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 327 others
properties (23)
authorhongtao
permlinktensorflow-keras-classification-with-keras
categorycn-stem
json_metadata{"community":"busy","app":"steemit/0.1","format":"markdown","tags":["cn-stem","tensorflow","keras","team-cn","busy"],"links":["https://busy.org/@hongtao/tensorflow-keras","https://github.com/zht007/tensorflow-practice","https://www.kaggle.com/uciml/pima-indians-diabetes-database","https://www.jianshu.com/u/bd506afc6fc1"],"image":["https://ws3.sinaimg.cn/large/006tKfTcgy1g15est7xhdj31540f80v5.jpg","https://ws2.sinaimg.cn/large/006tKfTcgy1g15f5pmw97j30qu0lwac5.jpg","https://ws1.sinaimg.cn/large/006tKfTcgy1g15g21e2p5j30oy0jc0v8.jpg","https://ws4.sinaimg.cn/large/006tKfTcgy1g15g8670u8j30og0jc43p.jpg"]}
created2019-03-16 23:55:54
last_update2019-03-20 10:25:09
depth0
children2
last_payout2019-03-23 23:55:54
cashout_time1969-12-31 23:59:59
total_payout_value1.950 HBD
curator_payout_value0.662 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length3,186
author_reputation3,241,267,862,629
root_title"Tensorflow入门——Keras处理分类问题,Classification with Keras"
beneficiaries
0.
accountbusy.org
weight1,000
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id81,431,717
net_rshares4,220,557,708,134
author_curate_reward""
vote details (391)
@steemitboard ·
Congratulations @hongtao! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

<table><tr><td>https://steemitimages.com/60x70/http://steemitboard.com/@hongtao/payout.png?201903172303</td><td>You received more than 10 as payout for your posts. Your next target is to reach a total payout of 50</td></tr>
</table>

<sub>_You can view [your badges on your Steem Board](https://steemitboard.com/@hongtao) and compare to others on the [Steem Ranking](http://steemitboard.com/ranking/index.php?name=hongtao)_</sub>
<sub>_If you no longer want to receive notifications, reply to this comment with the word_ `STOP`</sub>



**Do not miss the last post from @steemitboard:**
<table><tr><td><a href="https://steemit.com/drugwars/@steemitboard/drugwars-early-adopter"><img src="https://steemitimages.com/64x128/https://cdn.steemitimages.com/DQmYGN7R653u4hDFyq1hM7iuhr2bdAP1v2ApACDNtecJAZ5/image.png"></a></td><td><a href="https://steemit.com/drugwars/@steemitboard/drugwars-early-adopter">Are you a DrugWars early adopter? Benvenuto in famiglia!</a></td></tr></table>

> You can upvote this notification to help all Steem users. Learn how [here](https://steemit.com/steemitboard/@steemitboard/http-i-cubeupload-com-7ciqeo-png)!
properties (22)
authorsteemitboard
permlinksteemitboard-notify-hongtao-20190318t000332000z
categorycn-stem
json_metadata{"image":["https://steemitboard.com/img/notify.png"]}
created2019-03-18 00:03:30
last_update2019-03-18 00:03:30
depth1
children0
last_payout2019-03-25 00:03:30
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length1,276
author_reputation38,975,615,169,260
root_title"Tensorflow入门——Keras处理分类问题,Classification with Keras"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id81,488,940
net_rshares0
@steemstem ·
re-hongtao-tensorflow-keras-classification-with-keras-20190318t013024002z
<div class='text-justify'> <div class='pull-left'> <br /> <center> <img width='125' src='https://i.postimg.cc/9FwhnG3w/steemstem_curie.png'> </center>  <br/> </div> <br /> <br /> 

 This post has been voted on by the **SteemSTEM** curation team and voting trail in collaboration with **@curie**. <br /> 
 If you appreciate the work we are doing then consider [voting](https://www.steemit.com/~witnesses) both projects for witness by selecting [**stem.witness**](https://steemconnect.com/sign/account_witness_vote?approve=1&witness=stem.witness) and [**curie**](https://steemconnect.com/sign/account_witness_vote?approve=1&witness=curie)! <br /> 
For additional information please join us on the [**SteemSTEM discord**]( https://discord.gg/BPARaqn) and to get to know the rest of the community! </div>
properties (22)
authorsteemstem
permlinkre-hongtao-tensorflow-keras-classification-with-keras-20190318t013024002z
categorycn-stem
json_metadata{"app":"bloguable-bot"}
created2019-03-18 01:30:27
last_update2019-03-18 01:30:27
depth1
children0
last_payout2019-03-25 01:30:27
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length800
author_reputation262,017,435,115,313
root_title"Tensorflow入门——Keras处理分类问题,Classification with Keras"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id81,491,993
net_rshares0