一以贯之搭建神经网络的过程

在神经网络中预测识别和预测的过程中,其实都是一个函数的对应关系:$$y=f(x)$$ 的函数关系,为了找到这个函数的关系,我们需要做大量的训练,具体的过程可以总结下面几个步骤:

  1. 获得随机矩阵w1b1,经过一个激活函数,得到隐藏层。

  2. 获得随机矩阵w2b2,计算出对应的预测y

  3. 利用y与样本y_的差距,有平方差和交叉熵等方式,来定义损失函数

  4. 利用指数衰减学习率,计算出学习率

  5. 用滑动平均计算输出的参数的平均值

  6. 利用梯度下降的方法,减少损失函数的差距

  7. 用with结构初始化所有的参数

  8. 利用for循环喂入数据,反复训练模型

利用这个八个步骤拟合出的一个圆的函数,代码和结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# coding:utf-8
import os
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

xLine = 300
aLine = 10
BATCHSIZE = 10
regularizer=0.001
BASE_LEARN_RATE=0.001
LEARNING_RATE_DECAY = 0.99
MOVING_VERAGE_DECAY=0.99

# 1 生成随机矩阵x
rdm = np.random.RandomState(2)
X = rdm.randn(xLine, 2)
print("1.生成X:", X)

# 2 生成结果集Y
Y = [int(x1 * x1 + x2 * x2 < 2) for (x1, x2) in X]
print("2.生成Y:", Y)
Y_c = [['red' if y else 'blue'] for y in Y]

print("3. 生成Y_c:", Y_c)
X = np.vstack(X).reshape(-1, 2)
Y = np.vstack(Y).reshape(-1, 1)
print("4. 生成X:", X)
print("5. 生成Y:", Y)
# plt.scatter(X[:, 0], X[:, 1], c=np.squeeze(Y_c))
# plt.show()

# 3. 这里如何保存数据集为样本?

# 4. 训练样本占位
x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))
global_step=tf.Variable(0,trainable=False)

# 5. 获得随机矩阵w1和b1,计算隐藏层a
w1 = tf.Variable(tf.random_normal(shape=([2, aLine]))) # 2*100
tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w1))
b1 =tf.Variable(tf.constant(0.01, shape=[aLine])) # 3000*100

# 6.使用激活函数计算隐藏层a
a = tf.nn.relu(tf.matmul(x, w1) + b1) # 3000*2 x 2*100 +3000*100

# 7. 获得随机矩阵w2和b2 计算预测值y。
w2 = tf.Variable(tf.random_normal(shape=([aLine, 1])))
tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w2))
b2 = tf.Variable(tf.random_normal(shape=([1])))
y = tf.matmul(a, w2) + b2

# 8. 损失函数
loss = tf.reduce_mean(tf.square(y - y_))


# loss = tf.reduce_mean(tf.square(y - y_))
# 滑动学习率
learning_rate=tf.train.exponential_decay(BASE_LEARN_RATE,
global_step,
BATCHSIZE,
LEARNING_RATE_DECAY,
staircase=True)
# 9. 梯度下降方法训练
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

# 滑动平均
# ema=tf.train.ExponentialMovingAverage(MOVING_VERAGE_DECAY,global_step)
# ema_op=ema.apply(tf.trainable_variables())
# with tf.control_dependencies([train_step,ema_op]):
# train_op=tf.no_op(name="train")

saver=tf.train.Saver()

# 10. 开始训练
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)

STEPS = 40000
for i in range(STEPS):
start = (i * BATCHSIZE) % xLine
end = start + BATCHSIZE

# print(Y[start:end])
sess.run(train_step, feed_dict={
x: X[start:end],
y_: Y[start:end]
})

if i % 2000 == 0:
# learning_rate_val=sess.run(learning_rate)
# print("learning_rate_val:",learning_rate_val)
loss_mse_v = sess.run(loss, feed_dict={x: X, y_: Y})
print("After %d training steps,loss on all data is %s" % (i, loss_mse_v))
saver.save(sess,os.path.join("./model/","model.ckpt"))
# print(global_step)

xx, yy = np.mgrid[-3:3:0.1, -3:3:.01]

grid = np.c_[xx.ravel(), yy.ravel()]

probs = sess.run(y, feed_dict={x: grid})
probs = probs.reshape(xx.shape)

print("w1:", sess.run(w1))
print("b1:", sess.run(b1))
print("w2:", sess.run(w2))
print("b2:", sess.run(b2))

plt.scatter(X[:, 0], X[:, 1], c=np.squeeze(Y_c))
plt.contour(xx, yy, probs, levels=[.5])
plt.show()


一以贯之搭建神经网络的过程

使用mnist的数据集实现对手写数字识别

感慨下,学了这么久终于有能有点实战的东西了,这篇文章本想写于2019-04-14,可是担心会对学习的进度产生影响,就一直拖后。所以就再今天(2019-04-24)开始去写这篇实战的文章。

目标

写这篇博客的目的就是为了写一个识别手写程序的方案,首先我们准备了mnist(mnist数据集)的数据集,和一张手写的图片
tensorflow{:height=”600px” width=”600px”}

为了测试,增加了一些干扰线:

tensorflow{:height=”600px” width=”600px”}

我们知道,在mnist的数据集中,是对单个28*28像素图片进行处理,而且是黑底白字的数字图片,所以为了能够使用mnist的数据样本训练,我们也需要对手写的图片处理。

具体的处理方法我们可以分为:

1.二值化
2.去噪声
3.裁剪
4.缩放28*28的图像
5.训练样本
6.识别结果

二值化和去噪声

对于这个图片的二值化可以使用opencv相关处理模块,二值化是直接把图片编程只有黑白两种颜色的图:

tensorflow{:height=”600px” width=”600px”}

我们可以看到上面有几个 使用均值模糊可以去除干扰线和噪点:

tensorflow{:height=”600px” width=”600px”}

对模糊后的图片再进行二值化,得到结果如下:

tensorflow{:height=”600px” width=”600px”}

对应的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import cv2 as cv
import numpy as np
def threshold(image):
gray = cv.cvtColor(image, cv.COLOR_RGB2GRAY) # 把输入图像灰度化
# 直接阈值化是对输入的单通道矩阵逐像素进行阈值分割。
ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY | cv.THRESH_TRIANGLE)
return binary;

img = cv.imread('./pic/01.png')
img = threshold(img)
cv.imwrite("02.png", img)
img = average_value(img)
cv.imwrite("03.png", img)
# 二值化
ret, img = cv.threshold(img, 0, 255, cv.THRESH_BINARY | cv.THRESH_TRIANGLE)
cv.imwrite("04.png", img)

对图片进行裁剪 ,分别按行和列去做累加,找到平均值不等于255的点,找出最小值和最大值,对应的图片裁剪的开始和结束。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def corp_img(image):
h, w = image.shape[:2]
xArr = []
yArr = []
imgArr = []
resultImgArr = []
for x in range(w):
xSum = np.transpose(image)[x].sum()
if xSum / h != 255:
xArr.append(x)
else:
if len(xArr) > 0:
cropped = image[0:h, min(xArr):max(xArr)]
imgArr.append(cropped)
xArr.clear()

for img in imgArr:
for y in range(h):
yw = img.shape[-1]
ySum = img[y].sum()
if (ySum / yw) != 255:
yArr.append(y)
else:
if (len(yArr) > 0):
cropped = img[min(yArr):max(yArr), 0:yw]
resultImgArr.append(cropped)
yArr.clear()
return resultImgArr;

截取后保存后的图片结果为:

tensorflow

mnist的样本数据结果:

mnist数据

对比样本数据结果:

mnist数据

通过对比,我们可以发现截取后的图片没有边框,是白底黑字,分辨率也和mnist的样本数据不同,为了能够使用mnist的数据模型,需要对图片做进一步填充缩放反色

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 反色函数
def revers_color(image):
h, w = image.shape[0:2]
for x in range(w):
for y in range(h):
image[y, x] = 255 - image[y, x]
return image;

corpImgArr = corp_img(img)
for i in range(len(corpImgArr)):
cv.imwrite(str(i) + "_real.png", corpImgArr[i])
wrap = cv.copyMakeBorder(corpImgArr[i],15,15,15,15,cv.BORDER_CONSTANT,value=255)# 填充15个像素
resImg = cv.resize(wrap, (28, 28), interpolation=cv.INTER_AREA)# 缩放为28*28的像素
reversImg= revers_color(resImg)# 反色
cv.imwrite(str(i) + ".png", revers_color(reversImg))

经过处理后的图片效果如下,基本与mnist的数据样本格式一致:

mnist数据

训练模型

原理: 要被识别和训练的模型是一个28*28的图片矩阵,共有784个元素。而最终我们需要得到的结果是一个1x10的数组,所以我们对原来的图片转换成1x784的一维数组X,根据前向传播的神经元模型:$$y=wx+b$$,可以知道W784*10的矩阵,b是一个1*10的矩阵。我们的训练步骤如下:

  1. 随机获得784*10矩阵W和偏执项1*10的矩阵b
  2. 利用神经元的函数关系$$y=wx+b$$,并使用激活函数relu,计算出一个不准确的y
  3. 对于输入的正确结果y,定义为y_,使用交叉熵的方式计算损失函数
  4. 使用正则化方式防止模型过拟合
  5. 使用指数衰减的学习率,模型更好的变动
  6. 使用反向传播的训练方法,减少loss函数的损失
  7. 对所有的参数使用滑动平均,更准确的确定模型参数
  8. 读取mnist的数据,喂入神经网络,开始训练

神经网络的代码如下:

前向传播代码:minst_forward.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# coding:utf-8

import tensorflow as tf
INPUT_NODE=784
OUTPUT_NODE=10
LAYER1_NODE=500
#定义输入输出参数和前向传播过程
def get_weight(shape,regularizer):
w=tf.Variable(tf.random_normal(shape),dtype=tf.float32)
if regularizer!=None:
tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w))
return w

def get_bias(shape):
b=tf.Variable(tf.constant(0.01,shape=shape))
return b


def forward(x,regularizer):
w1=get_weight([INPUT_NODE,LAYER1_NODE],regularizer)
b1=get_bias([LAYER1_NODE])
y1=tf.nn.relu(tf.matmul(x,w1)+b1)

w2=get_weight([LAYER1_NODE,OUTPUT_NODE],regularizer)
b2=get_bias([OUTPUT_NODE])
y=tf.matmul(y1,w2)+b2
return y


反向传播代码:minst_backward.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# coding:utf-8

import tensorflow as tf
import minst_forward
import os

from tensorflow.examples.tutorials.mnist import input_data

BATCH_SIZE = 200
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
REGULARIZER = 0.0001
STEPS = 100000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH = "./model/"
MODEL_NAME = "minst_model"

def backword(minst):
x = tf.placeholder(tf.float32, (None, minst_forward.INPUT_NODE))
y_ = tf.placeholder(tf.float32, (None, minst_forward.OUTPUT_NODE))
y = minst_forward.forward(x, REGULARIZER)
global_step = tf.Variable(0, trainable=False)

# 使用交叉熵的形式定义损失函数
ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cem = tf.reduce_mean(ce)

#正则化防止过拟合
loss = cem + tf.add_n(tf.get_collection("losses"))


# 使用指数衰减的学习率,实现更好的学习率
learning_rate=tf.train.exponential_decay(LEARNING_RATE_BASE,
global_step,
minst.train.num_examples/BATCH_SIZE,
LEARNING_RATE_DECAY,
staircase=True)
# 使用反向传播训练方法,以减小loss值为优化目标
train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

# 对所有的参数都使用滑动平均,更准确的定义模型。
ema=tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
emp_op=ema.apply(tf.trainable_variables())
with tf.control_dependencies([train_step,emp_op]):
train_op=tf.no_op(name="train")


saver=tf.train.Saver()

with tf.Session() as sess:
init_op=tf.global_variables_initializer()
sess.run(init_op)

ckpt=tf.train.get_checkpoint_state(MODEL_SAVE_PATH)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess,ckpt.model_checkpoint_path)

for i in range(STEPS):
xs,ys =minst.train.next_batch(BATCH_SIZE)
_,loss_value,step=sess.run([train_op,loss,global_step],feed_dict={x:xs,y_:ys})#喂入神经网络数据

if i%1000==0:
print("After %d training step(s),loss on training batch is %g."%(step,loss_value))
saver.save(sess,os.path.join(MODEL_SAVE_PATH,MODEL_NAME),global_step=global_step)

def main():
minst=input_data.read_data_sets("./data",one_hot=True)# 读取mnist的数据
backword(minst)

if __name__=="__main__":
main()


运行backword.py的程序,使用mnist的数据,进行10w的训练,得到数据模型如下:

mnist数据

利用模型获得最大可能的预测值,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import imageUtils
import tensorflow as tf
import minst_forward
import minst_backward
import numpy as np
import cv2 as cv


def restore_model(imgArr):
with tf.Graph().as_default() as tg:
x = tf.placeholder(tf.float32, [None, minst_forward.INPUT_NODE])
y = minst_forward.forward(x, None)
preValue = tf.argmax(y, 1) # 得到概率最大的预测值

# 实现滑动平均模型,参数的MOVING_AVERAGE_DECAY用于控制模型的速度
variable_averages = tf.train.ExponentialMovingAverage(minst_backward.MOVING_AVERAGE_DECAY)
variable_to_restore = variable_averages.variables_to_restore()
saver = tf.train.Saver(variable_to_restore)

with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state(minst_backward.MODEL_SAVE_PATH)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
preValue = sess.run(preValue, feed_dict={x: imgArr})
return preValue
else:
print("No checkpoint file found")
return -1


if __name__ == "__main__":
imgArr = imageUtils.getReadyImage("./pic/01.png")
result = []
i = 0;
for img in imgArr:
try:
im_arr = np.array(img)
nm_arr = im_arr.reshape([1, 784])
nm_arr = nm_arr.astype(np.float32)
img_ready = np.multiply(nm_arr, 1.0 / 255.0)
# testPicArr=pre_pic(testPic)
preValue = restore_model(img_ready)
result.append(str(preValue[0]))
except Exception as ee:
print(ee)
print("".join(result))


识别结果如下:

mnist数据

我们再输入一个正常的手写图片:

mnist数据

输出结果:

mnist数据

识别大部分数据正确,还有有识别错误的现象,因为训练的模型的差异和训练次数过少,所以会导致识别出错的现象。

tensorflow的学习笔记--卷积网络

前面说的都是全连接NN的demo,(每个神经元余额前后的相邻的每一个神经元都有链接关系,输入特征,输出为预测结果)

在全连接NN中,一张分辨率为28*28的黑白图像, 有784个数据,如果我们采用了500个样本整个的参数的个数为:

1
2
第一层: 784*500+500 
第二层: 500*10+10

算下来接近40w个参数,参数过多,容易导致模型过拟合,如果换成高分辨率的彩色图像,问题会更为严重。 为了解决这个问题一般会首先提取图像的特征,把提取后的特征再喂给全连接网络,再对参数进行优化。

卷积

卷积是一种提取图片特征的方法,一般用一个正方形卷积核,遍历图片上的每一个像素点。图片与卷积内相对应的权重,然后求和,再加上偏置后,最后输出一个图片的像素值(图片来自网络,侵权请联系删除)。

tensorflow{:height=”600px” width=”600px”}

全零填充Padding

正常的卷积会导致图片的缩小,为了能够让输入和输出的图片尺寸一致,对图片周围的空数据进行全0填充。(图片来自网络,侵权请联系删除)。

tensorflow{:height=”600px” width=”600px”}

tensorflow中的卷积处理

在tensorflow中,使用tf.nn.conv2d方法来实现卷积的算法. conv2d有四个参数,对图片的描述,卷积核的描述,卷积核的滑动步长和是否使用padding.

具体参数描述:

  1. 对输入图片的描述:用batch给出一次喂入多少张图片,每张图片的分辨率大小,比如5行5列,以及这些图片包含几个通道的信息,如果是灰度图则是单通道,参数是1 ,彩色图像是3.

  2. 对于卷积核的描述: 要给出卷积的行分辨率和列分辨率,通道数以及用了几个卷积核。卷积核的通道数是由输入图片的通道数决定的,卷积核的通道数等于输入图片的通道数,所以卷积核的通道数也是1。

  3. 对卷积核滑动步长的描述: 第一个参数和最后一个参数是固定的,第二个和第三个表示滑动步长。

  4. 是否使用padding:padding=’valid’ 代表使用padding

多通道图片卷积

在大多数情况下,输入的图片是RGB三个颜色组成的彩色图。输入的图片包含红绿蓝三层数据,卷积核的深度应该等于输入图片的通道数.

多层卷积的计算方法和单层卷积核相似,卷积核为了匹配红绿蓝的三个颜色,把三层的卷积核套在三层的彩色图片上,重合的像素进行累加,再加上偏置项b,最终得到输出的值。

tensorflow{:height=”600px” width=”600px”}

池化

池化是对图片的像素进行优化和简化,具体过程如下图:

tensorflow{:height=”600px” width=”600px”}

池化包括对最大值池化和平均值池化。

minst的数据集介绍

MINST数据库是由是一个手写数字的数据集,官方网址:http://yann.lecun.com/exdb/mnist/

 > MNIST 数据集来自美国国家标准与技术研究所, National Institute of Standards and Technology (NIST). 训练集 (training set) 由来自 250 个不同人手写的数字构成, 其中 50% 是高中学生, 50% 来自人口普查局 (the Census Bureau) 的工作人员. 测试集(test set) 也是同样比例的手写数字数据.

MINST数据总共有4个包,解压出来的数据如下:

tensorflow{:height=”600px” width=”600px”}

  1. train-images-idx3-ubyte.gz,train-labels-idx1-ubyte.gz:提供了60000张,28*28像素的黑底白字图片用来训练
  2. t10k-images-idx3-ubyte.gz,t10k-labels-idx1-ubyte.gz:提供了10000张,28*28像素的黑底白字图片用来测试

mnist提供的图片有784(28*28)个像素点,把每个像素点的值,组织成一个一维的数组,作为输入参数。形式如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.386 0.379 0....... 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
```

图片的标签以一维数组给出,每个元素代表出现的概率,形式如下:

`[0. 0. 0. 0. 0. 0. 1. 0. 0. 0. ]` 代表数字6


`tensorflow`官方支持数据集的读取,使用代码如下:

``` python
# coding:utf-8
from tensorflow.examples.tutorials.mnist import input_data

minst=input_data.read_data_sets('./data/',one_hot=True)
#打印数据数量
print( "train data size:",minst.train.num_examples)
print( "validation data size:",minst.validation.num_examples)
print( "test data size:",minst.test.num_examples)

#打印数据
print(minst.train.labels[0])
print(minst.train.images[0])

#打印前200行数据
BATCH_SIZE=200
xs,ys=minst.train.next_batch(BATCH_SIZE)
print("xs shape:",xs.shape)
print("ys shape:",ys.shape)

```

输出到控制台结果:

``` cte
train data size: 55000
validation data size: 5000
test data size: 10000
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0.....0. 0.3529412 0.5411765 0.9215687 0.9215687 0.9215687 0.9215687 0.9215687 0.9215687 0.9843138 0.9843138 0.9725491 0.9960785 0.9607844 0.9215687 0.74509805 0.08235294 0. 0. 0.....]
xs: [[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]
xs shape: (200, 784)
ys: [[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 1. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 1. ... 0. 0. 0.]
[1. 0. 0. ... 0. 0. 0.]]
ys shape: (200, 10)

既然是图片数据,我们就能把他们都还原过去,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#coding:utf-8
import numpy as np
import struct

from PIL import Image
import os

data_file = 'train-images-idx3-ubyte' #需要修改的路径

# It's 47040016B, but we should set to 47040000B
data_file_size = 47040016
data_file_size = str(data_file_size - 16) + 'B'

data_buf = open(data_file, 'rb').read()

magic, numImages, numRows, numColumns = struct.unpack_from(
'>IIII', data_buf, 0)
datas = struct.unpack_from(
'>' + data_file_size, data_buf, struct.calcsize('>IIII'))
datas = np.array(datas).astype(np.uint8).reshape(
numImages, 1, numRows, numColumns)

label_file = 'train-labels-idx1-ubyte' #需要修改的路径

# It's 60008B, but we should set to 60000B
label_file_size = 60008
label_file_size = str(label_file_size - 8) + 'B'

label_buf = open(label_file, 'rb').read()

magic, numLabels = struct.unpack_from('>II', label_buf, 0)
labels = struct.unpack_from(
'>' + label_file_size, label_buf, struct.calcsize('>II'))
labels = np.array(labels).astype(np.int64)

datas_root = './pic/' #需要修改的路径
if not os.path.exists(datas_root):
os.mkdir(datas_root)

for i in range(10):
file_name = datas_root + os.sep + str(i)
if not os.path.exists(file_name):
os.mkdir(file_name)

for ii in range(numLabels):
img = Image.fromarray(datas[ii, 0, 0:28, 0:28])
label = labels[ii]
#不足5位补0版本
file_name = datas_root + os.sep + str(label) + os.sep + \
str(ii).zfill(5) + '.png'
img.save(file_name)

tensorflow

tensorflow

tensorflow的学习笔记--常用方法和代码片段

今天整理下学习的tensorflow的几个常用的方法和代码片段.

  1. tf.get_collection("") 从集合中取出全部变量,生成一个列表

  2. tf.add_n([]) 列表内对应元素相加

  3. tf.cast(x,dtype) 把x转为dtype类型

  4. tf.argmax(x,axis) 返回最大值所在索引号,如 tf.argmax([0,1,0]) 返回2

  5. with tf.Graph().as_default() as g: 其内定义的节点在计算图g中

  6. 保存模型:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
            saver=tf.train.Saver()# 实例化saver对象
    with tf.Session() as sess: # 在with结构的for循环一定轮数是,保存模型到当前会话
    for i in range(STEPS):
    if i%轮数 ==0
    saver.save(sess,os.path.join(""),global_step=global_step)

    ```

    7. 加载模型

    ``` python

    with tf.Session() as sess:
    ckpt=tf.train.get_checkpoint_path
    if ckpt and ckpt.model_checkpoint_path:
    saver.restore(sess,ckpt.model_checkpoint_path)
  7. 反向传播均方误差和减少loss

    1
    2
    3
    4
    5
    loss=tf.reduce_mean(tf.square(y_-y))
    # 三个减少loss的训练方法
    train_step=tf.train.GradientDescentOptimizer(learnig_rate).minimize(loss)
    train_step=tf.train.MomentumOptimizer(learnig_rate,momentum).minimize(loss)
    train_step=tf.train.AdamOptimizer(learnig_rate).minimize(loss)
  8. 损失函数

    1
    2
    3
    4
    5
    6
    7
    # tf.nn.relu   #未找到使用地方
    # tf.nn.sigmoid #未找到使用地方
    # tf.nn.tanh #未找到使用地方
    loss=tf.reduce_sum(tf.where(tf.greater(y,y_),COST(y-y_),PROFIT(y_-y))) #自定义损失函数

    ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logtis=y,labels=tf.argmax(y_,1))#交叉熵
    cem=tf.reduce_mean(ce)#交叉熵
  9. 学习率计算

    1
    2
    3
    4
    5

    # 指数衰减学习率
    global_step=tf.Variable(0,trainable=False)
    learning_rate=tf.train.exponential_decay(LEARNINF_RATE_BASE,global_step,LEARNINF_RATE_STEP,LEARNINF_RATE_DECAY,staircase=True)

  10. 滑动平均

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    ema=tf.train.ExponentialMovingAverage(衰减率MOVING_AVERAGE_DECAY,当前轮数global_step)   

    ema_op=ema.apply([])
    ema_op=ema.apply(tf.trainable_variables()) # 每运行此据,所有待优化的参数求滑动平均

    with tf.control_dependencies([train_step,ema_op]):
    train_op=tf.no_op(name='train')

    ema.average(查看参数的滑动平均)

  11. 正则化

    1
    2
    3
    4
    5
    6
    loss(w)=tf.contrib.layers.l1_regularizer(REGULARIZER)(w) #w加和
    loss(w)=tf.contrib.layers.l2_regularizer(REGULARIZER)(w) # w平方加和

    tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))

    loss=cem+tf.add_n(tf.get_collection('losses'))
  12. 实例化可还原滑动平均值的saver

    1
    2
    3
    ema=tf.train.ExponentialMovingAverage(滑动平均数)
    ema_restore=ema.variable_to_restore()
    saver=tf.train.Saver(ema_restore)
  13. 准确率计算方法

    1
    2
    correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
    accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

tensorflow的学习笔记--基础总结

这几天终于把tensorflow的基础学完了,虽然还有点云里雾里,但毕竟入门是一件困难的事情。下面把这几次的写的代码整理出来。

generateds.py文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
import numpy as np
import matplotlib.pyplot as plt

seed=2

def generateds():
# 基于seed产生随机数
rdm=np.random.RandomState(seed)
# 随机数返回300行2列的矩阵,表示300组坐标点
X=rdm.randn(300,2)

Y_=[int(x0*x0+x1*x1<2) for (x0,x1) in X]

# 遍历Y中的每个元素,1赋值'red',其余的赋值blue
Y_c=[['red' if y else 'blue' for y in Y_]]

# 对数据集X和标签Y进行形状的整理,第一个元素为-1 ,表示跟随第二列计算,第二个元素表示多少列,可见X为两列,Y为1列
X=np.vstack(X).reshape(-1,2)
Y_=np.vstack(Y_).reshape(-1,1)

return X,Y_,Y_c

if __name__=="__main__":
X,Y_,Y_c=generateds()
print("X:\n",X) # 300*2
print("Y_:\n",Y_)# 300*1
print("Y_c:\n",Y_c) # 1*300

```

`forward.py文件`
``` python
# coding:utf-8

import tensorflow as tf

def get_weight(shape,regularizer):
w=tf.Variable(tf.random_normal(shape),dtype=tf.float32)
tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w))
return w

def get_bias(shape):
b=tf.Variable(tf.constant(0.01,shape=shape))
return b


def forward(x,regularizer):
w1=get_weight([2,11],0.01)
b1=get_bias([11])
y1=tf.nn.relu(tf.matmul(x,w1)+b1)

w2=get_weight([11,1],0.01)
b2=get_bias([1])
y=tf.matmul(y1,w2)+b2
return y

```

最终利用所有的方法拟合的代码:


``` python
# coding:utf-8

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import generateds
import forward

STEPS = 40000
BATCH_SIZE = 30
LEARNING_RATE_BASE = 0.001
LEARNING_RATE_DECAY = 0.999
REGULARIZER = 0.01


def backward():
x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))

X, Y_, Y_c = generateds.generateds()
y = forward.forward(x, REGULARIZER)

global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, 300 / BATCH_SIZE, LEARNING_RATE_DECAY,
staircase=True)

# 定义损失函数
loss_mse = tf.reduce_mean(tf.square(y - y_))


loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))
# 定义反向传播方法:包含正则化
train_step = tf.train.AdadeltaOptimizer(learning_rate).minimize(loss_total)

with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)

for i in range(STEPS):
start = (i * BATCH_SIZE) % 300
end = start + BATCH_SIZE
sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})

if i % 2000 == 0:
loss_v = sess.run(loss_total, feed_dict={x: X, y_: Y_})
print("After %d training steps,loss on all data is %f" % (i, loss_v))

xx, yy = np.mgrid[-3:3:0.01, -3:3:0.01]
grid = np.c_[xx.ravel(), yy.ravel()]
probs = sess.run(y, feed_dict={x: grid})
probs = probs.reshape(xx.shape)

plt.scatter(X[:, 0], X[:, 1], c=np.squeeze(Y_c))
plt.contour(xx, yy, probs, levels=[.5])
plt.show()


if __name__ == "__main__":
backward()


tensorflow的学习笔记--正则化

正则化缓解过拟合

有时候发现,模型在训练数据集的准确率非常高,但很难预测新的数据,这样我们称为,模型存在了过拟合的现象。 正则化是缓解过拟合一种有效的方法.

正则化在损失函数中引入模型复杂度指标,利用给W加权值,弱化训练数据的噪声(一般不正则化b) ,具体公式如下:

$$loss=loss(y与y’)+REGULARIZER*loss(w)$$

其中,模型中所有参数的损失函数如:交叉熵,均方误差 。
用超参数REGULARIZER给出参数w在总loss中的比例,及正则化的权重 w是正则化的参数.

使用tensorflow表示:

1
2
3
4
5
6
7
8

loss(w)=tf.contrib.layers.l1_regularizer(REGULARIZER)(w) #w加和
loss(w)=tf.contrib.layers.l2_regularizer(REGULARIZER)(w) # w平方加和

tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))

loss=cem+tf.add_n(tf.get_collection('losses'))

我们生成几个随机点,然后利用生成分界线,代码如下:

#coding:utf-8
# 0.导入模块,生成模拟数据集
import tensorflow as tf 
import numpy as np
import matplotlib.pyplot as plt

BATCH_SIZE=30 
seed=2

# 基于seed产生随机数   
rdm=np.random.RandomState(seed)
# 随机数返回300行2列的矩阵,表示300组坐标点(x0,x1)作为输入数据集
X=rdm.randn(300,2)

#从X这个300行2列的矩阵中取出一行,判断如果两个坐标系的平方和小于2,给Y赋值1 其余赋值0. 

# 作为输入数据集的标签,
Y_=[int (x0*x0+x1*x1<2) for(x0,x1) in X]
# 遍历Y中的每个元素,1 赋值'red' 其余赋值'blue' ,这样可视化显示是人可以直观区分
Y_c=[['red' if y else 'blue'] for y in Y_]
# 对数据集X和标签Y进行shape整理,第一个元素为-1表示,随第二个参数计算得到,第二个元素表示多少列,把X整理为n行2列,把Y整理为n行1列
X=np.vstack(X).reshape(-1,2)
Y_=np.vstack(Y_).reshape(-1,1)  
print(X)
print(Y_)
print(Y_c)
#用plt.sctter画出数据集X各行中第0列元素和第1列元素的点即个行的(x0,x1),用各行Y_c对应的值表示颜色(c是color的缩写)
plt.scatter(X[:,0],X[:,1],c=np.squeeze(Y_c))
plt.show()

x=tf.placeholder(tf.float32,shape=(None,2))
y_=tf.placeholder(tf.float32,shape=(None,1))



def get_weight(shape,regularizer):
    w=tf.Variable(tf.random_normal(shape),dtype=tf.float32)
    tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w))
    return w

def get_bias(shape):
    b=tf.Variable(tf.constant(0.01,shape=shape))
    return b 
 

w1=get_weight([2,11],0.01)
b1=get_bias([11])
y1=tf.nn.relu(tf.matmul(x,w1)+b1)

w2=get_weight([11,1],0.01)
b2=get_bias([1])
y=tf.matmul(y1,w2)+b2

loss_mse=tf.reduce_mean(tf.square(y-y_))
loss_total=loss_mse+tf.add_n(tf.get_collection('losses'))

train_step=tf.train.AdamOptimizer(0.0001).minimize(loss_mse)

with tf.Session() as sess: 
    init_op=tf.global_variables_initializer()
    sess.run(init_op)

    STEPS=40000
    for i in range(STEPS):
        start=(i*BATCH_SIZE)%32
        end=start+BATCH_SIZE
        sess.run(train_step,feed_dict={
            x:X[start:end],y_:Y_[start:end]
        })
        if i%2000==0:
            loss_mse_v=sess.run(loss_mse,feed_dict={x:X,y_:Y_})
            print("After %d training steps,loss on all data is %s"%(i,loss_mse_v))
        
        xx,yy=np.mgrid[-3:3:0.1,-3:3:.01]

        grid=np.c_[xx.ravel(),yy.ravel()]

        probs=sess.run(y,feed_dict={x:grid})
        probs=probs.reshape(xx.shape)

    print("w1:",sess.run(w1))
    print("b1:",sess.run(b1))
    print("w2:",sess.run(w2))
    print("b2:",sess.run(b2))

plt.scatter(X[:,0],X[:,1],c=np.squeeze(Y_c))
plt.contour(xx,yy,probs,levels=[.5])
plt.show()

生成点的结果如下:

tensorflow{:height=”600px” width=”600px”}

我们拟合的结果为:

tensorflow{:height=”600px” width=”600px”}

使用正则化后,我们拟合的结果为:

tensorflow{:height=”600px” width=”600px”}

tensorflow的学习笔记--滑动平均

滑动平均

滑动平均,又叫影子值,记录了每个参数一段时间内过往值的平均,增加了模型的泛化性。

针对所有参数:w和b。(像是给参数加了影子,参数变化,影子缓慢追随),具体的计算公式如下:

影子=衰减率*影子+(1-衰减率)* 参数

影子初值=参数初值

衰减率=$$min\left({cases}MOVING_AVERAGE_DECAY, \frac{1+step}{10+step}\right)$$

例如:

MOVING_AVERAGE_DECAY为0.99,参数w1为0,轮数global_step为0,w1的滑动平均值为0,参数w1更新为1.则:

*w1滑动平均值=min(0.99,1/10)*0+(1-min(0.99,1/10))1=0.9
轮数global_step为100是,参数w1更新为10,则:

*w1滑动平均值=min(0.99,101/110)*0.9+(1-min(0.99,101/110))10=0.826+0.818=1.644

再次运行:

*w1滑动平均值=min(0.99,101/110)*1.644+(1-min(0.99,101/110))10=2.328

再次运行:

w1平均值=2.956

使用tensorflow表示如下:

1
2
3
4
5
6
7
8
9
10
ema=tf.train.ExponentialMovingAverage(衰减率MOVING_AVERAGE_DECAY,当前轮数global_step)   

ema_op=ema.apply([])
ema_op=ema.apply(tf.trainable_variables()) # 每运行此据,所有待优化的参数求滑动平均

with tf.control_dependencies([train_step,ema_op]):
train_op=tf.no_op(name='train')

ema.average(查看参数的滑动平均)

我们使用代码来使用模拟上面的计算逻辑:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#coding:utf-8
# 设损失函数 loss=(w+1)^2 令 w初值是常数5,反向传播就是求最优w,即求最小的loss对应的w值
import tensorflow as tf


# 1. 定义变量及滑动平均类
# 定义一个32位浮点变量,初始值为0.0 这个代码就是不断更新w1参数,优化w1参数,滑动平均做了个w1的影子

w1=tf.Variable(0,dtype=tf.float32)

# 定义num_updates(NN的迭代轮数),初始值为0,不可被优化。
global_step=tf.Variable(0,trainable=False)

# 实例化滑动平均类,给删减率为0.99,当前轮数global_step
MOVING_VERAGE_DECAY=0.99
ema=tf.train.ExponentialMovingAverage(MOVING_VERAGE_DECAY,global_step)

#ema.applu后的括号里是更新列表,每次运行sess.run(ema_op)时,对更新列表中的元素求滑动平均值

#在实际应用中会使用tf.trainable_variables()自动将所有的待训练的参数汇总为列表

# ema_op=ema.apply([w1])

ema_op=ema.apply(tf.trainable_variables())

with tf.Session() as sess:
# 初始化
init_op =tf.global_variables_initializer()
sess.run(init_op)
# 用ema.average(w1)获取w1滑动平均值,(要运行多个节点,作为列表中的元素列出,写在sess)
#打印出当前的参数w1和w2
print(sess.run([w1,ema.average(w1)]))

# 参数w1的值赋值为1
sess.run(tf.assign(w1,1))
sess.run(ema_op)
print(sess.run([w1,ema.average(w1)]))

# 更新的step和w1 的值,模拟出100轮迭代后,参数w1变为10
sess.run(tf.assign(global_step,100))
sess.run(tf.assign(w1,10))
sess.run(ema_op)
print(sess.run([w1,ema.average(w1)]))

for x in range(40):
# 每次sess.run会更新一次w1的滑动平均值
sess.run(ema_op)
print(sess.run([w1,ema.average(w1)]))


打印结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[0.0, 0.0]
[1.0, 0.9]
[10.0, 1.6445453]
[10.0, 2.3281732]
[10.0, 2.955868]
[10.0, 3.532206]
[10.0, 4.061389]
[10.0, 4.547275]
[10.0, 4.9934072]
[10.0, 5.4030375]
[10.0, 5.7791524]
[10.0, 6.1244946]
[10.0, 6.4415812]
[10.0, 6.7327247]
[10.0, 7.000047]
[10.0, 7.2454977]
[10.0, 7.470866]
[10.0, 7.6777954]
[10.0, 7.867794]
[10.0, 8.042247]
[10.0, 8.202427]
[10.0, 8.349501]
[10.0, 8.484542]
[10.0, 8.608534]
[10.0, 8.722381]
[10.0, 8.826913]
[10.0, 8.922893]
[10.0, 9.01102]
[10.0, 9.091936]
[10.0, 9.166232]
[10.0, 9.234449]
[10.0, 9.297086]
[10.0, 9.354597]
[10.0, 9.407403]
[10.0, 9.455888]
[10.0, 9.500406]
[10.0, 9.541282]
[10.0, 9.578814]
[10.0, 9.613275]
[10.0, 9.644916]
[10.0, 9.673968]
[10.0, 9.700644]
[10.0, 9.725137]

可以看到平均值一直趋近于w1。

tensorflow的学习笔记--学习率

学习率

学习率(learning_rate)是每次参数更新的幅度。

$$
W_{n+1}=W_n-learning_rate▽
$$

其中$$W_{n+1}$$时更新后的参数,$$W_n$$当前参数,▽是损失函数的梯度(导数)

例如:

损失函数$$loss=(w+1)^2$$ 梯度 $$▽=\frac{\partial loss}{\partial w}=2w+2$$

例如:参数w初始化为5,学习率为0.2,则:

1次 参数w:5 5-0.2*(2*5+2)=2.6
2次 参数w:2.6 2.6-0.2*(2*2.6+2)=1.16
3次 参数w:1.16 1.16-0.2*(2*1.16+2)=0.296
4次 参数w:0.296
…..

函数图像为:

tensorflow{:height=”600px” width=”600px”}

根据图像我们可以看出在x=-1时,有最小值。

我们利用tensorflow看看能不能找到x=-1这个极值点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#coding:utf-8
# 设损失函数 loss=(w+1)^2 令 w初值是常数5,反向传播就是最优w,即求最小的loss对应的w值
import tensorflow as tf

#定义待优化参数w初值为5
w=tf.Variable(tf.constant(5,dtype=tf.float32))

# 定义损失函数loss
loss=tf.square(w+1)

train_step=tf.train.GradientDescentOptimizer(0.2).minimize(loss)

with tf.Session() as sess:
init_op=tf.global_variables_initializer()
sess.run(init_op)

for i in range(40):
sess.run(train_step)
w_val=sess.run(w)
loss_val=sess.run(loss)
print("After %s steps: w is %f,loss is %f" %(i,w_val,loss_val))

代码运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
After 0 steps: w is 2.600000,loss is 12.959999
After 1 steps: w is 1.160000,loss is 4.665599
After 2 steps: w is 0.296000,loss is 1.679616
After 3 steps: w is -0.222400,loss is 0.604662
After 4 steps: w is -0.533440,loss is 0.217678
After 5 steps: w is -0.720064,loss is 0.078364
After 6 steps: w is -0.832038,loss is 0.028211
After 7 steps: w is -0.899223,loss is 0.010156
After 8 steps: w is -0.939534,loss is 0.003656
After 9 steps: w is -0.963720,loss is 0.001316
After 10 steps: w is -0.978232,loss is 0.000474
After 11 steps: w is -0.986939,loss is 0.000171
After 12 steps: w is -0.992164,loss is 0.000061
After 13 steps: w is -0.995298,loss is 0.000022
After 14 steps: w is -0.997179,loss is 0.000008
After 15 steps: w is -0.998307,loss is 0.000003
After 16 steps: w is -0.998984,loss is 0.000001
After 17 steps: w is -0.999391,loss is 0.000000
After 18 steps: w is -0.999634,loss is 0.000000
After 19 steps: w is -0.999781,loss is 0.000000
After 20 steps: w is -0.999868,loss is 0.000000
After 21 steps: w is -0.999921,loss is 0.000000
After 22 steps: w is -0.999953,loss is 0.000000
After 23 steps: w is -0.999972,loss is 0.000000
After 24 steps: w is -0.999983,loss is 0.000000
After 25 steps: w is -0.999990,loss is 0.000000
After 26 steps: w is -0.999994,loss is 0.000000
After 27 steps: w is -0.999996,loss is 0.000000
After 28 steps: w is -0.999998,loss is 0.000000
After 29 steps: w is -0.999999,loss is 0.000000
After 30 steps: w is -0.999999,loss is 0.000000
After 31 steps: w is -1.000000,loss is 0.000000
After 32 steps: w is -1.000000,loss is 0.000000
After 33 steps: w is -1.000000,loss is 0.000000
After 34 steps: w is -1.000000,loss is 0.000000
After 35 steps: w is -1.000000,loss is 0.000000
After 36 steps: w is -1.000000,loss is 0.000000
After 37 steps: w is -1.000000,loss is 0.000000
After 38 steps: w is -1.000000,loss is 0.000000
After 39 steps: w is -1.000000,loss is 0.000000

可以看到,随着loss减小,loss无限趋近与-1 。

学习率的设置

在上面的代码中,如果把学习率改成1时,发现结果一直在震荡:

1
2
3
4
5
6
7
After 1 steps: w is 5.000000,loss is 36.000000
After 2 steps: w is -7.000000,loss is 36.000000
....
After 37 steps: w is 5.000000,loss is 36.000000
After 38 steps: w is -7.000000,loss is 36.000000
After 39 steps: w is 5.000000,loss is 36.000000

如果把学习率改成0.001 ,发现w变化非常慢。

从上面的例子中可以看到,学习率大了震荡不收敛,学习率小了收敛速度慢。
学习率应该怎么正确设置?

相对于固定的学习率,提出了指数衰减学习率,tensorflow的表示如下:

$$learning_rate=LEARNINF_RATE_BASE * LEARNINF_RATE_DECY*\frac{global_step}{LEARNINF_RATE_STEP}$$

其中LEARNINF_RATE_BASE是学习初始值,LEARNINF_RATE_DECY是学习衰减率,0~1。 global_step代表运行了几轮,LEARNINF_RATE_STEP多少轮更新一次学习率(总样本数/BATCH_SIZE)具体的代码表示为:

1
2
global_step=tf.Variable(0,trainable=False) 
learning_rate=tf.train.exponential_decay(LEARNINF_RATE_BASE,global_step,LEARNINF_RATE_STEP,LEARNINF_RATE_DECAY,staircase=True)

最终我们使用指数衰减的学习率训练的模型代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#coding:utf-8
# 设损失函数 loss=(w+1)^2 令 w初值是常数5,反向传播就是求最优w,即求最小的loss对应的w值
import tensorflow as tf

LEARNINF_RATE_BASE=0.1 # 初始学习率
LEARNINF_RATE_DECAY=0.99 # 学习率衰减率
LEARNINF_RATE_STEP=1 # 喂入多少轮BATCH_SIZE后,更新一次学习率,一般设为:总样本数/BATCH_SIZE

# 运行几轮BATCH_SIZE的计数器,初始值给0,设为不被训练
global_step=tf.Variable(0,trainable=False)

#定义指数下降学习率
learning_rate=tf.train.exponential_decay(LEARNINF_RATE_BASE,global_step,LEARNINF_RATE_STEP,LEARNINF_RATE_DECAY,staircase=True)

#定义待优化参数w初值为5
w=tf.Variable(tf.constant(5,dtype=tf.float32))

# 定义损失函数loss
loss=tf.square(w+1)

train_step=tf.train.GradientDescentOptimizer(learning_rate
).minimize(loss,global_step=global_step)

with tf.Session() as sess:
init_op=tf.global_variables_initializer()
sess.run(init_op)

for i in range(40):
sess.run(train_step)
learning_rate_val=sess.run(learning_rate)
global_step_val=sess.run(global_step)
w_val=sess.run(w)
loss_val=sess.run(loss)
print("After %s ,global_step is %s,w is %f,learning_rate is %f,loss is %f" %(i,global_step_val,w_val,learning_rate_val,loss_val))

输出结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
After 0 ,global_step is 1,w is 3.800000,learning_rate is 0.099000,loss is 23.040001
After 1 ,global_step is 2,w is 2.849600,learning_rate is 0.098010,loss is 14.819419
After 2 ,global_step is 3,w is 2.095001,learning_rate is 0.097030,loss is 9.579033
After 3 ,global_step is 4,w is 1.494386,learning_rate is 0.096060,loss is 6.221961
After 4 ,global_step is 5,w is 1.015167,learning_rate is 0.095099,loss is 4.060896
After 5 ,global_step is 6,w is 0.631886,learning_rate is 0.094148,loss is 2.663051
After 6 ,global_step is 7,w is 0.324608,learning_rate is 0.093207,loss is 1.754587
After 7 ,global_step is 8,w is 0.077684,learning_rate is 0.092274,loss is 1.161403
After 8 ,global_step is 9,w is -0.121202,learning_rate is 0.091352,loss is 0.772287
After 9 ,global_step is 10,w is -0.281761,learning_rate is 0.090438,loss is 0.515867
After 10 ,global_step is 11,w is -0.411674,learning_rate is 0.089534,loss is 0.346128
After 11 ,global_step is 12,w is -0.517024,learning_rate is 0.088638,loss is 0.233266
After 12 ,global_step is 13,w is -0.602644,learning_rate is 0.087752,loss is 0.157891
After 13 ,global_step is 14,w is -0.672382,learning_rate is 0.086875,loss is 0.107334
After 14 ,global_step is 15,w is -0.729305,learning_rate is 0.086006,loss is 0.073276
After 15 ,global_step is 16,w is -0.775868,learning_rate is 0.085146,loss is 0.050235
After 16 ,global_step is 17,w is -0.814036,learning_rate is 0.084294,loss is 0.034583
After 17 ,global_step is 18,w is -0.845387,learning_rate is 0.083451,loss is 0.023905
After 18 ,global_step is 19,w is -0.871193,learning_rate is 0.082617,loss is 0.016591
After 19 ,global_step is 20,w is -0.892476,learning_rate is 0.081791,loss is 0.011561
After 20 ,global_step is 21,w is -0.910065,learning_rate is 0.080973,loss is 0.008088
After 21 ,global_step is 22,w is -0.924629,learning_rate is 0.080163,loss is 0.005681
After 22 ,global_step is 23,w is -0.936713,learning_rate is 0.079361,loss is 0.004005
After 23 ,global_step is 24,w is -0.946758,learning_rate is 0.078568,loss is 0.002835
After 24 ,global_step is 25,w is -0.955125,learning_rate is 0.077782,loss is 0.002014
After 25 ,global_step is 26,w is -0.962106,learning_rate is 0.077004,loss is 0.001436
After 26 ,global_step is 27,w is -0.967942,learning_rate is 0.076234,loss is 0.001028
After 27 ,global_step is 28,w is -0.972830,learning_rate is 0.075472,loss is 0.000738
After 28 ,global_step is 29,w is -0.976931,learning_rate is 0.074717,loss is 0.000532
After 29 ,global_step is 30,w is -0.980378,learning_rate is 0.073970,loss is 0.000385
After 30 ,global_step is 31,w is -0.983281,learning_rate is 0.073230,loss is 0.000280
After 31 ,global_step is 32,w is -0.985730,learning_rate is 0.072498,loss is 0.000204
After 32 ,global_step is 33,w is -0.987799,learning_rate is 0.071773,loss is 0.000149
After 33 ,global_step is 34,w is -0.989550,learning_rate is 0.071055,loss is 0.000109
After 34 ,global_step is 35,w is -0.991035,learning_rate is 0.070345,loss is 0.000080
After 35 ,global_step is 36,w is -0.992297,learning_rate is 0.069641,loss is 0.000059
After 36 ,global_step is 37,w is -0.993369,learning_rate is 0.068945,loss is 0.000044
After 37 ,global_step is 38,w is -0.994284,learning_rate is 0.068255,loss is 0.000033
After 38 ,global_step is 39,w is -0.995064,learning_rate is 0.067573,loss is 0.000024
After 39 ,global_step is 40,w is -0.995731,learning_rate is 0.066897,loss is 0.000018

可以从上面的结果中看出w和learning_rate的变化。

tensorflow的学习笔记--损失函数

损失函数

在前面几个博客中说了一个学习模型,具体表现如下:

tensorflow

具体的计算公式:$$Y=\sum_{i}^nX_iW_i$$

曾经有人提出另一个神经元模型,多了激活函数和偏执项。

tensorflow

具体的计算公式:
$$
Y=f(\sum_{i}^nX_iW_i+b)
$$
其中f激活函数,b是偏执项。

损失函数(loss):预测值y’和已知答案y的差距

我们的优化目标就是把loss降低为最小。

激活函数

引入激活函数有效的避免仅使用$$XW$$的线性组合,是模型更准确,更具有表达能力。

常用的激活函数有:

  1. relu(tf.nn.relu):
    $$
    f(x)=max(x,0)= \begin{cases}
    0, & \text{$x<=0$} \
    x, & \text{$x>0$}
    \end{cases}
    $$

    tensorflow表示为:tf.nn.relu ,函数图像为:

tensorflow

  1. sigmoid(tf.nn.sigmoid):
    $$
    f(x)=1/(1+e^x)
    $$
    tensorflow表示为:tf.nn.sigmoid,函数图像为:

tensorflow

  1. tanh:
    $$
    f(x)=(1-e^{-2x})/(1+e^{-2x})
    $$

    tensorflow表示为:tf.nn.tanh,函数图像为:

tensorflow

举个例子

预测酸奶的日销量,x1、x2是影响日销量的因素,建模前应预先猜测的数据有:每日x1、x2和销量y_(即已知答案,最佳情况,产量=销量)拟造数据集X,Y;y_=x1+x2,噪声是-0.05~+0.05 拟合可以预测销量的函数。

根据上面的模型,我们生成随机数,进行训练,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#coding:utf-8
import tensorflow as tf
import numpy as np
BATCH_SIZE=8
seed=23455


rdm=np.random.RandomState(seed)
X=rdm.rand(32,2)
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for(x1,x2) in X]


#定义神经网络输入,参数和输出,定义前向传播

x=tf.placeholder(tf.float32,shape=(None,2))
y_=tf.placeholder(tf.float32,shape=(None,1))
w1=tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
y=tf.matmul(x,w1)


#2 定义损失函数及反向传播方法
# 定义损失函数数MSE,方向传播方法为梯度下降

loss_mse=tf.reduce_mean(tf.square(y_-y))
train_step=tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)

#生成会话,训练step

with tf.Session() as sess:
init_op=tf.global_variables_initializer()
sess.run(init_op)
STEPS=20000
for i in range(STEPS):
start=(i*BATCH_SIZE)%32
end =(i*BATCH_SIZE)%32+BATCH_SIZE
sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]})

if i%500==0:
print ("After %d training steps,wl isL" %(i))
print (sess.run(w1))
print("final wl is:\n",sess.run(w1))

打印结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
After 17500 training steps,wl isL
[[0.96476096]
[1.0295546 ]]
After 18000 training steps,wl isL
[[0.9684917]
[1.0262802]]
After 18500 training steps,wl isL
[[0.9718707]
[1.0233142]]
After 19000 training steps,wl isL
[[0.974931 ]
[1.0206276]]
After 19500 training steps,wl isL
[[0.9777026]
[1.0181949]]
final wl is:
[[0.98019385]
[1.0159807 ]]

看到两个权重的值,都趋于1. 与数据结果y=x1+x2 结果一致。

自定义损失函数

在预测商品销量,预测多了,损失成本,预测少了损失利润,若利润不等于成本,则mse产生的loss无法利益最大化。

自定义损失函数 $$\sum_{i}^nf(y’,y)$$

$$
f(y’,y) =
\begin{cases}
PROFIT*(y’-y), & \text{$y<y’$ 预测y少了,损失利润(PROFIT)} \
COST*(y-y’), & \text{$y>=y’$ 预测y多了,损失成本(COST)}
\end{cases}
$$

使用下面的函数进行修正: loss=tf.reduce_sum(tf.where(tf.greater(y,y_),COST(y-y_),PROFIT(y_-y)))

如上面的例子,酸奶的成本(COST)1元,酸奶的利润(PROFIT)9元。
预测少了损失利润9元,预测多了损失成本预测。

预测少了损失大,希望生成的预测函数。往多了预测。

我们把损失函数进行替换,换成我们的自定义的函数,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#coding:utf-8
#预测少了损失利润9元,预测多了损失成本预测。
import tensorflow as tf
import numpy as np
BATCH_SIZE=8
seed=23455
COST=1
PROFIT=9


rdm=np.random.RandomState(seed)
X=rdm.rand(32,2)
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for(x1,x2) in X]


#定义神经网络输入,参数和输出,定义前向传播

x=tf.placeholder(tf.float32,shape=(None,2))
y_=tf.placeholder(tf.float32,shape=(None,1))
w1=tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
y=tf.matmul(x,w1)


#2 定义损失函数及反向传播方法
# 定义损失函数数MSE,方向传播方法为梯度下降

# loss_mse=tf.reduce_mean(tf.square(y_-y))
loss_mse=tf.reduce_sum(tf.where(tf.greater(y,y_),COST*(y-y_),PROFIT*(y_-y)))
train_step=tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)

#生成会话,训练step

with tf.Session() as sess:
init_op=tf.global_variables_initializer()
sess.run(init_op)
STEPS=20000
for i in range(STEPS):
start=(i*BATCH_SIZE)%32
end =(i*BATCH_SIZE)%32+BATCH_SIZE
sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]})

if i%500==0:
print ("After %d training steps,wl isL" %(i))
print (sess.run(w1))
print("final wl is:\n",sess.run(w1))


1
2
3
4
5
6
7
8
9
10
11
12
After 18500 training steps,wl isL
[[1.0232253]
[1.0445153]]
After 19000 training steps,wl isL
[[1.0171654]
[1.038825 ]]
After 19500 training steps,wl isL
[[1.0208615]
[1.0454264]]
final wl is:
[[1.020171 ]
[1.0425103]]

可以看到预测尽量往多的方向去预测。

交叉熵

交叉熵表示两个概率分布之间的距离。

$$
H(y’,y)=-\sum y’*logy
$$

例如,已知答案y’=(1,0),预测**y1=(0.6,0.4) y2=(0.8,0.2)**哪个更接近标准答案?

$$
H_1((1,0),(0.6,0.4))=-(1log0.6+0log0.4)\approx0.222
$$

$$
H_2((1,0),(0.8,0.2))=-(1log0.8+0log0.2)\approx0.097
$$

所以y2预测更为准确。

我们可以使用交叉熵的形式,来更精确的训练我们的模型。

ce=-tf.reduce_mean(y'*tf.log(tf.clip_by_value(y,1e^-12,1.0)))

其中y<1e^-12是,y=1e^-12,防止log0的出现

当n分类的n个输出(y1,y2,…yn)通过softmax()函数,便满足了概率的分布要求:

$$
P(X=x)\rightarrow[0,1] 且 \sum P(X=x)=1
$$

$$
softmax(y_i)=\frac{e^{y_i}}{\sum_{j=1}^ne^{y_i}}
$$
可以用ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logtis=y,labels=tf.argmax(y_,1))

cem=tf.reduce_mean(ce) 替换交叉熵的函数,代表的是当前的预测值与标准答案的差距。

Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×