BigFish's Blog

2019-05-02发表2025-04-16更新付威 5 分钟读完 (大约817个字)

在神经网络中预测识别和预测的过程中，其实都是一个函数的对应关系：$$y=f(x)$$ 的函数关系，为了找到这个函数的关系，我们需要做大量的训练，具体的过程可以总结下面几个步骤：

获得随机矩阵w1和b1,经过一个激活函数，得到隐藏层。
获得随机矩阵w2和b2，计算出对应的预测y
利用y与样本y_的差距，有平方差和交叉熵等方式，来定义损失函数
利用指数衰减学习率，计算出学习率
用滑动平均计算输出的参数的平均值
利用梯度下降的方法，减少损失函数的差距
用with结构初始化所有的参数
利用for循环喂入数据，反复训练模型

利用这个八个步骤拟合出的一个圆的函数，代码和结果如下：

# coding:utf-8
import os
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

xLine = 300
aLine = 10
BATCHSIZE = 10
regularizer=0.001
BASE_LEARN_RATE=0.001
LEARNING_RATE_DECAY = 0.99
MOVING_VERAGE_DECAY=0.99

# 1 生成随机矩阵x
rdm = np.random.RandomState(2)
X = rdm.randn(xLine, 2)
print("1.生成X：", X)

# 2 生成结果集Y
Y = [int(x1 * x1 + x2 * x2 < 2) for (x1, x2) in X]
print("2.生成Y：", Y)
Y_c = [['red' if y else 'blue'] for y in Y]

print("3. 生成Y_c：", Y_c)
X = np.vstack(X).reshape(-1, 2)
Y = np.vstack(Y).reshape(-1, 1)
print("4. 生成X：", X)
print("5. 生成Y：", Y)
# plt.scatter(X[:, 0], X[:, 1], c=np.squeeze(Y_c))
# plt.show()

# 3. 这里如何保存数据集为样本？

# 4. 训练样本占位
x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))
global_step=tf.Variable(0,trainable=False)

# 5. 获得随机矩阵w1和b1，计算隐藏层a
w1 = tf.Variable(tf.random_normal(shape=([2, aLine]))) # 2*100
tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w1))
b1 =tf.Variable(tf.constant(0.01, shape=[aLine]))  # 3000*100

# 6.使用激活函数计算隐藏层a
a = tf.nn.relu(tf.matmul(x, w1) + b1) # 3000*2 x 2*100 +3000*100

# 7. 获得随机矩阵w2和b2 计算预测值y。
w2 = tf.Variable(tf.random_normal(shape=([aLine, 1])))
tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w2))
b2 = tf.Variable(tf.random_normal(shape=([1])))
y = tf.matmul(a, w2) + b2

# 8. 损失函数
loss = tf.reduce_mean(tf.square(y - y_))

 
# loss = tf.reduce_mean(tf.square(y - y_))
# 滑动学习率
learning_rate=tf.train.exponential_decay(BASE_LEARN_RATE,
                                         global_step,
                                         BATCHSIZE,
                                         LEARNING_RATE_DECAY,
                                         staircase=True)
# 9. 梯度下降方法训练
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

# 滑动平均
# ema=tf.train.ExponentialMovingAverage(MOVING_VERAGE_DECAY,global_step)
# ema_op=ema.apply(tf.trainable_variables())
# with tf.control_dependencies([train_step,ema_op]):
#     train_op=tf.no_op(name="train")

saver=tf.train.Saver()

# 10. 开始训练
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)

    STEPS = 40000
    for i in range(STEPS):
        start = (i * BATCHSIZE) % xLine
        end = start + BATCHSIZE

        # print(Y[start:end])
        sess.run(train_step, feed_dict={
            x: X[start:end],
            y_: Y[start:end]
        })

        if i % 2000 == 0:
            # learning_rate_val=sess.run(learning_rate)
            # print("learning_rate_val:",learning_rate_val)
            loss_mse_v = sess.run(loss, feed_dict={x: X, y_: Y})
            print("After %d training steps,loss on all data is %s" % (i, loss_mse_v))
            saver.save(sess,os.path.join("./model/","model.ckpt"))
            # print(global_step)

        xx, yy = np.mgrid[-3:3:0.1, -3:3:.01]

        grid = np.c_[xx.ravel(), yy.ravel()]

        probs = sess.run(y, feed_dict={x: grid})
        probs = probs.reshape(xx.shape)

    print("w1:", sess.run(w1))
    print("b1:", sess.run(b1))
    print("w2:", sess.run(w2))
    print("b2:", sess.run(b2))

plt.scatter(X[:, 0], X[:, 1], c=np.squeeze(Y_c))
plt.contour(xx, yy, probs, levels=[.5])
plt.show()

一以贯之搭建神经网络的过程

2019-04-24发表2025-04-16更新付威 14 分钟读完 (大约2081个字)

使用mnist的数据集实现对手写数字识别

感慨下，学了这么久终于有能有点实战的东西了，这篇文章本想写于2019-04-14,可是担心会对学习的进度产生影响，就一直拖后。所以就再今天（2019-04-24）开始去写这篇实战的文章。

目标

写这篇博客的目的就是为了写一个识别手写程序的方案，首先我们准备了mnist(mnist数据集)的数据集，和一张手写的图片
tensorflow {:height=”600px” width=”600px”}

为了测试，增加了一些干扰线：

tensorflow {:height=”600px” width=”600px”}

我们知道，在mnist的数据集中，是对单个28*28像素图片进行处理，而且是黑底白字的数字图片，所以为了能够使用mnist的数据样本训练，我们也需要对手写的图片处理。

具体的处理方法我们可以分为：

1.二值化
2.去噪声
3.裁剪
4.缩放28*28的图像
5.训练样本
6.识别结果

二值化和去噪声

对于这个图片的二值化可以使用opencv相关处理模块，二值化是直接把图片编程只有黑白两种颜色的图：

tensorflow {:height=”600px” width=”600px”}

我们可以看到上面有几个使用均值模糊可以去除干扰线和噪点：

tensorflow {:height=”600px” width=”600px”}

对模糊后的图片再进行二值化，得到结果如下：

tensorflow {:height=”600px” width=”600px”}

对应的代码如下：

import cv2 as cv
import numpy as np
def threshold(image):
  gray = cv.cvtColor(image, cv.COLOR_RGB2GRAY)  # 把输入图像灰度化
  # 直接阈值化是对输入的单通道矩阵逐像素进行阈值分割。
  ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY | cv.THRESH_TRIANGLE)
  return binary;

img = cv.imread('./pic/01.png')
img = threshold(img)
cv.imwrite("02.png", img)
img = average_value(img)
cv.imwrite("03.png", img)
# 二值化
ret, img = cv.threshold(img, 0, 255, cv.THRESH_BINARY | cv.THRESH_TRIANGLE)
cv.imwrite("04.png", img)

对图片进行裁剪 ,分别按行和列去做累加，找到平均值不等于255的点，找出最小值和最大值，对应的图片裁剪的开始和结束。

def corp_img(image):
    h, w = image.shape[:2]
    xArr = []
    yArr = []
    imgArr = []
    resultImgArr = []
    for x in range(w):
        xSum = np.transpose(image)[x].sum()
        if xSum / h != 255:
            xArr.append(x)
        else:
            if len(xArr) > 0:
                cropped = image[0:h, min(xArr):max(xArr)]
                imgArr.append(cropped)
                xArr.clear()

    for img in imgArr:
        for y in range(h):
            yw = img.shape[-1]
            ySum = img[y].sum()
            if (ySum / yw) != 255:
                yArr.append(y)
            else:
                if (len(yArr) > 0):
                    cropped = img[min(yArr):max(yArr), 0:yw]
                    resultImgArr.append(cropped)
                    yArr.clear()
    return resultImgArr;

截取后保存后的图片结果为：

tensorflow

mnist的样本数据结果：

mnist数据

对比样本数据结果：

mnist数据

通过对比，我们可以发现截取后的图片没有边框，是白底黑字，分辨率也和mnist的样本数据不同，为了能够使用mnist的数据模型，需要对图片做进一步填充，缩放，反色：

# 反色函数
def revers_color(image):
  h, w = image.shape[0:2]
  for x in range(w):
      for y in range(h):
          image[y, x] = 255 - image[y, x]
  return image;

corpImgArr = corp_img(img)
for i in range(len(corpImgArr)):
    cv.imwrite(str(i) + "_real.png", corpImgArr[i])
    wrap = cv.copyMakeBorder(corpImgArr[i],15,15,15,15,cv.BORDER_CONSTANT,value=255)# 填充15个像素
    resImg = cv.resize(wrap, (28, 28), interpolation=cv.INTER_AREA)# 缩放为28*28的像素
    reversImg= revers_color(resImg)# 反色
    cv.imwrite(str(i) + ".png", revers_color(reversImg))

经过处理后的图片效果如下,基本与mnist的数据样本格式一致：

mnist数据

训练模型

原理： 要被识别和训练的模型是一个28*28的图片矩阵，共有784个元素。而最终我们需要得到的结果是一个1x10的数组，所以我们对原来的图片转换成1x784的一维数组X，根据前向传播的神经元模型：$$y=wx+b$$,可以知道W是784*10的矩阵，b是一个1*10的矩阵。我们的训练步骤如下：

随机获得784*10矩阵W和偏执项1*10的矩阵b
利用神经元的函数关系$$y=wx+b$$,并使用激活函数relu,计算出一个不准确的y
对于输入的正确结果y，定义为y_,使用交叉熵的方式计算损失函数
使用正则化方式防止模型过拟合
使用指数衰减的学习率，模型更好的变动
使用反向传播的训练方法，减少loss函数的损失
对所有的参数使用滑动平均，更准确的确定模型参数
读取mnist的数据，喂入神经网络，开始训练

神经网络的代码如下：

前向传播代码：minst_forward.py

# coding:utf-8

import tensorflow as tf
INPUT_NODE=784
OUTPUT_NODE=10
LAYER1_NODE=500
#定义输入输出参数和前向传播过程
def get_weight(shape,regularizer):
    w=tf.Variable(tf.random_normal(shape),dtype=tf.float32)
    if regularizer!=None:
        tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w))
    return w

def get_bias(shape):
    b=tf.Variable(tf.constant(0.01,shape=shape))
    return b 


def forward(x,regularizer):
    w1=get_weight([INPUT_NODE,LAYER1_NODE],regularizer)
    b1=get_bias([LAYER1_NODE])
    y1=tf.nn.relu(tf.matmul(x,w1)+b1)

    w2=get_weight([LAYER1_NODE,OUTPUT_NODE],regularizer)
    b2=get_bias([OUTPUT_NODE])
    y=tf.matmul(y1,w2)+b2
    return y

反向传播代码：minst_backward.py

# coding:utf-8

import tensorflow as tf
import minst_forward
import os

from tensorflow.examples.tutorials.mnist import input_data

BATCH_SIZE = 200
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
REGULARIZER = 0.0001
STEPS = 100000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH = "./model/"
MODEL_NAME = "minst_model"

def backword(minst):
    x = tf.placeholder(tf.float32, (None, minst_forward.INPUT_NODE))
    y_ = tf.placeholder(tf.float32, (None, minst_forward.OUTPUT_NODE))
    y = minst_forward.forward(x, REGULARIZER)
    global_step = tf.Variable(0, trainable=False)

    # 使用交叉熵的形式定义损失函数
    ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
    cem = tf.reduce_mean(ce)

    #正则化防止过拟合
    loss = cem + tf.add_n(tf.get_collection("losses"))


    # 使用指数衰减的学习率，实现更好的学习率
    learning_rate=tf.train.exponential_decay(LEARNING_RATE_BASE,
                                            global_step,
                                            minst.train.num_examples/BATCH_SIZE,
                                            LEARNING_RATE_DECAY,
                                            staircase=True)
    # 使用反向传播训练方法，以减小loss值为优化目标
    train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

   # 对所有的参数都使用滑动平均，更准确的定义模型。
    ema=tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
    emp_op=ema.apply(tf.trainable_variables())
    with tf.control_dependencies([train_step,emp_op]):
       train_op=tf.no_op(name="train")


    saver=tf.train.Saver()

    with tf.Session() as sess:
        init_op=tf.global_variables_initializer()
        sess.run(init_op)

        ckpt=tf.train.get_checkpoint_state(MODEL_SAVE_PATH)
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess,ckpt.model_checkpoint_path)

        for i in range(STEPS):
            xs,ys =minst.train.next_batch(BATCH_SIZE)
            _,loss_value,step=sess.run([train_op,loss,global_step],feed_dict={x:xs,y_:ys})#喂入神经网络数据

            if i%1000==0:
                print("After %d training step(s),loss on training batch is %g."%(step,loss_value))
                saver.save(sess,os.path.join(MODEL_SAVE_PATH,MODEL_NAME),global_step=global_step)

def main():
    minst=input_data.read_data_sets("./data",one_hot=True)# 读取mnist的数据
    backword(minst)

if __name__=="__main__":
    main()

运行backword.py的程序，使用mnist的数据，进行10w的训练，得到数据模型如下：

mnist数据

利用模型获得最大可能的预测值，代码如下：

import imageUtils
import tensorflow as tf
import minst_forward
import minst_backward
import numpy as np
import cv2 as cv


def restore_model(imgArr):
    with tf.Graph().as_default() as tg:
        x = tf.placeholder(tf.float32, [None, minst_forward.INPUT_NODE])
        y = minst_forward.forward(x, None)
        preValue = tf.argmax(y, 1)  # 得到概率最大的预测值

        # 实现滑动平均模型，参数的MOVING_AVERAGE_DECAY用于控制模型的速度
        variable_averages = tf.train.ExponentialMovingAverage(minst_backward.MOVING_AVERAGE_DECAY)
        variable_to_restore = variable_averages.variables_to_restore()
        saver = tf.train.Saver(variable_to_restore)

        with tf.Session() as sess:
            ckpt = tf.train.get_checkpoint_state(minst_backward.MODEL_SAVE_PATH)
            if ckpt and ckpt.model_checkpoint_path:
                saver.restore(sess, ckpt.model_checkpoint_path)
                preValue = sess.run(preValue, feed_dict={x: imgArr})
                return preValue
            else:
                print("No checkpoint file found")
                return -1


if __name__ == "__main__":
    imgArr = imageUtils.getReadyImage("./pic/01.png")
    result = []
    i = 0;
    for img in imgArr:
        try:
            im_arr = np.array(img)
            nm_arr = im_arr.reshape([1, 784])
            nm_arr = nm_arr.astype(np.float32)
            img_ready = np.multiply(nm_arr, 1.0 / 255.0)
            # testPicArr=pre_pic(testPic)
            preValue = restore_model(img_ready)
            result.append(str(preValue[0]))
        except Exception as  ee:
            print(ee)
    print("".join(result))

识别结果如下：

mnist数据

我们再输入一个正常的手写图片：

mnist数据

输出结果：

mnist数据

识别大部分数据正确，还有有识别错误的现象，因为训练的模型的差异和训练次数过少，所以会导致识别出错的现象。

2019-04-18发表2025-04-16更新付威 6 分钟读完 (大约873个字)

tensorflow的学习笔记--卷积网络

前面说的都是全连接NN的demo,(每个神经元余额前后的相邻的每一个神经元都有链接关系，输入特征，输出为预测结果)

在全连接NN中，一张分辨率为28*28的黑白图像，有784个数据，如果我们采用了500个样本整个的参数的个数为：

1 2	第一层： 784500+500 第二层： 50010+10

算下来接近40w个参数，参数过多，容易导致模型过拟合，如果换成高分辨率的彩色图像，问题会更为严重。为了解决这个问题一般会首先提取图像的特征，把提取后的特征再喂给全连接网络，再对参数进行优化。

卷积

卷积是一种提取图片特征的方法，一般用一个正方形卷积核，遍历图片上的每一个像素点。图片与卷积内相对应的权重，然后求和，再加上偏置后，最后输出一个图片的像素值（图片来自网络，侵权请联系删除）。

tensorflow {:height=”600px” width=”600px”}

全零填充Padding

正常的卷积会导致图片的缩小，为了能够让输入和输出的图片尺寸一致，对图片周围的空数据进行全0填充。（图片来自网络，侵权请联系删除）。

tensorflow {:height=”600px” width=”600px”}

tensorflow中的卷积处理

在tensorflow中，使用tf.nn.conv2d方法来实现卷积的算法. conv2d有四个参数，对图片的描述，卷积核的描述，卷积核的滑动步长和是否使用padding.

具体参数描述：

对输入图片的描述：用batch给出一次喂入多少张图片，每张图片的分辨率大小，比如5行5列，以及这些图片包含几个通道的信息，如果是灰度图则是单通道，参数是1 ，彩色图像是3.
对于卷积核的描述：要给出卷积的行分辨率和列分辨率，通道数以及用了几个卷积核。卷积核的通道数是由输入图片的通道数决定的，卷积核的通道数等于输入图片的通道数，所以卷积核的通道数也是1。
对卷积核滑动步长的描述：第一个参数和最后一个参数是固定的，第二个和第三个表示滑动步长。
是否使用padding：padding=’valid’ 代表使用padding

多通道图片卷积

在大多数情况下，输入的图片是RGB三个颜色组成的彩色图。输入的图片包含红绿蓝三层数据，卷积核的深度应该等于输入图片的通道数.

多层卷积的计算方法和单层卷积核相似，卷积核为了匹配红绿蓝的三个颜色，把三层的卷积核套在三层的彩色图片上，重合的像素进行累加，再加上偏置项b，最终得到输出的值。

tensorflow {:height=”600px” width=”600px”}

池化

池化是对图片的像素进行优化和简化，具体过程如下图：

tensorflow {:height=”600px” width=”600px”}

池化包括对最大值池化和平均值池化。

2019-04-15发表2025-04-16更新付威 5 分钟读完 (大约692个字)

minst的数据集介绍

MINST数据库是由是一个手写数字的数据集，官方网址：http://yann.lecun.com/exdb/mnist/。

 > MNIST 数据集来自美国国家标准与技术研究所, National Institute of Standards and Technology (NIST). 训练集 (training set) 由来自 250 个不同人手写的数字构成, 其中 50% 是高中学生, 50% 来自人口普查局 (the Census Bureau) 的工作人员. 测试集(test set) 也是同样比例的手写数字数据.

MINST数据总共有4个包,解压出来的数据如下：

tensorflow {:height=”600px” width=”600px”}

train-images-idx3-ubyte.gz,train-labels-idx1-ubyte.gz:提供了60000张，28*28像素的黑底白字图片用来训练
t10k-images-idx3-ubyte.gz,t10k-labels-idx1-ubyte.gz:提供了10000张，28*28像素的黑底白字图片用来测试

mnist提供的图片有784（28*28）个像素点，把每个像素点的值，组织成一个一维的数组，作为输入参数。形式如下：

 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.386 0.379 0....... 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
  ```      

  图片的标签以一维数组给出，每个元素代表出现的概率，形式如下：   

  `[0. 0. 0. 0. 0. 0. 1. 0. 0. 0. ]` 代表数字6   


`tensorflow`官方支持数据集的读取，使用代码如下：   

 ``` python
   # coding:utf-8
   from tensorflow.examples.tutorials.mnist import  input_data

   minst=input_data.read_data_sets('./data/',one_hot=True)
   #打印数据数量
   print( "train data size:",minst.train.num_examples)
   print( "validation data size:",minst.validation.num_examples)
   print( "test data size:",minst.test.num_examples)

   #打印数据
   print(minst.train.labels[0])
   print(minst.train.images[0])

   #打印前200行数据
   BATCH_SIZE=200
   xs,ys=minst.train.next_batch(BATCH_SIZE)
   print("xs shape:",xs.shape)
   print("ys shape:",ys.shape)

 ```  

 输出到控制台结果：    

 ``` cte
 train data size: 55000
 validation data size: 5000
 test data size: 10000
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [ 0. 0.....0.      0.3529412 0.5411765  0.9215687  0.9215687  0.9215687  0.9215687  0.9215687 0.9215687  0.9843138  0.9843138  0.9725491  0.9960785  0.9607844 0.9215687  0.74509805 0.08235294 0. 0. 0.....]
 xs: [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
 xs shape: (200, 784)
 ys: [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 1. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 1. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]]
 ys shape: (200, 10)

既然是图片数据，我们就能把他们都还原过去，代码如下：

#coding:utf-8
import numpy as np
import struct

from PIL import Image
import os

data_file = 'train-images-idx3-ubyte' #需要修改的路径

# It's 47040016B, but we should set to 47040000B
data_file_size = 47040016
data_file_size = str(data_file_size - 16) + 'B'

data_buf = open(data_file, 'rb').read()

magic, numImages, numRows, numColumns = struct.unpack_from(
    '>IIII', data_buf, 0)
datas = struct.unpack_from(
    '>' + data_file_size, data_buf, struct.calcsize('>IIII'))
datas = np.array(datas).astype(np.uint8).reshape(
    numImages, 1, numRows, numColumns)

label_file = 'train-labels-idx1-ubyte' #需要修改的路径

# It's 60008B, but we should set to 60000B
label_file_size = 60008
label_file_size = str(label_file_size - 8) + 'B'

label_buf = open(label_file, 'rb').read()

magic, numLabels = struct.unpack_from('>II', label_buf, 0)
labels = struct.unpack_from(
    '>' + label_file_size, label_buf, struct.calcsize('>II'))
labels = np.array(labels).astype(np.int64)

datas_root = './pic/' #需要修改的路径
if not os.path.exists(datas_root):
    os.mkdir(datas_root)

for i in range(10):
    file_name = datas_root + os.sep + str(i)
    if not os.path.exists(file_name):
        os.mkdir(file_name)

for ii in range(numLabels):
    img = Image.fromarray(datas[ii, 0, 0:28, 0:28])
    label = labels[ii]
    #不足5位补0版本
    file_name = datas_root + os.sep + str(label) + os.sep + \
                str(ii).zfill(5) + '.png'
    img.save(file_name)

tensorflow

2019-04-14发表2025-04-16更新付威 4 分钟读完 (大约571个字)

tensorflow的学习笔记--常用方法和代码片段

今天整理下学习的tensorflow的几个常用的方法和代码片段.

tf.get_collection("") 从集合中取出全部变量，生成一个列表
tf.add_n([]) 列表内对应元素相加
tf.cast(x,dtype) 把x转为dtype类型
tf.argmax(x,axis) 返回最大值所在索引号，如 tf.argmax([0,1,0]) 返回2
with tf.Graph().as_default() as g: 其内定义的节点在计算图g中

保存模型：

        saver=tf.train.Saver()# 实例化saver对象
        with tf.Session() as sess: # 在with结构的for循环一定轮数是，保存模型到当前会话
            for i in range(STEPS):
                if i%轮数 ==0：
                    saver.save(sess,os.path.join(""),global_step=global_step)

    ```   

7. 加载模型     

    ``` python    

    with tf.Session() as sess:
        ckpt=tf.train.get_checkpoint_path
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess,ckpt.model_checkpoint_path)

反向传播均方误差和减少loss

loss=tf.reduce_mean(tf.square(y_-y))
# 三个减少loss的训练方法
train_step=tf.train.GradientDescentOptimizer(learnig_rate).minimize(loss) 
train_step=tf.train.MomentumOptimizer(learnig_rate,momentum).minimize(loss) 
train_step=tf.train.AdamOptimizer(learnig_rate).minimize(loss)

损失函数

# tf.nn.relu   #未找到使用地方
# tf.nn.sigmoid   #未找到使用地方
# tf.nn.tanh   #未找到使用地方
loss=tf.reduce_sum(tf.where(tf.greater(y,y_),COST(y-y_),PROFIT(y_-y))) #自定义损失函数

ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logtis=y,labels=tf.argmax(y_,1))#交叉熵
cem=tf.reduce_mean(ce)#交叉熵

学习率计算


    # 指数衰减学习率
    global_step=tf.Variable(0,trainable=False) 
learning_rate=tf.train.exponential_decay(LEARNINF_RATE_BASE,global_step,LEARNINF_RATE_STEP,LEARNINF_RATE_DECAY,staircase=True)

滑动平均

ema=tf.train.ExponentialMovingAverage(衰减率MOVING_AVERAGE_DECAY,当前轮数global_step)   

ema_op=ema.apply([])
ema_op=ema.apply(tf.trainable_variables()) # 每运行此据，所有待优化的参数求滑动平均    

with tf.control_dependencies([train_step,ema_op]):
    train_op=tf.no_op(name='train') 

    ema.average(查看参数的滑动平均)

正则化

loss(w)=tf.contrib.layers.l1_regularizer(REGULARIZER)(w) #w加和
loss(w)=tf.contrib.layers.l2_regularizer(REGULARIZER)(w) # w平方加和   

tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))    

loss=cem+tf.add_n(tf.get_collection('losses'))

实例化可还原滑动平均值的saver

1
2
3

ema=tf.train.ExponentialMovingAverage(滑动平均数)
ema_restore=ema.variable_to_restore()
saver=tf.train.Saver(ema_restore)

准确率计算方法

1 2	correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(y_,1)) accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

2019-04-08发表2025-04-16更新付威 4 分钟读完 (大约585个字)

tensorflow的学习笔记--基础总结

这几天终于把tensorflow的基础学完了，虽然还有点云里雾里，但毕竟入门是一件困难的事情。下面把这几次的写的代码整理出来。

generateds.py文件

import numpy as np
import matplotlib.pyplot as plt

seed=2  

def generateds():
    # 基于seed产生随机数 
    rdm=np.random.RandomState(seed)
    # 随机数返回300行2列的矩阵，表示300组坐标点
    X=rdm.randn(300,2)

    Y_=[int(x0*x0+x1*x1<2) for (x0,x1) in X]

    # 遍历Y中的每个元素，1赋值'red'，其余的赋值blue
    Y_c=[['red' if y else 'blue' for y in Y_]]
   
   # 对数据集X和标签Y进行形状的整理，第一个元素为-1 ，表示跟随第二列计算，第二个元素表示多少列，可见X为两列，Y为1列
    X=np.vstack(X).reshape(-1,2)
    Y_=np.vstack(Y_).reshape(-1,1)

    return X,Y_,Y_c

if __name__=="__main__":
    X,Y_,Y_c=generateds()
    print("X:\n",X) # 300*2
    print("Y_:\n",Y_)# 300*1 
    print("Y_c:\n",Y_c) # 1*300

```   

`forward.py文件`  
``` python 
# coding:utf-8

import tensorflow as tf

def get_weight(shape,regularizer):
    w=tf.Variable(tf.random_normal(shape),dtype=tf.float32)
    tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w))
    return w

def get_bias(shape):
    b=tf.Variable(tf.constant(0.01,shape=shape))
    return b 


def forward(x,regularizer):
    w1=get_weight([2,11],0.01)
    b1=get_bias([11])
    y1=tf.nn.relu(tf.matmul(x,w1)+b1)

    w2=get_weight([11,1],0.01)
    b2=get_bias([1])
    y=tf.matmul(y1,w2)+b2
    return y

```    

最终利用所有的方法拟合的代码：    


``` python 
# coding:utf-8

import tensorflow as tf
import numpy as np
import matplotlib.pyplot  as plt
import generateds
import forward

STEPS = 40000
BATCH_SIZE = 30
LEARNING_RATE_BASE = 0.001
LEARNING_RATE_DECAY = 0.999
REGULARIZER = 0.01


def backward():
    x = tf.placeholder(tf.float32, shape=(None, 2))
    y_ = tf.placeholder(tf.float32, shape=(None, 1))

    X, Y_, Y_c = generateds.generateds()
    y = forward.forward(x, REGULARIZER)

    global_step = tf.Variable(0, trainable=False)
    learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, 300 / BATCH_SIZE, LEARNING_RATE_DECAY,
                                               staircase=True)

    # 定义损失函数
    loss_mse = tf.reduce_mean(tf.square(y - y_))


    loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))
    # 定义反向传播方法：包含正则化
    train_step = tf.train.AdadeltaOptimizer(learning_rate).minimize(loss_total)

    with tf.Session() as sess:
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        for i in range(STEPS):
            start = (i * BATCH_SIZE) % 300
            end = start + BATCH_SIZE
            sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})

            if i % 2000 == 0:
                loss_v = sess.run(loss_total, feed_dict={x: X, y_: Y_})
                print("After %d training steps,loss on all data is %f" % (i, loss_v))

        xx, yy = np.mgrid[-3:3:0.01, -3:3:0.01]
        grid = np.c_[xx.ravel(), yy.ravel()]
        probs = sess.run(y, feed_dict={x: grid})
        probs = probs.reshape(xx.shape)

    plt.scatter(X[:, 0], X[:, 1], c=np.squeeze(Y_c))
    plt.contour(xx, yy, probs, levels=[.5])
    plt.show()


if __name__ == "__main__":
    backward()

2019-04-05发表2025-04-16更新付威 6 分钟读完 (大约901个字)

tensorflow的学习笔记--正则化

正则化缓解过拟合

有时候发现，模型在训练数据集的准确率非常高，但很难预测新的数据，这样我们称为，模型存在了过拟合的现象。正则化是缓解过拟合一种有效的方法.

正则化在损失函数中引入模型复杂度指标，利用给W加权值，弱化训练数据的噪声（一般不正则化b），具体公式如下:

$$loss=loss(y与y’)+REGULARIZER*loss(w)$$

其中，模型中所有参数的损失函数如：交叉熵，均方误差。
用超参数REGULARIZER给出参数w在总loss中的比例，及正则化的权重 w是正则化的参数.

使用tensorflow表示：


loss(w)=tf.contrib.layers.l1_regularizer(REGULARIZER)(w) #w加和
loss(w)=tf.contrib.layers.l2_regularizer(REGULARIZER)(w) # w平方加和   

tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))    

loss=cem+tf.add_n(tf.get_collection('losses'))

我们生成几个随机点，然后利用生成分界线,代码如下：

#coding:utf-8
# 0.导入模块，生成模拟数据集
import tensorflow as tf 
import numpy as np
import matplotlib.pyplot as plt

BATCH_SIZE=30 
seed=2

# 基于seed产生随机数   
rdm=np.random.RandomState(seed)
# 随机数返回300行2列的矩阵，表示300组坐标点（x0，x1)作为输入数据集
X=rdm.randn(300,2)

#从X这个300行2列的矩阵中取出一行，判断如果两个坐标系的平方和小于2，给Y赋值1 其余赋值0. 

# 作为输入数据集的标签，
Y_=[int (x0*x0+x1*x1<2) for(x0,x1) in X]
# 遍历Y中的每个元素，1 赋值'red' 其余赋值'blue' ，这样可视化显示是人可以直观区分
Y_c=[['red' if y else 'blue'] for y in Y_]
# 对数据集X和标签Y进行shape整理，第一个元素为-1表示，随第二个参数计算得到，第二个元素表示多少列，把X整理为n行2列，把Y整理为n行1列
X=np.vstack(X).reshape(-1,2)
Y_=np.vstack(Y_).reshape(-1,1)  
print(X)
print(Y_)
print(Y_c)
#用plt.sctter画出数据集X各行中第0列元素和第1列元素的点即个行的（x0,x1)，用各行Y_c对应的值表示颜色（c是color的缩写）
plt.scatter(X[:,0],X[:,1],c=np.squeeze(Y_c))
plt.show()

x=tf.placeholder(tf.float32,shape=(None,2))
y_=tf.placeholder(tf.float32,shape=(None,1))



def get_weight(shape,regularizer):
    w=tf.Variable(tf.random_normal(shape),dtype=tf.float32)
    tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w))
    return w

def get_bias(shape):
    b=tf.Variable(tf.constant(0.01,shape=shape))
    return b 
 

w1=get_weight([2,11],0.01)
b1=get_bias([11])
y1=tf.nn.relu(tf.matmul(x,w1)+b1)

w2=get_weight([11,1],0.01)
b2=get_bias([1])
y=tf.matmul(y1,w2)+b2

loss_mse=tf.reduce_mean(tf.square(y-y_))
loss_total=loss_mse+tf.add_n(tf.get_collection('losses'))

train_step=tf.train.AdamOptimizer(0.0001).minimize(loss_mse)

with tf.Session() as sess: 
    init_op=tf.global_variables_initializer()
    sess.run(init_op)

    STEPS=40000
    for i in range(STEPS):
        start=(i*BATCH_SIZE)%32
        end=start+BATCH_SIZE
        sess.run(train_step,feed_dict={
            x:X[start:end],y_:Y_[start:end]
        })
        if i%2000==0:
            loss_mse_v=sess.run(loss_mse,feed_dict={x:X,y_:Y_})
            print("After %d training steps,loss on all data is %s"%(i,loss_mse_v))
        
        xx,yy=np.mgrid[-3:3:0.1,-3:3:.01]

        grid=np.c_[xx.ravel(),yy.ravel()]

        probs=sess.run(y,feed_dict={x:grid})
        probs=probs.reshape(xx.shape)

    print("w1:",sess.run(w1))
    print("b1:",sess.run(b1))
    print("w2:",sess.run(w2))
    print("b2:",sess.run(b2))

plt.scatter(X[:,0],X[:,1],c=np.squeeze(Y_c))
plt.contour(xx,yy,probs,levels=[.5])
plt.show()

生成点的结果如下:

tensorflow {:height=”600px” width=”600px”}

我们拟合的结果为：

tensorflow {:height=”600px” width=”600px”}

使用正则化后，我们拟合的结果为：

tensorflow {:height=”600px” width=”600px”}

2019-04-04发表2025-04-16更新付威 5 分钟读完 (大约746个字)

tensorflow的学习笔记--滑动平均

滑动平均

滑动平均，又叫影子值，记录了每个参数一段时间内过往值的平均，增加了模型的泛化性。

针对所有参数：w和b。（像是给参数加了影子，参数变化，影子缓慢追随）,具体的计算公式如下:

影子=衰减率*影子+(1-衰减率)* 参数

影子初值=参数初值

衰减率=$$min\left({cases}MOVING_AVERAGE_DECAY, \frac{1+step}{10+step}\right)$$

例如：

MOVING_AVERAGE_DECAY为0.99，参数w1为0，轮数global_step为0，w1的滑动平均值为0，参数w1更新为1.则：

*w1滑动平均值=min(0.99,1/10)*0+(1-min(0.99,1/10))1=0.9
轮数global_step为100是，参数w1更新为10，则：

*w1滑动平均值=min(0.99,101/110)*0.9+(1-min(0.99,101/110))10=0.826+0.818=1.644

再次运行：

*w1滑动平均值=min(0.99,101/110)*1.644+(1-min(0.99,101/110))10=2.328

再次运行：

w1平均值=2.956

使用tensorflow表示如下：

ema=tf.train.ExponentialMovingAverage(衰减率MOVING_AVERAGE_DECAY,当前轮数global_step)   

ema_op=ema.apply([])
ema_op=ema.apply(tf.trainable_variables()) # 每运行此据，所有待优化的参数求滑动平均    

with tf.control_dependencies([train_step,ema_op]):
    train_op=tf.no_op(name='train') 

ema.average(查看参数的滑动平均)

我们使用代码来使用模拟上面的计算逻辑：

#coding:utf-8
# 设损失函数 loss=(w+1)^2 令 w初值是常数5，反向传播就是求最优w,即求最小的loss对应的w值  
import tensorflow as tf 


# 1. 定义变量及滑动平均类 
# 定义一个32位浮点变量，初始值为0.0  这个代码就是不断更新w1参数，优化w1参数，滑动平均做了个w1的影子   

w1=tf.Variable(0,dtype=tf.float32)  

# 定义num_updates(NN的迭代轮数)，初始值为0，不可被优化。 
global_step=tf.Variable(0,trainable=False) 

# 实例化滑动平均类，给删减率为0.99，当前轮数global_step
MOVING_VERAGE_DECAY=0.99 
ema=tf.train.ExponentialMovingAverage(MOVING_VERAGE_DECAY,global_step)

#ema.applu后的括号里是更新列表，每次运行sess.run(ema_op)时，对更新列表中的元素求滑动平均值  

#在实际应用中会使用tf.trainable_variables()自动将所有的待训练的参数汇总为列表  

# ema_op=ema.apply([w1])  

ema_op=ema.apply(tf.trainable_variables()) 

with tf.Session() as sess:   
    # 初始化  
    init_op =tf.global_variables_initializer()
    sess.run(init_op)
    # 用ema.average(w1)获取w1滑动平均值，（要运行多个节点，作为列表中的元素列出，写在sess）
    #打印出当前的参数w1和w2
    print(sess.run([w1,ema.average(w1)]))

    # 参数w1的值赋值为1   
    sess.run(tf.assign(w1,1))
    sess.run(ema_op)
    print(sess.run([w1,ema.average(w1)]))

    # 更新的step和w1 的值，模拟出100轮迭代后，参数w1变为10
    sess.run(tf.assign(global_step,100))
    sess.run(tf.assign(w1,10))
    sess.run(ema_op)
    print(sess.run([w1,ema.average(w1)]))
    
    for x in range(40):
       # 每次sess.run会更新一次w1的滑动平均值
        sess.run(ema_op)
        print(sess.run([w1,ema.average(w1)]))

打印结果：

[0.0, 0.0]
[1.0, 0.9]
[10.0, 1.6445453]
[10.0, 2.3281732]
[10.0, 2.955868]
[10.0, 3.532206]
[10.0, 4.061389]
[10.0, 4.547275]
[10.0, 4.9934072]
[10.0, 5.4030375]
[10.0, 5.7791524]
[10.0, 6.1244946]
[10.0, 6.4415812]
[10.0, 6.7327247]
[10.0, 7.000047]
[10.0, 7.2454977]
[10.0, 7.470866]
[10.0, 7.6777954]
[10.0, 7.867794]
[10.0, 8.042247]
[10.0, 8.202427]
[10.0, 8.349501]
[10.0, 8.484542]
[10.0, 8.608534]
[10.0, 8.722381]
[10.0, 8.826913]
[10.0, 8.922893]
[10.0, 9.01102]
[10.0, 9.091936]
[10.0, 9.166232]
[10.0, 9.234449]
[10.0, 9.297086]
[10.0, 9.354597]
[10.0, 9.407403]
[10.0, 9.455888]
[10.0, 9.500406]
[10.0, 9.541282]
[10.0, 9.578814]
[10.0, 9.613275]
[10.0, 9.644916]
[10.0, 9.673968]
[10.0, 9.700644]
[10.0, 9.725137]

可以看到平均值一直趋近于w1。

2019-04-03发表2025-04-16更新付威 11 分钟读完 (大约1575个字)

tensorflow的学习笔记--学习率

学习率

学习率（learning_rate）是每次参数更新的幅度。

$$
W_{n+1}=W_n-learning_rate▽
$$

其中$$W_{n+1}$$时更新后的参数，$$W_n$$当前参数，▽是损失函数的梯度（导数）

例如：

损失函数$$loss=(w+1)^2$$ 梯度 $$▽=\frac{\partial loss}{\partial w}=2w+2$$

例如：参数w初始化为5，学习率为0.2，则：

1次参数w：5 5-0.2*(2*5+2)=2.6
2次参数w：2.6 2.6-0.2*(2*2.6+2)=1.16
3次参数w：1.16 1.16-0.2*(2*1.16+2)=0.296
4次参数w：0.296
…..

函数图像为:

tensorflow {:height=”600px” width=”600px”}

根据图像我们可以看出在x=-1时，有最小值。

我们利用tensorflow看看能不能找到x=-1这个极值点。

#coding:utf-8
# 设损失函数 loss=(w+1)^2 令 w初值是常数5，反向传播就是最优w,即求最小的loss对应的w值  
import tensorflow as tf 

#定义待优化参数w初值为5
w=tf.Variable(tf.constant(5,dtype=tf.float32))

# 定义损失函数loss
loss=tf.square(w+1)

train_step=tf.train.GradientDescentOptimizer(0.2).minimize(loss)

with tf.Session() as sess: 
init_op=tf.global_variables_initializer()
sess.run(init_op)

for i in range(40):
     sess.run(train_step)
     w_val=sess.run(w)
     loss_val=sess.run(loss)
     print("After %s steps: w is %f,loss is %f" %(i,w_val,loss_val))

代码运行结果:

After 0 steps: w is 2.600000,loss is 12.959999
   After 1 steps: w is 1.160000,loss is 4.665599
   After 2 steps: w is 0.296000,loss is 1.679616
   After 3 steps: w is -0.222400,loss is 0.604662
   After 4 steps: w is -0.533440,loss is 0.217678
   After 5 steps: w is -0.720064,loss is 0.078364
   After 6 steps: w is -0.832038,loss is 0.028211
   After 7 steps: w is -0.899223,loss is 0.010156
   After 8 steps: w is -0.939534,loss is 0.003656
   After 9 steps: w is -0.963720,loss is 0.001316
   After 10 steps: w is -0.978232,loss is 0.000474
   After 11 steps: w is -0.986939,loss is 0.000171
   After 12 steps: w is -0.992164,loss is 0.000061
   After 13 steps: w is -0.995298,loss is 0.000022
   After 14 steps: w is -0.997179,loss is 0.000008
   After 15 steps: w is -0.998307,loss is 0.000003
   After 16 steps: w is -0.998984,loss is 0.000001
   After 17 steps: w is -0.999391,loss is 0.000000
   After 18 steps: w is -0.999634,loss is 0.000000
   After 19 steps: w is -0.999781,loss is 0.000000
   After 20 steps: w is -0.999868,loss is 0.000000
   After 21 steps: w is -0.999921,loss is 0.000000
   After 22 steps: w is -0.999953,loss is 0.000000
   After 23 steps: w is -0.999972,loss is 0.000000
   After 24 steps: w is -0.999983,loss is 0.000000
   After 25 steps: w is -0.999990,loss is 0.000000
   After 26 steps: w is -0.999994,loss is 0.000000
   After 27 steps: w is -0.999996,loss is 0.000000
   After 28 steps: w is -0.999998,loss is 0.000000
   After 29 steps: w is -0.999999,loss is 0.000000
   After 30 steps: w is -0.999999,loss is 0.000000
   After 31 steps: w is -1.000000,loss is 0.000000
   After 32 steps: w is -1.000000,loss is 0.000000
   After 33 steps: w is -1.000000,loss is 0.000000
   After 34 steps: w is -1.000000,loss is 0.000000
   After 35 steps: w is -1.000000,loss is 0.000000
   After 36 steps: w is -1.000000,loss is 0.000000
   After 37 steps: w is -1.000000,loss is 0.000000
   After 38 steps: w is -1.000000,loss is 0.000000
   After 39 steps: w is -1.000000,loss is 0.000000

可以看到，随着loss减小，loss无限趋近与-1 。

学习率的设置

在上面的代码中，如果把学习率改成1时，发现结果一直在震荡：

After 1 steps: w is 5.000000,loss is 36.000000
After 2 steps: w is -7.000000,loss is 36.000000  
....
After 37 steps: w is 5.000000,loss is 36.000000
After 38 steps: w is -7.000000,loss is 36.000000
After 39 steps: w is 5.000000,loss is 36.000000

如果把学习率改成0.001 ，发现w变化非常慢。

从上面的例子中可以看到，学习率大了震荡不收敛，学习率小了收敛速度慢。
学习率应该怎么正确设置？

相对于固定的学习率，提出了指数衰减学习率，tensorflow的表示如下：

$$learning_rate=LEARNINF_RATE_BASE * LEARNINF_RATE_DECY*\frac{global_step}{LEARNINF_RATE_STEP}$$

其中LEARNINF_RATE_BASE是学习初始值，LEARNINF_RATE_DECY是学习衰减率，0~1。 global_step代表运行了几轮，LEARNINF_RATE_STEP多少轮更新一次学习率（总样本数/BATCH_SIZE）具体的代码表示为：

1 2	global_step=tf.Variable(0,trainable=False) learning_rate=tf.train.exponential_decay(LEARNINF_RATE_BASE,global_step,LEARNINF_RATE_STEP,LEARNINF_RATE_DECAY,staircase=True)

最终我们使用指数衰减的学习率训练的模型代码如下：

#coding:utf-8
# 设损失函数 loss=(w+1)^2 令 w初值是常数5，反向传播就是求最优w,即求最小的loss对应的w值  
import tensorflow as tf 

LEARNINF_RATE_BASE=0.1 # 初始学习率
LEARNINF_RATE_DECAY=0.99 # 学习率衰减率
LEARNINF_RATE_STEP=1 # 喂入多少轮BATCH_SIZE后，更新一次学习率，一般设为:总样本数/BATCH_SIZE
 
# 运行几轮BATCH_SIZE的计数器，初始值给0，设为不被训练
global_step=tf.Variable(0,trainable=False)

#定义指数下降学习率  
learning_rate=tf.train.exponential_decay(LEARNINF_RATE_BASE,global_step,LEARNINF_RATE_STEP,LEARNINF_RATE_DECAY,staircase=True)

#定义待优化参数w初值为5
w=tf.Variable(tf.constant(5,dtype=tf.float32))

# 定义损失函数loss
loss=tf.square(w+1)

train_step=tf.train.GradientDescentOptimizer(learning_rate
).minimize(loss,global_step=global_step)

with tf.Session() as sess: 
    init_op=tf.global_variables_initializer()
    sess.run(init_op)

    for i in range(40):
        sess.run(train_step)
        learning_rate_val=sess.run(learning_rate)
        global_step_val=sess.run(global_step)
        w_val=sess.run(w)
        loss_val=sess.run(loss)
        print("After %s ,global_step is %s,w is %f,learning_rate is %f,loss is %f" %(i,global_step_val,w_val,learning_rate_val,loss_val))

输出结果如下：

After 0 ,global_step is 1,w is 3.800000,learning_rate is 0.099000,loss is 23.040001
After 1 ,global_step is 2,w is 2.849600,learning_rate is 0.098010,loss is 14.819419
After 2 ,global_step is 3,w is 2.095001,learning_rate is 0.097030,loss is 9.579033
After 3 ,global_step is 4,w is 1.494386,learning_rate is 0.096060,loss is 6.221961
After 4 ,global_step is 5,w is 1.015167,learning_rate is 0.095099,loss is 4.060896
After 5 ,global_step is 6,w is 0.631886,learning_rate is 0.094148,loss is 2.663051
After 6 ,global_step is 7,w is 0.324608,learning_rate is 0.093207,loss is 1.754587
After 7 ,global_step is 8,w is 0.077684,learning_rate is 0.092274,loss is 1.161403
After 8 ,global_step is 9,w is -0.121202,learning_rate is 0.091352,loss is 0.772287
After 9 ,global_step is 10,w is -0.281761,learning_rate is 0.090438,loss is 0.515867
After 10 ,global_step is 11,w is -0.411674,learning_rate is 0.089534,loss is 0.346128
After 11 ,global_step is 12,w is -0.517024,learning_rate is 0.088638,loss is 0.233266
After 12 ,global_step is 13,w is -0.602644,learning_rate is 0.087752,loss is 0.157891
After 13 ,global_step is 14,w is -0.672382,learning_rate is 0.086875,loss is 0.107334
After 14 ,global_step is 15,w is -0.729305,learning_rate is 0.086006,loss is 0.073276
After 15 ,global_step is 16,w is -0.775868,learning_rate is 0.085146,loss is 0.050235
After 16 ,global_step is 17,w is -0.814036,learning_rate is 0.084294,loss is 0.034583
After 17 ,global_step is 18,w is -0.845387,learning_rate is 0.083451,loss is 0.023905
After 18 ,global_step is 19,w is -0.871193,learning_rate is 0.082617,loss is 0.016591
After 19 ,global_step is 20,w is -0.892476,learning_rate is 0.081791,loss is 0.011561
After 20 ,global_step is 21,w is -0.910065,learning_rate is 0.080973,loss is 0.008088
After 21 ,global_step is 22,w is -0.924629,learning_rate is 0.080163,loss is 0.005681
After 22 ,global_step is 23,w is -0.936713,learning_rate is 0.079361,loss is 0.004005
After 23 ,global_step is 24,w is -0.946758,learning_rate is 0.078568,loss is 0.002835
After 24 ,global_step is 25,w is -0.955125,learning_rate is 0.077782,loss is 0.002014
After 25 ,global_step is 26,w is -0.962106,learning_rate is 0.077004,loss is 0.001436
After 26 ,global_step is 27,w is -0.967942,learning_rate is 0.076234,loss is 0.001028
After 27 ,global_step is 28,w is -0.972830,learning_rate is 0.075472,loss is 0.000738
After 28 ,global_step is 29,w is -0.976931,learning_rate is 0.074717,loss is 0.000532
After 29 ,global_step is 30,w is -0.980378,learning_rate is 0.073970,loss is 0.000385
After 30 ,global_step is 31,w is -0.983281,learning_rate is 0.073230,loss is 0.000280
After 31 ,global_step is 32,w is -0.985730,learning_rate is 0.072498,loss is 0.000204
After 32 ,global_step is 33,w is -0.987799,learning_rate is 0.071773,loss is 0.000149
After 33 ,global_step is 34,w is -0.989550,learning_rate is 0.071055,loss is 0.000109
After 34 ,global_step is 35,w is -0.991035,learning_rate is 0.070345,loss is 0.000080
After 35 ,global_step is 36,w is -0.992297,learning_rate is 0.069641,loss is 0.000059
After 36 ,global_step is 37,w is -0.993369,learning_rate is 0.068945,loss is 0.000044
After 37 ,global_step is 38,w is -0.994284,learning_rate is 0.068255,loss is 0.000033
After 38 ,global_step is 39,w is -0.995064,learning_rate is 0.067573,loss is 0.000024
After 39 ,global_step is 40,w is -0.995731,learning_rate is 0.066897,loss is 0.000018

可以从上面的结果中看出w和learning_rate的变化。

2019-04-01发表2025-04-16更新付威 9 分钟读完 (大约1397个字)

tensorflow的学习笔记--损失函数

损失函数

在前面几个博客中说了一个学习模型，具体表现如下：

tensorflow

具体的计算公式：$$Y=\sum_{i}^nX_iW_i$$

曾经有人提出另一个神经元模型，多了激活函数和偏执项。

tensorflow

具体的计算公式：
$$
Y=f(\sum_{i}^nX_iW_i+b)
$$
其中f是激活函数,b是偏执项。

损失函数（loss）：预测值y’和已知答案y的差距

我们的优化目标就是把loss降低为最小。

激活函数

引入激活函数有效的避免仅使用$$XW$$的线性组合，是模型更准确，更具有表达能力。

常用的激活函数有：

relu（tf.nn.relu）：
$$
f(x)=max(x,0)= \begin{cases}
0, & \text{$x<=0$} \
x, & \text{$x>0$}
\end{cases}
$$

用tensorflow表示为:tf.nn.relu ,函数图像为：

tensorflow

sigmoid（tf.nn.sigmoid）:
$$
f(x)=1/(1+e^x)
$$
用tensorflow表示为:tf.nn.sigmoid,函数图像为：

tensorflow

tanh:
$$
f(x)=(1-e^{-2x})/(1+e^{-2x})
$$

用tensorflow表示为:tf.nn.tanh,函数图像为：

tensorflow

举个例子

预测酸奶的日销量，x1、x2是影响日销量的因素，建模前应预先猜测的数据有：每日x1、x2和销量y_（即已知答案，最佳情况，产量=销量）拟造数据集X,Y;y_=x1+x2,噪声是-0.05~+0.05 拟合可以预测销量的函数。

根据上面的模型，我们生成随机数，进行训练，代码如下：

#coding:utf-8
import tensorflow as tf
import numpy as np
BATCH_SIZE=8
seed=23455


rdm=np.random.RandomState(seed)
X=rdm.rand(32,2)
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for(x1,x2) in X]


#定义神经网络输入，参数和输出，定义前向传播  

x=tf.placeholder(tf.float32,shape=(None,2))
y_=tf.placeholder(tf.float32,shape=(None,1))
w1=tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
y=tf.matmul(x,w1)


#2 定义损失函数及反向传播方法
# 定义损失函数数MSE，方向传播方法为梯度下降   

loss_mse=tf.reduce_mean(tf.square(y_-y))
train_step=tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)

#生成会话，训练step

with tf.Session() as sess:
init_op=tf.global_variables_initializer()
sess.run(init_op)
STEPS=20000
for i in range(STEPS):
     start=(i*BATCH_SIZE)%32
     end =(i*BATCH_SIZE)%32+BATCH_SIZE
     sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]})

     if i%500==0:
          print ("After %d training steps,wl isL" %(i))
          print (sess.run(w1))
print("final wl is:\n",sess.run(w1))

打印结果：

After 17500 training steps,wl isL
[[0.96476096]
[1.0295546 ]]
After 18000 training steps,wl isL
[[0.9684917]
[1.0262802]]
After 18500 training steps,wl isL
[[0.9718707]
[1.0233142]]
After 19000 training steps,wl isL
[[0.974931 ]
[1.0206276]]
After 19500 training steps,wl isL
[[0.9777026]
[1.0181949]]
final wl is:
[[0.98019385]
[1.0159807 ]]

看到两个权重的值，都趋于1. 与数据结果y=x1+x2 结果一致。

自定义损失函数

在预测商品销量，预测多了，损失成本，预测少了损失利润，若利润不等于成本，则mse产生的loss无法利益最大化。

自定义损失函数 $$\sum_{i}^nf(y’,y)$$

$$
f(y’,y) =
\begin{cases}
PROFIT*(y’-y), & \text{$y<y’$ 预测y少了，损失利润（PROFIT）} \
COST*(y-y’), & \text{$y>=y’$ 预测y多了，损失成本（COST）}
\end{cases}
$$

使用下面的函数进行修正： loss=tf.reduce_sum(tf.where(tf.greater(y,y_),COST(y-y_),PROFIT(y_-y)))

如上面的例子，酸奶的成本（COST)1元，酸奶的利润（PROFIT)9元。
预测少了损失利润9元，预测多了损失成本预测。

预测少了损失大，希望生成的预测函数。往多了预测。

我们把损失函数进行替换，换成我们的自定义的函数，代码如下：

#coding:utf-8
#预测少了损失利润9元，预测多了损失成本预测。  
import tensorflow as tf
import numpy as np
BATCH_SIZE=8
seed=23455
COST=1
PROFIT=9


rdm=np.random.RandomState(seed)
X=rdm.rand(32,2)
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for(x1,x2) in X]


#定义神经网络输入，参数和输出，定义前向传播  

x=tf.placeholder(tf.float32,shape=(None,2))
y_=tf.placeholder(tf.float32,shape=(None,1))
w1=tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
y=tf.matmul(x,w1)


#2 定义损失函数及反向传播方法
# 定义损失函数数MSE，方向传播方法为梯度下降   

# loss_mse=tf.reduce_mean(tf.square(y_-y))
loss_mse=tf.reduce_sum(tf.where(tf.greater(y,y_),COST*(y-y_),PROFIT*(y_-y)))
train_step=tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)

#生成会话，训练step

with tf.Session() as sess:
init_op=tf.global_variables_initializer()
sess.run(init_op)
STEPS=20000
for i in range(STEPS):
     start=(i*BATCH_SIZE)%32
     end =(i*BATCH_SIZE)%32+BATCH_SIZE
     sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]})

     if i%500==0:
          print ("After %d training steps,wl isL" %(i))
          print (sess.run(w1))
print("final wl is:\n",sess.run(w1))

After 18500 training steps,wl isL
[[1.0232253]
[1.0445153]]
After 19000 training steps,wl isL
[[1.0171654]
[1.038825 ]]
After 19500 training steps,wl isL
[[1.0208615]
[1.0454264]]
final wl is:
[[1.020171 ]
[1.0425103]]

可以看到预测尽量往多的方向去预测。

交叉熵

交叉熵表示两个概率分布之间的距离。

$$
H(y’,y)=-\sum y’*logy
$$

例如，已知答案y’=(1,0),预测**y1=(0.6,0.4) y2=(0.8,0.2)**哪个更接近标准答案？

$$
H_1((1,0),(0.6,0.4))=-(1log0.6+0log0.4)\approx0.222
$$

$$
H_2((1,0),(0.8,0.2))=-(1log0.8+0log0.2)\approx0.097
$$

所以y2预测更为准确。

我们可以使用交叉熵的形式，来更精确的训练我们的模型。

ce=-tf.reduce_mean(y'*tf.log(tf.clip_by_value(y,1e^-12,1.0)))

其中y<1e^-12是，y=1e^-12，防止log0的出现

当n分类的n个输出（y1,y2,…yn）通过softmax()函数,便满足了概率的分布要求：

$$
P(X=x)\rightarrow[0,1] 且 \sum P(X=x)=1
$$

$$
softmax(y_i)=\frac{e^{y_i}}{\sum_{j=1}^ne^{y_i}}
$$
可以用ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logtis=y,labels=tf.argmax(y_,1))

cem=tf.reduce_mean(ce) 替换交叉熵的函数，代表的是当前的预测值与标准答案的差距。

目标

二值化和去噪声

训练模型

卷积

全零填充Padding

tensorflow中的卷积处理

多通道图片卷积

池化

正则化缓解过拟合

滑动平均

学习率

学习率的设置

损失函数

激活函数

举个例子

自定义损失函数

交叉熵

归档

标签

最新文章

分类

Your browser is out-of-date!