使用Mnist的数据集实现对手写数字识别

发表于 2019-04-24

感慨下，学了这么久终于有能有点实战的东西了，这篇文章本想写于2019-04-14,可是担心会对学习的进度产生影响，就一直拖后。所以就再今天（2019-04-24）开始去写这篇实战的文章。

目标

写这篇博客的目的就是为了写一个识别手写程序的方案，首先我们准备了mnist(mnist数据集)的数据集，和一张手写的图片
tensorflow {:height=”600px” width=”600px”}

为了测试，增加了一些干扰线：

tensorflow {:height=”600px” width=”600px”}

我们知道，在mnist的数据集中，是对单个28*28像素图片进行处理，而且是黑底白字的数字图片，所以为了能够使用mnist的数据样本训练，我们也需要对手写的图片处理。

具体的处理方法我们可以分为：

1.二值化
2.去噪声
3.裁剪
4.缩放28*28的图像
5.训练样本
6.识别结果

二值化和去噪声

对于这个图片的二值化可以使用opencv相关处理模块，二值化是直接把图片编程只有黑白两种颜色的图：

tensorflow {:height=”600px” width=”600px”}

我们可以看到上面有几个使用均值模糊可以去除干扰线和噪点：

tensorflow {:height=”600px” width=”600px”}

对模糊后的图片再进行二值化，得到结果如下：

tensorflow {:height=”600px” width=”600px”}

对应的代码如下：

import cv2 as cv
import numpy as np
def threshold(image):
  gray = cv.cvtColor(image, cv.COLOR_RGB2GRAY)  # 把输入图像灰度化
  # 直接阈值化是对输入的单通道矩阵逐像素进行阈值分割。
  ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY | cv.THRESH_TRIANGLE)
  return binary;

img = cv.imread('./pic/01.png')
img = threshold(img)
cv.imwrite("02.png", img)
img = average_value(img)
cv.imwrite("03.png", img)
# 二值化
ret, img = cv.threshold(img, 0, 255, cv.THRESH_BINARY | cv.THRESH_TRIANGLE)
cv.imwrite("04.png", img)

对图片进行裁剪 ,分别按行和列去做累加，找到平均值不等于255的点，找出最小值和最大值，对应的图片裁剪的开始和结束。

def corp_img(image):
    h, w = image.shape[:2]
    xArr = []
    yArr = []
    imgArr = []
    resultImgArr = []
    for x in range(w):
        xSum = np.transpose(image)[x].sum()
        if xSum / h != 255:
            xArr.append(x)
        else:
            if len(xArr) > 0:
                cropped = image[0:h, min(xArr):max(xArr)]
                imgArr.append(cropped)
                xArr.clear()

    for img in imgArr:
        for y in range(h):
            yw = img.shape[-1]
            ySum = img[y].sum()
            if (ySum / yw) != 255:
                yArr.append(y)
            else:
                if (len(yArr) > 0):
                    cropped = img[min(yArr):max(yArr), 0:yw]
                    resultImgArr.append(cropped)
                    yArr.clear()
    return resultImgArr;

截取后保存后的图片结果为：

tensorflow

mnist的样本数据结果：

mnist数据

对比样本数据结果：

mnist数据

通过对比，我们可以发现截取后的图片没有边框，是白底黑字，分辨率也和mnist的样本数据不同，为了能够使用mnist的数据模型，需要对图片做进一步填充，缩放，反色：

# 反色函数
def revers_color(image):
  h, w = image.shape[0:2]
  for x in range(w):
      for y in range(h):
          image[y, x] = 255 - image[y, x]
  return image;

corpImgArr = corp_img(img)
for i in range(len(corpImgArr)):
    cv.imwrite(str(i) + "_real.png", corpImgArr[i])
    wrap = cv.copyMakeBorder(corpImgArr[i],15,15,15,15,cv.BORDER_CONSTANT,value=255)# 填充15个像素
    resImg = cv.resize(wrap, (28, 28), interpolation=cv.INTER_AREA)# 缩放为28*28的像素
    reversImg= revers_color(resImg)# 反色
    cv.imwrite(str(i) + ".png", revers_color(reversImg))

经过处理后的图片效果如下,基本与mnist的数据样本格式一致：

mnist数据

训练模型

原理： 要被识别和训练的模型是一个28*28的图片矩阵，共有784个元素。而最终我们需要得到的结果是一个1x10的数组，所以我们对原来的图片转换成1x784的一维数组X，根据前向传播的神经元模型： $y=wx+b$ ,可以知道W是784*10的矩阵，b是一个1*10的矩阵。我们的训练步骤如下：

随机获得784*10矩阵W和偏执项1*10的矩阵b
利用神经元的函数关系 $y=wx+b$ ,并使用激活函数relu,计算出一个不准确的y
对于输入的正确结果y，定义为y_,使用交叉熵的方式计算损失函数
使用正则化方式防止模型过拟合
使用指数衰减的学习率，模型更好的变动
使用反向传播的训练方法，减少loss函数的损失
对所有的参数使用滑动平均，更准确的确定模型参数
读取mnist的数据，喂入神经网络，开始训练

神经网络的代码如下：

前向传播代码：minst_forward.py

# coding:utf-8

import tensorflow as tf
INPUT_NODE=784
OUTPUT_NODE=10
LAYER1_NODE=500
#定义输入输出参数和前向传播过程
def get_weight(shape,regularizer):
    w=tf.Variable(tf.random_normal(shape),dtype=tf.float32)
    if regularizer!=None:
        tf.add_to_collection("losses",tf.contrib.layers.l2_regularizer(regularizer)(w))
    return w

def get_bias(shape):
    b=tf.Variable(tf.constant(0.01,shape=shape))
    return b 


def forward(x,regularizer):
    w1=get_weight([INPUT_NODE,LAYER1_NODE],regularizer)
    b1=get_bias([LAYER1_NODE])
    y1=tf.nn.relu(tf.matmul(x,w1)+b1)

    w2=get_weight([LAYER1_NODE,OUTPUT_NODE],regularizer)
    b2=get_bias([OUTPUT_NODE])
    y=tf.matmul(y1,w2)+b2
    return y

反向传播代码：minst_backward.py

# coding:utf-8

import tensorflow as tf
import minst_forward
import os

from tensorflow.examples.tutorials.mnist import input_data

BATCH_SIZE = 200
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
REGULARIZER = 0.0001
STEPS = 100000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH = "./model/"
MODEL_NAME = "minst_model"

def backword(minst):
    x = tf.placeholder(tf.float32, (None, minst_forward.INPUT_NODE))
    y_ = tf.placeholder(tf.float32, (None, minst_forward.OUTPUT_NODE))
    y = minst_forward.forward(x, REGULARIZER)
    global_step = tf.Variable(0, trainable=False)

    # 使用交叉熵的形式定义损失函数
    ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
    cem = tf.reduce_mean(ce)

    #正则化防止过拟合
    loss = cem + tf.add_n(tf.get_collection("losses"))


    # 使用指数衰减的学习率，实现更好的学习率
    learning_rate=tf.train.exponential_decay(LEARNING_RATE_BASE,
                                            global_step,
                                            minst.train.num_examples/BATCH_SIZE,
                                            LEARNING_RATE_DECAY,
                                            staircase=True)
    # 使用反向传播训练方法，以减小loss值为优化目标
    train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

   # 对所有的参数都使用滑动平均，更准确的定义模型。
    ema=tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
    emp_op=ema.apply(tf.trainable_variables())
    with tf.control_dependencies([train_step,emp_op]):
       train_op=tf.no_op(name="train")


    saver=tf.train.Saver()

    with tf.Session() as sess:
        init_op=tf.global_variables_initializer()
        sess.run(init_op)

        ckpt=tf.train.get_checkpoint_state(MODEL_SAVE_PATH)
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess,ckpt.model_checkpoint_path)

        for i in range(STEPS):
            xs,ys =minst.train.next_batch(BATCH_SIZE)
            _,loss_value,step=sess.run([train_op,loss,global_step],feed_dict={x:xs,y_:ys})#喂入神经网络数据

            if i%1000==0:
                print("After %d training step(s),loss on training batch is %g."%(step,loss_value))
                saver.save(sess,os.path.join(MODEL_SAVE_PATH,MODEL_NAME),global_step=global_step)

def main():
    minst=input_data.read_data_sets("./data",one_hot=True)# 读取mnist的数据
    backword(minst)

if __name__=="__main__":
    main()

运行backword.py的程序，使用mnist的数据，进行10w的训练，得到数据模型如下：

mnist数据

利用模型获得最大可能的预测值，代码如下：

import imageUtils
import tensorflow as tf
import minst_forward
import minst_backward
import numpy as np
import cv2 as cv


def restore_model(imgArr):
    with tf.Graph().as_default() as tg:
        x = tf.placeholder(tf.float32, [None, minst_forward.INPUT_NODE])
        y = minst_forward.forward(x, None)
        preValue = tf.argmax(y, 1)  # 得到概率最大的预测值

        # 实现滑动平均模型，参数的MOVING_AVERAGE_DECAY用于控制模型的速度
        variable_averages = tf.train.ExponentialMovingAverage(minst_backward.MOVING_AVERAGE_DECAY)
        variable_to_restore = variable_averages.variables_to_restore()
        saver = tf.train.Saver(variable_to_restore)

        with tf.Session() as sess:
            ckpt = tf.train.get_checkpoint_state(minst_backward.MODEL_SAVE_PATH)
            if ckpt and ckpt.model_checkpoint_path:
                saver.restore(sess, ckpt.model_checkpoint_path)
                preValue = sess.run(preValue, feed_dict={x: imgArr})
                return preValue
            else:
                print("No checkpoint file found")
                return -1


if __name__ == "__main__":
    imgArr = imageUtils.getReadyImage("./pic/01.png")
    result = []
    i = 0;
    for img in imgArr:
        try:
            im_arr = np.array(img)
            nm_arr = im_arr.reshape([1, 784])
            nm_arr = nm_arr.astype(np.float32)
            img_ready = np.multiply(nm_arr, 1.0 / 255.0)
            # testPicArr=pre_pic(testPic)
            preValue = restore_model(img_ready)
            result.append(str(preValue[0]))
        except Exception as  ee:
            print(ee)
    print("".join(result))

识别结果如下：

mnist数据

我们再输入一个正常的手写图片：

mnist数据

输出结果：

mnist数据

识别大部分数据正确，还有有识别错误的现象，因为训练的模型的差异和训练次数过少，所以会导致识别出错的现象。