【Python-Opencv】KNN英文字母识别

特征集分析

数据集为letter-recognition.data,一共为20000条数据,以逗号分隔,数据实例如下所示,第一列为字母标记,剩下的为不同的特征。
T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8

学习方法

1、读入数据,并去除分隔号

2、将数据第一列作为标记,剩下的为训练数据

3、初始化分类器并利用训练数据进行训练

4、利用测试数据验证准确率

代码

<span style="font-size:14px;">
</span><span style="font-family:Courier New;font-size:12px;">import cv2
import numpy as np
import matplotlib.pyplot as plt

print 'load data'
data = np.loadtxt('letter-recognition.data',dtype = 'float32',delimiter = ',',
                  converters= {0: lambda ch: ord(ch)-ord('A')})

print 'split as train,test'
train,test = np.vsplit(data,2)

print 'train.shape:\t',train.shape
print 'test.shape:\t',test.shape

print 'split train as the response,trainData'
response,trainData = np.hsplit(train,[1])
print 'response.shape:\t',response.shape
print 'trainData.shape:\t',trainData.shape

print 'split the test as response,trainData'
restest,testData = np.hsplit(test,[1])

print 'Init the knn'
knn = cv2.KNearest()
knn.train(trainData,response)

print 'test the knn'
ret,result,neighbours,dist = knn.find_nearest(testData,5)

print 'the rate:'
correct = np.count_nonzero(result == restest)
accuracy = correct*100.0/10000
print 'accuracy is',accuracy,'%'</span>

结果

load data
split as train,test
train.shape:	(10000, 17)
test.shape:	(10000, 17)
split train as the response,trainData
response.shape:	(10000, 1)
trainData.shape:	(10000, 16)
split the test as response,trainData
Init the knn
test the knn
the rate:
accuracy is 93.22 %

数据集

http://download.csdn.net/detail/licong_carp/8612383


郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。