怎么制作一个决策树分类器
1、加载模块。# -*- coding: utf-8 -*-import numpy as npimport scipy as spfrom sklearn import treefrom sklearn.metrics import precision_recall_curvefrom sklearn.metrics import classification_reportfrom sklearn.cross_validation import train_test_split个别模块被别的东西取缔了,python给了我们一个提示。
2、把txt文档里面的数据读取出来:data = []la水瑞侮瑜bels = []with open("稆糨孝汶;D:/HintSoft/Hint-W7/Desktop/data.txt") as ifile: for line in ifile: tokens = line.strip().split(' ') data.append([float(tk) for tk in tokens[:-1]]) labels.append(tokens[-1])x = np.array(data)labels = np.array(labels)print(x)print(labels)
3、把瘦用0代替,把胖用1代替:y = np.zeros(labels.shape)y[labels=='fat']=1
4、构造训练数据集和测试集:x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)
5、构造一个未训练的树状分类器:f = tree.DecisionTreeClassifier(criterion='entropy')用训练集来训练这个分类器:f.fit(x_train, y_train)训练的过程,就是拟合。
6、把训练的树状分类器保存下来:with open("D:/HintSoft/Hint-W7/Desktop/.dot", 'w') as g: g = tree.export_graphviz(f, out_file=g)打开dot文件,可以看到这个决策树的各参数。
7、用测试集对f进行测试:answer = f.predict(x_test)print(x_test)print(answer)print(y_test)print(np.mean( answer == y_test))