决策树算法实现与项目应用实践 PDF 下载_Java知识分享网-免费Java资源下载

失效链接处理

决策树算法实现与项目应用实践 PDF 下载

转载自：https://python222.java1234.com/article/1486

相关截图：

主要内容：

1. 决策树算法基础

1.1 决策树概述

决策树是一种基本的分类和回归方法，它通过一系列的决策规则将数据集划分成更小

的子集。决策树结构包括：

• 根节点：包含所有数据

• 内部节点：根据特征进行划分

• 叶子节点：输出预测结果

1.2 信息增益与熵

熵：衡量系统的不确定性

import numpy as np

def entropy(y):

"""

计算标签集合的信息熵

参数：

y - 标签数组

返回： entropy_value - 熵值

"""

# 获取每个类别的数量

class_counts = np.bincount(y)

# 计算每个类别的概率

probabilities = class_counts / len(y)

# 只考虑非零概率

probabilities = probabilities[probabilities > 0]

# 计算熵

entropy_value = -np.sum(probabilities * np.log2(probabilities))

return entropy_value

# 示例：计算二分类数据的熵

y_binary = np.array([0, 0, 0, 1, 1, 1])

print(f"二分类数据的熵: {entropy(y_binary):.4f}")

y_pure = np.array([0, 0, 0, 0])

print(f"纯数据的熵: {entropy(y_pure):.4f}")

y_uniform = np.array([0, 1, 2, 3])

print(f"均匀分布的熵: {entropy(y_uniform):.4f}")

信息增益：分裂前后熵的差值

def information_gain(X_column, y, threshold):

"""

计算按某特征分裂后的信息增益

参数：

X_column - 特征列

y - 标签数组

threshold - 分裂阈值

ig - 信息增益值

"""

# 父节点熵

parent_entropy = entropy(y)

# 根据阈值划分左右子集

left_indices = X_column <= threshold

right_indices = X_column > threshold

if len(left_indices) == 0 or len(right_indices) == 0:

return 0

# 左右子集的熵

n_left, n_right = sum(left_indices), sum(right_indices)

n_total = len(y)

e_left, e_right = entropy(y[left_indices]), entropy(y[right_indices])

# 子节点加权熵

child_entropy = (n_left / n_total) * e_left + (n_right / n_total) * e_right

# 信息增益

ig = parent_entropy - child_entropy

return ig

# 示例：计算特征分裂的信息增益

X_feature = np.array([1.2, 2.3, 3.5, 4.1, 5.2])

y_labels = np.array([0, 0, 1, 1, 1])

threshold = 3.0

ig_value = information_gain(X_feature, y_labels, threshold)

print(f"在阈值 {threshold} 处分裂的信息增益: {ig_value:.4f}")