生成独热编码

创建日期:2024-06-21
更新日期:2025-01-01

代码示例

import pandas as pd

train_data = pd.read_csv('./csv/iris_training.csv')
print(train_data.head())

dummy_field = 'virginica'
dummies = pd.get_dummies(
    train_data[dummy_field], prefix=dummy_field, drop_first=False)
print(dummies.head())

train_data = pd.concat([train_data, dummies], axis=1)
print(train_data.head())

train_data = train_data.drop(dummy_field, axis=1)
print(train_data.head())

输出结果

原始数据

   120    4  setosa  versicolor  virginica
0  6.4  2.8     5.6         2.2          2
1  5.0  2.3     3.3         1.0          1
2  4.9  2.5     4.5         1.7          2
3  4.9  3.1     1.5         0.1          0
4  5.7  3.8     1.7         0.3          0

独热列

   virginica_0  virginica_1  virginica_2
0            0            0            1
1            0            1            0
2            0            0            1
3            1            0            0
4            1            0            0

合并后的数据

   120    4  setosa  versicolor  virginica  virginica_0  virginica_1  virginica_2
0  6.4  2.8     5.6         2.2          2            0            0            1
1  5.0  2.3     3.3         1.0          1            0            1            0
2  4.9  2.5     4.5         1.7          2            0            0            1
3  4.9  3.1     1.5         0.1          0            1            0            0
4  5.7  3.8     1.7         0.3          0            1            0            0

剔除原始数据

   120    4  setosa  versicolor  virginica_0  virginica_1  virginica_2
0  6.4  2.8     5.6         2.2            0            0            1
1  5.0  2.3     3.3         1.0            0            1            0
2  4.9  2.5     4.5         1.7            0            0            1
3  4.9  3.1     1.5         0.1            1            0            0
4  5.7  3.8     1.7         0.3            1            0            0

简介

一个来自三线小城市的程序员开发经验总结。