代码示例
import pandas as pd
train_data = pd.read_csv('./csv/iris_training.csv')
print(train_data.head())
dummy_field = 'virginica'
dummies = pd.get_dummies(
train_data[dummy_field], prefix=dummy_field, drop_first=False)
print(dummies.head())
train_data = pd.concat([train_data, dummies], axis=1)
print(train_data.head())
train_data = train_data.drop(dummy_field, axis=1)
print(train_data.head())
输出结果
原始数据
120 4 setosa versicolor virginica
0 6.4 2.8 5.6 2.2 2
1 5.0 2.3 3.3 1.0 1
2 4.9 2.5 4.5 1.7 2
3 4.9 3.1 1.5 0.1 0
4 5.7 3.8 1.7 0.3 0
独热列
virginica_0 virginica_1 virginica_2
0 0 0 1
1 0 1 0
2 0 0 1
3 1 0 0
4 1 0 0
合并后的数据
120 4 setosa versicolor virginica virginica_0 virginica_1 virginica_2
0 6.4 2.8 5.6 2.2 2 0 0 1
1 5.0 2.3 3.3 1.0 1 0 1 0
2 4.9 2.5 4.5 1.7 2 0 0 1
3 4.9 3.1 1.5 0.1 0 1 0 0
4 5.7 3.8 1.7 0.3 0 1 0 0
剔除原始数据
120 4 setosa versicolor virginica_0 virginica_1 virginica_2
0 6.4 2.8 5.6 2.2 0 0 1
1 5.0 2.3 3.3 1.0 0 1 0
2 4.9 2.5 4.5 1.7 0 0 1
3 4.9 3.1 1.5 0.1 1 0 0
4 5.7 3.8 1.7 0.3 1 0 0