我正在使用scikit learn来训练一个分类模型。我的训练数据中既有离散特征,也有连续特征。我想使用最大互信息进行特征选择。如果我有向量x和标签y,并且前三个特征值是离散的,我可以像这样获得MMI值:
mutual_info_classif(x, y, discrete_features=[0, 1, 2])现在我想在管道中使用相同的互信息选择。我想做这样的事情
SelectKBest(score_func=mutual_info_classif).fit(x, y)但是没有办法将离散特征掩码传递给SelectKBest。有没有一些我忽略的语法来做这件事,或者我必须编写我自己的得分函数包装器?
发布于 2017-04-27 10:36:03
不幸的是,我找不到SelectKBest的这个功能。但我们可以轻松地将SelectKBest扩展为我们的自定义类,以覆盖将要调用的fit()方法。
这是fit()的current method of SelectKBest (摘自source at github)
# No provision for extra parameters here
def fit(self, X, y):
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)
....
....
# Here only the X, y are passed to scoring function
score_func_ret = self.score_func(X, y)
....
....
self.scores_ = np.asarray(self.scores_)
return self现在,我们将使用更改后的fit()定义新类SelectKBestCustom。我从上面的源代码中复制了所有内容,只更改了两行(对其进行了注释):
from sklearn.utils import check_X_y
class SelectKBestCustom(SelectKBest):
# Changed here
def fit(self, X, y, discrete_features='auto'):
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)
if not callable(self.score_func):
raise TypeError("The score function should be a callable, %s (%s) "
"was passed."
% (self.score_func, type(self.score_func)))
self._check_params(X, y)
# Changed here also
score_func_ret = self.score_func(X, y, discrete_features)
if isinstance(score_func_ret, (list, tuple)):
self.scores_, self.pvalues_ = score_func_ret
self.pvalues_ = np.asarray(self.pvalues_)
else:
self.scores_ = score_func_ret
self.pvalues_ = None
self.scores_ = np.asarray(self.scores_)
return self这可以简单地称为:
clf = SelectKBestCustom(mutual_info_classif,k=2)
clf.fit(X, y, discrete_features=[0, 1, 2])编辑:上面的解决方案在管道中也很有用,在调用fit()时,可以为discrete_features参数分配不同的值。
另一种解决方案(不太可取):不过,如果你只需要暂时使用mutual_info_classif处理SelectKBest (只是分析结果),我们也可以制作一个自定义函数,它可以在内部使用硬编码的discrete_features调用mutual_info_classif。大致是这样的:
def mutual_info_classif_custom(X, y):
# To change discrete_features,
# you need to redefine the function each time
# Because once the func def is supplied to selectKBest, it cant be changed
discrete_features = [0, 1, 2]
return mutual_info_classif(X, y, discrete_features)上述函数的用法:
selector = SelectKBest(mutual_info_classif_custom).fit(X, y)发布于 2018-05-09 22:24:39
您还可以使用分词,如下所示:
from functools import partial
discrete_mutual_info_classif = partial(mutual_info_classif, iscrete_features=[0, 1, 2])
SelectKBest(score_func=discrete_mutual_info_classif).fit(x, y)https://stackoverflow.com/questions/43643278
复制相似问题