文章/答案/技术大牛

发布

社区首页 >问答首页 >ValueError: qk和pk必须具有相同的形状- scipy.spatial.distance.jensenshannon

问ValueError: qk和pk必须具有相同的形状- scipy.spatial.distance.jensenshannon
EN

Stack Overflow用户

提问于 2020-05-01 10:30:27

回答 1查看 752关注 0票数 1

我在下面调用jensen_shannon(查询，矩阵)函数，在文档矩阵中查找文档查询中最相似的文档。

def jensen_shannon(query, matrix):
"""
This function implements a Jensen-Shannon similarity
between the input query (an LDA topic distribution for a document)
and the entire corpus of topic distributions.
It returns an array of length M where M is the number of documents in the corpus
"""
# lets keep with the p,q notation above
p = query[None,:].T # take transpose
q = matrix.T # transpose matrix
m = 0.5*(p + q)
return np.sqrt(0.5*(entropy(p,m) + entropy(q,m)))

查询的形状：(100，)

矩阵形状：(10804,100)

错误跟踪：

ValueError                                Traceback (most recent call last)
<ipython-input-103-86cb68dd862d> in <module>
      1 # this is surprisingly fast
----> 2 most_sim_ids = get_most_similar_documents(new_doc_distribution,doc_topic_dist)

<ipython-input-102-c0fb95224e87> in get_most_similar_documents(query, matrix, k)
      6     print(query.shape)
      7     print(matrix.shape)
----> 8     sims = jensen_shannon(query,matrix) # list of jensen shannon distances
      9     return sims.argsort()[:k] # the top k positional index of the smallest Jensen Shannon distances

<ipython-input-74-6ffb0ec54e9a> in jensen_shannon(query, matrix)
     10     q = matrix.T # transpose matrix
     11     m = 0.5*(p + q)
---> 12     return np.sqrt(0.5*(entropy(p,m) + entropy(q,m)))

~/venv/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py in entropy(pk, qk, base, axis)
   2668         qk = asarray(qk)
   2669         if qk.shape != pk.shape:
-> 2670             raise ValueError("qk and pk must have same shape.")
   2671         qk = 1.0*qk / np.sum(qk, axis=axis, keepdims=True)
   2672         vec = rel_entr(pk, qk)

ValueError: qk and pk must have same shape.

为scipy.spatial.distance.jensenshannon添加轴参数，但它不接受函数中的轴参数。

有人知道我错过了什么吗？任何线索都非常感谢。谢谢。

FYI:我正在尝试这个kaggle代码https://www.kaggle.com/ktattan/lda-and-document-similarity/data

python

scipy

topic-modeling

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-12-11 20:34:16

试试这个：

p = query[None,:].T + np.zeros([100, 10804])

10804 =文件数量

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61540751

复制

相似问题

问ValueError: qk和pk必须具有相同的形状- scipy.spatial.distance.jensenshannon
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问ValueError: qk和pk必须具有相同的形状- scipy.spatial.distance.jensenshannonEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问ValueError: qk和pk必须具有相同的形状- scipy.spatial.distance.jensenshannon
EN