我试图找到在一个列中找到的一个子字符串的所有组合,然后用每个单词的所有可能组合来爆炸dataframe。
示例Dataframe
URL Keyword
0 http://www.amazon.com Amazon Lightning Sale
1 https://www.ebay.com Shop eBay Today期望输出
URL Keyword
0 http://www.amazon.com Amazon Lightning Sale
1 http://www.amazon.com Amazon Sale Lightning
2 http://www.amazon.com Lightning Amazon Sale
3 http://www.amazon.com Sale Amazon Lightning
4 http://www.amazon.com Sale Lightning Amazon
5 http://www.amazon.com Lightning Sale Amazon
6 https://www.ebay.com Shop eBay Today
7 https://www.ebay.com Shop Today eBay
8 https://www.ebay.com eBay Shop Today
9 https://www.ebay.com eBay Today Shop
10 https://www.ebay.com Today eBay Shop
11 https://www.ebay.com Today Shop eBay最小可再生示例
import pandas as pd
# initialize data of lists.
data = {'URL': ['http://www.amazon.com', 'https://www.ebay.com'],
'Keyword': ["Amazon Lightning Sale", "Shop eBay Today"]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)我在这里尝试过解决方案:Pandas DataFrame Combinations and expand,但这并不完全是我所需要的。
发布于 2022-05-01 14:13:15
这里有一种不使用迭代工具的替代方法:
(df.assign(Keyword = df['Keyword'].str.split().map(lambda x: [[i,j,k] for i in x for j in x for k in x if len({i,j,k})==len(x)]))
.explode('Keyword')
.assign(Keyword = lambda x: x['Keyword'].str.join(' ')))输出:
URL Keyword
0 http://www.amazon.com Amazon Lightning Sale
0 http://www.amazon.com Amazon Sale Lightning
0 http://www.amazon.com Lightning Amazon Sale
0 http://www.amazon.com Lightning Sale Amazon
0 http://www.amazon.com Sale Amazon Lightning
0 http://www.amazon.com Sale Lightning Amazon
1 https://www.ebay.com Shop eBay Today
1 https://www.ebay.com Shop Today eBay
1 https://www.ebay.com eBay Shop Today
1 https://www.ebay.com eBay Today Shop
1 https://www.ebay.com Today Shop eBay
1 https://www.ebay.com Today eBay Shop发布于 2022-05-01 11:41:31
#创建id: df' ID‘= range(df.shape) URL关键字ID 0 http://www.amazon.com Amazon 0 1 https://www.ebay.com Shop eBay eBay 1
导入新导入迭代工具def create_combinations(id,kw):#拆分关键字:re.split= re.split('\W+',kw)返回pd.DataFrame( {'ID':id,‘组合键’,'.join(x) } for x in itertools.permutations(Word))# create组合数据= [] for id,kw in zip(df.ID,df.Keyword):data.append( id,kw )
Joost D bken的回答更优雅一些
发布于 2022-05-01 11:53:17
from itertools import permutations
df['Keyword'] = df['Keyword'].apply(lambda x: list(permutations(x.split())))
df.explode('Keyword', ignore_index=True)首先,应用于关键字列的itertools.permutations方法将创建作为列表的所有可能的关键字组合。
接下来,您可以使用pandas.DataFrame.explode函数从创建的列表中创建许多项。
如果你真的想要一个完整的字符串而不是一个关键字的元组,你可以用一个字符串连接:[" ".join(t) for t in permutations(x.split())]来替换这个[" ".join(t) for t in permutations(x.split())]部件。
https://stackoverflow.com/questions/72076327
复制相似问题