文章/答案/技术大牛

发布

社区首页 >问答首页 >FeatureTools --如何将2列添加到一起？

问FeatureTools --如何将2列添加到一起？
EN

Stack Overflow用户

提问于 2020-07-12 04:47:19

回答 1查看 274关注 0票数 1

我卡住了。使用Featuretools，我想要做的就是创建一个新列，它从我的数据集中将两列相加在一起，创建一个“堆叠”的类型的特性。对数据集中的所有列执行此操作。

我的代码如下所示：

# Define the function
def feature_engineering_dataset(df):

    es = ft.EntitySet(id = 'stockdata')
    
    # Make the "Date" index an actual column cuz defining it as the index below throws
    # a "can't find Date in index" error for some reason.
    df = df.reset_index()

    # Save some columns not used in Featuretools to concat back later
    dates = df['Date']
    tickers = df['Ticker']
    dailychange = df['DailyChange']
    classes = df['class']

    dataframe = df.drop(['Date', 'Ticker', 'DailyChange', 'class'],axis=1)

    # Define the entity
    es.entity_from_dataframe(entity_id='data', dataframe=dataframe, index='Date') # Won't find Date so uses a numbered index. We'll re-define date as index later

    # Pesky warnings
    warnings.filterwarnings("ignore", category=RuntimeWarning) 
    warnings.filterwarnings("once", category=ImportWarning)

    # Run deep feature synthesis
    feature_matrix, feature_defs = ft.dfs(n_jobs=-2,entityset=es, target_entity='data', 
                                           chunk_size=0.015,max_depth=2,verbose=True,
                    agg_primitives = ['sum'],
                    trans_primitives = []
                    ) 

    # Now re-add previous columnes because featuretools...
    df = pd.concat([dates, tickers, feature_matrix, dailychange, classes], axis=1)
    
    df = df.set_index(['Date'])
    
    # Return our new dataset!
    return(df)

# Now run that defined function
df = feature_engineering_dataset(df)

我不知道这里到底发生了什么，但我已经定义了2的深度，所以我的理解是，对于数据集中的每一对列，它都会创建一个新列，将两者相加在一起？

我最初的dataframes形状有3101列，当我运行这个命令时，它是Built 3098 features，最后的df在连接之后有3098列，这是不对的，它应该有我的所有原始特性，加上工程特性。

我怎么才能达到我想要的？特性工具页面和API文档上的示例非常令人困惑，并且处理了很多过时的示例，比如"time_since_last“Trans基元和其他似乎不适用于这里的东西。谢谢!

python-3.x

featuretools

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-07-15 20:42:22

谢谢你的提问。您可以使用transform原语add_numeric创建一个新列，该列与两列相加。我将使用这些数据介绍一个快速示例。

id                time      open      high       low     close
 0 2019-07-10 07:00:00  1.053362  1.053587  1.053147  1.053442
 1 2019-07-10 08:00:00  1.053457  1.054057  1.053457  1.053987
 2 2019-07-10 09:00:00  1.053977  1.054192  1.053697  1.053917
 3 2019-07-10 10:00:00  1.053902  1.053907  1.053522  1.053557
 4 2019-07-10 11:00:00  1.053567  1.053627  1.053327  1.053397

首先，我们为数据创建实体集。

import featuretools as ft

es = ft.EntitySet('stockdata')

es.entity_from_dataframe(
    entity_id='data',
    dataframe=df,
    index='id',
    time_index='time',
)

现在，我们使用transform原语应用DFS来添加数字列。

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_entity='data',
    trans_primitives=['add_numeric'],
)

然后，将新的工程特性与原始功能一起返回。

feature_matrix

        open      high       low     close  close + high  low + open  high + low  close + open  high + open  close + low
id
0   1.053362  1.053587  1.053147  1.053442      2.107029    2.106509    2.106734      2.106804     2.106949     2.106589
1   1.053457  1.054057  1.053457  1.053987      2.108044    2.106914    2.107514      2.107444     2.107514     2.107444
2   1.053977  1.054192  1.053697  1.053917      2.108109    2.107674    2.107889      2.107894     2.108169     2.107614
3   1.053902  1.053907  1.053522  1.053557      2.107464    2.107424    2.107429      2.107459     2.107809     2.107079
4   1.053567  1.053627  1.053327  1.053397      2.107024    2.106894    2.106954      2.106964     2.107194     2.106724

通过调用函数ft.list_primitives()，您可以看到所有内置原语的列表。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62857188

复制

相似问题

问FeatureTools --如何将2列添加到一起？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问FeatureTools --如何将2列添加到一起？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问FeatureTools --如何将2列添加到一起？
EN