
数据需要从SOID列拆分到Circ,Language,Words,如上图所示。当尝试使用以下逻辑时:
SELECT SOID,
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Circuit:\\s([a-zA-Z0-9 ]*)(,\\s|$)', 1, 1, 'e') AS "Circuit",
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Language\\(s\\):\\s([a-zA-Z0-9, ]+)(,\\s|$)', 1, 1, 'e') AS "Language",
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Words:\\s([a-zA-Z0-9 ]*)(,\\s|$)', 1, 1, 'e') AS "Words"
FROM XYZ;数据得到了正确的处理,但根据以黄色突出显示的图片,上面的逻辑无法捕获一些数据。它不是像英语那样说英语,而是说是null,而是生物科技……它显示为空,如图所示。请需要您的输入。
发布于 2021-08-02 12:40:27
该问题似乎与处理"(s)“部件有关:
with XYZ as (
select 'Attachments: 1, Circuit: North America, Language: English, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: North America, Language(s): English,Spanish, Words: 500' as SALES_ORDER_ITEM_DESCRIPTION
union all
select 'Attachments: 1, Circuit: Biotechnology Newsline [National], Language(s): English, Words: 600' as SALES_ORDER_ITEM_DESCRIPTION
)
SELECT
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Circuit:\\s([a-zA-Z0-9 \\[\\]]+)(,\\s|$)', 1, 1, 'e') AS "Circuit",
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Language[()s]*:\\s([a-zA-Z0-9\\, ]+)(,\\s|$)', 1, 1, 'e') AS "Language"
FROM XYZ;
+-----------------------------------+-----------------+
| Circuit | Language |
+-----------------------------------+-----------------+
| North America | English |
| North America | English,Spanish |
| Biotechnology Newsline [National] | English |
+-----------------------------------+-----------------+https://stackoverflow.com/questions/68621405
复制相似问题