新的Neo4j,但可以看到如此多的可能性,在图形数据库,特别是IT数据工作流和系统影响。但不确定正确的设计是否能达到最大的效率。
考虑一个接收文件、处理文件、将它们存储在数据库中并在各种报表中提供数据的系统。但是,根据文件的不同,数据可能在一个报表中,而不是在另一个报表中。
一个重要的用例是,如果缺少上游文件或处理这些文件的组件失败,则能够报告对下游报告的影响。
我已经提出了4种设计,其中3种似乎有效,但不确定哪一种是最好的。
希望能在这方面提供任何帮助或建议。
所用代码:
---------------------------------------------------------------------------
-- Design Experiments
---------------------------------------------------------------------------
// 1. Combination of the Workflows with shared nodes where they interact
with same Process or DataStore
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{}]->(f1)
CREATE (p1)-[:PROVIDES{}]->(f2)
CREATE (p2)-[:PROVIDES{}]->(f3)
CREATE (f1)-[:DELIVERS_TO{}]->(pp)
CREATE (pp)-[:DELIVERS_TO{}]->(p)
CREATE (f2)-[:DELIVERS_TO{}]->(p)
CREATE (f3)-[:DELIVERS_TO{}]->(p)
CREATE (p)-[:DELIVERS_TO{}]->(d)
CREATE (d)-[:DELIVERS_TO{}]->(rA)
CREATE (d)-[:DELIVERS_TO{}]->(rB)
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[r*]->(rp:Report) RETURN rp
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[r*]->(rp:Report) RETURN rp
// 2. Same node relationship design as #1, but assign a workflow property
to each node and relationship as a property array
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1", workflow: ["workflow1","workflow2"]})
CREATE (p2:Provider {name: "Provider 2", workflow: ["workflow3"]})
CREATE (f1:File {name: "File 1", workflow: ["workflow1"]})
CREATE (f2:File {name: "File 2", workflow: ["workflow2"]})
CREATE (f3:File {name: "File 3", workflow: ["workflow3"]})
CREATE (pp:PreProcess {name: "PreProcess", workflow: ["workflow1"]})
CREATE (p:Process {name: "Process", workflow: ["workflow1","workflow2","workflow3"]})
CREATE (d:DataStore {name: "DataStore", workflow: ["workflow1","workflow2","workflow3"]})
CREATE (rA:Report {name: "Report A", workflow: ["workflow1","workflow3"]})
CREATE (rB:Report {name: "Report B", workflow: ["workflow2"]})
CREATE (p1)-[:PROVIDES{workflow: ["workflow1"]}]->(f1)
CREATE (p1)-[:PROVIDES{workflow: ["workflow2"]}]->(f2)
CREATE (p2)-[:PROVIDES{workflow: ["workflow3"]}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: ["workflow1"]}]->(pp)
CREATE (pp)-[:DELIVERS_TO{workflow: ["workflow1"]}]->(p)
CREATE (f2)-[:DELIVERS_TO{workflow: ["workflow2"]}]->(p)
CREATE (f3)-[:DELIVERS_TO{workflow: ["workflow3"]}]->(p)
CREATE (p)-[:DELIVERS_TO{workflow: ["workflow1","workflow2","workflow3"]}]->(d)
CREATE (d)-[:DELIVERS_TO{workflow: ["workflow1","workflow3"]}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: ["workflow2"]}]->(rB)
// Show individual workflows
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow1") RETURN p
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow2") RETURN p
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow3") RETURN p
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"}) WITH a.workflow AS workflows
MATCH (r:Report) WHERE filter(x in r.workflow WHERE x in workflows)
RETURN r
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"}) WITH a.workflow AS workflows
MATCH (r:Report) WHERE filter(x in r.workflow WHERE x in workflows)
RETURN r
// 3. Same node relationship design as #1, but create a relationship
with a workflow property for each workflow, resulting in multiple
relatinships between nodes.
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{workflow: "workflow1"}]->(f1)
CREATE (p1)-[:PROVIDES{workflow: "workflow2"}]->(f2)
CREATE (p2)-[:PROVIDES{workflow: "workflow3"}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pp)
CREATE (pp)-[:DELIVERS_TO{workflow: "workflow1"}]->(p)
CREATE (f2)-[:DELIVERS_TO{workflow: "workflow2"}]->(p)
CREATE (f3)-[:DELIVERS_TO{workflow: "workflow3"}]->(p)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow1"}]->(d)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow2"}]->(d)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow3"}]->(d)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow1"}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow3"}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow2"}]->(rB)
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[j]->(n)-[r*]->(g)-[t]->(rp:Report) WHERE j.workflow=t.workflow RETURN rp
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[j]->(n)-[r*]->(g)-[t]->(rp:Report) WHERE j.workflow=t.workflow RETURN rp
// 4. Distinct set of nodes and relationships for each workflow, but all
with same node type so can still be matched
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 1"})
CREATE (p3:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp1:PreProcess {name: "PreProcess"})
CREATE (pc1:Process {name: "Process"})
CREATE (pc2:Process {name: "Process"})
CREATE (pc3:Process {name: "Process"})
CREATE (d1:DataStore {name: "DataStore"})
CREATE (d2:DataStore {name: "DataStore"})
CREATE (d3:DataStore {name: "DataStore"})
CREATE (rA1:Report {name: "Report A"})
CREATE (rB2:Report {name: "Report B"})
CREATE (rA3:Report {name: "Report A"})
CREATE (p1)-[:PROVIDES{workflow: "workflow1"}]->(f1)
CREATE (p2)-[:PROVIDES{workflow: "workflow2"}]->(f2)
CREATE (p3)-[:PROVIDES{workflow: "workflow3"}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pp1)
CREATE (pp1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pc1)
CREATE (f2)-[:DELIVERS_TO{workflow: "workflow2"}]->(pc2)
CREATE (f3)-[:DELIVERS_TO{workflow: "workflow3"}]->(pc3)
CREATE (pc1)-[:DELIVERS_TO{workflow: "workflow1"}]->(d1)
CREATE (pc2)-[:DELIVERS_TO{workflow: "workflow2"}]->(d2)
CREATE (pc3)-[:DELIVERS_TO{workflow: "workflow3"}]->(d3)
CREATE (d1)-[:DELIVERS_TO{workflow: "workflow1"}]->(rA1)
CREATE (d2)-[:DELIVERS_TO{workflow: "workflow3"}]->(rB2)
CREATE (d3)-[:DELIVERS_TO{workflow: "workflow2"}]->(rA3)
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[j*]->(rp:Report) RETURN rp
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[j*]->(rp:Report) RETURN rp
按照建议,已经扩展了设计1,以包括文件和报告之间的直接链接。
// 1a. Combination of the Workflows with shared nodes where they interact
with same Process or DataStore.
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{}]->(f1)
CREATE (p1)-[:PROVIDES{}]->(f2)
CREATE (p2)-[:PROVIDES{}]->(f3)
CREATE (f1)-[:DELIVERS_TO{}]->(pp)
CREATE (pp)-[:DELIVERS_TO{}]->(p)
CREATE (f2)-[:DELIVERS_TO{}]->(p)
CREATE (f3)-[:DELIVERS_TO{}]->(p)
CREATE (p)-[:DELIVERS_TO{}]->(d)
CREATE (d)-[:DELIVERS_TO{}]->(rA)
CREATE (d)-[:DELIVERS_TO{}]->(rB)
CREATE (f1)-[:USED_BY{}]->(rA)
CREATE (f2)-[:USED_BY{}]->(rB)
CREATE (f3)-[:USED_BY{}]->(rA)
// Show impacted reports (and path) if Provider 1 is down
MATCH path = (:Provider{name:'Provider 1'})-[:PROVIDES|USED_BY*]->(r:Report)
RETURN path, r.name AS report
// Show impacted reports (and path) if Provider 2 is down
MATCH path = (:Provider{name:'Provider 2'})-[:PROVIDES|USED_BY*]->(r:Report)
RETURN path, r.name AS report
发布于 2018-04-07 21:09:48
您已经在这里做了一些彻底的探索,您已经找到了您的查询所针对的设计。然而,他们也要付出代价。
Design2根本不使用关系,所以解决方案看上去不太严谨。它还要求您确保相关节点上的工作流列表保持同步和最新。这似乎有一个较高的维护费用。
设计3有类似的成本,但现在属性在关系上,而且您还必须在整个模型中提供冗余关系,因此成本更高。
设计4需要过程中使用的每个步骤的冗余,其中每个子图都是从提供者到报表的单一路径。虽然这很容易理解和查询,但冗余节点和关系可能并不适合。
设计1很有趣,因为它提供了正确的答案,但只对某些questions...questions提供了有关路径中的处理器、预处理器和数据存储的影响的答案,当这些硬件和软件组件出现故障时会发生什么。
但是,它不适用于数据沿袭/依赖。还没。您可能需要考虑更改设计1,以便在数据依赖和管道过程中已经拥有的数据之间有单独的路径。
数据依赖可能是另一回事。如果您对此提出问题,那么您主要关注的是输入和输出、报表文件。在这种情况下,您可以考虑在相关文件和报表节点之间创建一个:DEPENDS_ON关系。
请考虑将其添加到设计1的创建脚本中:
match (f:File), (r:Report{name:'Report A'})
where f.name in ['File 1', 'File 3']
create (r)<-[:USED_BY]-(f)和
match (f:File), (r:Report{name:'Report B'})
where f.name in ['File 2']
create (r)<-[:USED_BY]-(f)对于有关数据血统的问题,您的查询只能使用相关的关系,在本例中是:PROVIDES和:USED_BY。
match path = (:Provider{name:'Provider 1'})-[:PROVIDES|USED_BY*]->(r:Report)
return path, r.name as report或者反过来说,报告的依据是什么?
match path = (p:Provider)-[:PROVIDES|USED_BY*]->(r:Report{name:'Report A')
return path, p.name as report如果您的模型发生了更改,从而对中间报告进行了建模(预处理和流程操作的输出),那么您可以从:File到:Report (而不是直接在:File和:Report之间)创建与链中的:USED_BY关系,以便在处理过程中看到依赖链。
https://stackoverflow.com/questions/49706575
复制相似问题