我有如下数据集。
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id str8 drug1 str3(drug2 drug3)
"pat" "thiazide" "BB" "CCB"
"ann" "thiazide" "ace" ""
"mary" "ace" "" ""
"john" "ace" "" ""
end我想为每个人的每一种药物创造一个单独的排。reshape绝对不是我想要的:我一直在用expand做实验,并且认为这是解决方案.一些我做不好的小事情。我想我需要expand,然后删除重复的。
步骤1:
这是我用来得到我想要的东西的代码,它运行得很好,除了pat:他的第三种药物没有复制到他的第三行。
expand 3
by id, sort: generate drug = cond(_n == 1,drug1, drug2, drug3)
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id str8 drug1 str3(drug2 drug3) str8 drug
"ann" "thiazide" "ace" "" "thiazide"
"ann" "thiazide" "ace" "" "ace"
"ann" "thiazide" "ace" "" "ace"
"john" "ace" "" "" "ace"
"john" "ace" "" "" ""
"john" "ace" "" "" ""
"mary" "ace" "" "" "ace"
"mary" "ace" "" "" ""
"mary" "ace" "" "" ""
"pat" "thiazide" "BB" "CCB" "thiazide"
"pat" "thiazide" "BB" "CCB" "BB"
"pat" "thiazide" "BB" "CCB" "BB"
end如果有人能教我怎么解决这个问题,那就太棒了。
步骤2:对于第二步(假设pat的行对此是正确的),我希望删除重复项,以便根据每个人不同的药物数量,只留下正确的行数。例如,pat的任何行都不应该是重复的,所以我想保留他的所有行。但安有一个重复的行,我需要删除。
这就是我所用的:
bys id drug: gen dup2=cond(_N==1,0,_n)
drop if dup2>1这样可以,但我还有多余的行给玛丽和约翰。我用以下方法处理这些问题:
drop if drug==""这是最有效/最容易出错的方法吗?
修正案--原来我的玩具数据集过于简单,无法反映我的真实数据。我的实际数据已经很长了,所以这就是为什么reshape不能在这里工作。我很高兴被纠正,但我认为expand可能是前进的道路。除了,现在,当我试图对更复杂的数据进行expand时,我不知道如何使循环生成我需要的数据集(基本上,每个药物每人一个观察)。下面是我所拥有的一个例子:
clear
input str4 id int day str8 drug1 str3(drug2 drug3)
"ann" 14 "thiazide" "ace" ""
"ann" 70 "thiazide" "ace" ""
"ann" 1 "CCB" "" ""
"ann" 35 "thiazide" "ace" ""
"ann " 30 "CCB" "" ""
"john" 1 "ace" "" ""
"john" 30 "CCB" "" ""
"john" 150 "ace" "" ""
"john" 60 "ace" "" ""
"john" 60 "CCB" "" ""
"john" 30 "ace" "" ""
"john" 1 "CCB" "" ""
"mary" 30 "ace" "" ""
"mary" 1 "ace" "" ""
"mary" 115 "thiazide" "" ""
"mary" 60 "ace" "" ""
"mary" 90 "ace" "" ""
"mary" 120 "ace" "" ""
"pat" 30 "thiazide" "BB" "CCB"
"pat" 1 "ace" "" ""
"pat" 30 "ace" "" ""
"pat" 1 "thiazide" "BB" "CCB"
end使用后:
expand 3这里有一个我想要的例子,但我不知道如何编写代码才能得到这个结果。我试过在下面使用尼克·考克斯循环的变体,但我没有把它做好。
clear
input str4 id int day str8 drug1 str3(drug2 drug3) str8 drug
"ann" 1 "CCB" "" "" "CCB"
"ann" 1 "CCB" "" "" ""
"ann" 1 "CCB" "" "" ""
"ann" 14 "thiazide" "ace" "" "thiazide"
"ann" 14 "thiazide" "ace" "" "ace"
"ann" 14 "thiazide" "ace" "" ""
"ann" 35 "thiazide" "ace" "" "thiazide"
"ann" 35 "thiazide" "ace" "" "ace"
"ann" 35 "thiazide" "ace" "" ""
"ann" 70 "thiazide" "ace" "" "thiazide"
"ann" 70 "thiazide" "ace" "" "ace"
"ann" 70 "thiazide" "ace" "" ""
"ann " 30 "CCB" "" "" "CCB"
"ann " 30 "CCB" "" "" ""
"ann " 30 "CCB" "" "" ""
"john" 1 "CCB" "" "" "CCB"
"john" 1 "CCB" "" "" ""
"john" 1 "CCB" "" "" ""
"john" 1 "ace" "" "" "ace"
"john" 1 "ace" "" "" ""
"john" 1 "ace" "" "" ""
"john" 30 "CCB" "" "" "CCB"
"john" 30 "CCB" "" "" ""
"john" 30 "CCB" "" "" ""
"john" 30 "ace" "" "" "ace"
"john" 30 "ace" "" "" ""
"john" 30 "ace" "" "" ""
"john" 60 "CCB" "" "" "CCB"
"john" 60 "CCB" "" "" ""
"john" 60 "CCB" "" "" ""
"john" 60 "ace" "" "" "ace"
"john" 60 "ace" "" "" ""
"john" 60 "ace" "" "" ""
"john" 150 "ace" "" "" "ace"
"john" 150 "ace" "" "" ""
"john" 150 "ace" "" "" ""
"mary" 1 "ace" "" "" "ace"
"mary" 1 "ace" "" "" ""
"mary" 1 "ace" "" "" ""
"mary" 30 "ace" "" "" "ace"
"mary" 30 "ace" "" "" ""
"mary" 30 "ace" "" "" ""
"mary" 60 "ace" "" "" "ace"
"mary" 60 "ace" "" "" ""
"mary" 60 "ace" "" "" ""
"mary" 90 "ace" "" "" "ace"
"mary" 90 "ace" "" "" ""
"mary" 90 "ace" "" "" ""
"mary" 115 "thiazide" "" "" "thiazide"
"mary" 115 "thiazide" "" "" ""
"mary" 115 "thiazide" "" "" ""
"mary" 120 "ace" "" "" "ace"
"mary" 120 "ace" "" "" ""
"mary" 120 "ace" "" "" ""
"pat" 1 "ace" "" "" "ace"
"pat" 1 "ace" "" "" ""
"pat" 1 "ace" "" "" ""
"pat" 1 "thiazide" "BB" "CCB" "thiazide"
"pat" 1 "thiazide" "BB" "CCB" "BB"
"pat" 1 "thiazide" "BB" "CCB" "CCB"
"pat" 30 "ace" "" "" "ace"
"pat" 30 "ace" "" "" ""
"pat" 30 "ace" "" "" ""
"pat" 30 "thiazide" "BB" "CCB" "thiazide"
"pat" 30 "thiazide" "BB" "CCB" "BB"
"pat" 30 "thiazide" "BB" "CCB" "CCB"
end此时,我可以删除缺少值的观测值,并清理dataset以获得以下内容:
drop if missing(drug)
drop drug?
clear
input str4 id int day str8 drug
"ann" 1 "CCB"
"ann" 14 "thiazide"
"ann" 14 "ace"
"ann" 35 "thiazide"
"ann" 35 "ace"
"ann" 70 "thiazide"
"ann" 70 "ace"
"ann " 30 "CCB"
"john" 1 "CCB"
"john" 1 "ace"
"john" 30 "CCB"
"john" 30 "ace"
"john" 60 "CCB"
"john" 60 "ace"
"john" 150 "ace"
"mary" 1 "ace"
"mary" 30 "ace"
"mary" 60 "ace"
"mary" 90 "ace"
"mary" 115 "thiazide"
"mary" 120 "ace"
"pat" 1 "ace"
"pat" 1 "thiazide"
"pat" 1 "BB"
"pat" 1 "CCB"
"pat" 30 "ace"
"pat" 30 "thiazide"
"pat" 30 "BB"
"pat" 30 "CCB"
end发布于 2016-11-14 19:43:29
我对reshape的被解雇感到困惑,没有任何论据或证据。reshape直接把你带到那里,除了一条线来清除错误。
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id str8 drug1 str3(drug2 drug3)
"pat" "thiazide" "BB" "CCB"
"ann" "thiazide" "ace" ""
"mary" "ace" "" ""
"john" "ace" "" ""
end
reshape long drug, i(id) j(seq)
drop if missing(drug)
list, sepby(id)
+-----------------------+
| id seq drug |
|-----------------------|
1. | ann 1 thiazide |
2. | ann 2 ace |
|-----------------------|
3. | john 1 ace |
|-----------------------|
4. | mary 1 ace |
|-----------------------|
5. | pat 1 thiazide |
6. | pat 2 BB |
7. | pat 3 CCB |
+-----------------------+编辑:
您从expand开始的想法可以很容易地工作。在引擎盖下,reshape也在做类似的事情。
clear
input str4 id str8 drug1 str3(drug2 drug3)
"pat" "thiazide" "BB" "CCB"
"ann" "thiazide" "ace" ""
"mary" "ace" "" ""
"john" "ace" "" ""
end
expand 3
sort id
gen drug = ""
quietly forval j = 1/3 {
by id: replace drug = drug`j' if _n == `j'
}
drop if missing(drug)
drop drug?
list, sepby(id) 编辑2
额外的并发症只是,并发症,并不意味着另一种方法。你需要更大的信念,并明白reshape比你想象的要灵活!参见例如这里的常见问题以及帮助和手动输入。
简单地说,我将假设"Ann "只是"Ann"的一个错误。那么,对于同一个人来说,我们拥有的不仅仅是不同的日子,而且在某种程度上,对某些人和日子来说也是重复的。这意味着更完整地拼写标识符;实际上,我们需要一个额外的变量。在FAQ中讨论的原则是,有时需要一个新的标识符变量来拼写一个默示顺序,即使是任意的。“长时间”布局是可能的,这也是一个标准概念。
clear
input str4 id int day str8 drug1 str3(drug2 drug3)
"ann" 14 "thiazide" "ace" ""
"ann" 70 "thiazide" "ace" ""
"ann" 1 "CCB" "" ""
"ann" 35 "thiazide" "ace" ""
"ann " 30 "CCB" "" ""
"john" 1 "ace" "" ""
"john" 30 "CCB" "" ""
"john" 150 "ace" "" ""
"john" 60 "ace" "" ""
"john" 60 "CCB" "" ""
"john" 30 "ace" "" ""
"john" 1 "CCB" "" ""
"mary" 30 "ace" "" ""
"mary" 1 "ace" "" ""
"mary" 115 "thiazide" "" ""
"mary" 60 "ace" "" ""
"mary" 90 "ace" "" ""
"mary" 120 "ace" "" ""
"pat" 30 "thiazide" "BB" "CCB"
"pat" 1 "ace" "" ""
"pat" 30 "ace" "" ""
"pat" 1 "thiazide" "BB" "CCB"
end
replace id = trim(id)
bysort id day : gen SEQ = _n
reshape long drug, i(id day SEQ) j(seq)
drop if missing(drug)
list, sepby(id)
+-----------------------------------+
| id day SEQ seq drug |
|-----------------------------------|
1. | ann 1 1 1 CCB |
2. | ann 14 1 1 thiazide |
3. | ann 14 1 2 ace |
4. | ann 30 1 1 CCB |
5. | ann 35 1 1 thiazide |
6. | ann 35 1 2 ace |
7. | ann 70 1 1 thiazide |
8. | ann 70 1 2 ace |
|-----------------------------------|
9. | john 1 1 1 ace |
10. | john 1 2 1 CCB |
11. | john 30 1 1 ace |
12. | john 30 2 1 CCB |
13. | john 60 1 1 ace |
14. | john 60 2 1 CCB |
15. | john 150 1 1 ace |
|-----------------------------------|
16. | mary 1 1 1 ace |
17. | mary 30 1 1 ace |
18. | mary 60 1 1 ace |
19. | mary 90 1 1 ace |
20. | mary 115 1 1 thiazide |
21. | mary 120 1 1 ace |
|-----------------------------------|
22. | pat 1 1 1 ace |
23. | pat 1 2 1 thiazide |
24. | pat 1 2 2 BB |
25. | pat 1 2 3 CCB |
26. | pat 30 1 1 ace |
27. | pat 30 2 1 thiazide |
28. | pat 30 2 2 BB |
29. | pat 30 2 3 CCB |
+-----------------------------------+发布于 2016-11-15 16:24:25
这是我在更复杂的数据方面所做的努力--似乎工作正常,但很高兴得到纠正。或者,如果有其他更好的方式来做这件事,请做帖子!
玩具数据在这里
clear
input str4 id int day str8 drug1 str3(drug2 drug3)
"pat" 1 "thiazide" "BB" "CCB"
"pat" 1 "ace" "" ""
"pat" 30 "ace" "" ""
"pat" 30 "thiazide" "BB" "CCB"
"ann" 1 "CCB" "" ""
"ann" 14 "thiazide" "ace" ""
"ann " 30 "CCB" "" ""
"ann" 35 "thiazide" "ace" ""
"ann" 70 "thiazide" "ace" ""
"mary" 1 "ace" "" ""
"mary" 30 "ace" "" ""
"mary" 60 "ace" "" ""
"mary" 90 "ace" "" ""
"mary" 115 "thiazide" "" ""
"mary" 120 "ace" "" ""
"john" 150 "ace" "" ""
"john" 1 "CCB" "" ""
"john" 1 "ace" "" ""
"john" 30 "CCB" "" ""
"john" 30 "ace" "" ""
"john" 60 "CCB" "" ""
"john" 60 "ace" "" ""
end这里的代码:
expand 3
gen drug=""
sort id day
egen group=group(id day drug1)
bys id group: gen count=_n
forval j = 1/3 {
bys id group: replace drug = drug`j' if count == `j'
}
drop if missing(drug)
drop drug? count groupNJC简化:
expand 3
gen drug = ""
forval j = 1/3 {
by id day drug1: replace drug = drug`j' if _n == `j'
}
drop if missing(drug)
drop drug? https://stackoverflow.com/questions/40596434
复制相似问题