2014-12-09 30 views
7

Benzersiz durumlar belirten birden çok sütun içeren bir data.table nesnesine sahibim. Aşağıdaki küçük örnekte, "name", "job" ve "sex" değişkenleri benzersiz kimlikleri belirtmektedir. Her satırın, "from" (from) benzeri her bir olası örneği için (expand.grid benzeri) bir satırı olması için eksik satırlar eklemek istiyorum. İşte Birden çok anahtarlı sütuna göre data.table için eksik satırları ekleyin

> mydata 
    name  job sex from  score 
1: john teacher male NYT -0.6264538 
2: john teacher male USAT 0.1836433 
3: john teacher male BG -0.8356286 
4: john teacher male TIME 1.5952808 
5: mary police female USAT 0.3295078 
6: chris lawyer female BG -0.8204684 
7: chris lawyer male NYT 0.4874291 
8: chris doctor male NYT 0.7383247 

sonucudur İsterdim:

İşte
> mydata 
    name  job sex from  score 
1: john teacher male NYT -0.6264538 
2: john teacher male USAT 0.1836433 
3: john teacher male BG -0.8356286 
4: john teacher male TIME 1.5952808 
5: mary police female NYT NA 
6: mary police female USAT 0.3295078 
7: mary police female BG NA 
8: mary police female TIME NA 
9: chris lawyer female NYT -NA 
10: chris lawyer female USAT -NA 
11: chris lawyer female BG -0.8204684 
12: chris lawyer female TIME -NA 
13: chris lawyer male NYT 0.4874291 
14: chris lawyer male USAT NA 
15: chris lawyer male BG NA 
16: chris lawyer male TIME NA 
17: chris doctor male NYT 0.7383247 
18: chris doctor male USAT NA 
19: chris doctor male BG NA 
20: chris doctor male TIME NA 

Ben çalıştığım buydu:

setkeyv(mydata, cols=c("name","job","sex")) 
mydata[CJ(unique(name, job, sex), unique(from))] 
İşte
library(data.table) 
set.seed(1) 
mydata <- data.table(name = c("john","john","john","john","mary","chris","chris","chris"), 
       job = c("teacher","teacher","teacher","teacher","police","lawyer","lawyer","doctor"), 
       sex = c("male","male","male","male","female","female","male","male"), 
       from = c("NYT","USAT","BG","TIME","USAT","BG","NYT","NYT"), 
       score = rnorm(8)) 

setkeyv(mydata, cols=c("name","job","sex")) 

mydata[CJ(unique(name, job, sex), unique(from))] 

akım data.table nesne var

Ancak şu hatayı alıyorum ve fromLast = T değerini ekliyorum REK (veya YANLIŞ) Bana doğru çözümü vermez:

Error in unique.default(name, job, sex) : 
    'fromLast' must be TRUE or FALSE 
İşte

genelinde geldim alakalı cevaplar (ama hiçbiri birden anahtarlı sütunların başa görünür): add missing rows to a data table

Efficiently inserting default missing rows in a data.table

Fastest way to add rows for missing values in a data.frame?

cevap

4

olasılıklar Birkaç burada - https://github.com/Rdatatable/data.table/pull/814

CJ.dt = function(...) { 
    rows = do.call(CJ, lapply(list(...), function(x) if(is.data.frame(x)) seq_len(nrow(x)) else seq_along(x))); 
    do.call(data.table, Map(function(x, y) x[y], list(...), rows)) 
} 

setkey(mydata, name, job, sex, from) 

mydata[CJ.dt(unique(data.table(name, job, sex)), unique(from))] 
#  name  job sex from  score 
# 1: chris doctor male NYT 0.7383247 
# 2: chris doctor male BG   NA 
# 3: chris doctor male TIME   NA 
# 4: chris doctor male USAT   NA 
# 5: chris lawyer female NYT   NA 
# 6: chris lawyer female BG -0.8204684 
# 7: chris lawyer female TIME   NA 
# 8: chris lawyer female USAT   NA 
# 9: chris lawyer male NYT 0.4874291 
#10: chris lawyer male BG   NA 
#11: chris lawyer male TIME   NA 
#12: chris lawyer male USAT   NA 
#13: john teacher male NYT -0.6264538 
#14: john teacher male BG -0.8356286 
#15: john teacher male TIME 1.5952808 
#16: john teacher male USAT 0.1836433 
#17: mary police female NYT   NA 
#18: mary police female BG   NA 
#19: mary police female TIME   NA 
#20: mary police female USAT 0.3295078 
0

Bir olasılık paste için sütunlar name, job olacak veediyorum 0 birlikte unique değerlerini alın ve 'u from değerleriyle yapın. Bundan sonra, sütununu üç sütuna ayırmak için'dan cSplit kullanın, setnames ve joinjoin ileile key ayarlandıktan sonra yeniden adlandırın.

library(splitstackshape) 
library(data.table) 
mydata1 <- setnames(cSplit(mydata[,CJ(unique(paste(name, job, sex)), 
      from=unique(from))], 'V1', ' '), 2:4, c('name', 'job', 'sex'))[, 
        c(2:4,1), with=FALSE] 
setkeyv(mydata, cols=colnames(mydata)[1:4]) 
mydata[mydata1] 
#  name  job sex from  score 
#1: chris doctor male BG   NA 
#2: chris doctor male NYT 0.7383247 
#3: chris doctor male TIME   NA 
#4: chris doctor male USAT   NA 
#5: chris lawyer female BG -0.8204684 
#6: chris lawyer female NYT   NA 
#7: chris lawyer female TIME   NA 
#8: chris lawyer female USAT   NA 
#9: chris lawyer male BG   NA 
#10: chris lawyer male NYT 0.4874291 
#11: chris lawyer male TIME   NA 
#12: chris lawyer male USAT   NA 
#13: john teacher male BG -0.8356286 
#14: john teacher male NYT -0.6264538 
#15: john teacher male TIME 1.5952808 
#16: john teacher male USAT 0.1836433 
#17: mary police female BG   NA 
#18: mary police female NYT   NA 
#19: mary police female TIME   NA 
#20: mary police female USAT 0.3295078 
+0

Oldukça yoğun. Sütun seçimi için 2: 4 mi? – jazzurro

+0

@jazzurro Evet, bu sütunları 'setnames' kullanarak yeniden adlandırmak için – akrun

+1

Anladım. Çok teşekkür ederim. :) – jazzurro

4

expand() işlevi artık yuvalama ve geçişi destekleyen çünkü tidyr ait dev sürümü artık bunu yapmanın harika bir yolu var:

library(dplyr) 

mydata <- data_frame(
    name = c("john","john","john","john","mary","chris","chris","chris"), 
    job = c("teacher","teacher","teacher","teacher","police","lawyer","lawyer","doctor"), 
    sex = c("male","male","male","male","female","female","male","male"), 
    from = c("NYT","USAT","BG","TIME","USAT","BG","NYT","NYT"), 
    score = rnorm(8) 
) 

mydata %>% 
    expand(c(name, job, sex), from) %>% 
    left_join(mydata) 

#> Joining by: c("name", "job", "sex", "from") 
#> Source: local data frame [20 x 5] 
#> 
#>  name  job sex from  score 
#> 1 chris doctor male BG   NA 
#> 2 chris doctor male NYT 0.5448206 
#> 3 chris doctor male TIME   NA 
#> 4 chris doctor male USAT   NA 
#> 5 chris lawyer female BG 1.2015173 
#> 6 chris lawyer female NYT   NA 
#> 7 chris lawyer female TIME   NA 
#> 8 chris lawyer female USAT   NA 
#> 9 chris lawyer male BG   NA 
#> 10 chris lawyer male NYT -1.0930237 
#> 11 chris lawyer male TIME   NA 
#> 12 chris lawyer male USAT   NA 
#> 13 john teacher male BG 1.1345461 
#> 14 john teacher male NYT 1.3032946 
#> 15 john teacher male TIME 2.4901830 
#> 16 john teacher male USAT -1.6449096 
#> 17 mary police female BG   NA 
#> 18 mary police female NYT   NA 
#> 19 mary police female TIME   NA 
#> 20 mary police female USAT -0.2443080 
İlgili konular