Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
209 views
in Technique[技术] by (71.8m points)

r - generalizing Dplyr aggregation for hierarchical data

I am working with a hierarchical dataset and I need to solve one issue:


library(tidyverse)
library(matrixStats)


df <- 
  tibble(
  LV = c('0.1', '0.1.1', '0.1.2', '0.1.2.1'),
  A = c(0.5, 1.2, 20000, 100),
  B = c(192, 18, 18, 5)
)

df_step1 <-
  df %>%
  mutate(SUB = str_count(LV, "[.]"),
         MAX_DEPTH = max(SUB)) %>%
  fastDummies::dummy_cols(select_columns = 'SUB') %>%
  mutate_at(vars(starts_with('SUB')),funs(.*A/B)) %>%
  rename(HEAD = SUB_1) %>%
  mutate(HEAD = HEAD[1]) %>%
  mutate_at(vars(starts_with('SUB')), ~ifelse(.== 0, 1,.))


# HEre is an issue - script write 1 to this possiont
df_step1 %>% 
  .[3, 8]

There are 3 sub-hirarchies in the data.

Second and third row is 'child' of 0.1 level and row 4 (0.1.2.1) is 'child' of (0.1.2) meaning it is grandchild of 0.1.

In line 3 columns 8 current script caluclates metric as 1 but there should really be 200000/18 as is subhierarchi (child) of Depth 2.

Is there any way to automate this in dplyr?

I was initially thinking about making some string and substring checks using:

str = '12.1.1'
substring(str, 1, str_length(str)-2)
str_detect(str, substring(str, 1, str_length(str)-2))

question from:https://stackoverflow.com/questions/66049359/generalizing-dplyr-aggregation-for-hierarchical-data

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...