I am working with a hierarchical dataset and I need to solve one issue:
library(tidyverse)
library(matrixStats)
df <-
tibble(
LV = c('0.1', '0.1.1', '0.1.2', '0.1.2.1'),
A = c(0.5, 1.2, 20000, 100),
B = c(192, 18, 18, 5)
)
df_step1 <-
df %>%
mutate(SUB = str_count(LV, "[.]"),
MAX_DEPTH = max(SUB)) %>%
fastDummies::dummy_cols(select_columns = 'SUB') %>%
mutate_at(vars(starts_with('SUB')),funs(.*A/B)) %>%
rename(HEAD = SUB_1) %>%
mutate(HEAD = HEAD[1]) %>%
mutate_at(vars(starts_with('SUB')), ~ifelse(.== 0, 1,.))
# HEre is an issue - script write 1 to this possiont
df_step1 %>%
.[3, 8]
There are 3 sub-hirarchies in the data.
Second and third row is 'child' of 0.1 level and row 4 (0.1.2.1) is 'child' of (0.1.2) meaning
it is grandchild of 0.1.
In line 3 columns 8 current script caluclates metric as 1 but there should really be 200000/18 as is subhierarchi (child) of Depth 2.
Is there any way to automate this in dplyr?
I was initially thinking about making some string and substring checks using:
str = '12.1.1'
substring(str, 1, str_length(str)-2)
str_detect(str, substring(str, 1, str_length(str)-2))
question from:
https://stackoverflow.com/questions/66049359/generalizing-dplyr-aggregation-for-hierarchical-data 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…