dplyr - How to fill down a given text up to another given text and so on in R? - Stack Overflow

admin2025-04-16  3

Probably already answered, but I'm struggling to find the answer to this question: In a new column 'new_text', how to fill down a given text to another given text, and so on...

In the example below, how to fill 'Potter' to 'Wisley' then 'Wisley' to 'Granger', etc...?

The idea is to apply the proposed solution to dataframes of thousands of lines (obtained with pdftools::pdf_data) by selecting a sequence of specific words to fill down in this way.

Thanks for help.

> dat0
      text new_text
1   Potter   Potter
2     hj7d   Potter
3    kl8ep   Potter
4      f3d   Potter
5   rtyzs2   Potter
6   Wisley   Wisley
7     lq6s   Wisley
8      2fg   Wisley
9  Granger  Granger
10    r8ka  Granger
11      h9  Granger
12   qm9ne  Granger  

Data:

dat0 <-
structure(list(text = c("Potter", "hj7d", "kl8ep", "f3d", "rtyzs2", 
"Wisley", "lq6s", "2fg", "Granger", "r8ka", "h9", "qm9ne"), new_text = c("Potter", 
"Potter", "Potter", "Potter", "Potter", "Wisley", "Wisley", "Wisley", 
"Granger", "Granger", "Granger", "Granger")), class = "data.frame", row.names = c(NA, 
-12L))

Probably already answered, but I'm struggling to find the answer to this question: In a new column 'new_text', how to fill down a given text to another given text, and so on...

In the example below, how to fill 'Potter' to 'Wisley' then 'Wisley' to 'Granger', etc...?

The idea is to apply the proposed solution to dataframes of thousands of lines (obtained with pdftools::pdf_data) by selecting a sequence of specific words to fill down in this way.

Thanks for help.

> dat0
      text new_text
1   Potter   Potter
2     hj7d   Potter
3    kl8ep   Potter
4      f3d   Potter
5   rtyzs2   Potter
6   Wisley   Wisley
7     lq6s   Wisley
8      2fg   Wisley
9  Granger  Granger
10    r8ka  Granger
11      h9  Granger
12   qm9ne  Granger  

Data:

dat0 <-
structure(list(text = c("Potter", "hj7d", "kl8ep", "f3d", "rtyzs2", 
"Wisley", "lq6s", "2fg", "Granger", "r8ka", "h9", "qm9ne"), new_text = c("Potter", 
"Potter", "Potter", "Potter", "Potter", "Wisley", "Wisley", "Wisley", 
"Granger", "Granger", "Granger", "Granger")), class = "data.frame", row.names = c(NA, 
-12L))
Share Improve this question asked Feb 4 at 6:02 denisdenis 8425 silver badges14 bronze badges 2
  • 2 What output are you expecting? It'd be helpful to see an example of what the third column you're trying to make should look like. – Russ Commented Feb 4 at 6:54
  • 1 One could also do dat0$filled <- zoo::na.locf(ifelse(dat0$text %in% c("Potter", "Wisley", "Granger"), dat0$text, NA)) – Tim G Commented Feb 4 at 10:20
Add a comment  | 

2 Answers 2

Reset to default 4

One way is to convert the non-names to NA and then use fill from tidyr. You'll need to set up the specific words (names) that you want to keep first.

library(tidyr)

Names <- c("Potter", "Wisley", "Granger")

transform(dat0, text=ifelse(text %in% Names, text, NA)) |>
  fill(text)
      text new_text
1   Potter   Potter
2   Potter   Potter
3   Potter   Potter
4   Potter   Potter
5   Potter   Potter
6   Wisley   Wisley
7   Wisley   Wisley
8   Wisley   Wisley
9  Granger  Granger
10 Granger  Granger
11 Granger  Granger
12 Granger  Granger

Obviously that @Edward's fill solution is the most concise option for your problem, you definitely won't wanna miss that.

My solution is built on base R (if you are interested and play it for fun), where you can use cumsum + %in% + ave like below

nms <- c("Potter", "Wisley", "Granger")
transform(
    df,
    new_text = nms[ave(
        match(text, nms),
        cumsum(text %in% nms),
        FUN = na.omit
    )]
)

which gives

      text new_text
1   Potter   Potter
2     hj7d   Potter
3    kl8ep   Potter
4      f3d   Potter
5   rtyzs2   Potter
6   Wisley   Wisley
7     lq6s   Wisley
8      2fg   Wisley
9  Granger  Granger
10    r8ka  Granger
11      h9  Granger
12   qm9ne  Granger
13  Potter   Potter
14    abcd   Potter
15    d9k2   Potter
16    89kx   Potter
17    dkdi   Potter

data

df <- structure(list(text = c(
    "Potter", "hj7d", "kl8ep", "f3d", "rtyzs2",
    "Wisley", "lq6s", "2fg", "Granger", "r8ka", "h9", "qm9ne",
    "Potter", "abcd", "d9k2", "89kx", "dkdi"
)), row.names = c(
    NA,
    -17L
), class = "data.frame")

> df
      text
1   Potter
2     hj7d
3    kl8ep
4      f3d
5   rtyzs2
6   Wisley
7     lq6s
8      2fg
9  Granger
10    r8ka
11      h9
12   qm9ne
13  Potter
14    abcd
15    d9k2
16    89kx
17    dkdi
转载请注明原文地址:http://www.anycun.com/QandA/1744738209a86918.html