Quantcast
Viewing all articles
Browse latest Browse all 3

I have R code to extract information from one document. How do I loop that for all the documents in my folder?

I have a folder of txt files, and I want to extract specific texts from them and arrange them separate columns into a new data frame. I did the code for one file, but I can't seem to edit it into a loop that will run across all the documents in my folder.

This is my code for the one txt file:

    clean_text <- as.data.frame(strsplit(text$text, '\\*' ), col.names = "text") %>% mutate(text = str_replace_all(text, "\n", " "),         text = str_replace_all(text, "- ", ""),          text = str_replace_all(text,"^\\s", "")) %>%   filter(!text == " ") %>%   mutate(paragraphs = ifelse(grepl("^[[:digit:]]", text) == T, text, NA)) %>%   rename(category = text) %>%   mutate(category = ifelse(grepl("^[[:digit:]]", category) == T, NA, category)) %>%   fill(category) %>%   filter(!is.na(paragraphs)) %>%   mutate(paragraphs = strsplit(paragraphs, '^[[:digit:]]{1,3}\\.|\\t\\s[[:digit:]]{1,3}\\.')) %>%   unnest(paragraphs) %>%   mutate(paragraphs = strsplit(paragraphs, 'Download as PDF')) %>%  unnest(paragraphs) %>%   mutate(paragraphs = str_replace_all(paragraphs, "\t", "")) %>%   mutate(paragraphs = ifelse(grepl("javascript", paragraphs), "", paragraphs)) %>%  mutate(paragraphs = str_replace_all(paragraphs, "^\\s+", "")) %>%  filter(!paragraphs == "") 

How do I make this into a loop? I realise there are similar questions, but none of the solutions have worked for me. Thanks in advance for the help!


Viewing all articles
Browse latest Browse all 3

Trending Articles