youtube comments scrape r
devtools::install_github("ropensci/RSelenium") # Install from github
library(RSelenium)
library(rvest)
pJS <- phantom(pjs_cmd = "PATH TO phantomjs.exe") # as i am using windows
Sys.sleep(5) # give the binary a moment
remDr <- remoteDriver(browserName = 'phantomjs')
remDr$open()
remDr$navigate("https://www.youtube.com/watch?v=qRC4Vk6kisY")
remDr$getTitle()[[1]] # [1] "YouTube"
# scroll down
for(i in 1:5){
remDr$executeScript(paste("scroll(0,",i*10000,");"))
Sys.sleep(3)
}
# Get page source and parse it via rvest
page_source <- remDr$getPageSource()
author <- html(page_source[[1]]) %>% html_nodes(".user-name") %>% html_text()
text <- html(page_source[[1]]) %>% html_nodes(".comment-text-content") %>% html_text()
#combine the data in a data.frame
dat <- data.frame(author = author, text = text)
Result:
> head(dat)
author text
1 Kikyo bunny simpie Omg I love fluffy puff she's so adorable when she was dancing on a rainbow it's so cute!!!
2 Tatjana Celinska Ciao 0
3 Yvette Austin GET OUT OF MYÂ HEAD!!!!
4 Susan II Watch narhwals
5 Greg Ginger who in the entire fandom never watched this, should be ashamed,\n\nPFFFTT!!!
6 Arnav Sinha LOL what the hell is this?
Are there any code examples left?
New code examples in category TypeScript
-
TypeScript 2022-03-27 19:30:45 typescript promise
-
TypeScript 2022-03-27 17:25:44 how to search for imports in vscode
-
TypeScript 2022-03-27 17:15:20 angular formgroup mark as touched
-
TypeScript 2022-03-27 17:05:06 use of slice and splice add elements array
-
TypeScript 2022-03-27 16:50:23 android studio loop through all objects in layout
-
TypeScript 2022-03-27 14:35:08 wergensherts meaning
-
TypeScript 2022-03-27 13:50:15 remove all the elements from a numpy array python
-
TypeScript 2022-03-27 12:35:49 redux toolkit typescript install
-
TypeScript 2022-03-27 12:35:30 laravel middleware for apis