how to get text from a website python
import requests
from bs4 import BeautifulSoup #pip install bs4
url = 'https://www.troyhunt.com/the-773-million-record-collection-1-data-reach/'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find_all(text=True)
output = ''
blacklist = [
'[document]',
'noscript',
'header',
'html',
'meta',
'head',
'input',
'script',
# there may be more elements you don't want, such as "style", etc.
]
for t in text:
if t.parent.name not in blacklist:
output += '{} '.format(t)
print(output)
Are there any code examples left?
New code examples in category Python
-
Python 2023-04-11 03:04:20
-
Python 2022-03-27 22:40:04 pycharm no module named
-
Python 2022-03-27 22:25:05 assign multiple variablesin one line
-
Python 2022-03-27 22:20:02 levenshtein distance
-
Python 2022-03-27 21:35:09 get text from url python last slash
-
Python 2022-03-27 21:30:30 df concatenate df
-
Python 2022-03-27 21:25:09 python odd or even
-
Python 2022-03-27 21:15:32 python include function from another file
-
Python 2022-03-27 21:10:01 color module python
-
Python 2022-03-27 21:00:27 python tkinter cursor types