BeautifulSoup - scraping paragraphs from html
from bs4 import BeautifulSoup
# Simple HTML
SIMPLE_HTML = '''<html>
<head></head>
<body>
<h1>This is a title</h1>
<p class="subtitle">Lorem ipsum dolor sit amet.</p>
<p>Here's another p without a class</p>
<ul>
<li>Sarah</li>
<li>Mary</li>
<li>Charlotte</li>
<li>Carl</li>
</ul>
</body>
</html>'''
simple_soup = BeautifulSoup(SIMPLE_HTML, 'html.parser') # use html.parser in order to understand the simple HTML
# Find paragraph
def find_paragraph():
print(simple_soup.find('p', {'class': 'subtitle'}).string)
def find_other_paragraph():
paragraphs = simple_soup.find_all('p') # give all the paragraphs
other_paragraph = [p for p in paragraphs if 'subtitle' not in p.attrs.get('class', [])] # iterate over the paragraphs and give back if not a class paragraph
print(other_paragraph[0].string) # attrs.get() give back None if paragraph not found
# instead of None we return an empty list [] is case paragraph not found
find_paragraph()
find_other_paragraph()
Are there any code examples left?
New code examples in category Python
-
Python 2023-04-11 03:04:20
-
Python 2022-03-27 22:40:04 pycharm no module named
-
Python 2022-03-27 22:25:05 assign multiple variablesin one line
-
Python 2022-03-27 22:20:02 levenshtein distance
-
Python 2022-03-27 21:35:09 get text from url python last slash
-
Python 2022-03-27 21:30:30 df concatenate df
-
Python 2022-03-27 21:25:09 python odd or even
-
Python 2022-03-27 21:15:32 python include function from another file
-
Python 2022-03-27 21:10:01 color module python
-
Python 2022-03-27 21:00:27 python tkinter cursor types