First of all , you should install a module named BeautifulSoup in Python . You can install it by typing - pip install bs4.
I created a script which goes to https://timesofindia.indiatimes.com/home/headlines and extracts all the given headlines on the site and print it in the terminal and also store in a excel file.
Step-1 : Import all the necessary modules required . Example : bs4 , requests , time , csv .
(csv [comma separated values] is a module in python used to write in excel sheet )
code : from bs4 import BeautifulSoup
import requests
import csv
Step-2 : Define a variable and give it a link in the form of a string.
code : url = 'https://timesofindia.indiatimes.com/home/headlines' .
Step-3 : Define a variable and get the url with the help of the requests module.
code : r = requests.get(url)
Step-4 : Define a variable and give it the value of content.
code : c = r.content
Step-5 : Make a BeautifulSoup object and use html parser.
code : soup = BeautifulSoup(c , 'html.parser') .
Step-6 : Make a loop and iterate through all the 'span' tags and print the text in it .
code : for i in soup.find_all('span' , class_='w_tle') :
print(i.get_text())
This will print all the headlines given on the site in the terminal .
Here is the source code of this script.
from bs4 import BeautifulSoup
import requests,time
import csv
# url="https://www.goodreads.com/quotes"
url = 'https://timesofindia.indiatimes.com/home/headlines'
r = requests.get(url)
Content = r.content
soup = BeautifulSoup(Content, 'html.parser')
f = open('excel.csv', 'w',newline='')
writer = csv.writer(f)
for i in soup.find_all('span', class_='w_tle'):
text = i.get_text()
time.sleep(0.01)
print(text, '\n')
writer.writerow([text])
If you have reached till here , Thank you very much for reading this blog. Have a nice day !!!