Today we have lot of data which is available on the internet. what if these data we will have and use it for decision making and growth of our business. generally scrapping used for data analytics.Today I am going to explain two method of web scrapping 1. Selenium web driver 2. Beautiful soup.
Selenium web driver
we need to following library to install in python before go further
1. selenium web driver(it support chromedriver to execute) 2. Panda (It act as data frame which store data temporarily from the web page) 3. Chrome web driver
All the library can get install in using npm method using syntax npm install selenium npm install panda npm install chromedriver
let's move to python code
from selenium import webdriver import pandas as pd
name=driver.find_elements_by_class_name("stoererwrapper") for nam in name: p_name=nam.find_element_by_tag_name('span').text temp_record=pd.DataFrame([[p_name]],columns=["name"]) record=record.append(temp_record) print record
First two line we are importing the library
In second line we are creating table with one column as name.
In third line we are accessing chromedriver, you need to give the path of chromedriver.
In fourth line we command chromedriver to load website.
In fifth line we are fetching the content having class name storewrapper which includes name of the product.
Further records getting insert into temprory table as temp_record, later it is restored in
orginal table as record
Below is the result of said code
print record name 0 TOPSELLER! 0 XDJ-1000 MK2 0 The perfect player for digital DJs 0 NEW! 0 LITTLE MARCUS 500 0 Marcus Miller Signature - 500W and two Equalizers 0 MEGADEAL! 0 OPERA 12 0 Active 2-way speaker with DSP presets 0 MEGADEAL! 0 FOG FURY JETT 0 Vertical output, 12 x 3W LEDs, DMX
Beautiful Soup method
In order to Scrape by this method we need following library
BeautifulSoup (Library to pulling data from html and xml)
Request (This library support access to given website)
IO (This library helps to create file)
Pandas (This library helps to create temporary table within the system)