Selenium has always been the choice of web developers to test their applications before launch but it also can be used to collect data.
Some of the sites on the Internet require a lot of manual intervention that most of traditional scraping methods will fail to reproduce. What to do then? - Use Selenium!
Is it easy to use - Pretty much.
Installing Selenium:
You can download Python bindings for Selenium from the PyPI page for selenium package. However, a better approach would be to use pip to install the selenium package. Python 3.6 has pip available in the standard library. Using pip, you can install selenium like this
For Windows (Since Linux already has working python)
Now you'd get a list like this:
If your list is any different from this, do not bother its going to be. Pick one folder which you think will never be deleted and place the driver over there and forget about it! (No! Do write it down somewhere, you might need to know where it is so when you are updating you can replace it with a new file)
Wait a second! That's a lot of code, what just happened there?
A lot of things to be honest.
First is the import line, allows to import sleep module so that we can pause execution for a little bit in between to understand whats happening on the page, otherwise web driver will do everything at a computer speed and we will not understand whats happening.
Second line imports the web driver.
In 4th line, we instantiate a web driver. And on the next line order it to go to google's homepage.
On 6th line we ask the computer to take a brake of 3 seconds while we gaze at the google's homepage.
Here we are locating the search bar by its name (it's q). We assign that element to a variable called search_box and use a function send_keys to send keyboard strokes (type in) the letters 'I am loving selenium!'
We do want to be able to hit the search button, but how? The next line of code exactly does that.
It calls on function submit to submit the keystrokes! This totally avoids the hassle of locating the search button's element id and clicking on it. (Don't worry, Its definitely possible if you want to though)
On the last line:
driver.quit()
We close the web driver session and release the resources allocated to it. That's it, you just searched on Google using a custom bot. Kudos give yourself pat on the back
Some of the sites on the Internet require a lot of manual intervention that most of traditional scraping methods will fail to reproduce. What to do then? - Use Selenium!
Is it easy to use - Pretty much.
Prerequisites:
Installing Selenium:
You can download Python bindings for Selenium from the PyPI page for selenium package. However, a better approach would be to use pip to install the selenium package. Python 3.6 has pip available in the standard library. Using pip, you can install selenium like this
pip install selenium
For Windows (Since Linux already has working python)
- Install Python using the MSI available in python.org download page.
- Start a command prompt using the
cmd.exe
program and run thepip
command as given below to install selenium.
C:\Python36\Scripts\pip.exe install selenium
If the above works, Now you can run your test scripts using Python.
Note: You can also install selenium remote web driver, but in most cases you will not need it! If you wish to do it anyway - Here are some instructions.
Note: You can also install selenium remote web driver, but in most cases you will not need it! If you wish to do it anyway - Here are some instructions.
Installing webdriver:
But wait, what exactly is a web driver?
Good Question, The python code is the pilot and web driver is a plane. Web driver look similar to web browsers (Chrome / Firefox / Egde) but they are not exactly browsers. Comment below if you want a separate article on that.
So to perform scraping from a site using human interaction, you will need webdriver (not browser).
Visit Selenium HQ's download page and locate: Third Party Drivers, Bindings, and Plugins
Here you'd find a list of currently available web drivers.
Let's assume you have decided to try out chromedriver. Download the file and place it in a path folder.
What are path folders and how to locate them?
Run this:
echo %PATH:;=&echo.%
Now you'd get a list like this:
C:\Windows\system32 C:\Windows C:\Windows\System32\Wbem C:\Windows\System32\WindowsPowerShell\v10\ C:\Program Files (x86)\ATI Technologies\ATI.ACE\Core-Static
If your list is any different from this, do not bother its going to be. Pick one folder which you think will never be deleted and place the driver over there and forget about it! (No! Do write it down somewhere, you might need to know where it is so when you are updating you can replace it with a new file)
Lets get down to the business
Your first selenium script! I am excited, are you?
1 2 3 4 5 6 7 8 9 10 11 12 | from time import sleep from selenium import webdriver driver = webdriver.Chrome('chromedriver') driver.get('http://www.google.com'); sleep(3) # Pause execution to see in slow motion. search_box = driver.find_element_by_name('q') search_box.send_keys('I am loving selenium!') # Wait search_box.submit() sleep(5) # Pause execution to see the search text entered. driver.quit() |
Wait a second! That's a lot of code, what just happened there?
A lot of things to be honest.
First is the import line, allows to import sleep module so that we can pause execution for a little bit in between to understand whats happening on the page, otherwise web driver will do everything at a computer speed and we will not understand whats happening.
Second line imports the web driver.
In 4th line, we instantiate a web driver. And on the next line order it to go to google's homepage.
On 6th line we ask the computer to take a brake of 3 seconds while we gaze at the google's homepage.
Here we are locating the search bar by its name (it's q). We assign that element to a variable called search_box and use a function send_keys to send keyboard strokes (type in) the letters 'I am loving selenium!'
search_box = driver.find_element_by_name('q') search_box.send_keys('I am loving selenium!')
We do want to be able to hit the search button, but how? The next line of code exactly does that.
It calls on function submit to submit the keystrokes! This totally avoids the hassle of locating the search button's element id and clicking on it. (Don't worry, Its definitely possible if you want to though)
On the last line:
driver.quit()
We close the web driver session and release the resources allocated to it. That's it, you just searched on Google using a custom bot. Kudos give yourself pat on the back
Now What?
You have just opened to a new universe of opportunity! There are no limits to what you could do with web scraping. Most startups and existing big businesses do it, and its not going out of business.
You can build apps around real customer data, get new acquisitions validate your users automate routine tasks the list goes on.
You can further explore our blog for interesting reads OR- you can contact us to learn a bit more over a FREE personal Skype coaching session. Just click on "Leave a message" and reach out to us. We get a lot of volume these days so FREE Sessions wont be here for long, Grab this opportunity while you can!
You can build apps around real customer data, get new acquisitions validate your users automate routine tasks the list goes on.
You can further explore our blog for interesting reads OR- you can contact us to learn a bit more over a FREE personal Skype coaching session. Just click on "Leave a message" and reach out to us. We get a lot of volume these days so FREE Sessions wont be here for long, Grab this opportunity while you can!
Comments
Post a Comment