import webbrowser, sys, pyperclip

# sys covers in lesson 20
sys.argv # ['mapit.py', '870', 'Valencia', 'St.']

# Check if command line arguments were passed
if len(sys.argv) > 1:
  # ['mapit.py', '870', 'Valencia', 'St.'] -> [870 Valencia St.]
  address = ' '.join(sys.argv[1:])
else:
  address = pyperclip.paste()

# https://www.google.com/maps/place/<ADDRESS>
webbrowser.open('https://www.google.com/maps/place/' + address)

True

Save the above code in file as "mapit.py". Create a batch file "mapit.bat" and include the following code

@python Z:\IT\Python\MyPythonScripts\mapit.py %*

To run the program, either:

Copy address Ctrl+C, select Win+R, type mapit, hit Enter or
Select Win+R, type mapit followed by the address, hit Enter

Downloading from the Web within the Requests Module¶

The requests module is a third party module for downloading web pages and files. Run pip install requests to install.

requsts.get() returns a Response object. The raise_for_status() Response method will raise an exception if the download failed.

You can learn more about the other features in the request modules from the website https://requests.readthedocs.org

import requests
res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
res.status_code

200

len(res.text)

174130

print(res.text[:500])

ï»¿The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org/license


Title: Romeo and Juliet

Author: William Shakespeare

Posting Date: May 25, 2012 [EBook #1112]
Release Date: November, 1997  [Etext #1112]

Language: English


*** S

res.raise_for_status()

import requests
badRes = requests.get('https://automatetheboringstuff.com/files/fail')
# badRes.raise_for_status() # 404

badRes.status_code

404

Write-binary mode¶

You can save downloaded file to your hard drive with call to the iter_content() method. Pass wb as the second argument to open() method in order to maintain the unicode encoding in the file.

playFile = open('RomeoAndJuliet.txt', 'wb')

for chunk in res.iter_content(100000):
  playFile.write(chunk)

playFile.close()

Unicode and bytes data type are explained on the page All about Python & Unicode. You can also watch the video there.

Parsing HTML with the Beautiful Soup Module¶

Web pages are plaintext files formatted as HTML. HTML can be parsed with the BeautifulSoup module. To install BeautifulSoup:

# pip install beautifulsoup4

Exercise: Get the price of a book from Amazon¶

Import modules BeautifulSoup & Requests. BeautifulSoup is imported with the name bs4

import bs4, requests

Pass URL of the web page to request.get() to get a Request Object

res = requests.get('https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994')
res.raise_for_status()

Pass the string with the HTML to bs4.BeautifulSoup function to get a Soup object

soup = bs4.BeautifulSoup(res.text, "html.parser")

The Soup object has a select() method that can be passed a string of the CSS Selector from HTML tag. You can get a CSS selector string from the browser's developer tools by right-clicking the element and selecting Copy CSS Path

To open browser's developer tools:

Chrome, IE & Firefox: F12
Safari: Cmd+Opt+I

To locate the element:

Chrome: right click the element on web page & select Inspect
Firefox: right click the element on web page & select Inspect Element

To get the CSS selector string:

Chrome: right click the element in developer's tool & select copy > copy Selector
Firefox: right click the element in developer's tool & select copy > CSS Selector

elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')

The select() method will return a list of matching Element objects. Each of these elements objects has a text property with a string of that element HTML. You can trim the output by using strip() String method

elems[0].text.strip()

'$23.96'

The Full Script: amazonPrice.py¶

import bs4, requests

def getAmazonPrice(productUrl):
  res = requests.get(productUrl)
  res.raise_for_status()

  soup = bs4.BeautifulSoup(res.text, 'html.parser')
  elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')
  return elems[0].text.strip()

price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994')
print('The price is ' + price)

The price is $23.96

Controlling the Browser with the Selenium Module¶

Selenium is a suite of tools (Selenium + WebDriver) to automate web browsers across many platforms. To install Selenium module: pip install selenium. Also you need to download the latest executable Webdriver geckodriver from here to run latest firefox using Selenium. The Webdriver must be found in environment PATH variable.

To import Selenium

from selenium import webdriver

To open the browser, run

browser = webdriver.Firefox()

To send the browser to a URL

browser.get('https://automatetheboringstuff.com')

The browser.find_element_by_css_selector() method will return a list of WebElement objects

elem = browser.find_element_by_css_selector('.main > div:nth-child(1) > ul:nth-child(18) > li:nth-child(1) > a:nth-child(1)')

elems = browser.find_elements_by_css_selector('p')
len(elems)

8

To get the CSS selector string:

Chrome: right click the element in developer's tool & select copy > copy Selector
Firefox: right click the element in developer's tool & select copy > CSS Selector

The click() method will click on an element in a browser

elem.click()

Selenium’s WebDriver Methods for Finding Element[s]¶

Method name	WebElement object/list returned
browser.find_element[s]_by_class_name(name)	Elements that use the CSS class name
browser.find_element[s]_by_css_selector(selector)	Elements that match the CSS selector
browser.find_element[s]_by_id(id)	Elements with a matching id
browser.find_element[s]_by_link_text(text)	<a> elements that completely match the text provided
browser.find_element_by_partial_link_text(text)	<a> elements that contain the text provided
browser.find_element[s]_by_name(name)	Elements with a matching name
browser.find_element[s]_by_tag_name(name)	Elements with a matching tag name (case insensitive; an <a> element is matched by 'a' and 'A')

The sendkeys() method will type into a specific element in the browser, usually an input form for search, login etc

# searchElem = browser.find_element_by_css_selector('.search-field')
# searchElem.send_keys('zophie')

The submit() method will simulate clicking on the Submit button of a form

# searchElem.submit()

The browser can also be controlled with these commands:

browser.back()
browser.forward()
browser.refresh()
browser.quit()

Example: Grap a paragraph¶

from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://automatetheboringstuff.com')
elem = browser.find_element_by_css_selector('.main > div:nth-child(1) > p:nth-child(7)')
elem.text

"In Automate the Boring Stuff with Python, you'll learn how to use Python to write programs that do in minutes what would take you hours to do by hand-no prior programming experience required. Once you've mastered the basics of programming, you'll create Python programs that effortlessly perform useful and impressive feats of automation to:"

Example: Grap the Entire web page¶

# elem = browser.find_element_by_css_selector('html')
# elem.text

To learn more about Selenium, read this doc or read the rest of Chapter 11 of the book

Table of Contents

Web Scraping ¶

The webbrowser Module¶

Exercise: Open address with Google Map¶

Downloading from the Web within the Requests Module¶

Write-binary mode¶

Parsing HTML with the Beautiful Soup Module¶

Exercise: Get the price of a book from Amazon¶

The Full Script: amazonPrice.py¶

Controlling the Browser with the Selenium Module¶

Selenium’s WebDriver Methods for Finding Element[s]¶

Example: Grap a paragraph¶

Example: Grap the Entire web page¶

Table of Contents

Web Scraping¶

The webbrowser Module¶

Exercise: Open address with Google Map¶

Downloading from the Web within the Requests Module¶

Write-binary mode¶

Parsing HTML with the Beautiful Soup Module¶

Exercise: Get the price of a book from Amazon¶

The Full Script: amazonPrice.py¶

Controlling the Browser with the Selenium Module¶

Selenium’s WebDriver Methods for Finding Element[s]¶

Example: Grap a paragraph¶

Example: Grap the Entire web page¶

Web Scraping ¶