import webbrowser
webbrowser.open('https://automatetheboringstuff.com')
Use google Map to show location of address typed or copied to clipboard (mapit.py)
import webbrowser, sys, pyperclip
# sys covers in lesson 20
sys.argv # ['mapit.py', '870', 'Valencia', 'St.']
# Check if command line arguments were passed
if len(sys.argv) > 1:
# ['mapit.py', '870', 'Valencia', 'St.'] -> [870 Valencia St.]
address = ' '.join(sys.argv[1:])
else:
address = pyperclip.paste()
# https://www.google.com/maps/place/<ADDRESS>
webbrowser.open('https://www.google.com/maps/place/' + address)
Save the above code in file as "mapit.py". Create a batch file "mapit.bat" and include the following code
@python Z:\IT\Python\MyPythonScripts\mapit.py %*
To run the program, either:
Ctrl+C
, select Win+R
, type mapit
, hit Enter
orWin+R
, type mapit
followed by the address, hit Enter
The requests module is a third party module for downloading web pages and files. Run pip install requests
to install.
requsts.get()
returns a Response object. The raise_for_status()
Response method will raise an exception if the download failed.
You can learn more about the other features in the request modules from the website https://requests.readthedocs.org
import requests
res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
res.status_code
len(res.text)
print(res.text[:500])
res.raise_for_status()
import requests
badRes = requests.get('https://automatetheboringstuff.com/files/fail')
# badRes.raise_for_status() # 404
badRes.status_code
You can save downloaded file to your hard drive with call to the iter_content()
method. Pass wb
as the second argument to open()
method in order to maintain the unicode encoding in the file.
playFile = open('RomeoAndJuliet.txt', 'wb')
for chunk in res.iter_content(100000):
playFile.write(chunk)
playFile.close()
Unicode and bytes data type are explained on the page All about Python & Unicode. You can also watch the video there.
Web pages are plaintext files formatted as HTML. HTML can be parsed with the BeautifulSoup module. To install BeautifulSoup:
# pip install beautifulsoup4
Import modules BeautifulSoup & Requests. BeautifulSoup is imported with the name bs4
import bs4, requests
Pass URL of the web page to request.get()
to get a Request Object
res = requests.get('https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994')
res.raise_for_status()
Pass the string with the HTML to bs4.BeautifulSoup
function to get a Soup object
soup = bs4.BeautifulSoup(res.text, "html.parser")
The Soup object has a select()
method that can be passed a string of the CSS Selector from HTML tag. You can get a CSS selector string from the browser's developer tools by right-clicking the element and selecting Copy CSS Path
To open browser's developer tools:
F12
Cmd+Opt+I
To locate the element:
Inspect
Inspect Element
To get the CSS selector string:
copy > copy Selector
copy > CSS Selector
elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')
The select() method will return a list of matching Element objects. Each of these elements objects has a text
property with a string of that element HTML. You can trim the output by using strip()
String method
elems[0].text.strip()
import bs4, requests
def getAmazonPrice(productUrl):
res = requests.get(productUrl)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')
return elems[0].text.strip()
price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994')
print('The price is ' + price)
Selenium is a suite of tools (Selenium + WebDriver) to automate web browsers across many platforms. To install Selenium module: pip install selenium
. Also you need to download the latest executable Webdriver geckodriver
from here to run latest firefox using Selenium. The Webdriver must be found in environment PATH variable.
To import Selenium
from selenium import webdriver
To open the browser, run
browser = webdriver.Firefox()
To send the browser to a URL
browser.get('https://automatetheboringstuff.com')
The browser.find_element_by_css_selector() method will return a list of WebElement objects
elem = browser.find_element_by_css_selector('.main > div:nth-child(1) > ul:nth-child(18) > li:nth-child(1) > a:nth-child(1)')
elems = browser.find_elements_by_css_selector('p')
len(elems)
To get the CSS selector string:
copy > copy Selector
copy > CSS Selector
The click()
method will click on an element in a browser
elem.click()
Method name | WebElement object/list returned |
---|---|
browser.find_element[s]_by_class_name(name) | Elements that use the CSS class name |
browser.find_element[s]_by_css_selector(selector) | Elements that match the CSS selector |
browser.find_element[s]_by_id(id) | Elements with a matching id |
browser.find_element[s]_by_link_text(text) | <a> elements that completely match the text provided |
browser.find_element_by_partial_link_text(text) | <a> elements that contain the text provided |
browser.find_element[s]_by_name(name) | Elements with a matching name |
browser.find_element[s]_by_tag_name(name) | Elements with a matching tag name (case insensitive; an <a> element is matched by 'a' and 'A') |
The sendkeys()
method will type into a specific element in the browser, usually an input form for search, login etc
# searchElem = browser.find_element_by_css_selector('.search-field')
# searchElem.send_keys('zophie')
The submit()
method will simulate clicking on the Submit button of a form
# searchElem.submit()
The browser can also be controlled with these commands:
browser.back()
browser.forward()
browser.refresh()
browser.quit()
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://automatetheboringstuff.com')
elem = browser.find_element_by_css_selector('.main > div:nth-child(1) > p:nth-child(7)')
elem.text
# elem = browser.find_element_by_css_selector('html')
# elem.text
To learn more about Selenium, read this doc or read the rest of Chapter 11 of the book