Py4e: Chapter 12; reading web data from Python
What is the ASCII character that is associated with the decimal value 42?
*
ASC2 Code (Decimal)
0 nul 16 dle 32 sp 48 0 64 @ 80 P 96 ` 112 p 1 soh 17 dc1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 stx 18 dc2 34 " 50 2 66 B 82 R 98 b 114 r 3 etx 19 dc3 35 # 51 3 67 C 83 S 99 c 115 s 4 eot 20 dc4 36 $ 52 4 68 D 84 T 100 d 116 t 5 enq 21 nak 37 % 53 5 69 E 85 U 101 e 117 u 6 ack 22 syn 38 & 54 6 70 F 86 V 102 f 118 v 7 bel 23 etb 39 ' 55 7 71 G 87 W 103 g 119 w 8 bs 24 can 40 ( 56 8 72 H 88 X 104 h 120 x 9 ht 25 em 41 ) 57 9 73 I 89 Y 105 i 121 y 10 nl 26 sub 42 * 58 : 74 J 90 Z 106 j 122 z 11 vt 27 esc 43 + 59 ; 75 K 91 [ 107 k 123 { 12 np 28 fs 44 , 60 < 76 L 92 \ 108 l 124 | 13 cr 29 gs 45 - 61 = 77 M 93 ] 109 m 125 } 14 so 30 rs 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 si 31 us 47 / 63 ? 79 O 95 _ 111 o 127 del
What is the decimal (Base-10) numeric value for the upper case letter "G" in the ASCII character set?
71
What ends up in the "x" variable in the following code: html = urllib.request.urlopen(url).read() soup = BeautifulSoup(html, 'html.parser') x = soup('a')
A list of all the anchor tags (<a..) in the HTML from the URL
Which HTTP header tells the browser the kind of document that is being returned?
Content-Type:
What is the purpose of the BeautifulSoup Python library?
It repairs and parses HTML to make it easier for a program to understand
What should you check before scraping a web site?
That the web site allows scraping not That the web site returns HTML for all pages
What is the most common Unicode encoding when moving data between systems?
UTF-8 is the most commonly used on web pages although UTF-8, UTF-16, and UTF-32 are the standard.
When reading data across the network (i.e. from a URL) in Python 3, what string method must be used to convert it to the internal format used by strings?
decode( )
Which of the following Python data structures is most similar to the value returned in this line of Python: x = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
file handle
What word does the following sequence of numbers represent in ASCII: 108, 105, 110, 101
line
In this Python code, which line is most like the open() call to read a file: import socket mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) mysock.connect(('data.pr4e.org', 80)) cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode() mysock.send(cmd) while True: data = mysock.recv(512) if (len(data) < 1): break print(data.decode()) mysock.close()
mysock.connect( )
In this Python code, which line actually reads the data: import socket mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) mysock.connect(('data.pr4e.org', 80)) cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode() mysock.send(cmd) while True: data = mysock.recv(512) if (len(data) < 1): break print(data.decode()) mysock.close()
mysock.recv()
Which of the following regular expressions would extract the URL from this line of HTML: <p>Please click <a href="http://www.dr-chuck.com">here</a></p>
not href="(.+)", http://.*
How are strings stored internally in Python 3?
unicode