CMSC320

Ace your homework & exams now with Quizwiz!

tabular operations

1. select/slicing (select only some rows or some columns or a combination of both 2.aggregrate/reduce combine values caross a column into a single value 3. map apply a function to every row, possibly creating more or fewer columns. variations that allow one row to generate multiple rows in the output (sometimes called "flatmap") 4. Group By Group tuples together by column/dimension 5. Group By Aggregate Compute one aggregate per group Final result usually seen as a table 6. Union/Intersection/Difference Set operations - only if the two tables have identical attributes/columns Similarly intersection and set difference manipulate tables as sets IDs may be treated in different ways, resulting in somewhat different behaviors

JSON from twitter

GET https://api.twitter.com/1.1/friends/list.json?cursor =-1&screen_name=twitter_api&skip-status=true&include_user_entities=false

RESTful APIs (Representational State Transfer)

GET: perform query, return data POST: create a new entry or object PUT: update an existing entry or object PATCH: partially update an existing entry or object DELETE: delete an existing entry or object. Can be more intricate. but verb ("put") aligns with actions

JSON Files and Strings

JSON is a method for serializing objects: Convert an object into a string deserialization converts a string back to an object easy for humans to read Defined by: object - python dict, hash table, Java map array - Python list, Java array, vector value - Python string, float, int, boolean, JSON object, JSON array

Relationships

Primary keys and foreign keys define interactions between different tables aka entities. Four types: One-to-one one-to-one-or-none one-to-many and many-to-one many-to-many connects (one, many) of the rows in one table to (one, many) of the rows in another table

creating lists in a "pythonic way"

P = [2**x for x in range(17)] E = [x for x in range(1000) if x%2 != 0] map/filter is "lazier" than this

scipy cont

SciPy gives you access to a ton of specialized mathematical functionality. Just know it exists. We won't use it much in this class some functionality: special mathematical functions (scipy.special) -- elliptic, bessel, etc) Integration (scipy.integrate) optimization (scipy.optimize) interpolation (scipy.interpolate) fourier transformations (scipy.ffpack) signal processing (scipy.signal) Linear algebra (scipy.linalg) Compressed Sparse Graph Routines (scipy.sparse.csgraph) Spatial data structures and algorithms (scipy.spatial) Statistics (scipy.stats) Multidimensional image processing (scipy.ndimage)

Querying a RESTful API

Stateless: with every request, you send along a token/authentication of who you are token = "super_secret_token" r = requests.get("https//:github.com/user", params = {"access_token": token}) print(r.content) {"login": "Mohammad Nayeem teli", "id" : 10536112, "avatar url" : "http..."} PUT/POST/DELETE can edit your repositories

XML, XHTML, HTML

Still hugely popular online, but JSON has replaced XML for: asychnchronous browser <--> server calls many newer web APIs

summary of operations

Tables: A simple, common abstraction Subsumes a set of "strings" - a common input Operations Select, Map, Aggregate, Reduce, Join/Merge, Union/Concat, Group by In a given system/language, the operations may be named differently: SQL uses "join", whereas Pandas uses "merge"

Relation

Simplest relation: a table aka tabular data full of unique tuples

XML

XML is a hierarchical markup language: <tag attribute="value1"> <subtag> 'some cool words or vales go here.' </subtag> <openclosetag attribute="value2/> </tag>

Pandas: History

Written by: Wes McKinney Started in 2008 to get a high-performance, flexible tool to perform quantitave analysis on financial data Highly optimized for performance, with critical code paths written in Cython or C Key constructs: series (like a numPy array) DataFrame (like a table relation or R data.frame) Foundation for Data Wrangling and Analysis in Python

numpy array

a few mechanisms for creating arrays in NumPy: conversion from other python structures (eg lists, tuples) Any sequence-like data can be mapped to a ndarray Built-in NumPy array creation (eg, aranges, ones, zeroes, etc.) create arrays with all ones, zeroes, increasing numbers from 0 to 1, etc Reading arrays from disk, either from standard or custom formats (CSV file)

primary key

a unique identifier for every tuple in a relation. Each tuple has one primary key

resizing array

an array shape can be manipulated by a number of methods. resize(size) will modify an array in place. reshape(size) will return a copy of the new array with a new shape a = np.floor(10*np.random((3,4))) print(a) [[9. 8. 7. 9.] [7. 5. 9. 7.] [8. 2. 7. 5.]] a.shape (3,4) a.ravel() array([9., 8., 7., 9., 7., 5., 9., 7., 8., 2., 7., 5.]) a.shape = (6,2) print(a) [[9. 8.] [7. 9.] [7. 5.] [9. 7.] [8. 2.] [7. 5.]] a.transpose() array([[9., 7., 7., 9., 8., 7.], [8., 9., 5., 7., 2., 5.]])

Python

an interpreted, dynamically-typed, high-level, garbage-collected, object-oriented- functional-imperative, and widely used. interpreted: instructions are executed without being compiled into virtual mahine instructions Dynamically-typed: verifies type safety at runtime high-level: abstracted away from the raw metal and kernel garbage-collected: memory management is automated OOF: you can do bits of OO, F programming

map

apply a function to a sequence or iterable arr = [1,2,3,4,5] map(lambda x:x**2, arr) = [1,4,9,16,25]

foreign keys

attributes (columns) that point to a different table's primary key A table can have multiple foreign keys

Exceptions

tweepy(Python Twitter API) returns "Rate limit exceeded" sqlite (a file-based database) returns an integrity error print('Python', python_version()) try: cause_a_Name_Error except Name_Error as err: print(err, '-> some extra text')

one-to-one

two tables have a one-to-one relationship if every tuple in the first table corresponds to exactly one entry in the other (person --> SSN) In general, you won't be using these (why not just merge the rows into one table)? split a big row between SSD and HDD or distributed Restrict access to part of a row (some DBMSs allow column-level access control, but not all) Caching, partioning, and serious stuff: another class

linear algebra in numpy continued (code)

u = eye(2) array([[1.,0.],[0.,1.]]) j = array([[0.0,-1.0],[1.0,0.0]]) dot(j,j) array([[-1.,0.],[0.,-1.]]) trace(u) #trace sum of diagonal 2.0 y = array([[5.],[7.]]) solve(a,y) #solve linear matrix equation array([[-3.],[4.]]) eig(j) #get eigenvalues/eigenvectors of matrix array([0.+1.j, 0.-1.j]), array([[0.707107 + 0.j, 0.707107+0.j],[0.000.7071j, 0.000+.7071j]]))

example of tidydata

variable: measure or attribute: age, sex, weight, height value: measurement of attribute: 12.2, 42.3kg, 145.1cm, M/F Observation: all measurements for an object: a specific person is [12.2, 42.3, 145.1, F]

one-to-one-or-none

we want to keep track of people's cats. Each person has at most one entry in the table

NumPy array code

x = np.array([2,3,1,0]) x = np.array([2,3,1,0]) x = np.array([[1,2,0], [0,0],(1+1j,3.)]) x = np.array([[ 1.+0.j, 2.+0.j], [0.+0.j, 0.+0.j], [1.+1.j, 3.+0.j]])

NumPy arrays cont

zeros(shape) - creates an array filled with 0 values witht he specified shape. The default dtype is float64. np.zeroes((2,3)) array([[0.,0.,0.], [0.,0.,0.]]) ones(shape) - creates an array filled with 1 values arange() - like Python's range() np.arange(10) array([0,1,2,3,4,5,6,7,8,9]) np.arange(2,10, dtype=np.float) array([2.,3.,4.,5.,6.,7.,8.,9.]) np.arange(2,3,0.2) array([2. , 2.2, 2.4, 2.6, 2.8])

array operations

basic operations apply element-wise. the result is a new array with the resultant elements. a= np.arange(5) b = np.arange(5) a + b array([0,2,4,6,8]) a-b array([0,0,0,0,0]) a**2 array([0, 1, 4 , 9, 16]) a>3 array([False, False, False, False, True], dtype = bool) 10*np.sin(a) array([0., 8.4147, 9.09297, 1.411200, -7.56802]) a*b array([0, 1, 4, 9, 16])

five most common problem with messy data

column headers are values, not variable names multiple variables are stored in one column variables are stored in both rows and columns multiple types of observational units in the same table a single observational unit stored in multiple tables

HTTP requests

conda install -c anaconda requests=2.21.0 r = requests.get('cmsc320 website url') r.staus_code = 200 r.headers('content_type') 'text/html' r.content 'b'<!DOCTYPE html>\nhtml lang="en" > \n\n <head>\n\n <meta charset = "utf-8" > \n <meta name = "viewport"

a web-based application programming interface (API)

contact between a server and a user stating: "If you send me a specific request, I will return some information in a structured and documented format." More generally, APIs can also perform actions, may not be web-based, be a set of protocols for communicating between processes, between an application and an OS

NumPy

contains: a powerful n-dimensional array object sophisticated (broadcasting/universal) functions tools for integrating C/C++ and Fortran code useful linear algebra, Fourier tranform, and random number capabilities, etc can also be used as an efficient multi-dimensional container of generic data

functions in python

def my_func(x,y): if (x>y): return x else: return y def my_func(x,y): return (x-1,y+2) (a,b) = my_func(1,2) a= 0, b= 4

HTTP requests cont

https://www.google.com/?c = cmsc320&tbs = qdr:m HTTP GET request: GET ?q = cmsc320&tbs = qdr:m HTTP/1.1 HOST: google.com User-Agent: Mozilla 15.0 (xll; Linux x86_64; rv:10.0.01) Gecko/20100101 Firefox/10.0.01 params = {"q": "cmsc320", "tbs", "qdr:m"} r = requests.get("https//www.google.com", params = params)

printing items in list in PYTHON

idx = 0 while idx < len(arr): print(arr[idx]) idx+= 1 for element in arr: print(element)

compiling regex

if things are going slowly or you are going to reuse the regular expression, then compile it. #compile the reg expression "cmsc320" regex = re.compile(r"cmsc320") #use it repeatedly to search for matches in text regex.match(text) # does strat of text match? regex.search(text) #find first match or none regex.findall(text) #finds all matches

searching for elements

if we want to do it in a table without pandas, would be O(n). Have to search the whole table

indexes

like a hidden sorted map of references to a specific attribute (column) in a table; allows O(log n) lookup instead of O(n) Actually implemented with data structures like B-trees But: indexes are not free takes memory to store takes time to build takes time to update (add/delete a row, update the column) But, but: one index is (mostly) free Index will be built automatically on the primary key think before you build/maintain an index on other attributes

numpy arrays cont pt 2

linspace() - creates arrays with a specified number of elements, and spaced equally between the specified beginning and end values. np.arange(1., 4., 6) array([1., 1.6, 2.2, 2.8, 3.4, 4.]) random.random(shape) - creates arrays with random floats over the interval [0,1]. np.random.random((2,3)) array([[0.7586, .4176, 0.3500], [0.7716, 0.0587, 0.9879]])

list

list(range(10)) [0,1,2,3,4,5,6,7,8,9]

matching sequences and repeating characters

match 'a' 0 or 1 time: a? match character 'a' 0 or more times: a* match char 'a' 1 or more times a+ match character a exactly n times: a{n} match char 'a' at least n times: a{n,}

Can match sets of characters or multiple and more elaborate sets and sequences of chars:

match 'a': a match 'a', 'b', or 'c': [abc] match any character but 'a', 'b', or 'c': [^abc] match any digit = \d (=[0123456789]) match any alphanumerc = \w (=[a-z A-Z0-9]) match any whitespace = \s (=[\t\n\r\f\v]) match any character: . Special charcters must be escapes: $.^*+?{}[]()

tools to fix common prblems in messy data

melting string splitting casting

ndarray

ndarray object: an n-dimensional array of homogenous data types, with mnay operations being performed in compiled code for performance

NumPy datatypes

numpy.dtype class includes: intc(same as C integer) and intp (used for indexing) int 8, int16, int32, int64 uint8, uint16, uint32, uint64 float16, float32, float64 complex64, complex128 bool_, int_, float_, complex_ are shorthand for defaults These can be used as functions to cast literals or sequence types, as well as arguments to NumPy functions that accept the dtype keyword argument.

pooling analyses

pooled slope estimate is the average of the N imputed estimates beta1p = (beta1 1 + b1 2)/ 2 the pooled slope variance is: s = (sum of zi)/n + (1 + 1/n) * 1/(n-1) * sum(beta1 i - beta 1 p)^2) where zi is the standard error of the imputed slopes standard error: take the square root

basic idea of python

present code in the order that logic and flow of human thoughts demand, not the machine-needed ordering source code; text explanation; and end results of running code

How a relational DB fits into your workflow

raw input --> python<--> structured output (trained classifiers, JSON for D3, visualizations) python<-->SQLite File (SQL) <--> SQLite CLI & GUI Frontend (SQL)

filter

returns a list of elements for which a predicate is true arr[1,2,3,4,5,6,7] filter(lambda x: x %2 == 0,arr) [2,4,6]

len

returns the number of items of an enumerate object x = len(['c', 'm', 's', 'c', 3, 2, 0]) x = 7

scipy

scipy is a collection of mathematical algoritms and convenience functions built on the numpy extensions of Python. It adds significant power to the interactove python session by providing the user with high-level commands and classes for manipulating and visualizing data. Basically SciPy scontains various tools and functions for solving common problems in scientific computing.

array operations cont

since multiplication is done element-wise, you need to specifically perform a dot product to perform matrix multiplication. a = np.zeroes(4).reshape(2,2) a array([[0., 0.], [0., 0.]]) a[0,0] = 1 a[1,1] = 1 b = np.arange(4).reshape(2,2) b array([[0,1],[2,3]]) a *b array([[0., 0.], [0., 3.]]) np.dot(a,b) array([[0., 1.], [2., 3.]])

indexing

single-dimension indexing is accomplished as usual. x = np.arange(10) x[2] 2 x[-2] 8 x.shape = (2,5) x[1,3] 8 x[1,-1] 9

indexing cont

slicing is possible just as it is in Python sequences. x = np.arange(10) x[2:5] array([2,3,4]) x[:-7] array([0,1,2]) x[1:7:2] array([1,3,5]) y = np.arange(35).reshape(5,7) y[1:5:2, ::3] array([7,10,13], [21, 24, 27])

aside:pandas

so this kinda feels like pandas.. and pandas kinda feels like a relational data system... Pandas is not strictly a relational data system: No notion of primary/foreign keys It does have indexes (and multi-column indexes): pandas.Index: ordered, sliceable set stroing axis labels pandas.MultiIndex: hierarchical index Rule of thumb: do heavy, rough lifting at the relational DB level, then fine-grained slicing and dicing and viz with pandas

hierachical indexes

sometimes more intuitive organization of the data Makes it easier to understand and analyze higher-dimensional data instead of 3-D array, may only need a 2D array

Pandas: series

subclass of numpy.ndarray data: any type index labels need not to be ordered duplicates possible but result in reduced functionality

HTML

the specification is fairly pure. We'll use BeautifulSoup: conda install -c asmeurer beautiful-soup-4.3.2 import requests from bs4 import BeautifulSoup r = r.requests.get("https://cs.umd.edu/class/summer/cmsc320/") root = BeautifulSoup(r.content) root.findAll("a") #links for cs320

Array Operations cont

there are also some built-in methods of ndarray objects. universal functions which may also be applied include exp, sqrt, add, sin, cos, etc a = np.random.random((2,3)) a array[.682, 0.989, 0.694], [0.788, 0.622, 0.405]]) a.sum() 4.1807 a.min() 0.405 a.max(axis = 0) array([0.788, 0.989, 0.694]) a.min(axis=1) array([0.682, 0.405]) axis = 0 - talking ab columns axis = 1, talking ab rows

5 ways to get Data

direct download and load from local storage generate locally via dowloaded code (eg simulation) query data from a database query an API from the intra/internet scrape data from a webpage

delete row(s) from the table

#Delete row(s) from the table cursor.execute("DELETE FROM cats WHERE id == 2"); conn.commit()

Regular Expressions Cont

#Does start of text match cmsc320? match = re.match(r"cmsc320", text) #Iterate over all matches for "cmsc320" in text for match in re.finditer(r"cmsc320", text): print(match.start()) #find all matches matches = re.findall(r"cmsc320", text)

crash course in SQL (in Python)

#Make a table cursor.execute(""" CREATE TABLE cats( id INTEGER PRIMARY KEY, name TEXT )""") Capitalization doesn't matter for SQL reserved words SELECT = select = SeLeCt Rule of thumb: capitalize keywords for readability

downloading a bunch of files cont

#cycle through the href for each anchor, checking to see if it's a PDF/PPTX link or not for lnk in lnks: href = lnk['href'] #if it's a PDF/PPTX link, queue a download If href.lower().endswith(('.pdf', '.pptx')): urld = urlparse.urljoin(url,href) rd = requests.get(urld, stream=True) #write the downloaded pdf to a file outfile = path.join(outbase,href) with open(outfile,'wb') as f: f.write(rd.content

more complicated example cont

#formatting df["week"] = df['week'].str.extract('(\d+)+, expand=False).astype(int) df["rank"] = df["rank"].astype(int) #Cleaning out unnecessary rows df = df.dropna() #Create "date" columns df['date'] = pd.to_datetime(df['date.entered']) + pd.to_timedelta(df['week'], unit='w') -- pd.DateOffset(weeks=1)

Inserting into table

#insert into the table cursor.execute("INSERT INTO cats VALUES (1, 'Megabyte')") cursor.execute("INSERT INTO cats VALUES (2, 'Meowly Cyrus')") cursor.execute("INSERT INTO CATS VALUES (3,'Fuzz Aldrin')") conn.commit()

more complicated example

#keep identifer variables id_vars = ["year","artist.inverted","track","time","genre","date.entered","date.entered", "date.peaked"] #melt the rest into week and rank columns df = pd.melt(frame=df, id_vars=id_vars, var_name = "week", value_name="rank")

reading rows

#read all rows from a table for rows in cursor.execute("SELECT * FROM cats"); print(row) #Read all rows into pandas dataFrame pd.read_sql_query("SELECT * FROM cats", conn, index_col="id")

melting data

f_df = pd.melt(df, ["religion'], var_name = "income", value_name = "freq") f_df = f_df.sort_values(by=["religion"]) f_df.head(10)

linear algebra in numpy

from numpy import * from numpy.linalg import * a = array([1.0,2.0],[3.0,4.0]]) a.transpose() array([[1.,3.],[2.,4.]]) inv(a) array([-2,1.], [1.5,-.5]])

downloading a bunch of files

import re import requests from bs4 import BeautifulSoup try: from urllib.parse import urlparse except ImportError: from urlparse import urlparse #HTTP GET request sent to the URL url r = requests.get(url) #use BeautifulSoup to parse the GET response root = BeautifulSoup(r.content) links = root.find("div", id="schedule")\.find("table")\.find("tbody").findAll("a")

Crash Course in SQL (in Python)

import sqlite3 #create a database and connect to it conn = sqlite3.connect("cmsc320.db") cursor = conn.cursor() conn.close() Cursor: temporary work area in system memory for manipulating SQL statements and return values If you do not close the connection(conn.close()), any outstanding transaction is rolled back

SQL join visual

inner join - only the keys present in both. full join - both left and right table values are in new table left join - all left table values and values present in both right join - all right table values and values present in both

printing items in list in JAVA

int[] arr = new int[10]; for (int idx = 0; idex < arr.length; ++idx){ System.out.println(arr[idx]); }

one scipy example

integral of sinxdx from a to b we have a function object - np.sin defines the sin function for us. We can compute the definite integral from a to b using the quad function. res = scipy.integrate.quad(np.sin, 0, np.pi) print(res) (2.0,2.22044) # 2 with a very small error margin res = scipy.integrate.quad(np.sin, -np.inf, +np.inf) print(res) (0.,0.) #integral does not converge

Many-to-many

keep track of cats' colors: one column per column, too many columns, too many nulls, cteate a color_id

Regular expressions

"filename.pdf".endswith((".pdf",".pptx")) "fiLNmae.pDf".lower().endwith(".pdf", ".pptx") Used to search for specific elements, or groups of elements that match a pattern #find index of 1st occurence of "cmsc320" import re match = re.search(r"cmsc320", text) print(match.start())

merge operations

1. merge or join combine rows/tuples across two tables if they have the same key Outer joins can be used to "pad" IDs that don't appear in both tables Three variants: LEFT, RIGHT, FULL SQL Terminology -- Pandas has these operations as well values padded with 'NaN'

RESTful API status code

200: request was successful 201: a new resource was created 202: request was received but no modification made 204: request was successful, but response has no content 400: request was malformed 401: client is unauthorized 404: request service not found 415: requested data format is not supported 422: requetsed data format had missing data 500: server throws an error while processing

CSV files

Any CSV reader worth anything can parse files with any delimiter, not just ',' (eg "TSV" - tab-sperated)

associative tables

Ctas in one table and colors in another and then combine tables so you have cat_id and color_id in one table.

Data manipulation and computation

Data representaion (natural way to think about data) one-dimensional like an array or vector. Also an n-dimensional array or a matrix. Indexing, slicing, filter map --> apply a function to every element reduce/aggregate --> combine values to get a single scalar (sum, median) given 2 vectors: dot and cross products

pandas: dataframes

Each column can have a different type Row and column index Mutable size: insert and delete columns Note the use of word "index" for what we called "key" Relational database use "index" to mean something else Non-unique index values allowed may raise an exception for some operations

S we've queried a server using a well-formed GET request via the requests Python made. What comes back?

General structured data: Comma-Seperated-files (CSV) files and strings Javascript object notation(JSON) files and strings HTML, XHTML,XML files and strings Domain-specific structured data: shapefiles: geospatial vector data (OpenStreetMap) RVT files: architectural planning (Autodesk Revlt)

scipy.integrate

Lets say we do not have a function object, we only have some (x,y) samples that don't "define" our function. We can estimate the Integral using the trapezoidal rule. sample_x = np.linspace(0, np.pi,1000) sample_y = np.sin(sample_x) # creating 1000 samples result = scipy.integrate.trapz(sample_y, sample_x) print(result) 1.99999 sample_x = np.linspace(0,np.pi,1000000) sample_y = np.sin(sample_x) #creating a million samples result = scipy.integrate.trapz(sample_y, sample_x) print(result) 2.0

Tidy Data

Names of files/DataFrames = description of one dataset Enforce one data type per dataset (ish)

difference between NumPy arrays and Python sequences

NumPy arrays have a fixed size. Modifying the size means creating a new array NumPy arrays must be of the same datatype, but this can include Python objects -- may not get performance benefits more efficient mathematical operations than built-in sequence types

Authentication and OAUTH

Old and busted: r = requests.get("https//api.github.com/user, auth=("nayeemz", "database name")) new approach: OAUTH grants access tokens that give possibly incomplete access to a user or app without exposing a password

Python 3

Pyhton3 is intentionally backwards incompatible (but not that incompatible) biggest changes from Python2: print "statement" --> print("function") 1/2 = 0 --> 1/2 = .5 1//2 = 0 ASCII str default --> default Unicode Namespace ambiguity fixed: i = 1 [i for i in range(5)] print(i)

Python vs R

Python is a "full" programming language - easier to integrate with systems in the field R has a more mature set of pure stats libraries Python is catching up and is ahead for ML (machine Learning) Python is used more in the tech industry

JSON in python

Some built-in types: "Strings", 1.0, True,False, None Lists: ["Goodbye", "cruel", "world"] Dictionaries: {"hello": "bonjour", "goodbye", "au revoir"

enumerate

enumerate(["311","320","330")] [(0,"311"),(1,"320"),(2,"330")]

CSV code

import csv with open("schedule.csv", "rb") as f: reader = csv.reader(f, delimiter = ",", quotechar = '"') for row in reader: print(row)

parsing JSON in python

import json r = requests.get('https://api.github.com/search/repositories; params = {'q' : 'users'}) data = json.loads(r.content) json.load(some_file) #loads JSON from a file json.dump(some_obj, some_file) #writes JSON to a file json.dumps(json_obj) #returns JSON String

printing arrays numpy

import numpy as np a = np.arange(3) print(a) [0,1,2] a array([0,1,2]) b = np.arange(9).reshape(3,3) print(b) [[0 1 2] [ 3 4 5] [6 7 8]] c = np.arange(8).reshape(2,2,2) print(c) [[[0 1] [2 3]] [[4 5] [6 7]]]

Numpy dataype code

import numpy as np x = np.float32(1.0) x y = np.int_([1,2,4]) y z = np.arange(3, dtype=np.uint8) z array([0,1,2], dtype =uint8) z.dtype dtype('uint8')

SQLite

on-disk relational database managment system (RDMS) Applications connect directly to a file Most RDMSs have applications connect to a server: Advantages include greater concurrency, less restrictive locking Disadvantages include, for this class setup time All interactions use Structured Query Language (SQL)

data processing operations

one or more datasets as input and produce one or more datasets as output

one-to-many and many-to-one

one person can have one nationality in this example, but one nationality can include many people

indexing cont

using fewer dimensions to index will result in a subarray: x = np.arange(10) x.shape = (2,5) x[0] array([0, 1 , 2, 3, 4]) This means that x[i, j] = x[i][j] but the second method is less efficient


Related study sets

Histology - Skeletal Connective Tissue - Cartilage and Bone

View Set

Botkin Chapter 4 - The Big Picture: Systems of Change

View Set

2 EVOLVE ANTEPARTUM/intra/post combined from other peoples stuff., Intrapartum, NEWBORN

View Set

First Conditional (present simple, simple future)

View Set

Inheritance Review, Chapter 13 object review, Ch 15 - C++

View Set