Python Numpy and Pandas
Numpy: given two array how would you horizontally and vertically stack it
For horizontally stacking we can use hstack or concatenate with axis=1 or we can also use column_stack() For vertically stacking we can use vstack or concatenate with axis=2 or we can also use row_stack()
Pandas: What is the use of inplace=True in many pandas methods
Inplace=True will actaul change dataframe, if False it will not make permanent changes to dataframe
Numpy: what is notation for "not a number"?
NaN
Define numpy object
NumPy's main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes.
What happens when you add two series objects? series1+series2
Series 1 labels are retains, if lable not found in series2 then NaN
Numpy: How to get numpy flatier object, what is the use of it?
The flat property gives back a numpy.flatiter object. This is the only means to get a flatiter object; we do not have access to a flatiter constructor. The flat iterator enables us to loop through an array as if it were a flat array
Numpy: How to use dstack() give an example
To boot, there is the depth-wise stacking employing dstack() and a tuple, of course. This entails stacking a list of arrays along the third axis (depth). For example, we could stack 2D arrays of image data on top of each other as follows In: dstack((a, b)) Out: array([[[ 0, 0], [ 1, 2], [ 2, 4]], [[ 3, 6], [ 4, 8], [ 5, 10]], [[ 6, 12], [ 7, 14], [ 8, 16]]])
Numpy: what is broadcasting?
Using broadcasting once can change multiple values in one go. Below example changes nulls to 0 world_alcohol[:,4][world_alcohol[:,4]=='']='0'
Numpy: Explain arithmetic operations for numpy array...
We can add, substract, mulitply arr+arr, arr-arr, arr*arr, arr/arr
Numpy: Given and 2D array array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]) How to flat this array? How to transpose it? How to change shape of it?
We can use ravel() or flatten() methods. Flatten allocates new memory transpose() method resize() method
Pandas: How to use multiple filter conditions? And condition and or condition
We cannot use and or operator with Pandas, instead we have to use & / | for and and or respectively
Explain this code - totals = {} year = (world_alcohol[:,0]=='1989') year = world_alcohol[year,:] for i in countries: is_country = (year[:,2]==i) country_consumption = year[is_country,:] country_consumption[:,4][country_consumption[:,4]=='']='0' l = country_consumption[:,4].astype(float) isum = l.sum() #print(i,l,'-',isum) totals[i] = isum print(totals)
We've assigned the list of all countries to the variable countries. Find the total consumption for each country in countries for the year 1989. Refer to the steps outlined above for help. When you're finished, totals should contain all of the country names as keys, with the corresponding alcohol consumption totals for 1989 as values.
Create a numpy array 3x5?
a = np.arange(15).reshape(3, 5)
here is an array - a = np.range(0,10) change array to 2,5
a.reshape(2,5)
Numpy: Given an array a=array([[ 1., 0., 0., 0., 0.], [ 0., 1., 0., 0., 0.], [ 0., 0., 1., 0., 0.], [ 0., 0., 0., 1., 0.], [ 0., 0., 0., 0., 1.]]) how to know shape of an array
a.shape
Numpy: What is conditional selection. if a=[1,2,3,4,5,6,2,4,3,4,11,2,33,23]. create an array with numbers > 5
a[a>5]
Pandas: how to use apply method? How to use apply with a row
apply is can be used to implement any function to a column or a row
What are different ways to create arrays
arange.reshape zeroes ones .array([1,2,3,4]) empty
How to get index position of max value in array
argmax
Numpy: Explain below methods np.min() np.max() np.argmax() np.argmin()
argmax return index of max value
Numpy: How to change datatype of an array?
astype() function
Numpy: How to convert the data type of an array?
astype() method. vector = numpy.array(["1", "2", "3"]) vector = vector.astype(float)
Use of axis parameter in Pandas?
axis = 0 means rows axis =1 means columns
How many axes and length are there in below ndarray? [[ 1., 0., 0.], [ 0., 1., 2.]]
axis =2 first axes is length of 2 and second axes is length 3. column first and row second.
List all numpy numirical types
bool, initi, int8, int16, int32, int64 unit8, unit16, unit32, unit64 float16, float32, float64/flaot complex64, complex128
Pandas: How to convert a all column in to a list?
col_names = food_info.columns.tolist()
Write down attributes of data frame
columns, dytpe, shape, index etc
Pandas: how to do union of data
concat
Pandas: How to drop a column? What is defualt axis value?
df.drop('a',axis=1) if axis=0 then it will drop row
how to give a name to index
df.index.name = ['a','b']
Pandas: How to retrieve specific column and rows?
df.loc[['r1','r2'],['c1','c2']]
How to join data using join
df1.join(df2)
Pandas: How to filter data? How to get multiple column using filter on column.
df['x' > 5][['x','y','z']]
Drop all columns in titanic_survival that have missing values and assign the result to drop_na_columns. Drop all rows in titanic_survival where the columns "age" or "sex" have missing values and assign the result to new_titanic_survival.
drop_na_rows = titanic_survival.dropna(axis=0) drop_na_columns = titanic_survival.dropna(axis=1) new_titanic_survival = titanic_survival.dropna(axis=0,subset=["age", "sex"])
Pandas: List function to handle missing values
dropna, isnull etc
Create an array with explicit data type as complex
dtype=Complex
What is default datatype of numbers in series object
float
Pandas: How to read CSV file?
food_info = pandas.read_csv("food_info.csv")
Numpy: What is the use of hsplit(), vsplit() and split() function
hsplit - splits by column vsplit - splits by rows split - splits by both row and column with axis parameter
What is the use of numpy character code? why is it still there? what are the use character codes: d, V, U
i, f, u, b, d,S, V,U they are used to assign datatype to numpy object. It is preserved for backward compatibility to Numerical
Numpy: Which function can be used to create numpy array from file
import numpy as np world_alcohol = np.genfromtxt("world_alcohol.csv", delimiter=",") Signature: np.genfromtxt( fname, dtype=<type 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None)
Pandas: use of iloc and iat functions
it can be used to access rows by number rows, iat is supposed to be faster
Numpy: Explain linespace function
it generate evenly spaced number between given range a, b np.linespace(a,b,n) n number of elements are generated between range a and b
what is the use of np.set_printoptions(threshold=np.nan) while printing array
it will show NaN values
Pandas: What is the use of iloc and loc function. Write an example...
loc['country','state'] iloc[1,2,3,4]
Pandas: How to merge data? How to do outer join?
merge, pd.merge(left,right,on=[key1,key2])
Explain below code - flt = (world_alcohol[:,0]=='1986') & (world_alcohol[:,2]=='Canada') canada_1986 = world_alcohol[flt] print(canada_1986) canada_1986[:,4][canada_1986[:,4]==''] = '0' canada_alcohol = canada_1986[:,4].astype(float) print(canada_alcohol) total_canadian_drinking = canada_alcohol.sum()
na
What is NumPy's array class is called?
ndarray
List attributes of ndarray
ndarray.ndim the number of axes (dimensions) of the array. ndarray.shape the dimensions of the array. ndarray.size the total number of elements of the array. ndarray.dtype an object describing the type of the elements in the array. ndarray.itemsize the size in bytes of each element of the array. ndarray.data the buffer containing the actual elements of the array.
Define each of below attribtue ndarray.ndim ndarray.shape ndarray.size ndarray.dtype ndarray.itemsize ndarray.data
ndarray.ndim the number of axes (dimensions) of the array. ndarray.shape the dimensions of the array. ndarray.size the total number of elements of the array. T ndarray.dtype an object describing the type of the elements in the array. ndarray.itemsize the size in bytes of each element of the array. ndarray.data the buffer containing the actual elements of the array.
Numpy: Explain below numpy attributes ndim size itemsize nbytes .T
ndim gives the number of dimensions size holds the count of elements itemsize returns the count of bytes for each element in the array The T property has the same result as the transpose() function
Decoding - passenger_classes = [1, 2, 3] fares_by_class = {} for this_class in passenger_classes: pclass_rows = titanic_survival[titanic_survival["pclass"] == this_class] pclass_fares = pclass_rows["fare"] fare_for_class = pclass_fares.mean() fares_by_class[this_class] = fare_for_class
none
Numpy: How to generate identity matrix
np.eye(4)
Numpy: how to generate 5x5 random matrix between values 0,1
np.random.rand(5,5)
Panda: How to use pivot_table function?
passenger_class_fares = titanic_survival.pivot_table(index="pclass", values="fare", aggfunc=np.mean)
How to manauly create dataframe? What arguments can be passed
pd.Dataframe(data,index,columns,dtype)
What is Pandas Series object? Explain different ways to create series?
pd.series(data= , lable=)
Numpy: Difference between rand and randn
rand generate matrix with eveReturn a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform:
Pandas: What is the use of reset_index() method
reset_index() method will add column with index and resets index with numerical index.
Pandas: What is the use of set_index() method
set_index() method can be used to set any existing column as index
Numpy comes with many universal array functions, which are essentially just mathematical operations you can use to perform the operation across the array. List couple of them...
sqrt, max, min, log etc
Numpy: List five aggregate function for numpy array
sum, mean, max min, count etc
Numpy: How to convert numpy array to list
tolist() function
Create 2-d array from list
using array function
How to grab the data type of the object in the array?
using np.dtype attribute
Pandas: how to select multiple columns?
zinc_copper = food_info[["Zinc_(mg)", "Copper_(mg)"]] columns = ["Zinc_(mg)", "Copper_(mg)"] zinc_copper = food_info[columns] selenium_thiamin=food_info[["Selenium_(mcg)",'Thiamin_(mg)']]