Advanced Python Methods and Functions
pd.set_index()
is a method to set a List, Series or Data frame as index of a Data Frame. Index column can be set while making a data frame too. But sometimes a data frame is made out of two or more data frames and hence later index can be changed using this method.
pd.date_range(start, frequency, periods)
is one of the general functions in Pandas which is used to return a fixed frequency DatetimeIndex
plt.legend()
makes a legend for the plot
plt.title()
Sets the title of a plot/chart
x.sum()
Sum of all elements in x
df.iloc[x,y]
Access variables and observations in df by integer position
df.loc['row', 'column']
Access variables and observations in df by names
df['x'] or df.x
Access x in df
df.groupby('x').function()
Apply a function to subgroups of df according to x
pd.crosstab()
Compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed. Passing normalize= as the argument will normalize by dividing all values by the sum of values. If passed 'all' or True, will normalize over all values. If passed 'index' will normalize over each row. If passed 'columns' will normalize over each column. If margins is True, will also normalize margin values.
pd.to_table()
Converts DataFrame into a general delimited file
df['x'].shift(i)
Creates a by i rows shifted variable of x
pd.DataFrame(dict)
Creates a data frame in pandas with a dictionary as the argument
np.arange()
Creates a sequence of integers starting at 0 ending at argument minus 1
np.array()
Creates a simple array when a list is provided
df['x].diff(i)
Creates a variable that contains the ith difference of x
np.linespace()
Creates an array of integers defined by the arguments start, end, and sequence length
np.empty()
Creates an uninitialized array filled with arbitrary nonsense elements
np.divide(x, y) or x/y
Element-wise division of all elements in x and y
np.exp(x)
Element-wise exponential of all elements in x
np.multiply(x, y) or x*y
Element-wise multiplication of all elements in x and y
np.log(x)
Element-wise natural logarithm of all elements in x
np.sqrt(x)
Element-wise square root of all elements in x
np.subtract(x, y) or x-y
Element-wise subtraction of all elements in x and y
np.add(x, y) or x+y
Element-wise sum of all elements in x and y
pd.to_sas()
Export DataFrame object to SAS format.
pd.to_spss()
Export DataFrame object to SPSS format.
pd.to_stata()
Export DataFrame object to Stata dta format. Writes the DataFrame to a Stata dataset file. "dta" files contain a Stata dataset.
plt.savefig()
Exports generated plot into designated graphic formats (PNG, PS, EPS, SVG, PDF)
np.unique()
Find the unique elements of an array. Returns the sorted unique elements of an array. There are three optional outputs in addition to the unique elements: 1) The indices of the input array that give the unique values 2) The indices of the unique array that reconstruct the input array 3) The number of times each unique value comes up in the input array
df.head()
First n rows of the DataFrame
plt.axhline()
Function in pyplot module of matplotlib library which is used to add a horizontal line across the axis
plt.axvline()
Function in pyplot module of matplotlib library which is used to add a vertical line across the axis
pdr.data.DataReader()
Function of the pandas_datareader library which makes it straightforward to query online databases
plt.xticks()
Function of the pyplot module of the matplotlib library which is used to get and set the current tick locations and labels of the x-axis
plt.yticks()
Function of the pyplot module of the matplotlib library which is used to get and set the current tick locations and labels of the y-axis
plt.figure()
Function used to set the width and height of a graph in inches
np.linalg.inv(x)
Gives inverse of x
np.zeros()
Initialized array with each element set to zero
np.ones()
Initializes an array with each element set to one
plt.xlabel()
Labels x axis of plot
plt.ylabel()
Labels y axis of plot
df.tail()
Last n rows of the DataFrame
pd.read_spss()
Load an SPSS file from the file path, returning a DataFrame.
plt.plot(x,y)
Make a plot of x and y values. You can use multiple x and y values to plot multiple lines in the same function, or you can use the same function multiple times to plot multiple lines.
plt.bar()
Matplotlib function to make a bar plot. The bars are positioned at x with the given alignment. Their dimensions are given by height and width. The vertical baseline is bottom (default 0).
plt.pie()
Matplotlib function to plot pie chart of given numeric data with labels. It also support different parameters which help to show better.
x.dot(y) or x@y
Matrix multiplication of x and y
x.max()
Maximum of all elements in x
pd.Categorical.from_codes(codes, categories)
Method used to attach labels to a data frame
x.min()
Minimum of all elements in x
pd.read_html()
Read HTML tables into a list of DataFrame objects.
pd.read_sas()
Read SAS files stored as either XPORT or SAS7BDAT format files.
pd.read_sql()
Read SQL query or database table into a DataFrame.
pd.read_stata()
Read Stata file into DataFrame.
pd.read_csv()
Read a comma-separated values (csv) file into DataFrame. Also supports optionally iterating or breaking of the file into chunks.
pd.read_excel()
Read an Excel file into a pandas DataFrame. Supports xls, xlsx, xlsm, xlsb, odf, ods and odt file extensions read from a local filesystem or URL. Supports an option to read a single sheet or a list of sheets.
pd.read_table()
Read general delimited file into DataFrame. Also supports optionally iterating or breaking of the file into chunks.
pd.to_html()
Render a DataFrame as an HTML table
pd.DataFrame.value_counts()
Return a series containing counts of unique rows in the DataFrame
df.set_index(x)
Set the index of df as x
plt.xlim(min, max)
Sets limits of the horizontal axis
plt.ylim(min, max)
Sets limits of the vertical axis
x.transpose() or x.T
Transpose of X
pd.to_csv()
Write object to a comma-separated values (csv) file.
pd.to_excel()
Write object to an Excel sheet. To write a single object to an Excel .xlsx file it is only necessary to specify a target file name. To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to. Multiple sheets may be written to by specifying unique sheet_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased.
pd.to_sql()
Write records stored in a DataFrame to a SQL database. Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.
df.describe()
summary statistics for numerical columns
operator.itemgetter(*items)
that fetches an "item" using the operand's __getitem__() method. If multiple values are returned, the function returns them in a tuple. This function works with Python dictionaries, strings, lists, and tuples.