BIT 5524 Final Exam
Printing w/ Variable Substitution in Python
%s - Used for inserting strings as a placeholder Example (%s): binary = 'binary' do_not = 'do not' y = 'Those who know %s, and those who %s' % (binary, do_not) print(y) #result is Those who know binary and those who do not %d - Used for inserting integers as a placeholder Example(%d): x = 'There are %d types of people' % 10 print(x) #result is There are 10 types of people %r - Used for debugging.
What are the best design practices for dashboards?
1) Benchmark key performance indicators w/ Industry Standards 2) Wrap the dashboard metrics w/ contextual metadata 3) Validate the dashboard design by a usability specialist 4) Prioritize and rank alerts / exceptions streamed to the dashboard 5) Enrich the dashboard w/ business-user comments 6) Present information in 3 different levels 7) Pick the right visual using dashboard design principles 8) Provide for guided analytics
What is the natural conceptual hierarchy of Python?
1) Programs - Composed of modules 2) Modules - Contain Statemens 3) Statements - Contain Expressions 4) Expressions - Create & Process objects
Explain the differences between 1st, 2nd, and 3rd normal form.
1NF - No two rows of data must contain repeating group information. Each set of columns must have a unique value, such that multiple columns can't be used to fetch the same row. 2NF - There must not be any partial dependency of any column on a primary key. 3NF - Every non-prime attribute of a table must be dependent on the primary key.
What are organizational critical success factors for big data analytics?
A clear business need Strong & committed sponsorship Alignment between business & IT strategy Fact-based decision making culture Strong data infrastructure The right analytics tools Personnel w/ advanced analytical skills
Purpose of Computer Programming
About ultimately trying to solve a problem or provide a function / utility through a program. - Computer: Machine that stores pieces of information and moves, arranges, and controls that information - Program: Detailed set of instructions that tells a computer what to do with the information.
Variables in Python
Allow you to calculate something once, put it towards a word (or variable) and reuse it again later. You can keep the same name for a variable but change the value. Example: headmaster = "Dumbledore" #headmaster is the variable
Print Statements
Allows us to retrieve the output for our code. print()
What is a database and what are the values of a database?
An abstraction on top of an operating system's file system to ease creating, reading, updating, and delivering persistent data. Databases are valuable b/c they make structured storage reliable and fast. They also give a mental framework for how the data should be saved and retrieved instead of having to figure out what to do w/ the data every time you build a new application
What's the relationship between big data & business intelligence?
B.I doesn't necessarily require big data, it can use any type of data, but big data just makes it better.
Why is just learning packages not enough to become a data scientist?
B/c there's no single answer or silver bullet to data analytics. No one package can do everything that we need to do as a data scientist, as data science draws on applied mathematics, computer science, statistics, information systems, databases, etc.
Why are relational databases important for data scientists to learn to use?
B/c they're structured storage that is reliable and fast to retrieve and update.
Tuples in Python
Basically like a list, but you use regular parentheses ( ) instead of square brackets [ ] You can do anything that you can do in a list, you just can't modify tuples b/c they're immutable.
Look at Dictionary Items in Python
Basically looking at keys and their value(s) person.items( ) #returns [['name', 'Nowell'], ['gender', 'male']]
Inputs to the Analytics Continuum
Business Processes Internet / Social Media Machines / Internet of Things
What is business intelligence and how is it related to business analytics?
Business intelligence is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies. It's essentially the use of reporting tools. B.I. is linked to strategy and execution of strategy. Business Analytics serves as a repository and disseminator of the best BI practices between and among different lines of businesses.
Where does data from business analytics come from?
Business transactions or surveys in which data is collected using Internet and / or sensor / RFID-based computerized networks.
Examples of Structured Data
Categorical - Nominal & Ordinal Numerical - Interval & Ratio
String Operators in Python
Concatenation (+) Multiplication (*) a = "It is a beautiful day" b = "do not go away" Concatenation => c = a + b print (c) Multiplication => print(c * 2) #prints c two times
Functions in Python
Concise way to group instructions into a bundle. They are defined using DEF. They take parameters and return outputs. PRINT displays info, but doesn't give a value. RETURN gives a value to the caller Example - pot of coffee Functions would be how people think of making the pot of coffee In python the function would be: make_coffee( ) Function Parameters would be: make_coffee(coffee_grounds, coffee_pot, water, filter_paper)
Floating Point Numbers in Python
Contain decimal points
Looping through Lists or Dictionaries in Python
Create a FOR LOOP. Example: the_count = [1, 2, 3, 4, 5] for i in the_count: print(i) #results: #1 #2 #3 #4 #5
What are some useful applications for predictive analytics?
Customer Retention Direct Marketing Analytical Customer Relationship Clinical Decision Support Systems Cross-Sell Fraud Detection Portfolio, Product, or Economy-Level Prediction Risk Management Underwriting
Data vs. Information vs. Knowledge vs. Wisdom
Data - Raw, unorganized facts that describe the characters of an event or object. Information - Data that is processed and organized w/ meaning and value Knowledge - Collection of information and data that's useful in assisting with decision-making Wisdom - The complete understanding of all the information.
What's the difference between Data Richness, Accuracy, Accessibility, and Reliability?
Data Reliability - The originality and appropriateness of the storage medium where the data is obtained Data Richness - All the required data elements are included in the data set. In essence, richness means that the available variables portray a rich enough dimensionality of the underlying subject matter for an accurate rate and a worthy analytics study. Data Accuracy - The cleanliness of the data we're using Data Accessibility - Can we obtain the data required to perform a worthy analytics study?
What are the top 3 data-related challenges for better analytics and why?
Data Source Reliability - Many projects are now biased. In other words, proctors of experiments are manipulating their data in order to make it appear that they're getting the answers that they want. Data Richness - We can't miss any variables or our analyses can be inaccurate. Data Currency / Timeliness - This pertains to relevance. If we have data that's outdated, then it isn't relevant to what we're trying to achieve.
Metadata
Data about data
Describe the Major Metrics for 'Analytics Ready' Data
Data source Reliability Data content Accuracy Data Accessibility Data Security and Privacy Data Richness Data Consistency Data Currency / Data Timeliness Data Granularity Data Validity and Relevance
Data vs. Information vs. Knowledge vs. Wisdom EXAMPLE
Data table w/ Student names, exam scores, attendance DATA - everything inside of the actual table INFORMATION - overall relationships (i.e., Sue did well on the exam, Jack did poorly on the exam, etc.) Can also be analysis results (i.e., mean, median, mode, etc.) KNOWLEDGE - Trend in the data. Students w/ lower attendance have lower exam scores. WISDOM - In the future, we need to encourage students to attend class b/c those who do not end up failing.
Big Data
Data that cannot be stored or processed easily using traditional tools / means. It typically refers to data that comes in many different forms: large, structured, unstructured, continuous, etc. This data is worthless if it doesn't provide any sort of business value.
Dictionary Keys
Describe something within a dictionary. person = {'name': 'Rob', 'gender': 'male'} person.keys( ) #returns ['name', 'gender']
What are the 3 major types of Metadata? What are their purposes?
Descriptive Metadata Administrative Metadata Structural Metadata Descriptive - describes a resource for purposes like discovery and identification (i.e., Title, Abstract, Author, Keywords, etc.) Administrative - provides information to help manage a resource (i.e., when/how the resource was created, file type/other technical information, who can access the resource, etc.) Structural - metadata about containers of data and indicates how compound objects are put together (i.e., how pages are ordered to form chapters, types/versions/relationships/other characteristics of digital materials)
Unique Key
Each row in a database table can be accessed w/ this type of key.
What enables real-time B.I. and why?
Enablers of Real-Time BI: - RFID - Web Services - Intelligent Agents Enable real-time B.I. b/c the demand for all of these things is through the roof.
Outputs to Analytics Continuum
End Users Applications Knowledge
Boolean Functions in Python
Functions can return Booleans, which is convenient for hiding complicated tests inside of functions. It's common to give these types of functions names that sound like yes/no questions. Example: def is_divisible(x, y): if x % y ==0: return True else: return False
IF Statements
If some condition is met, then perform the action. state = "Texas" if state == "Texas": print("TX") #returns TX Colon signifies what to do if the logic is true and applies to everything under the indentation. MUST indent after colons for the sake of the code.
Delete Anomalies
If we delete one entry, then we might have to delete all of that record from the entire database.
Insert Anomalies
If we enter a new record, we may not have all of the information required
Update Anomalies
If we update an item, we must find/update it in every place that it shows up. If we don't update all of the same entries, we will have conflicting entries.
Foreign Key
Interconnections between multiple tables. It's a unique reference from one row in a relational table to another row in a table
Why use Python for data science?
It's much more forgiving and easier to learn than many other programming techniques. Things that make it ideal for data science: - Run-Time Scripting Language - Allows for Object Orientation - Consists of Shell Tools - Control Language
Conditional Loops in Python
Loops that will keep repeating code until a certain things happen, or as long as some condition is true. Uses the keyword WHILE (usually called WHILE LOOPS) count = 0 while(count < 4): print('The count is: ', count) count = count + 1 #returns: #The count is 0 #The count is 1 #The count is 2 #The count is 3 #stops here b/c condition states to keep running the loop as long as count < 4
What are the 3 Information Layers of Dashboards?
Monitoring Analysis Management
What are key skills a data scientist should have?
Need to have technology skills: - Data Analytics - Algorithms - Neural Networks - Machine Learning - Artificial Intelligence
Strings
Non-numerical statements or words. They are found in quotes: either ' ' or " "
Mutable Objects
Objects whose value can change. When you alter these objects, the ID is still the same. Examples of these types of objects include: - Dictionary - Unordered set of distinct objects
Immutable Objects
Objects whose value is unchangeable once they are created. When you alter these objects, the ID changes. (Can't change from one type to another) Examples of these types of objects include: - Boolean Values - Integers
What are the key sources of big data?
Online Transactions Mobile Applications Sensors Images, Audio, Video Social Media
What does it mean for parameters and variables to be local to a function, and why is it useful?
Parameters/variables of a function will only exist within that particular function if they are LOCAL. This is useful b/c it encapsulates and protects what's going on inside of the function. In other words, it means that you can use the same names that you used for the parameters of a given function in other places in your code, for different purposes.
Business Analytics
Process of developing actionable decisions or recommendations for actions based on insights generated from historical data. It also represents the combination of computer technology, management science techniques, and statistics to solve real problems.
What's data normalization?
Process of organizing columns (attributes) and tables (relations) of a relational database to reduce redundancy and improve data integrity. Essentially it puts data into tabular form by removing duplicated data from the relation tables.
What are some problems with other languages (besides Python) for data science?
Programs such as C++ and Java take 3 to 10 times longer to run the same types of analyses as python. These programs are better suited for things like developing robust apps, GUI, etc.
What are dashboards used for?
Provide visual displays of important information that is consolidated and arranged on a single screen so that the information can be digested at a single glance and easily drilled in and further explored.
Why is metadata so useful for data science?
Raw data alone is never good enough. Computers don't know what to do with raw data. We need to describe what specific data means to computer programs. W/out metadata, it can be very difficult to derive knowledge, trends, and ultimate wisdom.
What is the most common kind of analysis in predictive analytics and why?
Regression models are the mainstay of predictive analytics b/c it predicts relationships among different parameters. Regression is also very easy to build, maintain, and use.
Counting Loops in Python
Repeat code a certain number of times, until they get to the end of the count. Uses the keyword FOR to create this type of loop. Thus, these types of loops are usually called FOR LOOPS. for my_num in [1, 2, 3, 4, 5]: print('Hello', my_num) #returns: #Hello 1 #Hello 2 #Hello 3 #Hello 4 #Hello 5
Representing Null
Setting something equal to 'None'
Fixed-Point Numbers in Python
Specific number of decimal points (rounded)
What are some downsides of Python?
Speed is slower than compiled, lower-level languages Not good for mobile development Bad memory consumption - not a good choice for memory intensive tasks. Limitations w/ database access Runtime Errors
What are the key disciplines involved in data mining?
Statistics AI Machine Learning & Pattern Recognition Information Visualization Database Management & Data Warehousing Management Science & Information Systems
Relational Database
Store data in a series of tables
Converting values to float in Python
Syntax: float(value) pie = '3.14159' pie = float(pie) print (pie) #returns 3.14159
Converting values to int in Python
Syntax: int(value) pie = 3.14159 pie = int(pie) print(pie) #returns 3 #if you convert float to int the output will round down
Examples of Unstructured Data
Textual Multimedia - Image, Audio, Video XML / JSON
What are some of the most common sources of metadata for data scientists?
The Web Internet of Specific Things Social Media
What is data mining?
The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases.
ELSE Statements
These add a choice to our IF statement. Essentially it adds a choice of what to do if the original condition is NOT met. state = "Texas" if state == "Texas": print("TX") else: print("Terrible State!") This statement is the ending clause. If we want to add multiple choices to the IF statement, we use the ELIF clause. When we get to the final choice, we use the ELSE clause.
Lists in Python
These are a sequence of objects. They can be heterogeneous. They have normal brackets: [ ] To append (add) to the end => list_name.append( ) To extend (add multiple) => list_name.extend( ) To find how many things are in a list => len(list)
Counters in Python
These are a special kind of dictionary. Turns a sequence of values into a default dictionary-like object that maps key counts. In other words, it will return a key, and tell you how many of this type of key there are. Can be very useful for creating histograms in python. Example: from collections import Counter c = Counter([1, 2, 3]) print (c) #returns Counter({1:1, 2:1, 3:1})
Loops in Python
These are chunks of code that repeat a task over and over again.
Algorithms in Python
These are really just a set of instructions. Example - Making a pot of coffee. 1) Buy coffee grounds 2) Get a coffee maker 3) Get filter paper 4) Get a pot of water 5) & on & on #this is how a computer would process making a cup of coffee #to humans it's as simple as 'make a pot of coffee'
Booleans in Python
These can only be True or False. is_boolean = True is_boolean = False Everything in python can be cast to boolean is_python = bool(any object)
What's the purpose of Python modules?
They can use and share libraries of tool. They essentially allow you to create your own toolkit, as well as use the expansive toolkits that are already out there. They can also provide reusable python code. You can just import it at the beginning of a new session, and run code alongside it.
Dictionaries in Python
They have curved brackets: { } They're set and retrieved by KEYS. Any immutable object can be a dictionary key. person = { }
Returning Multiple Values from a Function
This involves multiple return statements in a function. Example: def absolute_value(x): if x < 0: return -x else: return x
What is a Fruitful Function?
This is a function that will return a value. These are crucial to data science. Example: import math def area(radius): temp = math.pi * radius ** 2 return temp print(area(5.9)) #returns the area of a circle that has a radius of 5.9
'Slicing' Through Lists
This is a way for us to pull out a specific value in the list. We will always include the value of the first placeholder, but stop one place in front of the second placeholder. (i.e., [0:5] would start w/ place [0] on the list, but we would end at place [4] on the list) Example: numbers = [1, 2, 3, 4, 5] numbers[0] #returns [1] numbers[0:2] #returns [1, 2] numbers[2:] #returns [3, 4, 5]
What's a transitive dependency and how do you resolve them?
This is an indirect relationship between values in the same table. We can resolve these type of dependencies by putting our data in 3NF.
What is incremental development and why should you do it?
This is when you only add and test small chunks of code at a time. We use incremental development to deal w/ increasingly complex programs and avoid long debugging sessions/searches.
What's the basic purpose of a histogram?
To show the distribution shape of the given data. Can use a histogram to see if data is normally or exponentially distributed.
When should you use a Line Chart vs. a Pie Chart vs. a Bar Chart?
Use a Line Chart to show the relationship between two variables - most often used to track changes or trends over time. Use a Pie Chart to illustrate relative proportions of a specific measure. Use a Bar Chart to compare data across multiple categories (i.e., % of advertising spending by departments or by product categories)
Why use a geographic map? What other types of charts can be combined w/ a geographic map?
Use this when the dataset includes any kind of location data. It's better and more informative to see the data on the map. Maps are often used in conjunction w/ many other charts (i.e., pie charts, histograms, bar charts, line charts, etc.)
Exception Handling
Used to make code cleaner / more elegant. This is also really good for debugging. To do this, use a TRY clause w/ EXCEPT. TRY is for the code that could have a problem. try: print (0/0) except zero_division_error: print('Sorry but you cannot divide 0 by 0') #returns Sorry but you cannot divide 0 by 0
Dictionary Sorting
Used to order the dictionary / list in a manner of your liking. Example - Don't want to change anything: x = ['z', 'c', 'a'] y = sorted(x) print(y) Example - Sorting X: x.sort( ) print(x) Example - Reverse the Sort: x.sort(reverse == True) print(x)
What is Data Visualization useful for?
Useful for exploring, making sense of, and communicating data. They're not useful, however, if they contain bad visuals and are unclear / confusing to the audience
Which of the four V's is most important?
Variety
Define the four V's of big data
Volume - amount or scale of the data that we have Velocity - Analysis of streaming data (speed that we get the data) Veracity - Uncertainty of data Variety - Different forms of data (i.e., variables, types, etc.)
Descriptive Analytics
What happened or what is happening? These are well-define business problems and opportunities. Enablers: - Business Reporting - Dashboards - Scorecards - Data Warehousing
Prescriptive Analytics
What should I do/Why should I do it? This pertains to the best possible business decisions and actions. Enablers: - Optimization - Simulation - Decision Modeling - Expert Systems
Predictive Analytics
What will happen/why will it happen? These are accurate projections of future events and outcomes. Enablers: - Data Mining - Text Mining - Web/Media Mining - Forecasting
Recursive Function
When one function calls another function. A function can also call itself. Example of this type of function: def countdown(n): if n <= 0: print('Blastoff!') else: print(n) countdown(n-1) countdown(10) #returns 10 - 1 (one at a time) and after 1 returns Blastoff!
Keyboard Input in Python
When you tell the user of the computer to put something in for a value or statement, etc. name = input('What is your name?')
Integer Numbers in Python
Whole Number Values
What's the importance of composition and modular code?
You want to take small building blocks and compose them. This type of thinking leads to better design. A good computer scientist builds modular code that is REUSABLE. Basically, composition and modular code help make you code as clean, reusable, and elegant as possible.
Retrieving Things from Dictionaries in Python
person = {'name': 'Nowell', 'gender': 'male'} person['name'] person.get('name', 'Strice') #returns Nowell Strice
Look at Dictionary Keys in Python
person.keys( ) #returns ['name', 'gender']
Updating Dictionaries in Python
person.update({ 'favorites': [42, 'food'], 'gender': ['male'], })
Look at Dictionary Key Values in Python
person.values( ) #returns ['Nowell', 'male']
Converting objects to strings in Python
syntax: str(object) a = str(3.14159) print(a) #returns '3.14159'