BIT 5524 Final Exam

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Printing w/ Variable Substitution in Python

%s - Used for inserting strings as a placeholder Example (%s): binary = 'binary' do_not = 'do not' y = 'Those who know %s, and those who %s' % (binary, do_not) print(y) #result is Those who know binary and those who do not %d - Used for inserting integers as a placeholder Example(%d): x = 'There are %d types of people' % 10 print(x) #result is There are 10 types of people %r - Used for debugging.

What are the best design practices for dashboards?

1) Benchmark key performance indicators w/ Industry Standards 2) Wrap the dashboard metrics w/ contextual metadata 3) Validate the dashboard design by a usability specialist 4) Prioritize and rank alerts / exceptions streamed to the dashboard 5) Enrich the dashboard w/ business-user comments 6) Present information in 3 different levels 7) Pick the right visual using dashboard design principles 8) Provide for guided analytics

What is the natural conceptual hierarchy of Python?

1) Programs - Composed of modules 2) Modules - Contain Statemens 3) Statements - Contain Expressions 4) Expressions - Create & Process objects

Explain the differences between 1st, 2nd, and 3rd normal form.

1NF - No two rows of data must contain repeating group information. Each set of columns must have a unique value, such that multiple columns can't be used to fetch the same row. 2NF - There must not be any partial dependency of any column on a primary key. 3NF - Every non-prime attribute of a table must be dependent on the primary key.

What are organizational critical success factors for big data analytics?

A clear business need Strong & committed sponsorship Alignment between business & IT strategy Fact-based decision making culture Strong data infrastructure The right analytics tools Personnel w/ advanced analytical skills

Purpose of Computer Programming

About ultimately trying to solve a problem or provide a function / utility through a program. - Computer: Machine that stores pieces of information and moves, arranges, and controls that information - Program: Detailed set of instructions that tells a computer what to do with the information.

Variables in Python

Allow you to calculate something once, put it towards a word (or variable) and reuse it again later. You can keep the same name for a variable but change the value. Example: headmaster = "Dumbledore" #headmaster is the variable

Print Statements

Allows us to retrieve the output for our code. print()

What is a database and what are the values of a database?

An abstraction on top of an operating system's file system to ease creating, reading, updating, and delivering persistent data. Databases are valuable b/c they make structured storage reliable and fast. They also give a mental framework for how the data should be saved and retrieved instead of having to figure out what to do w/ the data every time you build a new application

What's the relationship between big data & business intelligence?

B.I doesn't necessarily require big data, it can use any type of data, but big data just makes it better.

Why is just learning packages not enough to become a data scientist?

B/c there's no single answer or silver bullet to data analytics. No one package can do everything that we need to do as a data scientist, as data science draws on applied mathematics, computer science, statistics, information systems, databases, etc.

Why are relational databases important for data scientists to learn to use?

B/c they're structured storage that is reliable and fast to retrieve and update.

Tuples in Python

Basically like a list, but you use regular parentheses ( ) instead of square brackets [ ] You can do anything that you can do in a list, you just can't modify tuples b/c they're immutable.

Look at Dictionary Items in Python

Basically looking at keys and their value(s) person.items( ) #returns [['name', 'Nowell'], ['gender', 'male']]

Inputs to the Analytics Continuum

Business Processes Internet / Social Media Machines / Internet of Things

What is business intelligence and how is it related to business analytics?

Business intelligence is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies. It's essentially the use of reporting tools. B.I. is linked to strategy and execution of strategy. Business Analytics serves as a repository and disseminator of the best BI practices between and among different lines of businesses.

Where does data from business analytics come from?

Business transactions or surveys in which data is collected using Internet and / or sensor / RFID-based computerized networks.

Examples of Structured Data

Categorical - Nominal & Ordinal Numerical - Interval & Ratio

String Operators in Python

Concatenation (+) Multiplication (*) a = "It is a beautiful day" b = "do not go away" Concatenation => c = a + b print (c) Multiplication => print(c * 2) #prints c two times

Functions in Python

Concise way to group instructions into a bundle. They are defined using DEF. They take parameters and return outputs. PRINT displays info, but doesn't give a value. RETURN gives a value to the caller Example - pot of coffee Functions would be how people think of making the pot of coffee In python the function would be: make_coffee( ) Function Parameters would be: make_coffee(coffee_grounds, coffee_pot, water, filter_paper)

Floating Point Numbers in Python

Contain decimal points

Looping through Lists or Dictionaries in Python

Create a FOR LOOP. Example: the_count = [1, 2, 3, 4, 5] for i in the_count: print(i) #results: #1 #2 #3 #4 #5

What are some useful applications for predictive analytics?

Customer Retention Direct Marketing Analytical Customer Relationship Clinical Decision Support Systems Cross-Sell Fraud Detection Portfolio, Product, or Economy-Level Prediction Risk Management Underwriting

Data vs. Information vs. Knowledge vs. Wisdom

Data - Raw, unorganized facts that describe the characters of an event or object. Information - Data that is processed and organized w/ meaning and value Knowledge - Collection of information and data that's useful in assisting with decision-making Wisdom - The complete understanding of all the information.

What's the difference between Data Richness, Accuracy, Accessibility, and Reliability?

Data Reliability - The originality and appropriateness of the storage medium where the data is obtained Data Richness - All the required data elements are included in the data set. In essence, richness means that the available variables portray a rich enough dimensionality of the underlying subject matter for an accurate rate and a worthy analytics study. Data Accuracy - The cleanliness of the data we're using Data Accessibility - Can we obtain the data required to perform a worthy analytics study?

What are the top 3 data-related challenges for better analytics and why?

Data Source Reliability - Many projects are now biased. In other words, proctors of experiments are manipulating their data in order to make it appear that they're getting the answers that they want. Data Richness - We can't miss any variables or our analyses can be inaccurate. Data Currency / Timeliness - This pertains to relevance. If we have data that's outdated, then it isn't relevant to what we're trying to achieve.

Metadata

Data about data

Describe the Major Metrics for 'Analytics Ready' Data

Data source Reliability Data content Accuracy Data Accessibility Data Security and Privacy Data Richness Data Consistency Data Currency / Data Timeliness Data Granularity Data Validity and Relevance

Data vs. Information vs. Knowledge vs. Wisdom EXAMPLE

Data table w/ Student names, exam scores, attendance DATA - everything inside of the actual table INFORMATION - overall relationships (i.e., Sue did well on the exam, Jack did poorly on the exam, etc.) Can also be analysis results (i.e., mean, median, mode, etc.) KNOWLEDGE - Trend in the data. Students w/ lower attendance have lower exam scores. WISDOM - In the future, we need to encourage students to attend class b/c those who do not end up failing.

Big Data

Data that cannot be stored or processed easily using traditional tools / means. It typically refers to data that comes in many different forms: large, structured, unstructured, continuous, etc. This data is worthless if it doesn't provide any sort of business value.

Dictionary Keys

Describe something within a dictionary. person = {'name': 'Rob', 'gender': 'male'} person.keys( ) #returns ['name', 'gender']

What are the 3 major types of Metadata? What are their purposes?

Descriptive Metadata Administrative Metadata Structural Metadata Descriptive - describes a resource for purposes like discovery and identification (i.e., Title, Abstract, Author, Keywords, etc.) Administrative - provides information to help manage a resource (i.e., when/how the resource was created, file type/other technical information, who can access the resource, etc.) Structural - metadata about containers of data and indicates how compound objects are put together (i.e., how pages are ordered to form chapters, types/versions/relationships/other characteristics of digital materials)

Unique Key

Each row in a database table can be accessed w/ this type of key.

What enables real-time B.I. and why?

Enablers of Real-Time BI: - RFID - Web Services - Intelligent Agents Enable real-time B.I. b/c the demand for all of these things is through the roof.

Outputs to Analytics Continuum

End Users Applications Knowledge

Boolean Functions in Python

Functions can return Booleans, which is convenient for hiding complicated tests inside of functions. It's common to give these types of functions names that sound like yes/no questions. Example: def is_divisible(x, y): if x % y ==0: return True else: return False

IF Statements

If some condition is met, then perform the action. state = "Texas" if state == "Texas": print("TX") #returns TX Colon signifies what to do if the logic is true and applies to everything under the indentation. MUST indent after colons for the sake of the code.

Delete Anomalies

If we delete one entry, then we might have to delete all of that record from the entire database.

Insert Anomalies

If we enter a new record, we may not have all of the information required

Update Anomalies

If we update an item, we must find/update it in every place that it shows up. If we don't update all of the same entries, we will have conflicting entries.

Foreign Key

Interconnections between multiple tables. It's a unique reference from one row in a relational table to another row in a table

Why use Python for data science?

It's much more forgiving and easier to learn than many other programming techniques. Things that make it ideal for data science: - Run-Time Scripting Language - Allows for Object Orientation - Consists of Shell Tools - Control Language

Conditional Loops in Python

Loops that will keep repeating code until a certain things happen, or as long as some condition is true. Uses the keyword WHILE (usually called WHILE LOOPS) count = 0 while(count < 4): print('The count is: ', count) count = count + 1 #returns: #The count is 0 #The count is 1 #The count is 2 #The count is 3 #stops here b/c condition states to keep running the loop as long as count < 4

What are the 3 Information Layers of Dashboards?

Monitoring Analysis Management

What are key skills a data scientist should have?

Need to have technology skills: - Data Analytics - Algorithms - Neural Networks - Machine Learning - Artificial Intelligence

Strings

Non-numerical statements or words. They are found in quotes: either ' ' or " "

Mutable Objects

Objects whose value can change. When you alter these objects, the ID is still the same. Examples of these types of objects include: - Dictionary - Unordered set of distinct objects

Immutable Objects

Objects whose value is unchangeable once they are created. When you alter these objects, the ID changes. (Can't change from one type to another) Examples of these types of objects include: - Boolean Values - Integers

What are the key sources of big data?

Online Transactions Mobile Applications Sensors Images, Audio, Video Social Media

What does it mean for parameters and variables to be local to a function, and why is it useful?

Parameters/variables of a function will only exist within that particular function if they are LOCAL. This is useful b/c it encapsulates and protects what's going on inside of the function. In other words, it means that you can use the same names that you used for the parameters of a given function in other places in your code, for different purposes.

Business Analytics

Process of developing actionable decisions or recommendations for actions based on insights generated from historical data. It also represents the combination of computer technology, management science techniques, and statistics to solve real problems.

What's data normalization?

Process of organizing columns (attributes) and tables (relations) of a relational database to reduce redundancy and improve data integrity. Essentially it puts data into tabular form by removing duplicated data from the relation tables.

What are some problems with other languages (besides Python) for data science?

Programs such as C++ and Java take 3 to 10 times longer to run the same types of analyses as python. These programs are better suited for things like developing robust apps, GUI, etc.

What are dashboards used for?

Provide visual displays of important information that is consolidated and arranged on a single screen so that the information can be digested at a single glance and easily drilled in and further explored.

Why is metadata so useful for data science?

Raw data alone is never good enough. Computers don't know what to do with raw data. We need to describe what specific data means to computer programs. W/out metadata, it can be very difficult to derive knowledge, trends, and ultimate wisdom.

What is the most common kind of analysis in predictive analytics and why?

Regression models are the mainstay of predictive analytics b/c it predicts relationships among different parameters. Regression is also very easy to build, maintain, and use.

Counting Loops in Python

Repeat code a certain number of times, until they get to the end of the count. Uses the keyword FOR to create this type of loop. Thus, these types of loops are usually called FOR LOOPS. for my_num in [1, 2, 3, 4, 5]: print('Hello', my_num) #returns: #Hello 1 #Hello 2 #Hello 3 #Hello 4 #Hello 5

Representing Null

Setting something equal to 'None'

Fixed-Point Numbers in Python

Specific number of decimal points (rounded)

What are some downsides of Python?

Speed is slower than compiled, lower-level languages Not good for mobile development Bad memory consumption - not a good choice for memory intensive tasks. Limitations w/ database access Runtime Errors

What are the key disciplines involved in data mining?

Statistics AI Machine Learning & Pattern Recognition Information Visualization Database Management & Data Warehousing Management Science & Information Systems

Relational Database

Store data in a series of tables

Converting values to float in Python

Syntax: float(value) pie = '3.14159' pie = float(pie) print (pie) #returns 3.14159

Converting values to int in Python

Syntax: int(value) pie = 3.14159 pie = int(pie) print(pie) #returns 3 #if you convert float to int the output will round down

Examples of Unstructured Data

Textual Multimedia - Image, Audio, Video XML / JSON

What are some of the most common sources of metadata for data scientists?

The Web Internet of Specific Things Social Media

What is data mining?

The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases.

ELSE Statements

These add a choice to our IF statement. Essentially it adds a choice of what to do if the original condition is NOT met. state = "Texas" if state == "Texas": print("TX") else: print("Terrible State!") This statement is the ending clause. If we want to add multiple choices to the IF statement, we use the ELIF clause. When we get to the final choice, we use the ELSE clause.

Lists in Python

These are a sequence of objects. They can be heterogeneous. They have normal brackets: [ ] To append (add) to the end => list_name.append( ) To extend (add multiple) => list_name.extend( ) To find how many things are in a list => len(list)

Counters in Python

These are a special kind of dictionary. Turns a sequence of values into a default dictionary-like object that maps key counts. In other words, it will return a key, and tell you how many of this type of key there are. Can be very useful for creating histograms in python. Example: from collections import Counter c = Counter([1, 2, 3]) print (c) #returns Counter({1:1, 2:1, 3:1})

Loops in Python

These are chunks of code that repeat a task over and over again.

Algorithms in Python

These are really just a set of instructions. Example - Making a pot of coffee. 1) Buy coffee grounds 2) Get a coffee maker 3) Get filter paper 4) Get a pot of water 5) & on & on #this is how a computer would process making a cup of coffee #to humans it's as simple as 'make a pot of coffee'

Booleans in Python

These can only be True or False. is_boolean = True is_boolean = False Everything in python can be cast to boolean is_python = bool(any object)

What's the purpose of Python modules?

They can use and share libraries of tool. They essentially allow you to create your own toolkit, as well as use the expansive toolkits that are already out there. They can also provide reusable python code. You can just import it at the beginning of a new session, and run code alongside it.

Dictionaries in Python

They have curved brackets: { } They're set and retrieved by KEYS. Any immutable object can be a dictionary key. person = { }

Returning Multiple Values from a Function

This involves multiple return statements in a function. Example: def absolute_value(x): if x < 0: return -x else: return x

What is a Fruitful Function?

This is a function that will return a value. These are crucial to data science. Example: import math def area(radius): temp = math.pi * radius ** 2 return temp print(area(5.9)) #returns the area of a circle that has a radius of 5.9

'Slicing' Through Lists

This is a way for us to pull out a specific value in the list. We will always include the value of the first placeholder, but stop one place in front of the second placeholder. (i.e., [0:5] would start w/ place [0] on the list, but we would end at place [4] on the list) Example: numbers = [1, 2, 3, 4, 5] numbers[0] #returns [1] numbers[0:2] #returns [1, 2] numbers[2:] #returns [3, 4, 5]

What's a transitive dependency and how do you resolve them?

This is an indirect relationship between values in the same table. We can resolve these type of dependencies by putting our data in 3NF.

What is incremental development and why should you do it?

This is when you only add and test small chunks of code at a time. We use incremental development to deal w/ increasingly complex programs and avoid long debugging sessions/searches.

What's the basic purpose of a histogram?

To show the distribution shape of the given data. Can use a histogram to see if data is normally or exponentially distributed.

When should you use a Line Chart vs. a Pie Chart vs. a Bar Chart?

Use a Line Chart to show the relationship between two variables - most often used to track changes or trends over time. Use a Pie Chart to illustrate relative proportions of a specific measure. Use a Bar Chart to compare data across multiple categories (i.e., % of advertising spending by departments or by product categories)

Why use a geographic map? What other types of charts can be combined w/ a geographic map?

Use this when the dataset includes any kind of location data. It's better and more informative to see the data on the map. Maps are often used in conjunction w/ many other charts (i.e., pie charts, histograms, bar charts, line charts, etc.)

Exception Handling

Used to make code cleaner / more elegant. This is also really good for debugging. To do this, use a TRY clause w/ EXCEPT. TRY is for the code that could have a problem. try: print (0/0) except zero_division_error: print('Sorry but you cannot divide 0 by 0') #returns Sorry but you cannot divide 0 by 0

Dictionary Sorting

Used to order the dictionary / list in a manner of your liking. Example - Don't want to change anything: x = ['z', 'c', 'a'] y = sorted(x) print(y) Example - Sorting X: x.sort( ) print(x) Example - Reverse the Sort: x.sort(reverse == True) print(x)

What is Data Visualization useful for?

Useful for exploring, making sense of, and communicating data. They're not useful, however, if they contain bad visuals and are unclear / confusing to the audience

Which of the four V's is most important?

Variety

Define the four V's of big data

Volume - amount or scale of the data that we have Velocity - Analysis of streaming data (speed that we get the data) Veracity - Uncertainty of data Variety - Different forms of data (i.e., variables, types, etc.)

Descriptive Analytics

What happened or what is happening? These are well-define business problems and opportunities. Enablers: - Business Reporting - Dashboards - Scorecards - Data Warehousing

Prescriptive Analytics

What should I do/Why should I do it? This pertains to the best possible business decisions and actions. Enablers: - Optimization - Simulation - Decision Modeling - Expert Systems

Predictive Analytics

What will happen/why will it happen? These are accurate projections of future events and outcomes. Enablers: - Data Mining - Text Mining - Web/Media Mining - Forecasting

Recursive Function

When one function calls another function. A function can also call itself. Example of this type of function: def countdown(n): if n <= 0: print('Blastoff!') else: print(n) countdown(n-1) countdown(10) #returns 10 - 1 (one at a time) and after 1 returns Blastoff!

Keyboard Input in Python

When you tell the user of the computer to put something in for a value or statement, etc. name = input('What is your name?')

Integer Numbers in Python

Whole Number Values

What's the importance of composition and modular code?

You want to take small building blocks and compose them. This type of thinking leads to better design. A good computer scientist builds modular code that is REUSABLE. Basically, composition and modular code help make you code as clean, reusable, and elegant as possible.

Retrieving Things from Dictionaries in Python

person = {'name': 'Nowell', 'gender': 'male'} person['name'] person.get('name', 'Strice') #returns Nowell Strice

Look at Dictionary Keys in Python

person.keys( ) #returns ['name', 'gender']

Updating Dictionaries in Python

person.update({ 'favorites': [42, 'food'], 'gender': ['male'], })

Look at Dictionary Key Values in Python

person.values( ) #returns ['Nowell', 'male']

Converting objects to strings in Python

syntax: str(object) a = str(3.14159) print(a) #returns '3.14159'


संबंधित स्टडी सेट्स

Pathophysiology (Select all that apply.) - Final

View Set

Intro to Physical Fitness and Sport Final Exam

View Set

FL 2-15 Chapter 3 Practice Questions

View Set

Civil Liberties and Civil Rights

View Set