MISY 160 Exam 3 University of Delaware
How to classify as big data
5 Vs: Volume, Velocity, Variety, Veracity, and Value
Veracity
Accuracy and trustworthiness of the data
Skill data analysts need to have
Analytical skills
What does a database administrator do?
Architecting how data is shared
Velocity
Big data is generated at a very high speed
Frameworks to store and process big data
Cassandra, Hadoop, Spark
Second phase in data science project
Data acquisition, which is gathering and scraping data from multiple sources like web servers, log files, databases, APIs, online
Roles/positions of a data scientist
Data analyst; Machine learning engineer; Deep learning engineer; Data engineer, Data scientist
Fifth phase in data science project
Data modeling
Third phase in data science project
Data preparation, which involves data cleaning and data transformation
What data cleaning is
Dealing with inconsistent data types, misspelled attributes, missing and duplicate values
Seventh phase in data science project
Deploying and maintaining the model
What does a software developer do?
Develop applications
Values of data science
Discovering the best time and route to ship for logistics companies like DHL, Fedex and predicting employee attrition
What Map Reduce is
Doing parallel processing: Lengthy task is broken into smaller tasks, one machine takes up one task, multiple machines complete
Example of structured data
Excel records
Fourth phase in data science project
Exploratory data analysis
How Hadoop distributed file systems
File is broken into smaller chunks, making copies of them, and storing them in various machines
What does a quality assurance analyst do?
Finding problems within the applications, testing the applications to find problems - what the system is supposed to accomplish
Benefits of big data
Improving users experience in the game industry, predicting a hurricane landfall earlier in weather forecasting and disaster
Example of semi-structured data
Log files
What does a data analyst do?
Look at data to find meaning to solve real business problems
What does a project manager do?
Managing people, keeping track of schedules, communicating risks
Big data
Massive amounts of data
First phase in data science project
Meeting with clients, asking relevant questions, understanding and defining objectives for the problem that needs to be tackled
Tools for data modeling
Python, R, SAS
Various data types
Structured, semi-structured, unstructured data
Tools for data visualization and communication
Tableau, Power BI, QlikView
Tools for complex data transformation
Talend, Informatica
Value
The benefit from analyzing the data
Skill software developer needs to have
Understanding progragramming languages and databases
What does a business analyst do?
Understanding what the business is trying to accomplish, the processes, the stakeholders, the goals, and the current system
Sixth phase in data science project
Visualization and communication
Example of unstructured data
X-ray images