Data Analytics Lifecycle
What are the 6 steps of the data analytics life cycle?
1. Discovery 2. Data prep 3. Model Planning 4. Model building 5. Communicate results 6. Operationalize
Who is responsible for each phase of the data analytics lifecycle?
1. Discovery - Bus User, Primary sponsor, BIA, DS 2. Data prep - Data engineer, DBA, BIA, DS 3. Model planning - DS, BIA 4. Model building - DS 5. Communicate results - DS, BIA, Bus User, Primary Sponsor 6. Operationalize - data engineer, DBA, BIA, DS BIA = Business Intelligence analyst DS = Data scientist DBA = Database Admin
What are the steps for the fifth phase of the data analytics lifecycle
Communicate Results 1. Communicate results to stakeholders 2. identify key findings (about 3) 3. success or fail based on criteria from first phase 4. Deliverables -depends on audience but could include: code, paper, presentation, technical report
What are the steps for the second phase of the data analytics lifecycle
Data Prep (> 50% of project) 1. Get data (ETLT) 2. Condition data -clean, normalize, transform into quality data 3. survey/visualize data -ex. distribution over time, patterns, skewness, unexpected values 4. prep analytic sandbox
What are the steps for the first phase of the data analytics lifecycle
Discovery: 1. identify business problem/opportunity -> change to analytics problem/opportunity -ex. strategy doc 2. Set success/fail criteria 3. Assess Resources 4. Deliverable: Analytic Plan
What is the question for model planning phase?
Do I have a good idea about the type of model to try? Can I refine the analytic plan?
What is the question for data prep phase?
Do I have enough good quality data to start building the model?
What is the question for the discovery phase?
Do I have enough information to draft an analytic plan and share for peer review?
What is the question for model building phase?
Is the model robust enough? Have we failed for sure?
What are the steps for the fourth phase of the data analytics lifecycle
Model Building 1. Partition data -training/validation/test 2. build model(s) -record modeling decisions and assumptions! 3. evaluate model(s) -avoids false positives, make sense to domain expert, accurate
What are the steps for the third phase of the data analytics lifecycle
Model Planning 1. Data Exploration -ex. correlation between variables 2. Variable selection -most essential, reduce number of variables 3. model selection -ex. classification, association, clustering, predictions
What are the steps for the sixth phase of the data analytics lifecycle
Operationalize 1. Run pilot 2. run production model/automate 3. monitor model -ex. degradation, retrain/revise