Data analyst

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is the difference between Data Mining and Data Analysis(3)

Results of data mining are not always easy to interpret. Data analysts interpret the results and convey the to the stakeholders.

What do you know about interquartile range as data analyst?

A measure of the dispersion of data that is shown in a box plot is referred to as the interquartile range. It is the difference between the upper and the lower quartile.

Explain the typical data analysis process.(4)

Validation In this step, the model provided by the client and the model developed by the data analyst are validated against each other to find out if the developed model will meet the business requirements.

Mention few best practices that you have followed while data cleansing.(4)

Tracking all the cleaning operations performed on the data is very important so that you repeat or remove any operations as necessary.

What is the criteria to say whether a developed data model is good or not?(3&4)

Any major data changes in a good data model should be scalable. A good data model is one that can be easily consumed for actionable results.

Mention few best practices that you have followed while data cleansing.(1)

Developing a data quality plan to identify where maximum data quality errors occur so that you can assess the root cause and design the plan according to that.

How often should you retrain a data model?

A good data analyst is the one who understands how changing business dynamics will affect the efficiency of a predictive model. You must be a valuable consultant who can use analytical skills and business acumen to find the root cause of business problems.

Mention some common problems that data analysts encounter during analysis.(3&4)

Common Misspelling and Duplicate entries are a common data quality problem that most of the data analysts face. Having different value representations and misclassified data

Explain the typical data analysis process.(1)

Data Exploration - Having identified the business problem, a data analyst has to go through the data provided by the client to analyse the root cause of the problem.

What is the difference between Data Mining and Data Analysis(2)

Data Mining depends on clean and well-documented data. Data analysis involves data cleaning.

What is the difference between Data Mining and Data Profiling?(DM)

Data Mining refers to the analysis of datasets to find relationships that have not been discovered earlier. It focusses on sequenced discoveries or identifying dependencies, bulk analysis, finding various types of attributes, etc.

Explain the typical data analysis process.(3)

Data Modelling The modelling step begins once the data has been prepared. Modelling is an iterative process wherein the model is run repeatedly for improvements. Data modelling ensures that the best possible result is found for a given business problem.

Explain the typical data analysis process.(2)

Data Preparation This is the most crucial step of the data analysis process wherein any data anomalies (like missing values or detecting outliers) with the data have to be modelled in the right direction.

What is the difference between Data Mining and Data Profiling?(DP)

Data Profiling, also referred to as Data Archeology is the process of assessing the data values in a given dataset for uniqueness, consistency and logic. Data profiling cannot identify any incorrect or inaccurate data but can detect only business rules violations or anomalies. The main purpose of data profiling is to find out if the existing data can be used for various other purposes.

What are the important steps in data validation process?(1)

Data Validation is performed in 2 different steps- Data Screening - In this step various algorithms are used to screen the entire data to find any erroneous or questionable values. Such values need to be examined and should be handled

What are the important steps in data validation process?(2)

Data Validation is performed in 2 different steps- Data Verification- In this step each suspect value is evaluated on case by case basis and a decision is to be made if the values have to be accepted as valid or if the values have to be rejected as invalid or if they have to be replaced with some redundant values.

How will you handle the QA process when developing a predictive model to forecast customer churn?

Data analysts require inputs from the business owners and a collaborative environment to operationalize analytics. To create and deploy predictive models in production there should be an effective, efficient and repeatable process. Without taking feedback from the business owner, the model will just be a one-and-done model. The best way to answer this question would be to say that you would first partition the data into 3 different sets Training, Testing and Validation. You would then show the results of the validation set to the business owner by eliminating biases from the first 2 sets. The input from the business owner or the client will give you an idea on whether you model predicts customer churn with accuracy and provides desired results.

What is the difference between Data Mining and Data Analysis(4)

Data mining algorithms automatically develop equations. Data analysts have to develop their own equations based on the hypothesis.

What is the difference between Data Mining and Data Analysis(1)

Data mining usually does not require any hypothesis. Data analysis begins with a question or an assumption.

Mention few best practices that you have followed while data cleansing.(2)

Follow a standard process of verifying the important data before it is entered into the database.

What is data cleansing?

From a given dataset for analysis, it is extremely important to sort the information required for data analysis. Data cleaning is a crucial step in the analysis process wherein data is inspected to find any anomalies, remove repetitive data, eliminate any incorrect information, etc. Data cleansing does not involve deleting any existing information from the database, it just enhances the quality of data so that it can be used for analysis.

Mention some common problems that data analysts encounter during analysis.(1&2)

Having a poor formatted data file. For instance, having CSV data with un-escaped newlines and commas in columns. Having inconsistent and incomplete data can be frustrating.

Mention few best practices that you have followed while data cleansing.(3)

Identify any duplicates and validate the accuracy of the data as this will save lot of time during analysis.

Explain the typical data analysis process.(5)

Implementation of the Model and Tracking This is the final step of the data analysis process wherein the model is implemented in production and is tested for accuracy and efficiency.

According to you what are the qualities/skills that a data analyst must posses to be successful at this position.

Problem Solving and Analytical thinking are the two important skills to be successful as a data analyst. One needs to skilled ar formatting data so that the gleaned information is available in a easy-to-read manner. Not to forget technical proficiency is of significant importance. You can also talk about other skills that the interviewer expects in an ideal candidate for the job position based on the given job description.

How often should you retrain a data model?

The best way to answer this question would be to say that you would work with the client to define a time period in advance. However, I would refresh or retrain a model when the company enters a new market, consummate an acquisition or is facing emerging competition. As a data analyst, I would retrain the model as quick as possible to adjust with the changing behaviour of customers or change in market conditions.

What is the criteria to say whether a developed data model is good or not?(1&2)

The developed model should have predictable performance. A good data model can adapt easily to any changes in business requirements.

How will you create a classification to identify key customer trends in unstructured data?

You can answer this question by stating that you would first consult with the stakeholder of the business to understand the objective of classifying this data. Then, you would use an iterative process by pulling new data samples and modifying the model accordingly and evaluating it for accuracy. You can mention that you would follow a basic process of mapping the data, creating an algorithm, mining the data, visualizing it and so on. However, you would accomplish this in multiple segments by considering the feedback from stakeholders to ensure that you develop an enriching model that can produce actionable results.

You are assigned a new data anlytics project. How will you begin with and what are the steps you will follow?

You can start answering this question by saying that you will start with finding the objective of the given problem and defining it so that there is solid direction on what need to be done. The next step would be to do data exploration and familiarise myself with the entire dataset which is very important when working with a new dataset.The next step would be to prepare the data for modelling which would including finding outliers, handling missing values and validating the data. Having validated the data, I will start data modelling untill I discover any meaningfuk insights. After this the final step would be to implement the model and track the output results.


Conjuntos de estudio relacionados

Enterococcus faecalis and E. faecium

View Set

NUR 221 Pharm EAQ - Chapter 81: Vitamins

View Set

Comprehensive Biology SOL Review 1

View Set

Management of EMS full, EMS Incident Management (NIMS), Management of EMS Education, EMS: Terrorism & Disaster Managment, Management of EMS Chapter 1, Management of EMS Chapter 2, EMS 315 test 2, Management of EMS Chapter 1, EMS Management Mid-Term,...

View Set