BUS-S364 Quiz 4

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

What will be the result of this code? import pandas as pd df = pd.DataFrame({'X': [1, 2, 2, 3, 3, 3]}) print(df.duplicated()) A) 0 False 1 False 2 True 3 False 4 True 5 True dtype: bool B) False False True False True True C) Error D) A list of indexes of duplicated rows

A) 0 False 1 False 2 True 3 False 4 True 5 True dtype: bool

What will be the result of this code? import pandas as pd import numpy as np df = pd.DataFrame({'Age': [22, 25, np.nan, 30]}) df_clean = df.dropna() print(len(df_clean)) A) 3 B) 4 C) 1 D) Error

A) 3

Given this DataFrame, what will be the output of the code? df = pd.DataFrame({ 'A': [1, 2, np.nan], 'B': [4, np.nan, 6] }) print(df.fillna(0)) A) A DataFrame with all NaNs replaced with 0 B) A DataFrame unchanged C) A DataFrame with only column A modified D) Error

A) A DataFrame with all NaNs replaced with 0

Data is missing without any pattern (e.g., a survey response lost due to random tech error) is an example of what type of missing data? A) MCAR B) MAR C) MNAR

A) MCAR

Which of the following are methods for detecting or dealing with outliers in a DataFrame? (Select all that apply) A) Using Interquartile Range (IQR) B) Using mean imputation C) Filtering values beyond Q1 - 1.5IQR and Q3 + 1.5IQR D) Using string replacement

A) Using Interquartile Range (IQR); C) Filtering values beyond Q1 - 1.5IQR and Q3 + 1.5IQR

the following methods can be used to replace specific values in a DataFrame? (Select all that apply) A) df.replace() B) df.map() C) df.sub() D) df.fillna()

A) df.replace(); B) df.map(); D) df.fillna()

5) Which of the following are valid functions to check for missing values in a DataFrame? (Select all that apply) A) pd.isnull() B) df.has_null() C) pd.notna() D) df.isnull()

A) pd.isnull(); C) pd.notna(); D) df.isnull()

Write a line of code that returns all rows in df where the value in the Age column is greater than 30.

df.loc[df['Age'] > 30]

Write a simple code snippet that renames the column old_name to new_name in a DataFrame called df.

df.rename(columns={'old_name': 'new_name'}, inplace=True)

Write a short Python snippet to remove all rows in a DataFrame df that contain any missing values. Name the cleaned df "df_clean".

df_clean = df.dropna()

True or False: If data is Missing Completely at Random (MCAR), dropping those rows will not bias your analysis.

True

What does this code print? df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Score': [np.nan, 85, 90]}) print(df['Score'].isnull()) A) 0 False 1 False 2 False Name: Score, dtype: bool B) 0 True 1 False 2 False Name: Score, dtype: bool C) True False False D) Error

B) 0 True 1 False 2 False Name: Score, dtype: bool

What will be the output of the following code: import pandas as pd import numpy as np df = pd.DataFrame({'A': [1, 2, np.nan]}) print(df.isnull().sum()) A) A 0 dtype: int64 B) A 1 dtype: int64 C) A NAN dtype: float64 D) Error

B) A 1 dtype: int64

What does df.duplicated() return? A) A new DataFrame without duplicate rows B) A Boolean Series marking duplicates as True C) The number of duplicate rows D) A list of rows with duplicated column names

B) A Boolean Series marking duplicates as True

Which of the following statements about DataFrames is true? A) DataFrames are indexed only by numbers B) Columns in a DataFrame are Pandas Series C) You must use .iloc[] to retrieve columns D) A DataFrame cannot contain different data types

B) Columns in a DataFrame are Pandas Series

What does the following line of code return? pd.notna(np.nan) A) True B) False C) NaN D) Error

B) False

Data is missing due to a pattern in other variables (e.g., younger people are more likely to skip a question about job title) is an example of what type of missing data? A) MCAR B) MAR C) MNAR

B) MAR

Which types of missing data assume the missingness can be predicted based on other known variables? (Select all that apply) A) MCAR B) MAR C) MNAR

B) MAR; C) MNAR

What will be the output of this code? print(titanic_df.iloc[0]) A) The first column of the DataFrame B) The first row of the DataFrame C) An error, because rows must be accessed with [] D) A subset of rows starting from index 0

B) The first row of the DataFrame

Which of the following are valid ways to retrieve a row from a DataFrame? (Select all that apply) A) df[0] B) df.loc["Name"] C) df.iloc[0] D) df.loc[0]

B) df.loc["Name"]; C) df.iloc[0]

What will the following code return? pd.get_dummies(df['Color']) A) A count of each color B) A pie chart of the color values C) A DataFrame with binary columns for each unique color D) An error, unless Color is numeric

C) A DataFrame with binary columns for each unique color

Missingness is related to the value itself (e.g., people with high income choose not to disclose it) is an example of what type of missin A) MCAR B) MAR C) MNAR

C) MNAR

What type of missing data is described by the statement: "A person chooses not to answer a survey question because the answer is sensitive"? A) MCAR B) MAR C) MNAR D) None of the above

C) MNAR

Which type of missing data is the most problematic to analyze without special techniques? A) MCAR B) MAR C) MNAR

C) MNAR

You are analyzing a medical dataset where patients with worse symptoms are more likely to skip follow-up questions. What type of missingness is most likely present? A) MCAR B) MAR C) MNAR

C) MNAR

What will this code return? df = pd.DataFrame({ 'A': [1, 2, np.nan], 'B': [3, 4, 5] }) print(df[df['A'].notnull()]) A) Only the row with index 2 B) All rows C) Only rows where column A is not NaN D) Only rows where column B is not NaN

C) Only rows where column A is not NaN

Which of the following best describes Missing Completely At Random (MCAR)? A) The missing data is related to unobserved variables B) The probability of a value being missing depends on observed data C) The missingness has no relationship to any data, observed or unobserved D) Missing values are caused by participant refusal to answer sensitive questions

C) The missingness has no relationship to any data, observed or unobserved

Which of the following is true about Missing At Random (MAR)? A) The missingness depends only on unobserved variables B) The missingness is completely unrelated to the data C) The missingness is related to other observed variables D) MAR is the same as MCAR but with more missing values

C) The missingness is related to other observed variables

What is the purpose of using the cut() function in pandas? A) To find missing data B) To rename index labels C) To segment numeric values into bins D) To shuffle the DataFrame randomly

C) To segment numeric values into bins

Which of the following statements is false about missing data in pandas? A) Missing values in pandas are represented as np.nan B) df.isnull().sum() gives total missing values per column C) df.info() shows only the column names and data types D) Missing data can be found using pd.isna()

C) df.info() shows only the column names and data types (false because df.info() also shows non-null counts)

Match the missing data type to its correct example: MCAR MAR MNAR Examples: A) A participant skips a question about income, but they answered similar demographic questions. B) A glitch randomly caused some data entries to be lost during transfer. C) People with higher debt choose not to answer the "Do you owe any money?" question.

MCAR → B MAR → A MNAR → C


Set pelajaran terkait

Civics 6.3 Practice Quiz Questions

View Set

ANPS 019 Lab 1 Cell Chemistry, Structure, and Protein Synthesis

View Set

4 VOTING AND POLITICAL PARTICIPATION

View Set

Chapter 16: Exchange Rates and International Capital Flows

View Set

Module 22 - Metabolism (10/5/2014)

View Set

Systems Security Certified Practitioner

View Set

No red ink (capitalizing and formatting titles, commonly confused words 1-5)

View Set