DATA1002 Weekly Quizzes

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Distortion, in the context of charts, is when: a) Visualisations show items with scale that is not the same. b) You show meaningless interactions. E.g. lines joining intermediate points which are not meaningful c) When your data is grouped, but the grouping isn't clear d) You don't label the axis

A

In a decision tree classifier: a) a decision uses one attribute, and compares the value of that attribute to some threshold(s), to decide which is the next step of the path to follow in the tree b) each decision uses a different attribute from all the other decisions in the tree c) each path through the decision tree ends with a different class/label being assigned to the case d) all the decisions are based on the same attribute, but they use different thresholds for choosing the next step of the path

A

In one iteration of the k-means clustering algorithm... a) each item in the dataset is allocated to the cluster whose current cluster representative is closest to the data item b) each item in the dataset is allocated to the cluster whose current cluster representative is furthest from the data item c) each item in the dataset is labelled with the average of the k nearest cluster representatives d) each item in the dataset is allocated to the same cluster as the data item that is closest to the given data item

A

In typical data science lifecycles, what should be the first activity? a) determine the goal for the project b) construct a model of the domain c) produce a report of the findings d) explore the data

A

Suppose df is a Pandas dataframe, with 'temp' as one column. What Pandas expression is a dataframe that contains the rows of df in which 'temp' has a value less than 20? a) df[df['temp']<20] b) df.temp<20 c) df['temp']<20 d) df['temp'<20]

A

Suppose dict is a dictionary. When would the assignment statement dict[x] = y cause an error? a) None of these would cause an error b) When y is not already a value in the dictionary dict c) When y is already a value in the dictionary dict d) When x is already a key in the dictionary dict e) When x is not already a key in the dictionary dict

A

Which values are used for the variable x in executing a loop for x in range(5): a) 1, 2, 3, 4 b) 1, 2, 3, 4, 5 c) 0, 1, 2, 3, 4 d) 0, 1,2, 3, 4, 5

C

Which visual attribute among the following is the best for representing a quantitative data attribute? a) shape b) texture c) y-position d) angle

C

Which of the following is NOT a possible consequence if metadata is missing? a) You accidentally use the data without permission. b) You accidentally misinterpret values in the data. c) You cannot understand the structure of the data. d) You calculate the wrong quantity because some of the data is incorrect

D

Which of the following is a real number that cannot be represented exactly in IEEE 754 floating point representation? a) 0.5 b) 2.0 c) -3.0 d) 0.1

D

Which of the following is not a text file? a) a file with data in XML format b) A file with data in JSON format c) a file with data in CSV format d) A file with spreadsheet data in .xlsx format

D

Which of the following is not likely to be an example of a stakeholder, to whom the findings of the project should be communicated? a) the client who paid for the project b) scholars investigating similar issues c) the subjects whose data was used d) the programmer who wrote the analysis software you used

D

The entries of the confusion matrix for a model give: a) The number of cases where the model predicts class A and the true value is class B b) The average time taken to classify cases where the model predicts class A and the true value is class B c) The accuracy of the model in cases where the model predicts class A and the true value is class B d) The benefit obtained by the user in cases where the model predicts class A and the true value is class B

A

To deploy a predictive model, means: a) use the model to calculate the predicted output on a given case based on the input features b) measure how accurate is the model c) find the model within a family that performs well on training data d) measure how long it takes to find the model

A

To write a reference to cell in row 6 and column D on worksheet called Data1, a formula would include: a) Data1!D6 b) D6.Data1 c) Data1(D6) d) Sheet(Data1,D,6)

A

What is printed when executing the following Python code cond = "fred" if cond: print("yes") else: print("no") print(cond) a) yes fred b) no fred c) yes True d) nothing is printed; this code causes an error when executed

A

What is printed when we execute the following Python code? list = [2.3, 4.1, 7.2] for x in list: print(x+1) print("OK") a) 3.3 OK 5.1 OK 8.2 OK b) 3.3 5.1 8.2 OK c) 3.3 OK 6.1 OK 10.2 OK d) 3.3 6.1 10.2 OK

A

What is the Excel syntax for the logical condition: either D6, or (E7 and E8) a) OR(D6,AND(E7,E8)) b) D6+(E7*E8) c) D6 OR (E7 AND E8) d) D6.OR.(E7.AND.E8)

A

What is the value of alist after the following code: alist = ["hello", "world", "from", "Sydney"] alist[2] = alist[1].upper() a) ["hello", "world", "WORLD", "Sydney"] b) ["hello", "HELLO", "from", "Sydney"] c) ["hello", "world", "FROM", "Sydney"] d) ["hello", "world", "from", "Sydney"]

A

What is the value of alist after the following code: alist = [[3, 2], [4, 0, -1], [8, 5]] blist = alist[0] blist.reverse() a) [[2, 3], [4, 0, -1], [8, 5]] b) [[8, 5], [4, 0, -1], [3, 2]] c) [[3, 2], [4, 0, -1], [8, 5]] d) [[2, 3], [-1, 0, 4], [5, 8]]

A

When calling a function is with keyword arguments, what of the following is not true? a) the keyword arguments have to be given in the same order as in the function definition b) a keyword argument is written in the calling code like an assignment (eg argname=value) c) the name of the keyword argument must be exactly the same as in the function definition d) if any argument is not given in the calling code, that argument gets its default value

A

When executing the Python code below, what is printed out? def fn(x, y): z = x*y print(z) return z a = 3 b = 4 c = 0 print(a) print(b) print(c) c = fn(a,b) print(c) a) 3 4 0 1 2 1 2 b) 3 4 0 1 2 0 c) 3 4 1 2 1 2 1 2 d) 1 2 3 4 0 1 2

A

Which approach is helpful to support fine-grained access control, where access to some parts of a dataset might be given to different people than access to other parts of the dataset? a) data access is mediated, through some software or application b) file system permissions determine access c) files are immutable, so access is always granted because the contents can't be damaged d) files are kept in a version management system

A

Which of the following cannot be altered by changing the format of a cell? a) the value of the cell b) the colour displayed in the cell c) the alignment used to display the cell d) the number of digits shown e) when the cell contains a number

A

Which of the following examples of a dataset, would be described as coming from a sensor that measures some aspect of the world? a) An astronomy dataset with intensity of radio signal from different galaxies b) a marketing dataset recording which pages of a website users have visited c) a demography dataset with responses from the Australian census responses d) a political science dataset with indications of which party people plan to vote for

A

Which of the following is NOT likely to be a source of unfairness in the predictive model produced by a machine learning algorithm? a) a training algorithm that was coded to introduce discrimination into the model b) a training set with many more data items from one group of subjects compared to another group c) a training set where the labels were produced by biased people or in a biased society d) a data set that includes information about the protected attribute (eg race or gender)

A

Which of the following is not a feature of a chart? a) The meaning of the data b) Chart title c) Grid lines d) Legend

A

Which of the following is true about Python strings? a) a Python string can include any character, including Unicode, punctuation, and non-printing special characters b) a Python string includes letters, digits, or underscore (_), but the first character must be a letter c) a Python string can include any character except quotes d) a Python string can include letters and underscore, but not other characters such as punctuation or digits

A

Which of the following situations is NOT an issue regarding data quality? a) The volume of data is too large to accurately understand b) There are missing values in the data set c) The values in the dataset are inconsistent. E.g. sometimes spelling a word with American spelling, and sometimes using British spelling for the same word d) The data contains a number of default values

A

Which of these is NOT a goal of charting? a) Make trends difficult to interpret so that this will force the audience to carefully reflect on their understanding of the data b) Reveal the data at multiple levels of detail c) Make the data look interesting d) Abstract away the details and reveal trends

A

A file key is: a) the name of the file b) a column or combination of columns, which hold the most valuable information of the file c) a column or combination of columns, whose values are distinct among the rows of the file d) a row with missing entries that need to be found

C

Which of these is not a visual attribute of a mark on chart? a) slope b) colour c) the value of the data d) x-position

C

What term is used for a logical file format, in which the same values occur in many rows, because they are directly related to another attribute that is in the rows (for example, the manufacturer of a device might be stored in every row which is an observation from that device)? a) denaturalised b) denormalised c) naturalised d) normalised

B

A content-based recommender system recommends new items: a) By finding items with content that is popular among all users b) By finding items with content similar to those the user has already liked c) By finding items that have new and unseen content d) By finding items that do not contain content the user does not like

B

A version control repository stores: a) All versions of a codebase and/or dataset that were written by any individual with access to the repository b) All versions of a codebase and/or dataset that were written and 'checked in' by any individual with access to the repository c) All versions of a codebase and/or dataset written and 'checked in' by one particular individual user d) All versions of a codebase and/or dataset that were written by one particular individual user

B

After executing the following code, what is printed? dict1 = { 5:6, 7:8, 3:10 } dict2 = dict1 dict1[7] = 0 dict2[7] = 2 print(dict1[7])) a) 8 b) 2 c) 0 d) [8,0]

B

Collaborative filtering is when the system suggests items by: a) group items that are similar and making suggestions from a group if a user has already liked a number of items from that group b) looking at other users who are similar to you and recommending items they have liked c) recommending the items that are most popular, considering the ratings among all the users d) combining data across multiple platforms. E.g. combing a user's itunes downloads and Spotify playlists

B

Consider the Python code below: x = 0 y = 2 if x < y then: print("Stage 1") z = y-3 if x < y-3: print("Stage 2") w = z print(w) This has some syntax errors. Which of the following is not a description of a syntax problem with this code? a) it is not valid to indent a statement (z = y-3) by more spaces than the previous statement in a case where the previous statement isn't some variety of control-flow b) it is not valid to have a statement (w = z) whose indentation is less than the statement before, but it doesn't line up with any previous statement's indentation c) an if statement should not have the word then before the colon (':') character d) It is not valid to have an if in the true-block of another if

B

Equalised odds is a fairness property of a binary classification where: a) the value of TP/(TP+FP) should be the same in each group of subjects defined by a protected characteristic b) the value of TP/(TP+FN) should be the same in each group of subjects defined by a protected characteristic, and also the value of FP/(FP+TN) should be the same in each group defined by a protected characteristic c) the value of TP/(TP+FP+FN+TN) should be the same for each group of subjects defined by a protected characteristic d) the value of (TP+TN)/(TP+FP+FN+TN) should be the same in each group of subjects defined by a protected characteristic

B

In ASCII encoding, how many bytes are used to store 1 character? a) 8 b) 1 c) the number of bytes varies depending on the character d) 2

B

In Australia, which of the following is not a protected attribute, on which it is illegal to discriminate in education and employment? a) disability b) visa status c) gender d) race

B

Quantitative data is: a) Data that can only be compared for equality b) Data represented by a number c) Data which describes in words how much of something there is. E.g. Small, medium or large d) Data that can be compared by ordering but not for nearness

B

Suppose df is a Pandas dataframe, with with 'temp' as one column. What Pandas expression gives the largest value of 'temp' among all the rows of df? a) df[temp].max() b) df['temp'].max() c) df.temp.max d) df.max('temp')

B

The term "over-fitting" refers to a situation where: a) the training process for a predictive model has taken a very long time to execute b) a predictive model performs very well in predicting for the training data, but not well in predicting test data or new cases c) a predictive model performs quite poorly in predicting for the training data, but it performs well on predicting for test data or new cases d) a predictive model has been trained so it can execute very quickly when deployed

B

What is printed when we execute the following Python code? word = 'Fred' print('word') a) 'word' b) word c) Fred d) 'Fred'

B

What is the output when we run the following program: dict = { 5:6, 7:8, 3:10 } for a in dict: print(a) a) 6 8 1 0 b) 5 7 3 c) 5:6 7:8 3:10 d) { 5:6, 7:8, 3:10 }

B

What is the term for the marks along an axis of a chart, that show the scale used for that axis? a) title b) ticks c) caption d) legend

B

What is the value of clist after the following code: alist = [3, 2, 8, -1] blist = alist clist = blist alist[0] = 6 blist = alist[0:2] a) [3, 2] b) [6, 2, 8, -1] c) [6, 2] d) [3, 2, 8, -1]

B

What kind of data attribute is the name of a city? a) Visual b) Nominal c) Numeric d) Ordered

B

When a programmer is given a data analysis task to calculate some aggregate value, what should the code do when there are no items to aggregate? a) an error message should be printed in this situation b) it depends; the programmer shoulld ask the users to see what they want done in this situation c) it depends, the code should do whatever the corresponding function does in Python, when called on an empty list d) the value 0 should be the output in this situation

B

When might data be lost, if it is kept in a file system? a) When the machine is shut down b) When a disk failure occurs c) When the program finishes running d) When a new file is created

B

Which of the following about default values is FALSE: a) Default values may be more common than nearby values, in your dataset b) Default values should be treated in the same way as all the other values for that attribute c) Often, it is sensible to treat a default value as a missing value d) You may not be able to detect default values

B

Which of the following is true about the value stored for a Python variable whose identifier is count? a) The value must be an integer; it can be positive, zero or negative b) Any value can be stored for this variable c) The value must be a number; it can be an integer or a float d) The value must be a non-negative integer (counting something)

B

Which of the following statements about sourcing information from sensors or surveys is FALSE. a) You can get missing values from surveys if someone leaves a question unanswered b) You cannot get an incorrect value from a sensor because the sensor is never wrong c) You can get incorrect values from surveys because the people taking them can lie d) You can get missing values from sensors if the equipment is faulty

B

Which of the following statements is true, about the values that occur in a dictionary? a) all the values must be immutable; you can't have a list or dictionary as a value b) each of the values must be associated with a key in the dictionary c) all the values must be distinct from one another d) each of the values must also occur as a key in the dictionary

B

Which of the following tasks is the closest fit for the pattern of filtered-aggregate, given a dataset with prices of various products in various stores? a) find which store has the most products available b) find the lowest price for products in the Broadway store c) find how many products are available, for each store d) find the average price of all products

B

Given 1 byte of memory, which of the following is true: a) The smallest value you could represent with a signed integer representation is 0. b) The largest value you can store with a signed integer representation is 16. c) You can store a larger positive number with an unsigned representation than you can store with a signed integer representation. d) You can store more different values using unsigned integer representation, than you can store with signed integer representation

C

In 16 bit signed integer representation, the bit patterns that represent negative numbers are: a) those whose MSB (bit 15) is 0 b) those whose LSB (bit 0) is 0 c) those whose MSB (bit 15) is 1 d) those whose LSB (bit 0) is 1

C

In the code below, which variables are called "formal parameters" of the function? def fn2(x, y): z = x - y +1 return z a = 3 b = 4 c = fn2(a,b) a) a, b b) c c) x, y d) x, y, a, b

C

Integer overflow is: a) When small rounding errors accumulate resulting in large errors in the final calculation. b) When a signed integer is incorrectly interpreted as an unsigned integer. c) When the computer attempts to store a value that is either too large or too small to be represented with the given number of bits. d) When the computer attempts to store too many integers simultaneously.

C

Machine learning refers to: a) creating a computer system to answer multichoice exam questions b) creating a computer system that simulates a model of the domain as it develops through time c) creating a computer system that performs a task better when it is given more data d) creating a computer system based on asking experts how they do the task, and copying what they do

C

We want to print the message "Equal" in cases where x is equal to y, and not otherwise. Which Python code will do this? a) if x != y: print("Equal") b) if x <> y: print("Equal") c) if x == y: print("Equal") d) if x = y: print("Equal")

C

What is printed when the Python code below is executed? x = 3 y = 4 z = 5 if x<y: print("Stage1") elif y<z: print("Stage2") else: print("Stage3") print("Stage4") a) Stage1 Stage2 Stage3 Stage4 b) Stage1 Stage3 Stage4 c) Stage1 Stage4 d) Stage1 Stage2 Stage4

C

What is printed when the following code is run? alist=[3,4,-1,6,2] blist=[x*2 for x in alist if x<0] print(blist[0]) a) -1 b) 3 c) -2 d) 6

C

What is printed when we execute the following Python code? x='Fred' y="Fred" if x==y: print("equal") else: print("different") a) this code causes an error when it is run b) fred c) equal d) different

C

What is the Python statement that, when executed inside the body of a loop, immediately stops the loop and transfers control to the lines after the whole loop? a) continue b) exit c) break d) else

C

What is the meaning when we say that a function has a side-effect? a) the function code body contains some assignment to a local variable b) after calling the function, execution will eventually continue in the calling code c) calling the function may lead to a change in some variable of the calling code, that is not explicitly modified in the calling code d) calling the function may lead to calculations being performed by the function code body

C

What is the value of alist after the following code: alist = [3, -1, 6] alist.extend([7, 11]) a) [[7, 11], 3, -1, 6] b) [7, 11, 3, -1, 6] c) [3, -1, 6, 7, 11] d) [3, -1, 6, [7, 11]]

C

What term is used to describe a process of gathering data by browsing to web pages, and taking values that are found there, and putting those values into a dataset. a) web saving b) web cleaning c) web scraping d) web quality control

C

When a return statement is encountered during execution of a function call, what happens next? a) execution continues with the following statement in the function code body b) execution continues with the first line of the function code body c) execution continues in the calling code d) the program execution is halted

C

When training machine learning models, supervised learning is when: a) There are a lot of specified constraints on your model. b) You closely monitor the training process, so you can control the process. c) The training data is labelled with the correct value d) You practice training the model on a very small section of the data.

C

Which of the following gives the best definition of metadata? a) Metadata is a table containing statistics relevant to the data. b) Metadata is a description of how the data was obtained. c) Metadata refers to any piece of information describing the data. d) Metadata is a copy of the data in a reduced format.

C

Which of the following is NOT an example of reciprocal recommendation? a) A company hiring an employee b) A person deciding who to date c) A person ordering food d) A person choosing their university degree

C

Which of the following is NOT metadata that would be recorded with a version in a repository? a) The author b) The date and time the version was submitted c) Which merge conflicts were resolved when the version was created d) The name or ID of the version

C

Which of the following is not a suitable way to write code so that it prints "different" in cases when x is not equal to y, but nothing is printed when x and y are the same? a) if x==y: pass else: print("different") b) if x!=y: print("different") c) if x==y: else: print("different") d) if x!=y: print("different") else: pass

C

Which of the following is not a usual expectation for professional conduct? a) respect privacy b) avoid harm c) do not work without payment d) work only when competent

C

Which of the following is not true about Pandas? a) Pandas operations can be quite efficient b) Pandas allows thinking at a high level, because it can perform complex operations in a single function c) Pandas is a programming language, that extends Python with extra features, and requires special software to run d) Pandas offers many ways to do the same thing

C

Which of the following statements is NOT true about hyper-parameters? a) hyper-parameters are needed when the algorithm of the training process is not completed specified b) hyper-parameters determine which predictive model is produced c) hyper-parameters are changed during the period when the predictive model is deployed d) hyper-parameters influence the way the training searches for a good predictive model

C

Consider the following Python code. Which of the variables have scope that is local to the function foo? def foo(x): a = x+b c = a*2 return c a) x,a,b,c are the local variables or formal parameters of foo b) x,b are the local variables or formal parameters of foo c) x is the only local variable of foo d) x,a,c are the local variables or formal parameters of foo

D

Given a dataset with prices of various products in various stores, which of the following tasks is most similar in pattern to "find how many products are available, for each store in NSW"? a) how many products are available for less than $20 at some store in NSW b) print the average price of products in the "clothing" category which are available somewhere in NSW c) is there some product that is available in each store in NSW d) what is the highest price charged anywhere, for each product in the "homeware" category

D

How many arguments does an Excel function have? a) exactly two b) exactly one c) one or more d) any number

D

If a cell contains the formula '=SQRT(-1)', which of the following errors will you get? a) #VALUE! b) #N/A! c) #NULL! d) #NUM!

D

If alist = [3,5,-1,6], what is another way to refer to a[3]? a) end(alist) b) alist[-2] c) alist.last() d) alist[-1]

D

If cell A1 contains the text hello world, what would the formula =LEFT(A1,FIND("r",A1)) evaluate to? a) 9 b) hello wo c) 8 d) hello wor

D

In Pandas, an index refers to: a) the number of rows or columns b) the data type of all the entries in a column c) a dataframe where all the values are Boolean d) the labels used to access the rows or columns

D

Suppose dict is a dictionary. When would the assignment statement y = dict[x] cause an error? a) When x is already a key in the dictionary dict b) None of these causes an error c) When y is not already a value in the dictionary dict d) When x is not already a key in the dictionary dict e) When y is already a value in the dictionary dict

D

Suppose that the file foo.txt contains the 3 lines: hello friend study What is printed when the following Python code is executed? for word in open("foo.txt"): print(word.rstrip("\n") + "!") print ("silly") a) hello! silly friend! silly study! silly b) hello friend study silly c) hello! friend! study! silly d) hello! friend! study! silly

D

What is the state after executing the following Python code? x = 1 y = 3 z = y - x x = 4 y = 7 w = z+1 a) x stores 4, y stores 7, z stores 3, w stores 4 b) w stores the value 3 c) w stores the value 3 d) x stores 4, y stores 7, z stores 2, w stores 3

D

What is true about a wide format for a structured file? a) it is not possible to have a structured file in a wide format; that only works for data inside a Python program b) there are one row and one column that holds related information in a list of tuples (for example, information about price in different year is shown by a colums with a list of (year,price) pairs) c) there are several rows that hold related information that vary in a way indicated in some column (for example, information about price in different years is shown in several rows each containing the year as a column and the price as a column) d) there are several columns that hold related information that vary in a way indicated in the column name (for example, information about price in different year is shown by columns named as price_<yearvalue>))

D

When creating a chart in Excel, what is true about the axis ticks and labels? a) Excel provides axis ticks and labels, which cannot be altered by the user b) Charts always have blank axes (no ticks, no label) c) Excel starts with blank axes (no ticks or label), but the user can provide these aspects if they wish d) Excel provides default choices for axis ticks and labels; but the user can alter these if they wish

D

When creating a model, we have both a training set and a test set. Which of the following is TRUE? a) The accuracy of our model on the training set is an excellent indicator for how well our model will do on the test set. b) Both the training and test set are used to train the model, then at the end, we use the test set to assess the accuracy of the model c) The training set is labelled data with the correct values of the output, and the test set is unlabelled, where we do not know the correct values of the output variable. d) The test set is completely separate from the training set and is used to assess the accuracy of the model

D

Which of the following statements is TRUE? a) Regression analysis should not be used unless the output depends linearly on the input variables b) The output of a regression model is to predict a nominal (discrete) variable c) A regression model should not be used unless all the attributes are quantitative d) The output of a regression model is to predict a quantitative (numerical) variable.

D

Which of the following statements is true about the keys in a dictionary? a) the keys must be simple and unstructured; so no key can be a tuple or list b) the keys must all be the same type; as one another c) the keys must all be integers d) the keys must be immutable; so no key can be a list or dictionary

D

Why is succession planning important, in connection to persistent data? a) Several system failures may happen in a short period of time b) Cloud providers may increase their charges c) Hardware is expensive while people are not d) Storage/backup processes depend on particular people, who may leave the project

D

Which statement does not mutate a Python list? a) alist b) alist.reverse() c) alist.sort() d) alist.pop() e) alist.copy()

E


Ensembles d'études connexes

Accounting 2: Chapter 5 (exam 2)

View Set

First Aid p. 232, Serum tumor markers and psammoma bodies

View Set

Ambush Marketing and Sponsorship

View Set