MIS 307 Final Part 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

An object's attributes are references to objects of other classes. Embedding references to objects of other classes is a form of software reusability known as ________ and is sometimes referred to as the ________ relationship. inheritance, "is a" composition, "has a" inheritance, "has a" composition, "is a"

composition, "has a"

To create a Decimal object, we can write:value = Decimal('3.14159')This is known as a(n) ________ expression because it builds and initializes an object of the class. builder constructor assembler maker

constructor

In the field of NLP, a text collection is generally known as a ________. corpus compilation volume book

corpus

A goal when designing a relational database is to minimize data ________ among the tables. dependency duplication binding None of the above

duplication

Each new class you create becomes a new data type that can be used to create objects. This is one reason why Python is said to be a(n) ________ language. comprehensive extensible None of the above dynamic typing

extensible

Which of the following statements a), b) or c) is false? One of the most common and valuable NLP tasks is sentiment analysis, which determines whether text is positive, neutral or negative. A sentence that contains the word "good" has positive sentiment. All of the above statements are true. Companies might use sentiment analysis to determine whether people are speaking positively or negatively online about their products.

A sentence that contains the word "good" has positive sentiment.

Which of the following statements a), b) or c) is false? All of the above statements are true. When you call a method for a specific object, Python implicitly passes a reference to that object as the method's first argument, so all methods of a class must specify at least one parameter. When an object of a class is created, it does not yet have any attributes. They're added dynamically via assignments, typically of the form self.attribute_name = value. All methods must have a first parameter self-a class's methods must use that reference to access the object's attributes and other methods.

All methods must have a first parameter self-a class's methods must use that reference to access the object's attributes and other methods.

Given the following Word object: In [1]: from textblob import WordIn [2]: happy = Word('happy') which of the following statements a), b) or c) is false? The TextBlob library uses the NLTK library's WordNet interface, enabling you to look up word definitions, and get synonyms and antonyms. The Word class's define method enables you to pass a part of speech as an argument so you can get definitions matching only that part of speech. All of the above statements are true. The Word class's definitions property returns a list of all the word's definitions in the WordNet database:In [3]: happy.definitionsOut[3]:['enjoying or showing or marked by joy or pleasure','marked by good fortune','eagerly disposed to act or to be of service','well expressed and to the point']

All of the above statements are true.

Which of the following statements a), b) or c) is false? A text's meaning can be influenced by its context and the reader's "world view." Nuances of meaning make natural language understanding difficult. All of the above statements are true. Natural language lacks mathematical precision.

All of the above statements are true.

Which of the following statements a), b) or c) is false? All of the above statements are true. Like TSNE, a PCA estimator uses the keyword argument n_components to specify the number of dimensions, as in:from sklearn.decomposition import PCApca = PCA(n_components=2, random_state=11) The following snippet trains the PCA estimator and produces the reduced data by calling the PCA estimator's fit and transform methods:pca.fit(iris.data)iris_pca = pca.transform(iris.data) The PCA estimator (from the sklearn.decomposition module), like TSNE, performs dimensionality reduction. The PCA estimator uses an algorithm called principal component analysis to analyze a dataset's features and reduce them to the specified number of dimensions.

All of the above statements are true.

Which of the following statements a), b) or c) is false? All of the above statements are true. Much of today's data is so large that it cannot fit on one system. You can configure a multi-node Hadoop cluster using the Microsoft Azure HDInsight cloud service, then use it to execute a Hadoop MapReduce job implemented in Python. As big data grew, we needed distributed data storage and parallel processing capabilities to process vast amounts of data more efficiently. This led to complex technologies like Apache Hadoop for distributed data processing with massive parallelism among clusters of computers where the intricate details are handled for you automatically and correctly.

All of the above statements are true.

Which of the following statements a), b) or c) is false? All of the above statements are true. The "secret sauce" of machine learning is data-and lots of it. We can make machines learn. With machine learning, rather than programming expertise into our applications, we program them to learn from data.

All of the above statements are true.

Which of the following statements a), b) or c) is false? All of the above statements are true. The Iris dataset is referred to as a "toy dataset" because it has only 150 samples and four features. The dataset describes 50 samples for each of three Iris flower species—Iris setosa, Iris versicolor and Iris virginica. The Iris dataset bundled with scikit-learn is commonly analyzed with both classification and clustering. Although the Iris dataset is labeled, we can ignore those labels to demonstrate clustering. Then, we can use the labels to determine how well the k-means algorithm clusters the samples.

All of the above statements are true.

Which of the following statements a), b) or c) is false? All of the above statements are true. The samples in the Iris dataset each have four features. We cannot graph one feature against the other three in a single graph. But we can plot pairs of features against one another in a pairplot. One way to learn more about your data is to see how the features relate to one another.

All of the above statements are true.

Which of the following statements a), b) or c) is false? All of the above statements are true. You can get a Word's synsets-that is, its sets of synonyms-via the synsets property. The result of applying this property to a Word is a list of Synset objects: In [4]: happy.synsetsOut[4]:[Synset('happy.a.01'),Synset('felicitous.s.02'),Synset('glad.s.02'),Synset('happy.s.04')] Each Synset represents a group of synonyms. In the code in Part (a) above, the notation 'happy.a.01':• happy is the original Word's lemmatized form (in this case, it's the same).• a is the part of speech, which can be a for adjective, n for noun, v for verb, r for adverb or s for adjective satellite.• 01 is a 0-based index number. Many words have multiple meanings, and this is the index number of the corresponding meaning in the WordNet database. There's also a get_synsets method that enables you to pass a part of speech as an argument so you can get Synsets matching only that part of speech.

All of the above statements are true.

Which of the following statements a), b) or c) is false? All of the above statements are true. You can merge data from multiple tables, referred to as joining the tables, with INNER JOIN. The INNER JOIN's ON clause uses a primary-key column in one table and a foreign-key column in the other table to determine which rows to merge from each table. Qualified name syntax (tableName.columnName) is required if the columns have the same name in both tables.

All of the above statements are true.

Which of the following statements a), b) or c) is false? Although the Iris dataset is labeled, we can ignore those labels to demonstrate clustering. Then, we can use the labels to determine how well the k-means algorithm clusters the samples. The Iris dataset bundled with scikit-learn is commonly analyzed with both classification and clustering. The Iris dataset is referred to as a "toy dataset" because it has only 150 samples and four features. The dataset describes 50 samples for each of three Iris flower species—Iris setosa, Iris versicolor and Iris virginica. All of the above statements are true.

All of the above statements are true.

Which of the following statements a), b) or c) is false? An important use of POS tagging is determining a word's meaning among its possibly many meanings-this is important for helping computers "understand" natural language. All of the above statements are true. Parts-of-speech (POS) tagging is the process of evaluating words based on their context to determine each word's part of speech. There are eight primary English parts of speechnouns, pronouns, verbs, adjectives, adverbs, prepositions, conjunctions and interjections (words that express emotion and that are typically followed by punctuation, like "Yes!" or "Ha!").

All of the above statements are true.

Which of the following statements a), b) or c) is false? Different database users are often interested in different data and different relationships among the data. You use Structured Query Language (SQL) to define queries. Queries specify which subsets of the data to select from a table. All of the above statements are true. Most users require only subsets of a database table's rows and columns.

All of the above statements are true.

Which of the following statements a), b) or c) is false? If predicted and expected are arrays containing the predictions and expected target values, respectively, evaluating the following code snippets in IPython interactive mode displays the predicted and expected target values for the first 20 test samples:predicted[:20]expected[:20] All of the above statements are true. If predicted and expected are arrays containing the predictions and expected target values, respectively, the following list comprehension locates all the incorrect predictions for the entire test set-that is, the cases in which the predicted and expected values do not match:wrong = [(p, e) for (p, e) in zip(predicted, expected) if p != e] Once we've loaded our data into s KNeighborsClassifier, we can use it with the test samples to make predictions. Calling the estimator's predict method with the test samples (X_test) as an argument returns an array containing the predicted class of each sample:predicted = knn.predict(X=X_test)

All of the above statements are true.

Which of the following statements a), b) or c) is false? Some examples of time series are daily closing stock prices, hourly temperature readings, the changing positions of a plane in flight, annual crop yields, quarterly company profits, and the stream of time-stamped tweets coming from Twitter users worldwide. You can use simple linear regression to make predictions from time series data. All of the above statements are true. time series are sequences of values called observations associated with points in time.

All of the above statements are true.

Which of the following statements a), b) or c) is false? The LinearRegression estimator is in the sklearn.linear_model module. By default, LinearRegression uses all the numeric features in a dataset, performing a multiple linear regression. Simple linear regression uses one feature as the independent variable. All of the above statements are true.

All of the above statements are true.

Which of the following statements a), b) or c) is false? The LinearRegression estimator is in the sklearn.linear_model module. Simple linear regression uses one feature as the independent variable. All of the above statements are true. By default, LinearRegression uses all the numeric features in a dataset, performing a multiple linear regression.

All of the above statements are true.

Which of the following statements a), b) or c) is false? The following code creates a KMeans object: from sklearn.cluster import KMeanskmeans = KMeans(n_clusters=3, random_state=11) The keyword argument n_clusters specifies the k-means clustering algorithm's hyperparameter k (in this case, 3), which KMeans requires to calculate the clusters and label each sample. The default value for n_clusters is 8. We can use k-means clustering via scikit-learn's KMeans estimator (from the sklearn.cluster module) to place each sample in a dataset into a cluster. The KMeans estimator hides from you the algorithm's complex mathematical details, making it straightforward to use. All of the above statements are true.

All of the above statements are true.

Which of the following statements a), b) or c) is false? The following code loads a mask image by using the imread function from the imageio module that comes with Anaconda:import imageiomask_image = imageio.imread('mask_heart.png')This function returns the image as a NumPy array, which is required by WordCloud. To create a word cloud of a given shape, you can initialize a WordCloud object with an image known as a mask. The WordCloud fills non-white areas of the mask image with text. You can use the open source wordcloud module's WordCloud class to generate word clouds with just a few lines of code. By default, wordcloud creates rectangular word clouds, but the library can create word clouds with arbitrary shapes. All of the above statements are true.

All of the above statements are true.

Which of the following statements a), b) or c) is false? The steps mentioned in Part (a) can be performed separately with the TSNE methods fit and transform, or they can be performed in one statement using the fit_transform method, as in:In [5]: reduced_data = tsne.fit_transform(digits.data) All of the above statements are true. Dimensionality reduction in scikit-learn typically involves two steps-training the estimator with the dataset, then using the estimator to transform the data into the specified number of dimensions. TSNE's fit_transform method takes some time to train the estimator then perform the reduction. When the method completes its task, it returns an array with the same number of rows as digits.data, but only the number of columns specified by the n_components argument when you created the estimator object. You can confirm this by checking reduced_data's shape.

All of the above statements are true.

Which of the following statements a), b) or c) is false? You can create your own custom classes. You'll use lots of classes created by other people. All of the above statements are true. Core technologies of object-oriented programming are classes, objects, inheritance and polymorphism.

All of the above statements are true.

Which of the following statements a), b) or c) is false? You can perform hyperparameter tuning to try to determine the optimal value for k. For classification estimators, the score method returns the prediction accuracy for the test data. Each estimator has a score method that returns an indication of how well the estimator performs for the test data you pass as arguments. All of the above statements are true.

All of the above statements are true.

Which of the following statements about code snippets that use class Account is false? The following expressions access an Account object's name and balance attributes:account1.nameaccount1.balance The following code uses a constructor expression to create an Account object and initialize it with an account holder's name (a string) and balance (a Decimal):account1 = Account('John Green', Decimal('50.00')) The following snippets deposit an amount into an Account and access the new balance:account1.deposit(Decimal('25.53'))account1.balance All of the above statements are true.

All of the above statements are true.

Which of the following statements is false? Once your models are trained, you put them to work making predictions based on data they have not seen. With machine learning, your computer will take on characteristics of intelligence. Although you can specify parameters to customize scikit-learn models and possibly improve their performance, if you use the models' default parameters for simplicity, you'll generally obtain mediocre results. With scikit-learn, you train each model on a subset of your data, then test each model on the rest to see how well your model works.

Although you can specify parameters to customize scikit-learn models and possibly improve their performance, if you use the models' default parameters for simplicity, you'll generally obtain mediocre results.

Which of the following statements is false? Scikit-learn supports many classification algorithms, including the simplest-k-nearest neighbors (k-NN). In the k-nearest neighbors algorithm, the class with the most "votes" wins. The k-nearest neighbors algorithm attempts to predict a test sample's class by looking at the k training samples that are nearest (in distance) to the test sample. Always pick an even value of k for the k-nearest neighbors algorithm.

Always pick an even value of k for the k-nearest neighbors algorithm.

Which of the following statements a), b) or c) is false? All of the above statements are true. An error occurs if any of the features passed to a LinearRegression estimator for training are categorical rather than numeric. If a dataset contains categorical data, you must exclude the categorical features from the training process. A benefit of working with scikit-learn's bundled datasets is that they're already in the correct format for machine learning using scikit-learn's models. By default, a LinearRegression estimator uses all the features in the dataset's data array to perform a multiple linear regression.

An error occurs if any of the features passed to a LinearRegression estimator for training are categorical rather than numeric. If a dataset contains categorical data, you must exclude the categorical features from the training process.

Which of the following statements a), b) or c) is false? You Answered All of the above statements are true. To calculate an estimator's R2 score, use the sklearn.metrics module's r2_score function with the arrays representing the expected and predicted results, as in:In [44]: from sklearn import metricsIn [45]: metrics.r2_score(expected, predicted)Out[45]: 0.6008983115964333 Among the many metrics for regression estimators is the model's coefficient of determination, which is also called the R2 score. Answer R2 scores range from 0.0 to 1.0 with 1.0 being the best. An R2 score of 1.0 indicates that the estimator perfectly predicts the independent variable's value, given the dependent variable(s) value(s). An R2 score of 0.0 indicates the model cannot make predictions with any accuracy, based on the independent variables' values.

Answer R2 scores range from 0.0 to 1.0 with 1.0 being the best. An R2 score of 1.0 indicates that the estimator perfectly predicts the independent variable's value, given the dependent variable(s) value(s). An R2 score of 0.0 indicates the model cannot make predictions with any accuracy, based on the independent variables' values.

Which of the following statements is false? K-means clustering works through the data attempting to divide it into that many clusters. Given that it's tedious and error-prone for humans to have to assign labels to unlabeled data, and given that the vast majority of the world's data is unlabeled, unsupervised machine learning is an important tool. K-means clustering can find similarities in unlabeled data. This can ultimately help with assigning labels to that data so that supervised learning estimators can then process it. As with many machine learning algorithms, k-means clustering is recursive and gradually zeros in on the clusters to match the number you specify.

As with many machine learning algorithms, k-means clustering is recursive and gradually zeros in on the clusters to match the number you specify.

Consider the following code: In [18]: blob Out[18]: TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.") In [19]: blob.sentiment Out[19]: Sentiment(polarity=0.07500000000000007,subjectivity=0.8333333333333333)Which of the following statements is false? The subjectivity is a value from 0.0 (objective) to 1.0 (subjective). Based on the values for this TextBlob, the overall sentiment is close to neutral, and the text is mostly objective. A TextBlob's sentiment property returns a Sentiment object indicating whether the text is positive or negative and whether it's objective or subjective. The polarity indicates sentiment with a value from -1.0 (negative) to 1.0 (positive) with 0.0 being neutral.

Based on the values for this TextBlob, the overall sentiment is close to neutral, and the text is mostly objective.

The following code creates and configures a WordCloud object: from wordcloud import WordCloudwordcloud = WordCloud(colormap='prism', mask=mask_image,background_color='white') Which of the following statements is false? For a mask image, the WordCloud size is the image's size. WordCloud uses Matplotlib under the hood. WordCloud assigns random colors from a color map. You can supply the colormap keyword argument and use one of Matplotlib's named color maps. By default, the word is drawn on a white background. The default WordCloud width and height in pixels is 400×200, unless you specify width and height keyword arguments or a mask image. The mask keyword argument specifies the mask image to use.

By default, the word is drawn on a white background.

Which of the following statements a), b) or c) is false? You can contribute your custom classes to the Python open-source community, but you are not obligated to do so. Organizations often have policies and procedures related to open-sourcing code. All of the above statements are true. Most applications you'll build for your own use will commonly use either no custom classes or just a few. Classes are new function types.

Classes are new function types.

Which of the following statements is false? Each class must provide a descriptive docstring in the line or lines immediately following the class header. To view any class's docstring in IPython, type the class name and a question mark, then press Enter. Every statement in a class's suite is indented. A class definition begins with the keyword class followed by the class's name and a colon (:). This line is called the class header. The Style Guide for Python Code recommends that you begin each word in a multi-word class name with an uppercase letter (e.g., CommissionEmployee).

Each class must provide a descriptive docstring in the line or lines immediately following the class header. To view any class's docstring in IPython, type the class name and a question mark, then press Enter.

Which of the following statements a), b) or c) is false? If data has closely correlated features, some could be eliminated via dimensionality reduction to improve the training performance. It's difficult for humans to think about data with large numbers of dimensions. This is called the curse of dimensionality. Eliminating features with dimensionality reduction, improves the accuracy of the model. All of the above statements are true.

Eliminating features with dimensionality reduction, improves the accuracy of the model.

In the context of the California Housing dataset, which of the following statements is false? The following code creates a LinearRegression estimator and invokes its fit method to train the estimator using X_train (the samples) and y_train (the targets):from sklearn.linear_model import LinearRegressionlinear_regression = LinearRegression()linear_regression.fit(X=X_train, y=y_train) For positive coefficients, the median house value increases as the feature value increases. For negative coefficients, the median house value decreases as the feature value decreases. Multiple linear regression produces separate coefficients for each feature (stored in coeff_) in the dataset and one intercept (stored in intercept_). You can use the coefficient and intercept values with the following equation to make predictions:y = m1x1 + m2x2 + ... mnxn + bwhere• m1, m2, ..., mn are the feature coefficients• b is the intercept• x1, x2, ..., xn are the feature values (that is, the values of the independent variables)• y is the predicted value (that is, the dependent variable)

For positive coefficients, the median house value increases as the feature value increases. For negative coefficients, the median house value decreases as the feature value decreases.

Which of the following statements a), b) or c) is false? In the Iris dataset, the first 50 samples are Iris setosa, the next 50 are Iris versicolor, and the last 50 are Iris virginica. Because the Iris dataset is labeled, we can look at its target array values to get a sense of how well the k-means algorithm clustered the samples for the three Iris species. All of the above statements are true. If the KMeans estimator chose the Iris dataset clusters perfectly, then each group of 50 elements in the estimator's labels_ array should have mostly the same label.

If the KMeans estimator chose the Iris dataset clusters perfectly, then each group of 50 elements in the estimator's labels_ array should have mostly the same label.

Which of the following statements a), b) or c) is false? In Python, as in other major object-oriented programming languages, you can implement polymorphism only via inheritance. All of the above statements are true. With polymorphism, you simply send the same method call to objects possibly of many different types. Each object responds by "doing the right thing" for objects of its type. So the same method call takes on many forms, hence the term "poly-morphism." Polymorphism enables you to conveniently program "in the general" rather than in the specific."

In Python, as in other major object-oriented programming languages, you can implement polymorphism only via inheritance.

Which of the following statements a), b) or c) is false? In the Digits dataset, every sample has 64 features (and a target value), so there is no way to visualize the dataset. All of the above statements are true. Using Matplotlib, Seaborn and other visualization libraries, you can plot datasets with two or three variables using 2D and 3D visualizations, respectively. Unsupervised machine learning and visualization can help you get to know your data by finding patterns and relationships among unlabeled samples.

In the Digits dataset, every sample has 64 features (and a target value), so there is no way to visualize the dataset.

Which of the following statements a), b) or c) is false? Building a new object from even a large class is simple-you typically write one statement. All of the above statements are true. Everything in Python is an object. Just as houses are built from blueprints, classes are built from objects-one of the core technologies of object-oriented programming.

Just as houses are built from blueprints, classes are built from objects-one of the core technologies of object-oriented programming.

Which of the following statements a), b) or c) is false? Each database management system that has Python support typically provides a module that adheres to Python's Database Application Programming Interface (DB-API), which specifies common object and method names for manipulating any database. The open-source SQLite database management system is included with Python. Only the SQLite database management system has Python support. All of the above statements are true.

Only the SQLite database management system has Python support.

Which of the following statements is false? You'll manipulate relational databases via Structured Query Language (SQL). Databases are critical big-data infrastructure for storing and manipulating the massive amounts of data we're creating. Databases are critical for securely and confidentially maintaining that data, especially in the context of ever-stricter privacy laws such as HIPAA (Health Insurance Portability and Accountability Act) in the United States and GDPR (General Data Protection Regulation) for the European Union. Relational databases store unstructured data in tables with a fixed-size number of columns per row.

Relational databases store unstructured data in tables with a fixed-size number of columns per row.

Which of the following statements a), b) or c) is false? All of the above statements are true. Relational databases are geared to the unstructured and semi-structured data in big-data applications. Most data produced today is unstructured data, like the content of Facebook posts and Twitter tweets, or semi-structured data like JSON and XML documents. Twitter processes each tweet's contents into a semi-structured JSON document with lots of metadata.

Relational databases are geared to the unstructured and semi-structured data in big-data applications.

Which of the following statements is false? SQL can be used only to retrieve data from a relational database. The pandas method read_sql uses a Cursor behind the scenes to execute queries and access the rows of the results. The SQL keywords INSERT INTO are followed by the table in which to insert the new row and a comma-separated list of column names in parentheses. The INSERT INTO statement inserts a row into a table.

SQL can be used only to retrieve data from a relational database.

Which of the following statements a), b) or c) is false? All the above statements are true. Seaborn and Matplotlib auto-scale the axes, based on the data's range of values. Seaborn function regplot's x and y keyword arguments are two-dimensional arrays of the same length representing the x-y coordinate pairs to plot. Pandas automatically creates attributes for each column name if the name can be a valid Python identifier.

Seaborn function regplot's x and y keyword arguments are two-dimensional arrays of the same length representing the x-y coordinate pairs to plot.

Which of the following statements a), b) or c) is false? You can use Spark SQL to query data stored in a Spark DataFrame which, unlike pandas DataFrames, may contain data distributed over many computers in a cluster. Spark was developed to perform certain big-data tasks more efficiently by breaking them into pieces that do lots of disk I/O across many computers. Spark streaming processes streaming data in mini-batches. Spark streaming gathers data for a short time interval you specify, then gives you that batch of data to process. As big-data processing needs grow, the information-technology community is continually looking for ways to increase performance.

Spark was developed to perform certain big-data tasks more efficiently by breaking them into pieces that do lots of disk I/O across many computers.

Which of the following statements is false? Table Query Language is used almost universally with relational database systems to manipulate data and perform queries, which request information that satisfies given criteria. Relational database management systems (RDBMSs) store data in tables and define relationships among the tables. A database is an integrated collection of data. Database management systems allow for convenient access and storage of data without concern for the internal representation of databases.

Table Query Language is used almost universally with relational database systems to manipulate data and perform queries, which request information that satisfies given criteria.

Which of the following statements is false? A relational database is a logical table-based representation of data that allows the data to be accessed without consideration of its physical structure. The following diagram shows a sample Employee table that might be used in a personnel system: Tables are composed of columns, each describing a single entity. In Part (b)'s Employee table, each column represents one employee. Columns are composed of rows containing individual attribute values. Part (b)'s Employee table's primary purpose is to store employees' attributes.

Tables are composed of columns, each describing a single entity. In Part (b)'s Employee table, each column represents one employee. Columns are composed of rows containing individual attribute values.

Which of the following statements is false? Sentences, Words and TextBlobs inherit from BaseBlob, so they have many common methods and properties. TextBlob, Sentences and Words cannot be compared with strings. The following code creates a TextBlob containing two sentences:from textblob import TextBlobtext = 'Today is a beautiful day. Tomorrow looks like bad weather.'blob = TextBlob(text) TextBlob is the fundamental class for NLP with the textblob module.

TextBlob, Sentences and Words cannot be compared with strings.

Which of the following statements is false? The UPDATE keyword is followed by the table to update, the keyword SET and a comma-separated list of column_name:value pairs indicating the columns to change and their new values. An UPDATE statement modifies existing values in a table. An UPDATE's change will be applied to every row if you do not specify a WHERE clause. To make a change to only one row, it's best to use the row's unique primary key in the WHERE clause. For statements that modify the database, the Cursor object's rowcount attribute contains an integer value representing the number of rows that were modified. If this value is 0, no changes were made.

The UPDATE keyword is followed by the table to update, the keyword SET and a comma-separated list of column_name:value pairs indicating the columns to change and their new values.

Which of the following statements about the k-means clustering algorithm is false? The algorithm's results are a one-dimensional array of labels indicating the cluster to which each sample belongs, and a two-dimensional array of centroids representing the center of each cluster. Initially, the algorithm chooses k centroids at random from the dataset's samples. Then the remaining samples are placed in the cluster whose centroid is the closest. Each cluster of samples is grouped around a centroid-the cluster's center point. The centroids are iteratively recalculated and the samples re-assigned to clusters until, for all clusters, the distances from a given centroid to the samples in its cluster are maximized.

The centroids are iteratively recalculated and the samples re-assigned to clusters until, for all clusters, the distances from a given centroid to the samples in its cluster are maximized.

Consider the confusion matrix for the Digits dataset's predictions: array([[45, 0, 0, 0, 0, 0, 0, 0, 0, 0],[ 0, 45, 0, 0, 0, 0, 0, 0, 0, 0],[ 0, 0, 54, 0, 0, 0, 0, 0, 0, 0],[ 0, 0, 0, 42, 0, 1, 0, 1, 0, 0],[ 0, 0, 0, 0, 49, 0, 0, 1, 0, 0],[ 0, 0, 0, 0, 0, 38, 0, 0, 0, 0],[ 0, 0, 0, 0, 0, 0, 42, 0, 0, 0],[ 0, 0, 0, 0, 0, 0, 0, 45, 0, 0],[ 0, 1, 1, 2, 0, 0, 0, 0, 39, 1],[ 0, 0, 0, 0, 1, 0, 0, 0, 1, 41]]) Which of the following statements is false? Each row represents one distinct class-that is, one of the digits 0—9. The columns within a row specify how many of the test samples were classified incorrectly into each distinct class 0—9. The nonzero values that are not on the principal diagonal indicate incorrect predictions (that is, misses). The correct predictions are shown on the diagonal from top-left to bottom-right-this is called the principal diagonal.

The columns within a row specify how many of the test samples were classified incorrectly into each distinct class 0—9.

Which of the following statements is false? The following code creates a KFold object:from sklearn.model_selection import KFoldkfold = KFold(n_folds=10, random_state=11, shuffle=True) The keyword argument shuffle=True causes the KFold object to randomize the data by shuffling it before splitting it into folds. This is particularly important if the samples might be ordered or grouped. Scikit-learn provides the KFold class and the cross_val_score function (both in the module sklearn.model_selection) to help you perform the training and testing cycles. The keyword argument random_state=11 seeds the random number generator for reproducibility.

The following code creates a KFold object:from sklearn.model_selection import KFoldkfold = KFold(n_folds=10, random_state=11, shuffle=True)

Which of the following statements is false? When you're calculating word frequencies, you might first want to convert all inflected words to the same form for more accurate word frequencies. Words and WordLists each support converting words to their singular or plural forms. Inflections are different forms of the same words, such as singular and plural (like "person" and "people") and different verb tenses (like "run" and "ran"). The following code pluralizes a bunch of nouns:In [6]: from textblob import TextBlobIn [7]: animals = TextBlob('dog cat fish bird').wordsIn [8]: animals.plural()Out[8]: WordList(['dogs', 'cats', 'fish', 'birds'])

The following code pluralizes a bunch of nouns:In [6]: from textblob import TextBlobIn [7]: animals = TextBlob('dog cat fish bird').wordsIn [8]: animals.plural()Out[8]: WordList(['dogs', 'cats', 'fish', 'birds'])

Which of the following statements a), b) or c) is false? A primary key is a column (or group of columns) with a value that's unique for each row. This guarantees that each row can be identified by its primary key. All of the above statements are true. The rows of a relational database table are always listed in ascending order by primary key. Examples of primary keys are social security numbers, employee ID numbers and part numbers in an inventory system-values in each of these are guaranteed to be unique.

The rows of a relational database table are always listed in ascending order by primary key.

Which of the following statements is false? In this era of big data and massive, economical computer power, you should be able to build some pretty accurate machine learning models. If you're developing a computer vision application to recognize dogs and cats, you'll train your model on lots of dog photos labeled "dog" and cat photos labeled "cat." If your model is effective, when you put it to work processing unlabeled photos it will recognize dogs and cats it has never seen before. The more photos you train with, the greater the chance that your model will accurately predict which new photos are dogs and which are cats. The two main types of machine learning are supervised machine learning, which works with unlabeled data, and unsupervised machine learning, which works with labeled data. The two main types of machine learning are supervised machine learning.

The two main types of machine learning are supervised machine learning, which works with unlabeled data, and unsupervised machine learning, which works with labeled data.

Which of the following statements is false? Over the years, the Python open-source community has crafted an enormous number of valuable classes and packaged them into class libraries, available on the Internet at sites like GitHub, BitBucket, SourceForge and more. This makes it easy for you to reuse existing classes rather than "reinventing the wheel." The vast majority of object-oriented programming you'll do in Python is object-based programming in which you primarily use objects of new custom classes you create. To take maximum advantage of Python you must familiarize yourself with lots of preexisting classes. Widely used open-source library classes are more likely to be thoroughly tested, bug free, performance tuned and portable across a wide range of devices, operating systems and Python versions.

The vast majority of object-oriented programming you'll do in Python is object-based programming in which you primarily use objects of new custom classes you create.

Which of the following statements a), b) or c) is false? To visualize a dataset with many features (that is, many dimensions), you must first reduce the data to two or three dimensions. This requires a supervised machine learning technique called dimensionality reduction. In big data, samples can have hundreds, thousands or even millions of features. All of the above statements are true. When you graph the resulting data after dimensionality reduction, you might see patterns in the data that will help you choose the most appropriate machine learning algorithms to use. For example, if the visualization contains clusters of points, it might indicate that there are distinct classes of information within the dataset.

To visualize a dataset with many features (that is, many dimensions), you must first reduce the data to two or three dimensions. This requires a supervised machine learning technique called dimensionality reduction.

________ time series have one observation per time, such as the average of the January high temperatures in New York City for a particular year; ________ time series have two or more observations per time, such as temperature, humidity and barometric pressure readings in a weather application. Single, mixed Univariate, bivariate Single, multivariate Univariate, multivariate

Univariate, multivariate

Which of the following statements a), b) or c) is false? As with the other estimators, the fit method returns the estimator object. All of the above statements are true. We train the KMeans estimator by calling the object's fit method-this performs the k-means algorithm. When the training completes, the KMeans object contains a labels_ array with values from 0 to n_clusters — 1 (in the Iris dataset example, 0—2), indicating the clusters to which the samples belong, and a cluster_centers_ array in which each row represents a cluster.

When the training completes, the KMeans object contains a labels_ array with values from 0 to n_clusters — 1 (in the Iris dataset example, 0—2), indicating the clusters to which the samples belong, and a cluster_centers_ array in which each row represents a cluster.

Which of the following statements is false? Let's assume we meant to type the word "they" but we misspelled it as "theyr." The spell checking results show two possible corrections with the word 'they' having the highest confidence value: In [1]: from textblob import WordIn [2]: word = Word('theyr')In [3]: %precision 2 Out[3]: '%.2f'In [4]: word.spellcheck()Out[4]: [('they', 0.57), ('their', 0.43)] When using TextBlob's spellcheck method, the word with the highest confidence value will be the correct word for the given context. You can check a Word's spelling with its spellcheck method, which returns a list of tuples containing possible correct spellings and a confidence value for each. For many natural language processing tasks, it's important that the text be free of spelling errors.

When using TextBlob's spellcheck method, the word with the highest confidence value will be the correct word for the given context.

Which of the following statements is false? To plot the centroids in two-dimensions, you must reduce their dimensions. Each centroid in the KMeans object's cluster_centers_ array has the same number of features as the original dataset (four in the case of the Iris dataset). You can think of a centroid as the "median" sample in its cluster. Each centroid should be transformed using the same PCA estimator used to reduce the other samples in that cluster.

You can think of a centroid as the "median" sample in its cluster.

Which of the following statements a), b) or c) is false? The LinearRegression estimator performs multiple linear regression by default using all of a dataset's numeric features. All of the above statements are true. The California Housing dataset (bundled with scikit-learn) has 20,640 samples, each with eight numerical features. You should expect more meaningful results from simple linear regression than from multiple linear regression on the dataset.

You should expect more meaningful results from simple linear regression than from multiple linear regression on the dataset.

Which of the following statements is false? If you have images of dogs and images of cats, you can classify each image as a "dog" or a "cat." This is a binary classification problem. You train a classification model using unlabeled data. When classifying digit images from the Digits dataset bundled with scikit-learn, our goal is to predict which digit an image represents. Since there are 10 possible digits (the classes), this is a multi-classification problem. Classification in supervised machine learning attempts to predict the distinct class to which a sample belongs.

You train a classification model using unlabeled data.

Which of the following statements a), b) or c) is false? In supervised machine learning, each sample has an associated label called a target (like "spam" or "not spam" for classifying e-mails). This is the value you're trying to predict for new data that you present to your models. You train machine-learning models on datasets that consist of rows and columns. Each row represents a data feature. Each column represents a sample of that feature. All of the above statements are true. Supervised machine learning falls into two categories--classification and regression.

You train machine-learning models on datasets that consist of rows and columns. Each row represents a data feature. Each column represents a sample of that feature.

Given a Fahrenheit temperature, we can calculate the corresponding Celsius temperature using the following formula:c = 5 / 9 * (f - 32)In this formula, f (the Fahrenheit temperature) is the ________ variable, and c (the Celsius temperature) is the ________ variable. dependent, independent independent, dependent separated, connected All of the above statements are true.

independent, dependent

________ are sets of consecutive words in a corpus for use in identifying words that frequently appear adjacent to one another. Stems Blobs n-grams Inflections

n-grams

Splitting text into meaningful units, such as words and numbers is called ________. parts-of-speech tagging inflectionization lemmatization tokenization

tokenization

A(n) ________ in a pattern string indicates a single wildcard character at that position. underscore (_) None of the above. hash sign (#) at sign (@)

underscore (_)

In the SQL query:SELECT * FROM authorsthe asterisk (*) is a ________ indicating that the query should get all the columns from the authors table. catchall potpourri character None of the above wildcard

wildcard

Assuming you have a TextBlob named blob containing 'Today is a beautiful day. Tomorrow looks like bad weather.', what property should replace the ? in the following snippet to get the output shown below? In [8]: blob.?Out[8]: WordList(['Today', 'is', 'a', 'beautiful', 'day', 'Tomorrow', 'looks', 'like', 'bad', 'weather']) words None of the above wordlist word

words


Ensembles d'études connexes

communications law midterm (quiz 2)

View Set

CIS - Connect Computer Input Quiz

View Set

French Quiz: La Géographie de la France

View Set

Questions for Adult Health 2 exam #2

View Set

Chap 12 Management of Inf. Syst.

View Set

Psychological Assessment (Reviewer)

View Set