APCSP 5
By using your own data, search engines and other sites try to make your web experience more personalized. However, by doing this, certain information is being hidden from you. Which of the following terms is used to describe the virtual environment a person ends up in when sites choose to show them only certain, customized information?
A filter bubble
Which of the following scenarios involve computers making predictions based on the analysis of patterns in large data sets?
A state education board analyzing demographic information and academic records to identify students at risk of dropping out of school. A video streaming service analyzing viewer's habits to suggest videos based on the viewing history of other viewers.
When NASA (National Aeronautics and Space Administration) first tried to launch a man into orbit, there were a lot of factors that had to be considered when making the calculations (at the time these were done by hand) for the launch. Some factors that needed to be included are the following weather features - temperature, wind speed, humidity and dew point. Today, these calculations are done with computers, but this data still needs to be accessed for the computer to complete these calculations. Suppose that this data is to be inputted manually by the astronauts using information from the site weather.com. How should this information be presented to the astronauts so that they can enter it into the computers easily and correctly?
A structured table of only the variable and value pairs needed to be entered into the computers from weather.com's forecast for the city from which they are launching.
A social media website has a feature which provides information on the number of posts by users which mention any given term. This post data can be filtered by date & time, geographic region, number of interactions (e.g. replies, shares) and whether images or videos were attached to the post. Which of the following questions is LEAST likely to be answerable using this feature?
About which topic did a given user read the most posts?
Both small and big businesses can benefit from using big data in their organization. Which of the following are ways businesses could use big data to their advantage?
All of these
Web browsers such as Google Chrome and Mozilla Firefox allow users to browse the web in an anonymous session. When the session is ended, any browsing history and cookies created from the session are deleted. Which of the following statements best describes the security situation when browsing using anonymous mode.
Although local browsing data will not be stored, websites can see your IP and track your activity if you log in to accounts on these sites.
The following statements describe hypothetical TedX presenters and their plans to share data during their presentation. Which of the presenters have appropriate plans for sharing data in this format?
An epidemiologist (someone who studies infectious diseases) who would like to share her knowledge about the spread of Ebola using a map that lights up as new cases of the disease appear in different parts of the world over time. A pollster (someone who collects information about people's preferences) who plans to show his audience a dynamic word cloud that shows how politics related search terms appeared in the top ten twitter topics week to week during the year before the election to show the rise and fall of the popularity of two candidates.
Which of the following would be the result if a user were to query this database for any "City" with a population between 100,000 and 1,000,000?
Anaheim, Austin, Charlotte, Tempe
There was a large study conducted on a random sample of 500 students from the United Kingdom, South Africa and Australia. The graph displays a comparison of each student's height and age. Four data points are represented by stars on the graph. Upon further inspection, it was discovered that the 4 students associated with the stars on the graph had rare medical conditions.
Anomaly detection
The statement "If a customer buys a dozen eggs, he or she is 80% likely to also purchase milk." is a conclusion that may be determined from what type of data mining?
Association Rule Mining
After reviewing the service records at a car dealership, the CEO (chief executive officer) discovered that customers who scheduled a service for a transmission fluid exchange and differential fluid exchange also typically scheduled a service for an oil change. This conclusion is obtained using what data analysis technique?
Association rule mining
Students are using data collected from a non-profit organization to try to convince the school board that their school should be in session year-round with several week-long breaks as opposed to the usual 9 months on and 3 months off. Information that was collected by this organization was as follows.
Association rules showing links between motivation and happiness levels and the type of schooling students were receiving. A regression analysis of standardized tests scores comparing the two different types of schooling.
A popular site allows users to stream different television shows and movies over the internet in a similar way to Netflix or Hulu. There is a vast assortment of shows and movies in this site's database that users can choose from. Users may locate a show or movie of choice by browsing categories such as the genre of a movie or the television station that broadcasts a show. Each month, this site adds new movies and tv shows to their database and assigns them to the existing categories. Which data analysis technique is applied to movies as they are added to the site?
Classification
The passing grade of the last exam is known to be 65. Which of the following correctly describes a valid data analysis method and information that could be determined from this data through that analysis?
Classification analysis could be used to determine the number of students who passed the exam and studied for less than 3 hours.
A snack company is starting an advertising campaign for its new line of tortilla chips. Rather than target specific demographic groups in its commercials, the company has decided to perform market research to determine common characteristics of patrons who prefer their chips. This is an example of what type of data mining strategy?
Cluster analysis
A teacher noticed that they have had a lot of students in their classes that didn't perform very well on homework and classwork assignments, but still seemed to perform similarly to other students on the final exam. In order to check their claim, a statistics teacher helped them to create the following graph from a data set that compares a student's final exam grade (FE_Grade) to their marking period average (Average) in the class. The teacher's observations and the graph are both an example of what data analysis technique?
Cluster analysis
Which of the following tasks best shows an example where the searching and sorting techniques of big data may be involved?
Creating a seating chart for a classroom based on an alphabetized list of student Names Keeping track of all employees' email use to see how many personal or work-related emails are sent during work time to check for productivity
Which of the following are examples of how people give up some of their privacy in order to gain something in return (utility)?
Customers signing up for "rewards" programs for different grocery stores so that they can get discounts on different items throughout the store. People enabling GPS on their phone so that Apps can locate nearby stores, restaurants and hotels.
Which Big Data analysis technique involves the examination of previously collected data sets in an attempt to discover patterns and other knowledge hidden within the data?
Data Mining
Which of the following is true when it comes to usefulness and usability of a data set?
Data can be useful but not usable Data can be usable but not useful
In Fantasy Football, participants compete against one another by choosing certain players from different NFL teams that they think will do the best on any particular week. Top Fantasy Football players spend hours every day looking at huge databases of statistics related to the players and the teams often using spreadsheets and software tools to gain new insights and choose the best players. This process could be considered an example of which of the following?
Data mining
A student in a history class is creating an infographic about the civil war. He includes information on where battles took place, how long the battles lasted (on average), how many soldiers were involved from both sides, etc. What type of statistical analysis is this student using?
Descriptive Analytics
Which of the following are examples of unstructured data?
Digital Image scans of store receipts Closed-circuit security footage of a bank lobby
The next generation of high school students will most likely have a digital version of their yearbook. Some students may support the movement because there would be no...
Generation loss
Hannah has become concerned with her Facebook habits. She believes the amount of time she is spending on Facebook is detrimental to the time she spends on homework and studying. In addition she is concerned about other people, including potential employers, gaining access to personal and potentially embarrassing information from old Facebook posts. She has decided to delete her account and wants to know if all her Facebook posts and information will be permanently gone once she does this. Which of the following best describes what will happen to the information on Hannah's account if she chooses to delete her account?
Hannah's previous statuses, photos and profile information will be hidden from view, and will be removed from Facebook's servers within 30 days. She will be not able to restore this information later.
A hospital that has its own pharmacy keeps track of the following information. Date prescription is filled Patient name Room number Medication prescribed Cost of medication At the end of the week, all of the data is summarized into a database that is accessible by financial analysts of the hospital that can be sorted by any column in ascending or descending order. Below is a portion of this database.
How many patients were in the hospital on a given day?
Which of the following questions could not be answered based solely on the information in this data set?
How many registered republicans voted for Gary Johnson in the 2016 presidential election?
Many universities have multiple campuses which students can attend. For example, The Pennsylvania State University has 24 total campuses. Although there are different campuses, some staff and employees have access to student records from all campuses in a large database. Which of the following is NOT a relevant factor which should be considered by a University in the development and creation of a database of this type?
How to ensure that there is a complete copy of the database stored at every campus
Which of the following can be used to extract structured information from unstructured data? Creating frameworks for information Identifying patterns in data Regression analysis
I and II only
A messaging company keeps track of the identities of the sender and receiver of every message sent, as well as the content of the message being sent. Which of the following could fall under the category of metadata?I - How many words were in a message?II - Was there was an emoji used in a message?III - What time was a message received?
I, II and III
A recent computer science graduate is looking to design a computer software tool that helps with big data analysis. Which of the following features would be useful to include in their program?I - A sort tool that can organize the data in numerical or alphabetical orderII - A search tool that helps the user to quickly locate specific information from the dataIII - A graphing tool that will create bar graphs and scatter plots
I, II and III
Which of the following analysis techniques could aid understanding of the relationships present in this chart?I - The data may be classified using high and low GDP per capita classifications and high and low car-ownership classifications.II - A regression analysis may be used to find a mathematical function which models the expected car-ownership based on the GDP per capita.III - A cluster analysis could find groups of countries which shared similar GDP per capita and car-ownership characteristics.
I, II and III
A large data set contains information about all students in a state enrolled in public high schools. The data set contains the following information about each student: The grade of the student (e.g. 12th Grade) The name of the high school in which they are enrolled The gender of the student The ethnic background of the student Which of the following questions could be answered by analyzing only the information from this data set?
In how many public high schools are more than 20% of students from a minority ethnic background?
Which of the following is NOT a benefit of making digital information and scientific databases openly available across the internet?
Inaccurate and misleading data can be more easily disseminated to scientific researchers.
Targeted advertising, in which advertisements are shown to individuals based on past purchases, web searches or other demographic data, has advantages and disadvantages. Which of the following are NOT benefits of targeted advertising?
Individual preferences are able to remain completely private. On social media websites people are not exposed to political viewpoints that they do not agree with.
Which of the following is not an example of a source of information that contributes to the accumulation of big data?
Law enforcement officials request the driver's license history for a suspect they recently apprehended.
If you wanted to know more about the relationship between the data in these two charts, which of the following would be most likely to help you to better understand the reality represented by these maps:
Look at regions in the United States to determine if there are clusters in percentage of wireless-only households, and see if these clusters match clusters of states defined by population density. Find the numerical data behind the charts to see if you can find out if there is any relationship between population density and household telecommunication type.
Two parents are trying to figure out how tall their child might be by using a formula that was created based on studying large numbers of parents and children. This formula takes into account the heights of the parents along with other key factors. This is an example of what kind of analytics?
Predictive
AMSCO networks plans to conduct a poll of viewers during the SuperBowl. They will conduct analysis to determine which area of the country they should target to maximize their retail broadband sales vs their business wifi sales. Which classification best describes this type of analysis?
Prescriptive Analytics
A computer science student often uses public forum sites like CodeGuru, DreamInCode and StackOverflow in order to help with learning a new programming language. Which of the following statements about these kinds of learning forums is true?
Public data found on these forums provides widespread access to identified problems and their solutions.
A popular restaurant collects data on the food their patrons are ordering. They hope that this will allow them to be better informed about what items they need to order in preparation for the next week. What would be the best way for the restaurant to collect this data with that end goal in mind?
Record the major meat, vegetable and fruit components of the meals each customer ordered (steak - corn - apples, fish - peas - peaches, etc).
A number of parents have volunteered their children to participate in a developmental study administered by a local child psychologist. The following chart summarizes the results of the psychologist's assessments.
Regression
Which of the following terms describes the conversion of data, formatted for human use, to a format that can be more easily used by automated computer processes?
Screen Scraping
Certain programs are designed to analyze electronic scans of documents such as receipts, business cards and recipes, and allow information from these paper copies to be represented electronically. In order for such a program to understand the difference between, for example, a zip code and a phone number or between the price of an item and the total bill for a shopping trip, certain rules must be put into place. If a program was trying to identify which part of a receipt is the date of purchase, which of the follow probably would NOT be included as part of these rules?
Search for the numbers 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12 anywhere on the receipt.
For situations that may be too dangerous, costly, or otherwise too difficult to test in the real world, what do computer scientists create in order to help discover new knowledge and create new hypotheses related to the situation they are studying?
Simulations
A local high school recently won the girls volleyball championships and has been rewarded with $5,000 to purchase merchandise for the team. The coach is trying to surprise the players with the merchandise, so instead of asking them what sizes they want, he will attempt to figure out how many items of each size need to be ordered based on the information from the volleyball program. The following information is stored in the program for each player:
Sort the data by height and weight and order smaller sizes for the girls that are shorter and weigh less and order larger sizes for the girls that are taller and weigh more.
Based on the information in the table, which of the following tasks is likely to take the longest amount of time when scaled up for a very large company of approximately 100,000 customers?
Sorting data
The World Wide Web is full of unstructured data. Search engines like Google, Bing and Yahoo have been doing a good job of allowing users to search by key term in order to quickly locate links to websites about that particular topic. In order to do this, these search engines use what tool in order to help index and find these results?
Spiderbots
Different kinds of data analytics have differing levels of utility and confidence. In general, as the utility level increases, what happens to the confidence level?
The confidence level decreases.
Which of the following is not something companies can learn about you by collecting digital information from you or from the digital information associated with your computer or mobile device use?
The content of a private conversation held in person.
Many universities have multiple campuses in which students can attend. For example, The Pennsylvania State University has 24 total campuses. Although there are different campuses, some staff and employees have access to student records from all campuses on a large database. Which of the following is probably NOT true about the development, creation and structure of this database of student records?
The database was developed by one person, so that he did not have conflicting opinions with others on his/her team
Which of the following are examples of structured data?
The glossary and index in the back of a textbook An address book filled with family members names and addresses
Which of the following statements accurately describes the graph and the data mining technique it represents?
The graph depicts six clusters using the technique of cluster analysis to show the similarities between certain structures in the data.
Which of the following is applicable to information published both on social media and on online news networks?
The information can be quickly distributed by people other than the original source to reach a wider audience.
Which of the following CANNOT be determined using only the information in the database?
The number of purchases made by an individual customer
A drug company is developing a new drug, and has already achieved some success with laboratory experiments on tissue samples. Before conducting trials of the drugs in living organisms the company will use these results, and medical knowledge about the body, to develop a simulation of how the drugs might affect the entirety of a person's body when administered. Which of the following is NOT a valid reason why the drug company might wish to do this?
The simulation will completely eliminate the need to conduct tests in living organisms before the drug is released to market, saving the company money.
A highway has just been enlarged to consist of two lanes in each direction. The government body responsible for the highway is considering two different sets of rules for drivers on the newly upgraded highway. One set of rules will allow drivers to overtake using any of the lanes, while the other will allow drivers to overtake only using the left-hand lane. To help understand the effect these two sets of rules will have on traffic flows and congestion, the government contracts a company to build a computer simulation. Which of the following statements about this simulation is true?
The simulation will likely require some simplifications and assumptions to be made about the behavior of drivers on the road.
Temporal scan thermometers are popular tools used to take a baby's temperature. They use sensors which are slowly moved across the forehead of a baby. A display provides a temperature reading in either degrees Celsius or Fahrenheit. Which of the following is a statement about the usefulness of these thermometers rather than their usability?
The thermometers (when used properly) are accurate to within 0.2 degrees.
A web browser uses locally cached data to speed up load-times for recently visited websites by a user. Which of the following is a likely negative consequence of this feature?
The usable storage space of the device on which the browser is running will decrease
Which of the following statements about the infographic is least likely to be true?
This infographic can tell you which news story was most important to you in 2016.
Photographs stored as image files frequently contain "metadata" such as the date the photo was taken, what equipment and settings were used for its creation, and the GPS coordinates of the location at which the photo was taken. This is in addition to the "data" contained which can be thought of as the pixels of the image itself. For which of the following purposes would it be more useful to computationally analyze image data rather than the metadata of images?
To determine the relative popularity of photos of landscapes, people and objects.
A certain social media web site allows users to post messages and to comment on other messages that have been posted. When a user posts a message, the message itself is considered data. In addition to the data, the site stores the following metadata.
To determine the topics that many users are posting about
Which of the following is NOT an example of a problem with the collection and analysis of a large data set which may have unintended negative consequences?
Unemployment data collected by the Bureau of Labor Statistics helps policy makers make recommendations about new business development and job training programs.
Which of the following is a risk of obtaining information through the use of crowdsourcing?
Unless independently verified, the results of crowdsourcing may be inaccurate.
Which of the following statements about web crawlers (also known as spiderbots) is true?
Web crawlers visiting a new site can receive information from the site about which pages to visit and index.
Which of the following best describes what happens when we take unstructured data and organize it into structured data?
When unstructured data is organized into structured data there is some loss of data, but the data is in a much more usable format.
There are many computer applications that have been designed to help people search through large data sets to find patterns. However, not all questions require a search for a hidden pattern. Seeking the answer to which of the following questions is least likely to require an investment in software:
Which contestants took the top three prizes in a talent show at a neighborhood block party?
Using only the database, which of the following CANNOT be determined?
Which models of computer stocked by the retailer have not had a single sale
Google has access to a lot of data. One way to make use of the data collected by Google is to examine the relative popularity of search terms using the Google Trends feature. This tool allows users to identify trends across geographies and time, and within categories like Real Estate, Sports, Shopping, Pets & Animals, Books & Literature and Arts & Entertainment. Google Trends would be most helpful for determining which of the following?
Which week is the best week to send advertisements to parents who want Fidget Spinners or other popular toys for their children around the holidays.
An infographic displays the relative frequencies of the 100 most common emojis used in text messaging for each of the last 12 months. Which of the following conclusions cannot be drawn from such a representation of emoji usage?
You can determine the average age of emoji users based on emoji use.
Association rule mining refers to the discovery of relationships between variables in large databases, such as in a store's transactions. The following association equation often describes how the appearance of one set of antecedent items implies that a consequent set of items will appear: {X, Y} → {Z}.
{Bread, Milk} → Flour