BI 2
Which BEST describes the term metadata?
Data about other data. It is often compared to the information that previous generations saw in the card catalog in the library - not the book itself, but information about the book.
Many of us have taken selfies and occasionally applied a filter to the picture. As you may be aware, a filter is either a way to change the color and effect of your picture or away to add extra elements on top of your picture. Data streams can also have filters applied to them. These filters might restrict the data to only show certain attributes, like age or gender. Will the order of these filters change the result?
No, the result is the same regardless of the order of filters applied
Data transformation can be a useful tool in communicating the meaning of information to others. However, sometimes once the data transformation is completed, it is impossible to revert it back to the raw data. Which of the following would be the most problematic if later the transformed data needed to be converted back to its original form? Assume that there are no backups or version management that captured the state of the original data.
An international bank manages money from clients across the globe and stores the amounts in the customers' desired currency. The bank decides to be consistent and convert every customer's currency to Us dollars, for easier customer management. The original currency is replaced with "USD" and the converted dollar value replaces the original value.
The idea of metadata is not a new term. Which of these scenarios best describes an early use of metadata?
Books having publisher and copyright information in cover.
Webpages store metadata that helps search engines locate them. Which of the following is NOT an example of meta data stored within the site?
Contact information
The Web Hacking Incident Database is a maintained list of web applications related security incidents. The data collected is available online and is currently displayed with these two graphs: How could these two data sets be transformed to find new insights into hacking?
Create a separate pie chart of outcomes for each method of hacking. For each method, make a list of the outcomes of that method from most frequent to least frequent.
Those that post videos on YouTube are offered a host of different statistics to help them understand their viewers. This data can be looked at as values or graphs to understand the quantity of users or the categorical information like the location or device type. Which of the following is the best reason a creator would want to know this data?
Device usage may affect the design elements of the video.
Following the 2016 Presidential election, Andy decided to see if there were any patterns that could be observed from the voting data. Below are his two visualizations. The first visualization is a heat map of Florida where red represents a county that voted for the Republican candidate and blue represents a county that voted for the Democratic candidate. The second visualization is a map that shows the state universities in the state of Florida. Which of the following correlations could be drawn from the two visualizations?
Most of the counties that contain a state university voted for the Democratic candidate.
A school digitized its record keeping software and gained the ability to run reports on different topics in the school. The reports could pull from attendance, discipline, and grades. A vice principal is looking to show the community that success is tied to students attending school. Which of the following pairs of attributes should be put together to prove this point?
Number of absences vs. GPA, for each student.
A data set contains the results of a survey where participants were asked to rate movies on a scale from 0 to 5. Their ages, along with their average rating score, were recorded into a chart. In order to share this information in a clear, visual way, the data needs to be processed into a graph/diagram. Which of the following graphs would be the MOST effective way to represent this data?
Scatter Plot
Raw data can quickly become overwhelming. This is particularly important when dealing with big-data collected on the internet and stored cheaply in cloud servers. Which of the following might be a role of a data analyst in transforming the data into a more friendly format for both humans and machines?
Structuring the data into a consistent style.
Richard sent a letter to Juanita. It was an old-fashioned, snail-mail letter sent via the US Postal Service. Which of the following is best described as the metadata about this letter?
The envelope, which has both Richard's and Juanita's addresses, a stamp, and the postmark indicating from where the letter was sent.
Information was collected in a hypothetical study to examine the effectiveness of bullet-proof vests worn by police officers. The data was collected on policemen who were fired upon while on duty. It was summarized and shown as follows: However, the following summary shows the same data in a different way. It shows the data summarized by the policeman's duty (normal patrol or special operations) and whether they were wearing a vest at the time they were being fired upon. When evaluating both of these summarizations, which of the following is the most likely conclusion about this data?
The reason that bullet-proof vests appear to be ineffective is that they are worn more frequently during high-risk special operations than during normal patrols.
In 2008, Google decided to compare search term trends with the CDC's recorded flu trends. They went to work trying to determine a pattern with previous years' data and use this data to predict future flu outbreaks. Which of the following is the MOST accurate about using search trends, like Google Flu Trends, as predictors of future events?
They are unreliable predictors of future events that might not accurately represent society as a whole.
Mary, a high school administrator, is analyzing a large amount of data about the students at her school to determine the most efficient bus routes. The data includes information about each household: the names of students in the household, the address, which schools the students attend, and whether or not they take the bus. Before Mary analyzes her data, she uses a computer program to clean it. What are the benefits of cleaning the data before she analyzes it?
This eliminates incorrect or irrelevant data that might skew her analysis.
Mary has cleaned her data and is ready to determine the most efficient bus route. She starts by splitting the city into four regions on a map, then she uses a computer program to sort her data into four different groups according to the regions. This way she can examine how many students use the bus in each region of the city. Mary starts prioritizing bus routes based on regions, but one of her fellow administrators suggests that she use her computer program to sort her data a different way first. Why might this be useful?
Transforming the data might help Mary notice a different pattern that makes a bigger impact on bus routes than regions of the city.
Most stereos and mixing boards provide a graph that shows where the sound levels are for a piece of music being played. ADJ can lookat this read out in real time through their equipment. This histogram of various noise levels allows them to adjust settings to get the sound they are looking for at an event. Which of the following is the BEST reason the engineer would have designed the set up in this way?
Visuals can communicate information about data.
The Bureau of Labor Statistics surveys households occasionally to see what they spend their money on. They collect data on the expenditures of approximately 90,000 households and ultimately come up with the average expenditures on a variety of products. The spreadsheet below summarizes the data from the Consumer Expenditure Surveys for 2011 and 2012. Note: Totals may not add to 100% due to rounding. Which of the following representations of the data would likely be most helpful in allowing one to visualize and compare the proportion of expenditures in each category?
pie chart
Which of the following is NOT a reason to represent a large data set in a visualization?
A visualization will always represent all of the data without obscuring the meaning of the data set.
Wireless companies collect metadata for each phone call made on their network, including the number of the caller, the number called, the location of the cell tower used for your call, and the duration of each call. Which of the following can NOT be discovered from the metadata collected by cell phone companies?
A. What was said in your phone conversation.
The Bureau of Economic Analysis collects data on the size of the US economy. The most widely followed measure of the size of the economy is the Gross Domestic Product, which attempts to measure the value of all goods and services produced in the US in a given year. The spreadsheet below summarizes the data for 2004- 2014. Note: Trillions of dollars, adjusted for inflation, 2009 dollars. Which of the following representations of the data would likely be most helpful in allowing one to visualize the change in the US economy over this time period: Select TWO answers.
Bar graph and scatter plot
Several companies are taking advantage of computers to be able to adjust pricing based off demand. Companies like Uber and Lyft started this trend, but other companies have followed this model and we can anticipate seeing this used for many entertainment and ticket companies in the near future. Prices may fluctuate based on demand, quantity already sold, popularity and quantity of tickets purchased. This dynamic pricing is an example of which of the following?
Data driven solutions
When naming a newborn, there is no shortage of books and websites offering lists of names to use for the little one. It is also very popular for analysts to track what are the most popular names for children in a given year. Some sites offer you the ability to see the rank of a given name for a person of a given age. Which of the following might a computer NOT be able to help in collecting this information?
Determining the reason a name gained popularity.
A large data set contains information about all students enrolled in high schools in the state of Georgia. The data set contains the following information about each student. The student's age • The student's gender • The student's grade level • The student's attendance • The name of the student's high school • The student's GPA Which of the following questions could NOT be answered by analyzing only the information above?
Do students who go to a high school in Georgia have higher grade point averages than those who go to a high school in Florida?
Google Trends shows the frequency of specified search terms entered by the user. In the example below, the search terms "apple", "orange", "banana", and "peach" were entered and the date range "2004 to 2015" was identified.
From 2012 to 2015, there is more variation in the number of searches for "apple" than the number of searches for "orange", "banana" or "peach".
Interactive visualizations allow users to explore the data set visually. There are pros and cons to using interactive visualizations. Using the graph above, which of the following statements about interactive visualizations is true? Select TWO answers.
Interactive visualizations allow users to sift through a large set of data without having to oversimplify or omit portions of the data. Interactive visualizations allow users to see patterns in data more easily by allowing the user to adjust the filters themselves.
The following data is collected from a survey of high school seniors. The seniors were asked the following questions: • "What state do you currently live in?" • "How many AP level classes are you currently taking?" • What is your anticipated college major?" "What is one after school activity you do?" Below is a summary table of the collected data. In order to analyze the data and create a visualization, the data must be cleaned. Which of the following would NOT be a modification that should be made to the data?
Round up all the non-integer values for "Number of AP level classes."
When photography went digital manufactures discovered that they could digitally store information about a photo within the digital file. DSLR Cameras were able to store information about the settings used to create the photo and the location of the event all within the photo file stored within the camera. These attributes would stay with the image as it moved from the camera to a computer and potentially out onto the internet as well. Courts and police are often called in to settle disputes over the ownership of property. How might the attributes of a photo be used by these groups to settle disputes?
The owner to confirm location and potentially use phone GPS to support location.
Companies spend a lot of time and effort collecting information into large data sets which they interrogate to search for patterns in the data. Which of the following are examples where searching for patterns in the data could help answer a question or verify a hypothesis? Select TWO answers.
A high school principal analyzes the current grades across subjects of all junior students to try to determine which students will likely enroll at the local college. An online search engine analyzes the buying habits of its users to determine what types of products the user may be interested in purchasing in the future.
Bar charts are one of the most flexible ways of visualizing information. Which of the following would be best visualized with a bar chart?
A month by month comparison of a company's profit (amount earned minus amount spent) during last year.
In order to search through very large data sets, computer programs are created to analyze the data. These programs search for patterns in the data. Which of the following is NOT a scenario in which you would need this type of computer program to search for patterns involving multiple variables?
A video hosting site needs to analyze the views for all videos and create a list of the top ten most viewed videos.
The following data is collected from a survey of high school seniors. The seniors were asked the following questions: • "What state do you currently live in?" • "How many AP level classes are you currently taking?" • "What is your anticipated college major?" • "What is one after school activity you do?" Belowis a summary table of the collected data. In order to see trends in the data, a visualization is created. Which visualization would be considered the best visual representation of the data?
D
Many states have adopted programs to allow people to purchase technology that they can attach to their car to be able to pay tolls as they drive through the check lanes. These technologies allow the user to move faster through the exchange and be able to pay the tolls with check, debit, or credit cards and not have to root out loose change in the vehicle. These systems also allow the agency to keep track of the count of cars through the interchange and the times of day that this occurs. This data allows them to amass a large data-set of vehicle movements. Which of the following might NOT be a use for this data?
Determine staff schedules for toll booth.
The chart below from Google Trends shows the prevalence of some universities searched for in the United States between August 2004 and the present. Which of the following statement can best be supported by the data in this graph?
Generally speaking, since 2004, more people have searched "Harvard University" than have searched "University of Texas at Austin", "University of Florida", "Texas A&M University", or "University of Miami".
Jana developed a survey and she asked the following questions to the middle school students at her school: • Who is your favorite teacher? • What was your current grade in your favorite teacher's class? • What was your current grade in your LEAST favorite teacher's class? Jana's goal is to determine the favorite teacher in the school and also to see if there is an association between the current grades for each student's favorite and least favorite teachers. To accomplish her objective, she transferred all of the raw data into a spreadsheet. In this example assume all teachers at the middle school have unique last names and the grading scale is a ten-point scale (A = 90-100, B = 80-89, etc.). A small sample of the data is shown below.
Replace all grades with the midpoint for that grade range (i.e. grades from 90-100 would become 95, grades from 80-89 would become 85, etc.).
The Bureau of Economic Analysis collects data on the size of the US economy. The most widely followed measure of the size of the economy is the Gross Domestic Product, which attempts to measure the value of all goods and services produced in the US in a given year. The spreadsheet below summarizes the data for 2004- 2014. Note: Billions of dollars, adjusted for inflation, 2009 dollars. Which of the following is BEST supported by the data in the spreadsheet?
The GDP is trending upwards.
Some web-pages provide 'widgets' that allow you to interact with the web-page beyond simply viewing content. As public employees, teachers' salaries are public knowledge and can be looked up by any member of the community. A newspaper collected all of this information on teachers in a given state and created a widget on their site to allow people to choose a county, then choose a school and look up the average teacher salary at that school through a widget. Why would the site choose to make people select a county and then a school instead of providing direct access to the table?
The data is more easily viewable in the widget.
Scatter plots area good way of visualizing certain types of information. Of the following scenarios, which would be best visualized with a scatter plot?
A professional sports league wants to see if there is a correlation between the number of minutes a player plays and the amount of money the player makes.
A high school health teacher was frustrated that a few of her 8th period students always seemed to be sleepy. She decided to have each student enter the following information into an online spreadsheet at the beginning of class each day for two weeks: • hours of sleep the student had last night • approximate number of carbohydrates consumed at breakfast • approximate number of carbohydrates consumed at lunch • number of quizzes/tests taken that day • energy level (0-10) at start of 8th period. (0 = no energy, 10 = extreme energy) After two weeks of collecting this data, the teacher now needed to manipulate/view it in suchaway that it would help explain the lack of energy of some of her students. Which of the following would be LEAST beneficial in helping the teacher understand why some of her students have a lack of energy?
Filter the information and display the rows which match all three of these criteria: an energy level below 4, the number of tests/quizzes taken less than 2, and the hours of sleep less than 5.
Occasionally a teacher is faced with a situation where a student is handing in an assignment that is suspected of not being genuine. If a teacher suspects that a student has submitted an assignment that they did not generate, they have the option of using metadata to verify the origin of the file. Which of the following could be used as metadata to determine the origin of a file? I. Author of the file II. IP address where file was made III. Date created
I and III only
A powerful hurricane hit the coast of Alabama. In order to better analyze the situation, tweets sent by people in the state of Alabama during the brunt of the storm were accumulated and stored in a large database. The next day, meteorologists used this data to analyze the strength of the hurricane. After analysis of the tweets, the meteorologists concluded that the storm was fairly weak. However, this did not coincide with the devastating aftermath of the storm captured by their own cameramen. Which of the following would be the MOST likely explanations for the discrepancy between the tweets and the video footage? Select TWO answers.
The hurricane disrupted cell service for those people affected most during the brunt of the storm. The people with the most damage were preoccupied during the brunt of the storm, so tweeting was not a high priority for them at that time.