HW 1 Data Fundamentals: BUSN 5000

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Based on object.size the Card data take up _____ MB in memory. (Round to 3 digits)

0.438

Pro Tips

1. Make sure the first letter in the answer is NOT capitalized. It will say your answer is wrong even if you're right. 2. for questions with multiple answers make sure you put a comma between the answers plus a space after the comma.

A megabyte can store _____ num values, while a terabye can store roughly a _____ times that.

131072, 1000000

The source of Card's data is a survey that began in _____ with _____ young men age 14-24.

1966, 5525

The same young men were surveyed again in selected years through _____ , effectively creating a _____ data set where the unit of observation is the person- _____ .

1981, panel, year

What percentage of the sample are Black? _____ %. Is that representative of the US population in 1976? (Yes/No) _____ .

23, No

Card's analysis is based on the 1976 survey when the youngest respondents are _____. By 1976, attrition had reduced the sample size to _____ observations. After filtering the sample on observations with valid education and wage data, Card is left with an analysis sample of _____ young men.

24, 3694, 3010

The card data set contains _____ observations and _____ variables.

3010, 34

What percentage of young men in the sample are missing IQ test scores? _____ % . (Answer to 1 decimal place, for example: ''99.9'' percent)

31.5

The third person in the data set is _____ years old, has _____ years of education, has _____ years of experience, and reported a wage of $ ______ .

34, 12, 16, 7.21

How many variables have missing data? _____.

6

In the third stage, the workhorse will be the _____.

CEF

Card obtained the data from the _____.

NLSYM

PART B: BELOW THIS CARD

PART B: BELOW THIS CARD

We distinguish 4 stages of data analysis. Name them.

acquisition, transformation, analysis, communication

The variable exper measures labor-market experience as ______.

age - educ - 6

Another reason reproducibility matters is to guard against confirmation _____.

bias

The survey was not a random sample of the US population because men from neighborhoods with a high concentration of _____ residents were over-sampled.

black

The wage variable is measured in _____. The lwage variable is the _____ transformation of wage.

cents, log

We say that data are tidy if each variable corresponds to a _____, each row an _____, and each cell a _____.

column, observation, single value

This representation of the data structure identifies the _____ to which each observation pertains.

entity

Each record is made of _____ that contain measurements of known types.

fields

One reason reproducibility matters is to protect and support your _____ self.

future

The key variable in the data set is _____.

id

This representation of the data also makes clear what are the _____ that identify an observation.

key variables

The skim() function provided by the skimr package is another useful tool for data documentation Load skimr via a library()command and then "skim" the card data. Answer a few more questions based on the skim() output.

library(skimr) skim(card)

First, use the library() and data() functions to load the wooldridge package and card data set.

library(wooldridge) data(card)

R stores real numbers as a _____ data type and allocates _____ bytes of data to each number.

num, 8

What data type is lwage? _____. How about wage? ______. (Use the full-name description of the data type in your answers.)

numeric, integer

Finally, use the object.size function to estimate the amount of memory allocated to store the Card data.

object.size(card)

A quick-serve restaurant chain records sales, staffing and customer traffic every day for each store. You recognize this as a _____ data set where the unit of observation is the store-day.

panel

The term data is (singlular/plural) _____.

plural

You should view a reproducible analysis as a _____ that you should be able to produce again and again.

product

One important component of reproducibility is describing the exact _____ of your raw input data.

provenance

A data table is made up of rows containing _____ and columns containing _____.

records, fields

The first requirement for a data analysis to be reproducible is if the _____ of the original analysis can be generated using the same _____.

results, inputs

A data _____ is a representation of the data structure comprising all of the attributes of the data and their types.

schema

It is advisable to _____ the acquisition, transformation and analysis tasks.

separate

The str() function, which provides an overview of the data type, size, and content in a data set. Apply it to determine the structure of the card data set and answer the questions that follow.

str(card)

Data that is organized in a table of records and fields is _____.

structured

The second stage involves, among other things, making sure the data are _____ (as the Posit folks would say).

tidy

A variable will not have _____ if it does not measure what it is supposed to.

validity

A national company has developed a new product and is offering it for sale at a discount to introduce it to the market. Randomly surveying customer who purchased the product in the initial discount period (would/would not) ______ generate a sample representing the population of typical customers.

would not


Set pelajaran terkait

Legal Concepts of Life Insurance

View Set

Chapter 1 Problem Set, Ten Principles of Economics Homework

View Set

Chapter 12- Electrostatic Phenomena

View Set