PHG 301 Homework

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is the corresponding -log(p) that you should require a statistical significant association to meet or exceed on the Manhattan plot, to chance a 0.05 chance a false positive somewhere overall in a study with a million SNPs being genotyped? Give your number as a decimal.

-log(0.00000005) = 7.3

Assuming that your result from the previous question reaches statistical significance, what does that number from the previous question mean about the relationship between exposure and disease?

0.42, the exposure is protective for this disease

EXPLAIN two possible reasons for the gap between the total heritability attributed to the variants found by GWAS and the total heritability estimated from twin studies.

1. One reason for the gap between total heritability of variants found by GWAS and that which is estimated from twin studies is genotype-environment interactions, where the environment in GWAS is not taken into account in the same manner that twin studies does, and therefore there is a potential confounding environmental factor in a GWAS study and therefore a gap in heritability. 2. ????

1. Looks for patterns in disease distribution to generate hypotheses 2. Compares groups to test hypotheses about what might be causing/determining a disease 3. studies how genes and environmental factors cause disease in human population and families

1. descriptive epi 2. analytic epi 3. genetic epi

1. Generating new hypotheses about disease distributions (not testing them) 2. Establishing causation with the greatest scientific validity, when ethically feasible to expose subjects and ample funding exists 3. Being quick and cheapest to conduct 4. Looking at a very rare disease, to investigate many exposures 5. Following the incidence of disease resulting from exposure in correct time-order, when the exposure is not ethical to assign

1. descriptive epi 2. experimental trial 3. cross-sectional study 4. case-control study 5. cohort study

1. Graph the number of cases of congenital syphilis by year for the country 2. Compare frequency of brain cancer among anatomists (who are exposed to many chemical exposures professionally, i.e. stains and fixatives) with frequency in general population 3. Recommend that close contacts of a child recently reported with meningococcal meningitis receive Rifampin

1. distribution 2. determinants 3. application

Considering genetic relatives only, full siblings are __1__ degree relatives, parents/children are __2__ relatives, and grandparents/grandchildren are __3__ degree relatives.

1. first 2. first 3. second

1. Let's say that segregation analysis informs you that the best model to describe how the trait is passing through families is a polygenic one. What does that mean? 2. And which of the following would be best to track down the disease gene(s) in this situation if you had not very much money and were looking for common variants only? 3. On the other hand, what kind of study would you do to track down the disease gene(s) if you had lots of additional money and were looking for rare variants and copy number variation?

1. multiple genes are involved in determining the trait 2. genome-wide association study 3. sequencing

For schizophrenia, the prevalence in the general population is 1.2%. However, 13% of the full-siblings of diagnosed individual go on to develop schizophrenia themselves. What is λS, the sibling relative risk ratio, and what does it mean in words?

10.83 Full siblings of individuals with schizophrenia have a 10.83 times the risk of developing the disease compared to the general population.

calculation of relative risk and interpretation (hw 2)

232/(648+232)= 0.2636 315/(315+1008)= 0.2381 0.2636/0.2381= 1.107 = 110.7% High levels of noise pollution increases the risk of heart disease by 10.7% or There is 1.107 times the risk of heart disease from high levels of noise pollution compared to low levels of noise pollution. There is an increased risk of heart disease in areas of high levels of noise pollution.

You see a bunch of whole genome sequencing sales this weekend for Rare Disease Day. For under $300, you can get either a 10x coverage whole genome sequence or a 30x coverage whole genome genome sequence from different vendors. Which would be the better use of your money in terms of giving more support and credibility that any variants identified are real?

30x coverage

Extra credit: Evolutionary biologist J.B.S. Haldane famously said, "I would lay down my life for two [siblings] or eight cousins." Going by amount of DNA shared by descent, as J.B.S. was doing here, how many nieces or nephews (or a gender-neutral term for a children of a sibling which we do not have in the English language) do you think he would sacrifice himself for? You should assume full sibling relationship.

4

If a study estimates a heritability of 0.75 for Crohn's disease, what does that mean?

75% of the variation in the Crohn's disease trait in the population is attributed to differences in genetics

Which of the following kinds of disease genes could we help identify the location of through linkage analysis?

A mendelian major effect gene that is X-linked

Can we say there is linkage to the disease across these two families, even though there are different marker alleles associated with the disease in each family (1 pt)? What could we do to combine linkage information across these two different families (1 pt)? (HW 6)

Although there are different marker alleles associated with each family, a linkage to the disease can be evaluated if you look at the specific marker D8S1798. At this location in both families, each individual is affected by the same marker. Therefore, it may be possible to say that this marker is perfectly linked to the disease. To combine or compare linkage information between the two families LOD scores can be used. A LOD score can show if linkage is likely occurring or not, and can be compared across families.

How is ancestry analysis related to GWAS?

Ancestry analysis is necessary as a prerequisite for GWAS to include ancestry in the model as a potential confounder

What is a difference between monozygotic and dizygotic twins that might violate the underlying environment assumption of twin studies?

BOTH :MZ twins can be treated more similarly due to their more similar appearances MZ twins can share a placenta, and thus sharing more of environmental factors than DZ twins

When monozygotic twins are not concordant, this can be because of...

BOTH: Epigenetic factors resulting in different gene expression from the same genome Environmental influences

Why does your teacher not have you look at your own and classmates' cheek cells with stains to see inactivated X chromosomes in females and a lack of inactivated X chromosomes in males?

Because sex chromosome aneuploidies are fairly common and that might be upsetting to find out, and biological sex isn't always the same as gender and outing people would be rude

Extra credit: the below item is on sale on Ebay as a "Bone cell Osteon Biology Science STEM" print dress. What's a better description for the Biology Science STEM concept that is actually being showcased on these dresses?

Dress with print of X-Rayed double stranded DNA

The incidence of testicular cancer in Sweden is over double the rate in Finland. When Finns in their twenties move to Sweden, their testicular cancer risk remains at the Finnish level, but their sons have the Swedish testicular cancer rate. (This is true even if they marry Finnish women so the sons are ethnically 100% Finnish.) What does that mean?

Evidence for environmental etiology of early childhood exposure

T/F Ancestry is synonymous with race.

F

T/F Everyone who has a recessive disease allele must always carry exactly the same nucleotide variant, so there's a simple genotype test for whether they have it or not.

F

T/F It is always possible to infer genotype from phenotype.

F

T/F Most clinicians have dedicated time to take a family health history and document it for you in a summary figure, so individuals don't need to collect their own.

F

What kind of study would be best/cheapest way to identify the genetic risk alleles for a common disease, which you believe (from segregation analysis) to be polygenic in nature?

GWAS

Please explain the difference between genotyping and sequencing (e.g. in terms of what kind of information they produce).

Genotyping: determines allelic forms at a particular locus or loci. Produces information specifically for the presence or absence of a variant, but can only be done if we know where the gene is (must be determined prior) Sequencing: determines all based in either a genome or an exome looking for variation at specific regions of the genome. Can produce information about de novo mutations and rare variants, along with any other information based on base pairs.

Pedigrees

HW 6

Manhattan plots, cluster plots

HW 7

A customer who obtained different results from three different ancestry testing companies is upset, and wants his money back, because he thinks it's ridiculous that his y chromosome and mitochondrial DNA tests do not match up with the results he got for his autosomes. What do you say in response?

He shouldn't get his money back; these tests are reporting different information about different lineages of his ancestors

Which of the following statements is true about recombination in genetic epidemiology study designs?

In linkage more recent recombination taking place within the generations of the family studied, defining a LARGER area of chromosome AND In GWAS longer ago recombination events defining a SMALLER area of chromosome (or locus).

Which of the following is true about carrier status?

It can sometimes be associated with a subtle version of the phenotype it becomes most relevant when two carriers reproduce using their own genomes genetic relatives or isolated population groups might be more likely to be carriers for a particular condition people are increasingly finding out their carrier status without involving a doctor or genetic counselor

Which of these strategies for discovering disease genes can use a family structure (a large pedigree)? Select all correct answers.

Linkage analysis Sequencing

The Doll and Hill case-control study we talked about in class investigated the smoking history of patients hospitalized in London for lung cancer as well as age-matched controls who were hospitalized for other causes. If we use their overall results (both genders grouped together) to calculate an odds ratio, as we did in class, we get an odds ratio (OR) of 3.0. What is the best, most complete interpretation of what this OR means, in English?

Lung cancer patients were 3 times more likely to have been smokers than patients hospitalized for other causes

Why might traits be more complex than the different mendelian inheritance patterns described in the previous questions?

Many different genes contribute to the final phenotype Environmental factors contribute to the final phenotype Interactions between genes and environment can contribute to the phenotype Gene-gene interactions can modify the final phenotype

Whole genome sequencing is...

Most certain to capture the causal variant than exome sequencing because it captures the variation in all regulatory regions, not just the coding regions More expensive than exome sequencing because it attempts to capture all of the genetic material, instead of just the transcribed-and-translated part Harder to interpret than exome sequencing because our ability to predict the phenotype impact of noncoding variants is less goo

If monozygotic twins have a concordance of 28% for Irritable Bowel Syndrome, and dizygotic twins have a concordance of 27%, what does that suggest? (Note that these are real numbers from an actual study, and if asked to describe the meaning of the 1% difference I would say they are very similar and that might not represent a statistically significant difference.)

Mostly environmental etiology

If monozygotic twins have a concordance of 96% (super close to 1!) for recurrent childhood ear infections, and dizygotic twins have a concordance of 51% what does that suggest? Note these are real numbers so you will need to decide which of these is the BEST model to explain the data, not necessarily a completely perfect one.

Mostly genetic etiology

Sequencing can be used to study...

Mutations in body cells leading to cancer development Dominant diseases Recessive diseases disease alleles on the autosomes disease alleles on the sex chromosomes common diseases rare diseases

In a study of familial aggregation, the risk of psoriasis (an autoimmune skin condition) was evaluated in first degree relatives of people with psoriasis, as compared to people who did not have a first degree relative with psoriasis, using a cohort approach. The RR was 5.5. Which, if any, of the following statements are NOT a correct interpretation of that RR in plain English?

NONE: all are correct Having a family history of psoriasis in first-degree relatives is associated with more than 5x the risk of psoriasis. The risk of psoriasis with a first-degree relative who has the disease is more than 5x that without a first-degree relative with the disease. The risk of psoriasis with a first-degree family history is 550% of that without a first-degree family history. A first-degree relative family history is associated with a 4.5-fold INCREASE in the risk of psoriasis. A first-degree family history is associated with a 450% INCREASE in the risk of psoriasis. A family history of psoriasis is a risk factor for developing psoriasis onesself.

Would it be possible to use linkage analysis with these markers in the previous question if one of the parents in generation III were instead a non-relative marrying in? Why or why not?

No we cannot. We need to have to consanguinity because we need to disease allele to be inherited identically by decent from a common ancestor, which only cab be done with consanguineous families. The non-relative marrying in would not share the common ancestor nor the identical markers needed for the recessive trait to be expressed.

Which of these is a mode of inheritance that we look for on pedigrees?

None of the above; Y chromosomes are hemizygous and mitochondria are all inherited from one parent, so it doesn't make sense to think about multiple alleles together for those genes

The role of family history as a risk factor of coronary heart disease was explored in a study of 121 female survivors of a recent heart attack and 130 control women (selected to be good matches for the survivors in age, ancestry, and neighborhood residence). Each woman was asked whether she had first-degree relatives (parents or siblings) who had occurrence of CHD. Here is the collected data: Survivors of recent heart attack Control women Total Family history of CHD in first degree relative (parent/sibling) 21 11 32 No family history of CHD in first-degree relatives 100 119 219 Total 121 130 251 subjects What calculation is appropriate to do to assess the role of family history as a risk factor given this study design?

OR, 2.28

Which of the following are types of genetic variation which could be discovered by sequencing?

SNPs Variable number of tandem repeats Indels Structural rearrangements and segmental duplications (assuming careful analysis of reads)

T/F Familial aggregation studies can take both a case-control and a cohort study approach.

T

T/F Having a family history is a risk factor for developing most diseases.

T

T/F If unaffected people mating with each other have affected offspring, that suggests recessive inheritance of the trait.

T

T/F Potential bias exists in family aggregation studies around patients/cases being more motivated to respond or better informed as a result of their disease status.

T

T/F Sequencing can look for new "de novo" mutations leading to diseases as well as existing variants which have been passed through families.

T

T/F There are standardized notations for pedigrees that have been completely implemented in the major health history toolkits that cover all types of family building, including third-party assisted reproduction and half-sibling twins.

T

T/F To do a segregation analysis involves collecting the same data as familial aggregation studies (pedigrees and phenotypes), just analyzing them differently (fitting to different models of inheritance).

T

When a strong, statistically significant association between a SNP and disease is identified via GWAS, and confirmed by multiple independent, well-designed studies (good quality control measures, good control choices, lack of population stratification), what conclusions should be drawn?

The actually causative SNP might be in linkage disequilibrium (very very close on the same chromosome) with the causative variant

T/F

The hallmark of mitochondrial inheritance include only females being affected.

Which of the following is NOT a correct way to describe a RR of 0.66?

The risk is reduced BY two thirds

If a trait is shown to have familial aggregation, what does that mean?

The trait could be attributed to either genetics OR environment

Explain why carrier females are unaffected in the previous question, given there is Lyonization aka X-inactivation for dosage compensation? (You can consider that blood cells come from bone marrow which exists in many sites in the body, and then they circulate freely in mixing.)

There are two possible correct answers here: 1. It might be in the minority of genes on the X chromosome which are not subject to X-inactivation, along the ends. 2. The cells in blood come from many different bone marrow tissues, and then they mix freely, and that would obscure any mosaicism. Having half the functional blood cells might be enough to generate the appropriate phenotype.

interpretation of risk

There is a *79.2% reduction in risk (or 0.208 times the risk)* of having a cold while social distancing compared to not social distancing. There is a decreased risk of catching a cold while social distancing vs not.

You are trying to identify the gene responsible for the disease trait in the family represented in the pedigree shown, which has thus far eluded genotyping-based efforts. You have funds to do whole-genome sequencing for individual V-I. What kinds of variants are you looking for? (Select all that apply)

Variants for which the individual is homozygous Variants for which healthy people in databases or in this family are NOT homozygous

Individual V-1 in the shown pedigree is not available for sequencing, in your quest to identify the gene responsible for the disease trait in this family. You had funds to do whole-genome sequencing for individual V-1 but now you have decided to try sequencing of individual II-1 instead. Note that the parents of individual II-1 are unrelated to one another. (HW 8, Q 7)

We would still be looking for a gene where both copies are mutated in some way but we wouldn't expect it necessarily to be the same mutation in both copies of the gene (i.e. II-1 could be a compound heterozygote)

For the following genetic epidemiological questions, indicate the type of study that addresses it

What is the disease, how is it distributed (in people, time, place)? descriptive epidemiology Does it cluster in families (or run in families)? familial aggregation study Is there a genetic effect (heritability)? twin/adoption study In what pattern is the disease inherited from parent to offspring? (I.e. is it passed on like there's one gene or many, dominant or recessive, autosomal or sex linked or other?) segregation analysis

is epi a science

Yes, I believe that epidemiology is a science. To collect data, analyze it, and apply it are all aspects of epidemiology. While there are other important aspects to the field such as increasing access to healthcare and policy work, these action tasks are still backed by a need for a scientific method, research papers, and statistical analysis; all of which are key pieces of science.

Briefly define p-hacking, why it's a problem and describe one potential solution. (1.5/2)

Your Answer: P-haking is when you continuously reanalyze data or analyze data using p-values for multiple experiments, searching for associations in hopes that one will give a statistically significant result. Because statistical significance is p=0.05, if you were to perform 100 experiments 5 would be significant data, even if it's by chance. One potential solution is by requiring a much much smaller p value (of ~0.01 or even 0.001 to make certain the data and different associations being looked at truly are statistically significant and not due to repeating the same experiment over and over again.

calculation of 2874/7878=0.367 is done, what is it reporting?

a proportion

y-linked inheritance (HW #5)

all male offspring from affected male also affected

Sex chromosomes...

are passed down differently from autosomes because they determine the biological sex of offspring

A typical GWAS with 1 million SNPS is examining what fraction of the total genome?

around 0.0003 or 0.03% of it

Exome sequencing is examining what fraction of the total genome?

around 0.01 or 1% of it

If a gene is located on the recombining, homologous tips of the X and Y chromosomes, it will have the following inheritance pattern?

autosomal

Why is consanguinity something that that genetic counselors and genetic epidemiology researchers note on pedigrees?

because the risk of recessive conditions in the offspring goes up

disease + exposure/total exposure = 0.025 disease + no exposure/total exposure= 0.121 0.025/0.121= 0.208 = 20.8% There is a 20.8% less chance of having a cold while social distancing than not social distancing. There is a decreased risk of catching a cold while social distancing vs not.

calculating RR

A topic in the news is coffee-drinking. A Japanese study examined the association between coffee consumption and the risk of a particular kind of cancer (endometrial endometrioid adenocarcinoma, or EEA). The researchers first identified 107 women less than 80 years of age from two medical centers who had been diagnosed by cell staining to have EEA. From a cancer-screening program they also identified 214 women who did not have EEA, who were matched for age and for area of residence for the EEA patients. A questionnaire containing questions to determine past beverage consumption was administered, including questions about coffee consumption. What kind of study is this?

case control, can calculate OR

A search for common variants associated with colorectal cancer is conducted by recruiting 1550 individuals diagnosed with colorectal cancer, and 3800 individuals recruited from the same medical establishments who were not cancer patients, and who did not have a previous history of colorectal cancer. All recruited study participants are genotyped. What study design is this? Check all that apply.

case-control GWAS

Your instructor reads this paper and says, "Well, everyone I know who is eating a high-fiber diet rich in fruits and vegetables is also really preoccupied with being healthy in other ways, like exercising regularly. What if it's the exercise that is truly causative in improving lung health?" What explanation does this suggest for the association found?

confounding

You are interested in whether underemployment is a risk factor for depression, but as an undergraduate student you are limited in funds, so you decide to conduct a study by simply conducting a phone/internet survey in which people self-report the exposure (underemployment) and outcome (depression) at that point in time. What kind of study is this?

cross sectional

You are a principal invesigator designing a GWAS to study your favorite health-related trait. Choose which of these QQ plots would be most desirable to obtain at the end of your GWAS study, to show that you had obtained meaningful results (SNPs associated with your trait of interest) with a minimum of bias or confounding. (HW 7)

d

genetic variants environmental exposures infectious disease pathogens

disease outcomes determined by

An adoption study finds that adoptees who have at least one genetic (birth) parent with coronary heart disease were 1.5 times more likely to develop coronary heart disease themselves than adoptees without a genetic parent with CHD. However, adoptees with at least one adoptive parent with CHD were not at increased risk for developing the disease themselves. What does that mean?

evidence of genetic etiology

If an odds ratio for the association between exposure and disease is statistically significant and much bigger than 1, this means that the exposure must be directly causing the disease outcome.

false

the only multiplier used in rates in epi is 100 for percentages

false

Pedigrees (graphical displays of family health history) and the data about genetic relationships that they display can be used in...

familial aggregation studies segregation analyses linkage analyses by primary care physicians in clinical settings by genetic counselors

How can anyone (scientists, curious relatives, twins themselves) reliably tell whether twins are monozygotic, dizygotic full-sibling twins, dizygotic half-sibling twins, or sesquizygotic?

genetic testing

You are equally closely related with a first cousin as with a...

great-grandparent

An article you found for discussion section is reporting a number that the authors describe as "the proportion of variation in a trait that is attributable to variation in genetics". The article gives a number that's between 0 and 1. However, the article doesn't actually give a specific term for this test statistic, so a fellow student is confused about how this number relates to class concepts. You help your fellow student and explain that the study is giving the...

heritability

looking at new covid cases during last quarter and discovered that 598 students contracted covid-19 during autumn quarter (not counting preexisting covid19 cases)

incidence

Rare diseases are...

individually rare, as in the prevalence of each disease is low

Which of the following study designs make use of recombination events that narrow down the interval where a gene is located?

linkage and GWAS

Which of the following study designs is/are a way to identify a new disease gene (i.e. one which has not previously been implicated in the disease)?

linkage study (possibly with sequencing follow-up) genome-wide association study (possibly with sequencing) sequencing (by itself, without being a follow up to any of the above)

x-linked dominant

look @ homework 5

Your instructor has a family member who has a perplexing medical condition (adult onset vision loss that is very atypical compared to other forms of retinitis pigmentosa in the literature). Since one parent and one grandparent also appear to have been affected, this trait seems to transmit in an autosomal dominant inheritance pattern. That individual enrolled in study where exome sequencing was done. Of the 20,000 different places where the individual varies from the reference sequence, a lot of criteria could be used to filter variants to hone in on which variant is causatively involved in the vision loss. Which of the following would NOT be a good strategy?

look for a variant in which the individual is homozygous

GWAS usually identifies....

multiple SNPs in linkage disequilibrium at each disease-involved location in the genome

Familial aggregation is ______ to establish that genetics are involved in a trait (through genetic epidemiology).

necessary but not sufficient

calculation of 598 new cases/31000 students/3 months of autumn quarter or 0.0006 cases/student/month or 300/100k students/fortnight

rate

You are excited to find a nice robust association! However, your classmate provides the criticism, "but what if the reason people aren't employed is because they're depressed?" This would be an instance of:

reverse time order

In which of the following study designs do you definitely for sure identify the causal variant which is directly responsible for the trait, rather than a variant which is nearby enough not to be recombined away often?

sequencing

When looking through L'il Bub's genome sequences to find a variant that might be responsible for her osteopetrosis, which did researchers eliminate from consideration?

synonymous variants in exons which do not change the resulting protein

Let's say the LOD scores from these two microcephalie families put together are less than -4 at markers D8S504 and D8S1824. What does that mean? the disease gene is excluded from being linked to and thus not near those markers What does it mean that the LOD score is 5.75 for markers D8S1798, D8S277 and D8S1819? Probability of these markers being linked to the disease gene is more than 100000 times the probability of them being unlinked

the disease gene is excluded from being linked to and thus not near those markers Probability of these markers being linked to the disease gene is more than 100000 times the probability of them being unlinked

Which of the following is true about twins?

there are twins that are neither dizygotic nor monozygotic

Match the study with what calculations/information you get out of it. Not all answers must be used, answers may be used more than once, and there may be more than one correct answer - you only have to give one.

twin studies: heritability adoption or migrant studies: heritability segregation analysis: stats about which model is best fit for the mode of inheritance familial segregation: OR or RR associated with having a family history

Match variant vs mutation vs haplotype.

variant = a difference in the genome (can be copy number or substitution) SNP = a common difference related to a single basepair mutation = a mistake in replication that happens to create a variant haplotype = run of variation usually inherited as a bundle

A cohort study of 880 study participants that came out earlier this week looked for an association between exposure to a high-fiber diet (as measured by a survey) and then following up over time on lung problems as measured by airway restriction. For this study, can you calculate a RR (0.5)? If so, calculate it (1 pt) and say if the exposure is protective or a risk factor (.5 pt). If not, explain why you are not able to (1 pt) and what you should calculate instead (.5 pt).

yes, becasue it is a cohort study The RR should be ~ .51, which shows that the exposure is protective, because the RR of lung problems is less in participants with high-fiber diets.


Conjuntos de estudio relacionados

CSC 2040 Java Programming Quiz 2

View Set

Ch 26 Management of Patients w/ Dysrhythmias & Conduction Problems

View Set

Gold Coast RE 1001 Cram, Practice Math and Final Exam

View Set

SOCIAL WORK: EXAM #2 STUDY GUIDE

View Set

Chapter 57: Management of Patients with Burn Injury

View Set

Intro to professional nursing 1: chapter 1

View Set