Big Data

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Big data examples in healthcare

EMR (EHR), ● Insurance claims, ● Clinical trial data, ● Data collected from smartphone applications, wearable devices, social media, ● Personal genomics data

Reduction of dimensionality

In such cases, researchers may decide to eliminate covariates that are unlikely to carry useful information, such as those with little variation or low prevalence.

What is EMR

Include real time clinical data, such as patient identifier (admission numbers), vital signs, diagnoses, and coded therapies. Administrative data include information from the centers for medicare and medicaid service or commerical insurance claim

Why is data set splitting necessary?

One of the biggest limitations of predictive analytics models is their tendency toward overfitting.The overfitted model represents the data rather than predicting it., the original data set can be randomly split into a training set, a validation set, and a testing set prior to model building.

List steps of predictive analytics

Step 1. Construction of a patient-level data set (data set aggregation) Step 2. Reduction of dimensionality Step 3. Model building, validation, and selection

List some examples of big data in healthcare?

Stock market, Social, Media, Sensor

Velocity

The speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in real-time. Compared to small data, big data are produced more continually. Two kinds of velocity related to big data are the frequency of generation and the frequency of handling, recording, and publishing.

predictive analytics includes

a variety of statistical techniques, from predictive modeling to machine learning to data mining, that analyze historical and current data to predict future or other unknown events.

List some parameters for the model evaluation.

accuracy; area under the curve; specificity; sensitivity;

List some models commonly used in big data analytics.

artificial neuron networks, support vector machines, discriminant analysis, and classification trees.

What are challenges of the analysis of big data?

spurious correlation, noise accumulation,

What is univariate selection

the construction of univariate regressions that measure the strength of the association between each of the remaining covariates and the outcome. Then, only the variables that are strongly associated with the outcome (those associated with a p value below a prespecified threshold) are selected for inclusion in the multivariable regression.

1. What characteristics can big data be described by?

volume, variety, velocity, veracity

Construction of a patientlevel data set

(1) cleansing and normalization of the original data sets so that the information is consistently formatted across different data sources, (2) aggregation of multiple data sources into a unified data set at the patient level, (3) de-identification of the protected information (4) validation of the process to ensure the accuracy of the data

Veracity

It is the extended definition for big data, which refers to the data quality and the data value. The data quality of captured data can vary greatly, affecting the accurate analysis.

Volume

The quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not.

Variety

The type and nature of the data. This helps people who analyze it to effectively use the resulting insight. Big data draws from text, images, audio, video; plus it completes missing pieces through data fusion.

Spurious correlation

a large number of variables are evaluated, important variables can be highly correlated with variables with which they have no actual relations higher probability of a Type I error—an incorrect rejection of a true null hypothesis (i.e., a false positive). This phenomenon can lead to false scientific discoveries and wrong inference.

application in pharmacy

prediction of a patient's therapy success, appropriate treatment, effective intervention

.List some limitations of the application of predictive analytics in a health system.

results are still subject to a high chance of false scientific discoveries;the high chance of false discovery associated with the use of large databases, and because of the methodological complexity of these methods. Second, the storage and analysis of large quantities of patient data require the development and maintenance of a complex and secure infrastructure with high computing power, as well as experts in multiple domains, including data management, computer science, epidemiology or outcomes research, and systems administration. Finally, the conventional procedures commonly used to assure privacy protection may not be sufficient to prevent patient identification,

Noise accumuation

the accumulation of estimation errors when the prediction is based on a large number of parameters, which leads to poor classification or poor prediction. the number of predictors is too large, the addition of more predictors does not improve the predictive power of the model but only leads to accumulated noise, which deteriorates the performance of the model.


Ensembles d'études connexes

Types of fires and extinguishers

View Set

World Science Fiction Terminology

View Set

International Marketing Chapter 8

View Set

Chapter 28: Disorders of Cardiac Conduction and Rhythm Patho Prep U

View Set

Chapter 3: Transfer of Real Property

View Set

Introduction to Probability Unit Test Review and Test 100%

View Set

CCNA 1 v7.0 Final Exam Answers Full - Introduction to Networks

View Set

Chapter 24- Toward a Modern America: The 1920s

View Set