Session 5 Green Belt

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

The process improvement team at Zak's mart has created a pie chart of the number of days it took to complete each phase of their last project. Based on the information in the chart, what percentage of time was spent on the Measure phase? A) 33% B) 24% C) 14% D) 25%

Answer D The total number of days taken to complete the project = 7 + 24 + 13 + 31 + 20 = 95. The Measure phase accounts for (24/95)*100 = 25% of the time.

What is the characteristic of a distribution called that is the difference between the highest and lowest values of a sample data set? A) Mean B) Median C) Location D) Standard Deviation E) Spread

Answer E

The team has found that cycle times longer than 15 minutes cause unacceptable delays in the process. If the sample is representative of the process at large, what is the estimated percentage of delays based on the histogram below? (Note, the horizontal axis on the histogram shows the upper limit for the bin) A) 18% B) 5% C) 25% D) 10% E) 38%

Answer E Solution: Cycle times longer than 15 minutes represent delays. The number of cycle times longer than fifteen minutes sum to 19, representing 38% (100*19/50) delays in the process

Statistics: Population

The entire process output or collection of objects about which we wish to draw a conclusion. The populations may be finite or infinite, real or hypothetical. In our example, the population consists of all the claims filed last year at the five locations (real, finite). Another study might include all claims from this year onward (infinite, hypothetical)

Statistics

The science (and art) of data-based decision-making. A numerical characteristic calculated from the sample.

Histograms Limitations

* Because the samples are randomly drawn, any time-dependent trends or patterns in the data are hidden. This could paint a misleading picture of the true performance of the process. * The histogram is a snapshot of the process at a single point in time and may not represent that process in general (unless you use long-term historical data). * Any changes in the number or width of the bins will result in changes to the appearance of the distribution, which may adversely affect the conclusions drawn from it. * The histogram gives no information about the stability (state of statistical control) of the process at the time of sampling. If the process is stable, predictions can be made about its future performance. An unstable process changes all the time, so a histogram based on such data would have poor predictive power.

Skewed Histogram

A distribution with a long tail at one end: * The distribution is truncated (cut-ff) at one end, such as when only output above or below a specification is considered. * a natural process limit exists(e.g. Age cannot be lower than zero) * time-based measurements such as cycle times, wait times, etc. * start-up effects: an oven may take a while to heat up, causing many initial failures which taper off as the temperature stabilizes.

Bi-model Histogram

A histogram with two distinct modes or peaks may indicate that material from two different suppliers, production lines or shifts have been pooled.

Saw-Toothed Histogram

A number of spikes distributed throughout the histogram may be indicative or stratification, measurements (rounding) error or a low resolution instrument.

Statistics: Parameter

A quantity associated with the population. Designted by Greek or capital Roman letters. Examples P (proportion), u (average), o (standard deviation). The population average or mean of the claims process cycle time is the calculated as the sum of the cycle times of all the claims in the population, divided by the total number of claims in the population.

Statistic

A quantity calculated from the sample. Designated by lower-case roman letters. Examples: p (proportion), ybar (average), s (standard deviation). The sample average (mean) of the claims processing cycle times is calculated as the sum of the cycle times in the sample, divided byt he total number of values in the sample.

Statistics: Random Sample

A sample drawn such that every member of the population has an equal chance of being selected. This method ensures that every item in the population is equally likely to be selected into the sample. As a result, claims are randomly picked from each of the five locations, making it more representative of the population as a whole.

Bell-Shaped Histogram

A symmetric, bell-shaped curve with a distinct peak represents teh distribution of sample means from any population. This distribution, called the Normal distribution, plays a central role in many types of statistical analysis.

Enumerative Studies

Aim to answer questions about the current population like "how many?" or "in what proportion?". The focus is historical rather than predictive.

Descriptive Statistics

Aim to describe and summarize the important features of a population or process. Graphs such as pie, bar and line graphs, histograms, boxplots, scatterplots, etc. and numerical summaries like means, variances, etc. are examples of descriptive statistics, as illustrated below.

Results Matrics

Also called dependent or response variables

Inputs

Also called independent or explanatory variables.

The Empirical Rule

Also called the 68-95-99.7% rule, the empirical rule states: 68% of the values under the normal curve fall within one standard deviation of the mean, 95% within two standard deviations and 99.7% within three standard deviations of the mean.

According to Taguchi's Loss Function, the loss to society from poor quality increases as a function of what? A) Process variability. B) Process centering. C) Specification limits. D) Internal scrap rates.

Answer A

Two suppliers offer a machine part with a required diameter of 1mm. Data indicate that supplier A's product has a standard deviation of 0.12 mm. and supplier B's product has a standard deviation of 0.19 mm. Which of the following statements is true? A) Supplier A offers a more homogeneous product. B) Supplier B offers a more homogeneous product. C) The standard deviation has no relationship to the quality of the product. D) Only the mean is important in buying the machine part. E) The apparent difference between Supplier A and Supplier B is of no practical significance.

Answer A

Which of the following statements about continuous data is TRUE? A) Take values along an infinitely divisible scale. B) Measure the presence or absence of some characteristic in each unit. C) Not preferred for Six Sigma project analysis. D) Process variability is distorted by continuous data.

Answer A

Which term best represents the proportion of ALL islanders who have a college education a. parameter b. statistics c. population d. random sample e. sample

Answer A A numerical characteristics calculated from the entire population is called a parameter.

Using the same chart information, what percentage of the time was spent on the Improve and Control phases together? A) 54% B) 45% C) 51% D) 60%

Answer A The Improve and Control phases together account for (31 + 20) = 51 days out of the 95 days spent on the project. Thus, (51/95)*100 = 54% of the time was spent on the two phases combined.

What is the primary purpose of a Histogram? A) Calculate the standard deviation. B) Display relative frequency by interval. C) Display number of intervals. D) Display process stability. E) Identify the root cause of variability.

Answer B

Which of the following statements is TRUE? A) The variance of a set of negative numbers is negative. B) A small standard deviation implies that the observations are clustered close to the mean. C) The standard deviation is the square of the variance. D) In a positively (right) skewed distribution the mean is smaller than the median.

Answer B

Normal Distribution is important for two reasons:

1. Many widely used statistical techniques are based on the assumption that the distribution underlying the sample data is normal. 2. The distribution of sample means from any distribution is well approximated by the normal distribution, due to the Central Limit Theorem (CLT), which states that as the sample size increases, the distribution of the sample mean tends toward the normal, irrespective of the shape of the original (parent) population. In addition, the distribution of the sample mean has a smaller spread, or variance, compared to the original distribution (due to the effects of averaging).

Pareto

Also know and 80/20 rule.

Central Limit Theorem (CLT)

Central Limit Theorem (CLT), which states that as the sample size increases, the distribution of the sample mean tends toward the normal, irrespective of the shape of the original (parent) population. In addition, the distribution of the sample mean has a smaller spread, or variance, compared to the original distribution (due to the effects of averaging).

A machine used in an electrical application produces bolts with diameters following a normal distribution with &#956; = 0.060 inches and &#963; = 0.001 inches. While 0.060 inches is the required diameter, any bolt with diameter less than 0.058 inches or greater than 0.062 inches must be scrapped because of the precision required for the application. What proportion of bolts will be scrapped on this machine? (Hint: Area to the left of -2 = P(Z < -2) = 0.0228) A) 0.0456 B) 0.0228 C) 0.9772 D) 0.1667

Answer A Let x1 = 0.058 and x2 = 0.062. Then, z1 = (x1 - μ)/σ = (0.058 - 0.060)/0.001 = -2 and z2 = (0.062 - 0.060)/0.001 = 2. The percent scrap: P(Z < -2) + P(Z > 2) = 2*P(Z < -2), since both areas are equal due to symmetry. Given P(Z < -2) = 0.0228, the required area or probability is 2*0.0228 = 0.0456.

Which of the following is NOT a reason to use continuous (variable) data? A) It provides a better measure of variation. B) It is more convenient for subjective appearance items. C) A greater number of statistical analysis tools can be applied. D) There is less subjectivity involved.

Answer B

Which of the following is an enumerative study and which is analytical? 1. Census survey 2. Climate modeling 3. Inventory valuation 4. Pilot study A) enumerative, enumerative, analytical, analytical B) enumerative, analytical, enumerative, analytical C) analytical, enumerative, analytical, enumerative D) analytical, analytical, enumerative, enumerative

Answer B Census and inventory are taken to understand and make decisions about the existing population, thus they're enumerative studies. Climate modeling and pilot runs are used to make predictions about future states of the weather and solution implementation respectively; they're analytical studies.

Which term best represents the actual measured proportion of icelanders with a college education? a. random sample b. statistic c. sample d. population e. parameter

Answer B Statistics: Proportion of people in the sample who have a college education.

A dataset of five scores has a median of 12. If the highest score is increased by 3, the new median: A) Will decrease B) Will increase C) Will stay the same D) Cannot be determined without additional information E) None of the above

Answer C

For a normal distribution with mean = 65 and standard deviation = 12, which of the following intervals contains the middle 68% of the values? A) 41 and 89 B) 29 and 101 C) 53 and 77 D) 11 and 119

Answer C By the Empirical Rule, 68% of a normal distribution falls within 1 standard deviation of its mean, in this case within 65-1(12)=53 and 65+1(12)=77.

This problem is based on the dataset used previously, from NewMed hospital. An improvement team at the hospital is investigating the variability in the time it takes for the surgical staff to pull out preference cards from the shelves to prepare the operating tray prior to each surgical procedure. Calculate the range, variance and standard deviation for the sample. (Hint: use (n-1) as the demoninator) The Excel® data spreadsheet can be downloaded here. 16.5 11.7 14.3 11.4 12.2 12.0 12.3 13.7 13.6 21.8 11.9 16.9 20.0 9.6 13.2 10.4 15.1 15.3 18.5 11.2 9.6 25.5 10.1 14.7 14.5 19.9 10.4 15.6 12.7 15.2 27.4 17.3 14.7 16.2 16.8 10.5 14.2 11.3 18.4 18.0 11.8 12.2 11.5 13.0 21.6 9.2 13.2 23.3 13.2 12.9 A) Range = 15.5, variance = 19.23, sd = 3.86 B) Range = 18.2, variance = 17.15, sd = 4.14 C) Range = 18, variance = 35.19, sd = 5.93 D) Range = 19, variance = 15.58, sd = 6.75

Answer B You can use any statistical software package to do the calculations. If using Excel®, the "=MAX()-MIN()", "=VAR()" and "=STDEV()" functions calculate the range, variance and standard deviation respectively.

The time required to assemble an electronic component is normally distributed with a mean of 8 minutes and a standard deviation of 1.5 min. What is the probability that a particular assembly takes more than 10.25 minutes? (Hint: area under the normal curve to the left of 10.25 is 0.9332) A) 0.9332 B) 0.0668 C) 0.3413 D) 0.4332

Answer B Calculate z = (10.25 - 8)/1.5 = 1.5. We want the greater-than area: P(X > 10.25) = P(Z > 1.5). The area under the normal curve to the right of 1.5 = 1 - the area to the left of 1.5 = 1 - NORMSDIST(1.5) = 1 - 0.9332 = 0.0668. Alternatively, due to symmetry, the area to the right of 1.5 is equal to the area to the left of -1.5, NORMSDIST(-1.5) = 0.0668.

A sample of 5 persons with hypertension underwent a special treatment program which resulted in the following reductions in systolic blood pressure for these persons: -5, 10, 20, 5, 10. The mean reduction in systolic blood pressure is A) 10 B) 9 C) 8 D) 40 E) None of these

Answer C

A team is sorting through error data from a shipping process and wants to develop priorities for action. What tool should the team use to analyze the data? A) Process Mapping B) Brainstorming C) Pareto Chart D) Corrective Action Matrix E) Trend Chart

Answer C

What is the characteristic of a distribution called that is the difference between the average of a sample set of data and the target value for the sample data? A) Mean B) Median C) Location D) Standard Deviation E) Spread

Answer C

The table below shows the number of surgeries of each of four types of procedures, performed per quarter of the 2006 fiscal year at a regional hospital. Which chart(s) from those listed below would be appropriate to show the trends in surgical procedures over the four quarters? A) Pie chart only B) Pie and Bar chart C) Bar and Line chart D) Line chart only

Answer C Bar charts and Line charts allow comparisons of values across two variables - in this case, Surgical Procedure and Quarter.

The tabled data from the last question is displayed below as a bar chart. Based on the information in the chart, which surgical procedure numbers stayed essentially flat over the year? A) OB/GYN B) Opthalmology C) Orthopedic D) General

Answer C Orthopedic surgeries show the smallest changes in numbers as evidenced by the blue bars in the chart, all of which seem to have essentially the same height.

The study team selected a group of individuals to represent teh people in each of the eight regions. Which is the bvest description for the data set collected to estimate the proportion of Icelanders with a college education... a. parameter b. population c. sample d. statistics

Answer C The group selected is a subset of all the people in the population and is called a sample.

A 'population' is defined as: A) A number or measurement collected as a result of observation. B) A measurable characteristic of a population. C) A subset of a sample. D) The complete set of individuals, objects, or measurements about which we wish to draw a conclusion. E) none of the above

Answer D

A population has a mean of 50 and a variance of 0 (zero). What is the logical conclusion to be drawn from this information? A) It is an error. B) The population consists of 50 units. C) There are no units in the population. D) All units in the population have the same value of 50. E) None of the above.

Answer D

Construct a histogram using the data set presented. Which shape of distribution best approximates this set of data? A) Normal B) Bi-Modal C) Left Skewed D) Right Skewed E) Uniform

Answer D

Of these measurements, which one is best described as an Indicator/Predictor? A) Net profit margin. B) Customer satisfaction rating. C) Warranty returns. D) Scrap rate of a key input process.

Answer D

Which of the following statements is FALSE? A) When comparing test scores across different locations, the standard deviation will tell you how diverse the test scores are for each location. B) A time-series plot is a graphical representation of the observations in a data set over time. C) The frequency of any particular value of a discrete variable is the number of times that value occurs in the data set. D) The sample median is highly sensitive to extreme values (outliers). E) When the data are categorical, a frequency distribution or relative frequency distribution provides an effective tabular summary of the data.

Answer D

Which of these measurements is taken from a continuous scale? A) The percentage of babies born during the year that are boys. B) The number of blades of grass in your yard. C) Ratings from a Zagat restaurant survey book. D) The air pressure of the left front tire of your vehicle. E) Percentage of election votes for a given candidate.

Answer D

Based on frequency of occurrence, answering the phone appears to be the number one priority defect. But what happens when cost is factored in? Based on the data presented, prioritize the top three defects by total cost (frequency x cost): A) Wrong Toppings, Phone Answering, Driver Not Professional B) Wrong Toppings, Phone Answering, Cold Pizza C) Late Delivery, Wrong Toppings, Cold Pizza D) Late Delivery, Phone Answering, Cold Pizza E) Phone Answering, Driver Not Professional, Late Delivery

Answer D Solution: Late: (40 x $62) = $2,480 #1 Phone: (87 x $25) = $2,175 #2 Cold: (35 x $62) = $2,170 #3

Iceland is one of the most developed countries in the world. A study was conducted to use a random survey in order to estimate the proportion of icelanders who have a college education. Please fill in the blank: By proper study of a subset of data, the results can be generalized to the ...... a. sample b. parameter c. random sample d. population e. statistic

Answer D The study would result in an estimate of the proportion of college educated people in the population.

A Lean Six Sigma team is working on a project in the order entry area to improve order accuracy and reduce the cycle time. The team wants to analyze the variability in cycle time. What tool should be used? A) Trend Chart B) Fishbone Diagram C) Benchmarking D) Metrics E) Histogram

Answer E

What information is conveyed by a histogram? A) The shape of a distribution of data B) The relative frequency of observations within a set of data C) The spread of a set of data D) A & C only E) A, B and C

Answer E

Effectiveness metrics

Are a direct measure of how well customer expectations are met:

Continuous Measurement

Are derived from a scale or continuum that is infinitely divisible. measurements of time, temperature, weight, height, voltage, etc. are some examples. In general, these measurements are characterized by the presence of measurement units such as seconds/minutes, inches, volts, miles/hour etc. Continuous measurements provide the greatest possible information content (within the boundaries of measurement device resolution), because values are represented directly, not classified into categories.

Discrete Measurement

Are representations of categories or attributes. Examples include good/bad, boy/girl, apple/orange/banana, where the categories have no logical ordering, or rating scales andindices where the categories have a logical order: customer satisfaction rated on a scale of 1 to 5, course grades from F to A. counts of items or objects that only come in whole units (people, cars,animals, cities, computer terminals, etc.) are also examples of discrete measurements. Thus, if you can find discrete spaces between measurements, then the data are from a discrete scale.

Two Flavors of Measurement Discrete & Continuous

Discrete & Continuous

Data Hierarchy: Level 3 Counts

Discrete Counts : Number of items or events Number of new car sales in a week Number of consumers answering a telephone survey Number of home loan applications granted per month Number of errors on a page Number of accidents at a factory in one year Number of imperfections in a yard of cloth

Discrete measurements: Attribute Data

Discrete Measurements are often termed as Attribute data, because they sort or count items based on attributes, such as the presence/absence of defects, quality perceptions, occurrence of an event, etc.

Data Hierarchy: Level 1 Nominal

Discrete Nominal Groups are levels, no order; Defective/Non Defective Sales Region: Northeast/Midwest/Southwest Profession: Accountant/Lawyer/Teacher Type of paint defect: Cracking/Dirt Particles/flaking/bleeding Color of car: Silver/Blue/Red Delivery: On Time/Late

This problem is based on the dataset used previously, from NewMed hospital. An improvement team at the hospital is investigating the variability in the time it takes for the surgical staff to pull out preference cards from the shelves to prepare the operating tray prior to each surgical procedure. Calculate the mean and median for the data set below, representing a random sample of 50 cycle times (in minutes). Click here to download the Excel® data spreadsheet. 16.5 11.7 14.3 11.4 12.2 12.0 12.3 13.7 13.6 21.8 11.9 16.9 20.0 9.6 13.2 10.4 15.1 15.3 18.5 11.2 9.6 25.5 10.1 14.7 14.5 19.9 10.4 15.6 12.7 15.2 27.4 17.3 14.7 16.2 16.8 10.5 14.2 11.3 18.4 18.0 11.8 12.2 11.5 13.0 21.6 9.2 13.2 23.3 13.2 12.9 A) mean = 25.36, median = 24.78 B) mean = 32.67, median = 22 C) mean = 14.73, median = 13.65 D) mean = 24.78, median = 22

Edit Answer C To save time from hand calculations, use any statistical software package, or copy and paste the data into Excel®, and use the "=AVERAGE()" and "=MEDIAN()" functions to obtain the answers.

Excel of Normal Distribution

For instance, the Excel® function =NORMSDIST() returns the cumulative probability or area to the left of the point, written as P(Z < z). So "=NORMSDIST(-1)" in Excel® gives the area to the left of -1 under the standard normal curve, which is P(Z < -1) = 0.1586. Thus, there is about a 16% chance that a randomly picked snack pack will weigh 1.1 oz. or less.

Variance

Formula S(2) = Sum of the squared deviations of the observations from the mean/Total Number of observations-1 Example: Let's calculate the variance for the sales revenue data: Observation y y2 1 113 113*113 = 12769 2 114 12996 3 125 15625 4 97 9409 5 123 15129 6 113 12769 Sum &#931;y = 685 &#931;y2 = 78697 Substituting the two sums into the shortcut formula:

Histograms can be used to answer the following questions:

How is the process output distributed? How variable is the output of this process? Is the process meeting customer requirements? What part of the process output falls outside specifications? How do the variances of two processes compare?

Weighted Pareto Charts

It is often useful to further quantify the priorities identified in a Pareto chart with a measure that relates to the cost of the error being observed. Common examples of this type of measure are cost ($) to rework or correct, cost ($) to replace and man-hours lost. A Pareto using a cost measure is called a weighted Pareto in that defect quantities are weighted by cost. Very often, errors with a lower frequency of occurrence will have a higher repair or replacement cost. The benefit of a weighted Pareto is that it ranks the errors by the total cost impact. An improvement project that seeks to remove the errors that have the highest cost impact will possibly realize a higher savings than a project that focuses on error frequency alone.

Mean

Mean (Average) of a distribution is the most common measure of central tendency. Y = Sum of all the observations/Total Number of observations

Efficiency Metrics

Measure the amount of input necessary to achieve a given output.

Truncated at Both Ends HIstogram

Might indicate a post-inspection process in which all items not meeting specifications (upper and lower) have been excluded from the dataset.

Data Hierarchy: Level 2 Ordinal

Ordinal Groups in a logical order: Compliance level: Not compliant/somewhat compliant/Fully compliant Education level: High School/Graduate/Post Graduate Product Ratings on a scale of 1 to 10 Agreement: 1=Strongly Disagree 2= Disagree 3=Neutral 4=Agree 5=Strongly Agree Evaluation: Poor/Fair/Excellent Print quality: Draft/Regular/Premium

Plaeteau-shaped Histogram

Uniform bin heights suggest: * distribution of equally likely events, such as in a dice toss. * an inadequate operational definition, measurement error ot lack of instrument resolution, yielding output that is equally spread throughout the range and not concentrated at the target

Inferential statistics

Use sample data to help make comparisons among, or draw inferences about the effects of different solutions or treatments on the overall population. When the entire population cannot be measured, a smaller sample of data is used to infer, or estimate, the characteristics of the wider population. Regression analysis, hypothesis tests and experimental design fall into this category. Some tools can be used to both describe and infer: control charts (highlighted below) observe the process over time to establish stability (describe) and signal the effect of a special cause as soon as the process displays instability (inference).

Pareto charts can be used to answer the following questions:

Which defects occur most frequently? What is the relative frequency or relative value of items or categories? How should improvement actions be prioritized?

Histograms

a simple visual display that conveys a lot of information in one compact graphic. Used in Measure and Analysis phase of DMIAC Process

Variance (shortcut formula)

S(2) = Sun of squared observations - (Sum of all observations)2/total number of observations/Total number of observations-1

Standard Deviations

Standard Deviation = square root of Variance

Data Hierarchy: Level 4 Continuous

Continuous Measurements are made along a continuum Cycle time for call resolution. Gas mileage of a vehicle Mercury concentration in a sample of tuna Electricity usage at a manufacturing plant Speed of a pitched baseball Shelf life of a drug

Continuous Measurement: Variable Data

Continuous measurements on the other hand, are often termed as Variable data, since they can take on infinite values within any two fixed points. In certain cases however, the line between discrete and continuous gets blurred - read more about this in the slide 'The Gray Area'.

Mode

Is the most frequently occurring value in a dataset,

Normal Distribution (Gaussion Distribution) .

It is one of the most important distributions in the application of statistical methods to business process improvement. This lesson will discuss the important features and uses of this distribution.

Efficiency Metrics

Measure the amount of input necessary to achieve a given output. Labor productivity is a familiar example, measuring the amount of labor hours or labor dollars per unit of output. Other efficiency metrics include working capital per unit, return on investment and average unit cost. To improve comparability, units are often standardized as "equivalent units" if the organization produces units of varying complexity. Efficiency measurements as a group can be very useful to gage how actions translate into cost savings and net financial impact; however, they do not speak of the customer's experience.

Effectiveness metrics

Metrics are a direct measure of how well customer expectations are met. Defects received by customers, on-time shipments, customer satisfaction, customer loyalty and repeat business are all measures of effectiveness. If effectiveness measurements turn south, improved efficiency will have little long term value because the top line - sales revenue - will be negatively impacted.

Discrete Measurements: Pros & Cons

Pros: * Discrete measurements are simple, fast, and relatively cheap to obtain. * They are often used to express subjective factors that are hard to measure directly on a numerical scale, like general appearance of a product, or experience with a service, e.g., restaurant meal. * Some factors can only be expressed in discrete units, particularly representations of events or opinions. For example, address is either one city or another, a vehicle either starts or doesn't, and a customer may like a product 'not at all, somewhat or very much'. * In one sense, the Sigma level is defined by a discrete measure - defects - although not all defects are truly discrete events, as we will soon discuss. Cons: * Attribute/Discrete data may be subject to greater error if subjective traits are categorized on a scale. Categories of defect type are clear cut, but categories of defect severity may be more judgmental, so that two persons might categorize the same item differently. * If you aren't producing defects, there is no measure of process variability at all. All output could be barely acceptable, or all output could be close to the target value, and the measurement won't distinguish between the two conditions. * Process variability is distorted by category distinctions (especially since those categories are often artificial constructs). When variability is hidden, reaction time to process shifts is delayed. * Many statistical techniques can only be applied to data from variable measurement systems, so using discrete measures can limit analysis.

Analytical Studies

Seek to answer questions like "why?" or "what are the causes of?" and aim to generalize the results to future states of the population. Analytical studies are conducted to understand the behavior of a process over time with the intention of identifying relationships between cause and effect which could impact future performance (predictive as well as historical perspective). For example, the problem of characterizing enrollment in a credit card service over time and predicting future enrollment given certain incentives is an analytical study. However, analytical studies do use descriptive tools to characterize the dataset before analyzing it.

Statistics: Sample

The subset chosen to represent the population. It would be too costly and time consuming to measure the cycle times of the entire population of claims, so the claims from two locations were selected in order to estimate the overall cycle time. Look at the selected claims, in blue, this is a sample but does not represent the population? What if the two locations picked happen tob e short-staffed, so that the cycle times are much longer than those of the other locations.

Create A Histogram

Step 1: Calculate the data range, i.e., the difference between the largest and smallest data points in the dataset. For the demo data: Range = 26 - 2 = 24. Step 2: Determine the number of bins. You can use one of these methods as a guideline: A) Use between 6 - 15 bins as a rule of thumb. For the demo data, we chose 7 bins. OR, B) Take the square root of the sample size and round up to the nearest whole number. Keep in mind that the objective is to clearly see the features of the data. Too many bins and the histogram will appear flat and spread out; too few and the histogram will be artificially tight. Either way you lose meaningful information, so try to find a balance between the two extremes. Step 3: Determine the bin width using the formula: Bin Width = Range / (Number of Bins - 1) The result may be rounded up to a convenient number. For the demo data: Bin Width = 24/6 = 4. Note: Steps 3 and 4 may be interchanged. If the bin width is fixed, then the Number of Bins is given by: (Range/Bin Width) + 1. Step 4: Develop a table showing the bin limits: start the first bin at a convenient point below the smallest data value and add the bin width to get the upper limit for the bin. Each successive bin then starts with the upper limit of the previous bin and ends with another increment of the bin width. Step 5: Create the histogram using a spreadsheet charting function. Enter an appropriate title for your graph and provide any details necessary for its correct interpretation.

Practical Problem

Step 1: State the problem in practical terms: "Billing errors cost us millions in lost business annually and must be reduced/eliminated.

Statistical Problem

Step 2: Express the practical problem as a statistical problem: "What factors are significant predictors of billing errors.

Statistical Solution

Step 3: Obtain a statistical solution using the appropriate data analysis tools: "Inconsistency inthe order filling process and high variability in staff skills contribute significantly to billing errors."

Practical Solution

Step 4: Convert the statistical solution into practical terms, which can be put into an actionable format: "Simplify the billing process and train employees in correct ord3r filling procedures."

Stratification

Stratification is a term derived from the Latin word "stratum", meaning "cover" or "layer". The term originated in a geology context to describe various layers of rock - a way of better understanding the whole by examining the constituent parts. If your objective is to understand the geology of the Grand Canyon, a general description of the rocks visible along the rim wouldn't be very useful. A lot more can be learned by examining the various layers of rock along the canyon wall - the details that determine the whole. So it is with data - stratification is an effective method to break down the whole into meaningful subsets.

Range

The Range is the simplest measure of dispersion of a set of data. It is defined as the difference between the largest value (maximum) and the smallest value (minimum) in the dataset.

Accurate

The closer the average is to the target the more accurate the process performance is said to be.

Precise

The narrower the spread, the more precise the process performance is said to be.


Set pelajaran terkait

AP Human Geography Summer Assignment

View Set

Lookup Functions and Formula Auditing

View Set

oceanography chapter 4 mastering questions

View Set

Year 11 Economics Demand - Luxury and Inferior Goods

View Set

BIOL 112 Exam 1 (Ch.22,23,24) Practice Test Questions

View Set

AP Psychology - Chapter 2 - I think

View Set

Medical Review: Part 2 Questions

View Set