Posted: March 21st, 2023

Please see attachments for the same instructions. The raw data is attached as we

Place your order now for a similar assignment and have exceptional work written by our team of experts, At affordable rates

For This or a Similar Paper Click To Order Now

Please see attachments for the same instructions. The raw data is attached as well.
Data Analysis Lab Instructions
You will be using Microsoft Excel to complete this assignment. We would recommend watching the following Excel tutorial videos before you begin:
#1 Getting Started; #1 Getting Started transcriipt
#2 Finding Stuff and Getting Around; #2 Finding Stuff and Getting Around transcriipt
#3 Data: Entering and Formatting; #3 Data: Entering and Formatting transcriipt
Background
Cholesterol is essential to normal function of our bodies, but too much cholesterol negatively affects health. Cholesterol circulates in the blood, where it accumulates on the walls of arteries and limits the flow of blood. If cholesterol blocks blood vessels severely, a person can suffer a heart attack and stroke. According to the U.S. Department of Health and Human Services, a concentration of cholesterol greater than 240 mg dL-1 would be considered too high.
You work for a company that plans to test a new drug, designed to lower the concentration of cholesterol circulating in blood. To recruit suitable patients for this test, your company plans to advertise the study, targeting people who are most likely to have high cholesterol. To design the advertising, your company needs to know which target audience is most likely to have high cholesterol. In general, advertising is targeted toward people of a particular age.
Fortunately, doctors routinely order tests that quantify the concentration of cholesterol circulating in the blood of their patients. Therefore, your company could cooperate with a nearby lab to obtain anonymous samples of cholesterol concentrations for people of various ages. The sample was divided into two groups: 1) people between the ages of 20 and 44 years, and 2) people between the ages of 45 and 64 years. You have been asked to analyze the data and conclude which group is more likely to have high cholesterol.
Step 1: Anticipate your analysis
To construct a sound argument, one must anticipate the evidence needed to support a claim. In this assignment, you can choose between two claims:
Potential Claim 1: People between the ages of 20 and 44 years are more likely to have high cholesterol than people between the ages of 45 and 64 years.
Potential Claim 2: People between the ages of 45 and 64 years are more likely to have high cholesterol than people between the ages of 20 and 44 years.
In each of the three figures below, the y-axis represents the median concentration of cholesterol in the blood, with higher values indicating more cholesterol. The x-axis compares two categories of age: 20-44 years and 45-64 years.
Select the figure that best illustrates what one should expect to observe if people between the ages of 20 and 44 years are more likely to have high cholesterol than people between the ages of 45 and 64 years.
Figure A
Figure B
Figure C
Step 2: Examine the frequency distribution of cholesterol for each category of people.
To determine whether people are more likely to have higher cholesterol at ages 20-30 years or at ages 45-64 years, we must first examine the frequency of high cholesterol among people in each age category. For any variable, such as cholesterol concentration, we can use a frequency distribution to visualize the number of observations in a range of values. (Excel tutorial #8 Frequency; #8 Frequency transcriipt)
To construct a frequency distribution, we must first generate a table of frequencies in each range of values. Let’s consider an example.
Imagine that you asked 20 students to tell you how many hours (h) they worked each day. Let’s call this variable the daily duration of work. Here is a list of the data:
[5, 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3].
1
Rather than look at this long list, we can conveniently express the set of values as a frequency table. Recall that a frequency equals the number of times a value occurs in a set of data. Table 1 summarizes the frequencies of values in the set of data listed above.
Table 1. Frequencies of work duration in a sample of 20 students.
Duration of Work (h)
Frequency
1
0
2
3
3
5
4
3
5
6
6
2
7
1
8
0
In some cases, the data span a wide range of values, requiring one to compute the frequency of values in a range rather than the frequency of each value. For example, below is a frequency table for the same data, using values in a range of 2. (Excel tutorial #8 Frequency; #8 Frequency transcriipt)
Table 1. Frequencies of work duration in a sample of 20 students.
Duration of Work (h)
Frequency
1-2
3
2-4
8
4-6
8
6-8
1
Note, the range of values is chosen to balance the need to compress the table with the need to resolve any patterns. Sometimes, one needs to use a larger range to compress the table or use a smaller range to resolve subtle patterns.
Once we have a frequency table, we can plot a frequency distribution—a bar plot in which the height of each bar represents the frequency of observations within a certain range. For example, the figure below shows a frequency distribution for the data in Table 1.
Figure 1. The frequency distribution of work duration for a sample of 20 students.
Directions: For questions 2 through 4, download the Excel file in Canvas titled “Cholesterol Data”, which contains the concentrations of cholesterol in people at the ages of 20-44 years (sample size = 100) and people at the ages of 45-64 years (sample size = 100). This file contains two columns of data, each of which represents the concentration of cholesterol for people in an age category. Use Excel for calculations, modeling, and graphing. Round all calculated values to the nearest tenth of a decimal place. For example, if you calculate the value as 3.8218, round to 3.8.
Complete the frequency table below, listing the concentration of cholesterol. This plot should follow the formatting guidelines listed below. Your plot should be formatted as an image file, PDF, or .xls/.xlsx. Click “Choose a file” and upload your frequency distribution. (Excel tutorial #8 Frequency; #8 Frequency transcriipt)
Table 1. The frequencies of values for cholesterol concentration of people in two age categories: 20-44 years and 45-64 years.
Cholesterol (mg dL-1)
Frequency among people at 20-44 years of age
Frequency among people at 45-64 years of age
120-140
140-160
160-180
180-200
200-220
220-240
240-260
260-280
280-300
300-320
320-340
Plot of a frequency distribution (also known as a histogram) of the concentration of cholesterol in people between 20 and 44 years of age. Your plot should follow the formatting guidelines listed below. Save your plot as an image file, PDF, or .xls/.xlsx. Click “Choose a file” and upload your frequency distribution.
Excel tutorials:
#8 Frequency; #8 Frequency transcriipt
#9 Saving Plots as Images; #9 Saving Plots as Images transcriipt
Formatting Instructions
Chart type: 2D Column
Quick layout: Layout 1
Y-axes title: “Frequency”; Font size = 16
Y-axis numbers: Font size = 14
X-axis title: “Cholesterol in People at Ages 20-44 Years”; Font size = 16
X-axis numbers: Font size = 14
Bins = Use the following 11 bins: 120-140, 140-160, 160-180, 180-200, 200-220, 220-240, 240-260, 260-280, 280-300, 300-320, 320-340
Plot of a frequency distribution (also known as a histogram) of the concentration of cholesterol in people between 45 and 64 years of age. Your plot should follow the formatting guidelines listed below. Save your plot as an image file, PDF, or .xls/.xlsx. Click “Choose a file” and upload your frequency distribution.
Excel tutorials:
#8 Frequency; #8 Frequency transcriipt
#9 Saving Plots as Images; #9 Saving Plots as Images transcriipt
Formatting Instructions
Chart type: 2D Column
Quick layout: Layout 1
Y-axes title: “Frequency”; Font size = 16
Y-axis numbers: Font size = 14
X-axis title: “Cholesterol in People at Ages 45-64 Years”; Font size = 16
X-axis numbers: Font size = 14
Bins = Use the following 11 bins: 120-140, 140-160, 160-180, 180-200, 200-220, 220-240, 240-260, 260-280, 280-300, 300-320, 320-340
Step 3: Compare the frequency distributions of cholesterol concentration between people of different ages.
Although a frequency distribution enables one to see major features of the data, we often need to quantify these features. For example, we may want to know the most commonly observed value of a variable. Or we may want to know the smallest or largest value observed. (Excel tutorial #6 Functions: Describing Data; #6 Functions: Describing Data transcriipt)
For example, consider the five observations listed in Column A of the spreadsheet below:

A
B
1
Variable 1

2
5
3
3
4
7
5
4
6
6
7
You could better understand the distribution of these data by examining such quantities as the minimum, maximum, mode, and median. Microsoft Excel has a built-in function designed to calculate each of these quantities.
Each Excel function requires a set of arguments and returns a value defined by the function. Let’s review four functions that you will use in this assignment.
A) Median —The MEDIAN requires a list of numbers and Excel returns the median, or central value in a list of values sorted from smallest to largest. The following syntax would be used to compute the median of Variable 1 in the spreadsheet above:
=median(B2:B6),
where the word “median” tells Excel which function to use. The argument in parentheses, B2:B6, tells Excel to use the list of values B2 through B6. For a small sample, we can also list the numbers directly as follows:
=median(5, 3, 7, 4, 6).
In either case, Excel will return a median value of 5, which is the central value in the sorted list— 3, 4, 5, 6, 7.
B) Minimum —The MIN function requires a list of numbers and returns the minimal value in the list. The following syntax would be used to compute the minimum of Variable 1 in the spreadsheet above:
=min(A2:A6)
C) Maximum —The MAX function requires a list of numbers and returns the maximum value in the list. The following syntax would be used to compute the maximum of Variable 1 in the spreadsheet above:
=max(A2:A6)
D) Mode —The MODE function requires a list of numbers and returns the most frequent value in the list. The following syntax would be used to compute the mode of Variable 1 in the spreadsheet above:
=mode(A2:A6)
Directions: To answer questions 5 through 12, download the Excel file in Canvas titled “Cholesterol Data”, which contains the concentrations of cholesterol in people at the ages of 20-44 years (sample size = 100) and people at the ages of 45-64 years (sample size = 100). This file contains two columns of data, each of which represents the concentration of cholesterol for people in an age category. Use Excel for calculations, modeling, and graphing. Round all calculated values to the nearest whole number. For example, if you calculate the value as 3.8218, round to 4.
Estimate the median of cholesterol concentration in people at 20-44 years of age. (Excel tutorial #6 Functions: Describing Data; #6 Functions: Describing Data transcriipt)
Median =
Estimate the minimum of cholesterol concentration in people at 20-44 years of age. (Excel tutorial #6 Functions: Describing Data; #6 Functions: Describing Data transcriipt)
Estimate the maximum of cholesterol concentration in people at 20-44 years of age. (Excel tutorial #6 Functions: Describing Data; #6 Functions: Describing Data transcriipt)
Estimate the mode of cholesterol concentration in people at 20-44 years of age. (Excel tutorial #6 Functions: Describing Data; #6 Functions: Describing Data transcriipt)
Mode =
Estimate the median of cholesterol concentration in people at 45-64 years of age. (Excel tutorial #6 Functions: Describing Data; #6 Functions: Describing Data transcriipt)
Median =
Estimate the minimum of cholesterol concentration in people at 45-64 years of age. (Excel tutorial #6 Functions: Describing Data; #6 Functions: Describing Data transcriipt)
Estimate the maximum of cholesterol concentration in people at 45-64 years of age. (Excel tutorial #6 Functions: Describing Data; #6 Functions: Describing Data transcriipt)
Estimate the mode of cholesterol concentration in people at 45-64 years of age. (Excel tutorial #6 Functions: Describing Data; #6 Functions: Describing Data transcriipt)

Mode =
Step 4: For each age category, calculate the probability that a person has high cholesterol.
Excel tutorials:
#4 Data: Basic Calculations; #4 Data: Basic Calculations transcriipt
#5 Data: Sorting; #5 Data: Sorting transcriipt
#8 Frequency; #8 Frequency transcriipt
Now that we know the frequency distribution of cholesterol for people in each age category, we can use the frequencies to infer the probabilities of observing certain values. Recall that the drug company wants to know the probability that a person in a certain age category has high cholesterol (> 240 mg dL-1). That way, they can target the advertising for their drug study.
The probability of an outcome equals the relative frequency of that outcome.
To calculate the relative frequency of high cholesterol, follow these steps:
1) Count the number of people with high cholesterol—a concentration greater than 240 mg dL-1. This value equals the frequency.
2) Determine the total number of people in the sample, regardless of their cholesterol concentration. This value equals the sample size.
3) Divide the number of people with high cholesterol (frequency) by the total number of people in the sample (sample size). Convert the resulting value to a percentage by multiplying by 100. This percentage equals the relative frequency.
For example, consider the body masses of 10 horses, shown in the table below.
Horse
Body Mass (kg)
1
387
2
390
3
390
4
391
5
402
6
402
7
405
8
411
9
412
10
416
Let’s calculate the probability that a horse will have a body mass less than 392 kg. First, we count the number of horses with a body mass less than 392 kg. To make this task easier, the data were sorted according to body mass (from least to greatest). Horses 1 through 4 have body masses less than 392 kilograms (kg); therefore, the frequency equals 4.
Next, we divide the frequency of horses with masses less than 392 kg by the total number of horses in the sample (sample size = 10):
Relative frequency = 4 /10 = 0.4.
This relative frequency of 0.4 can also be expressed as a percentage. Multiplying 0.4 by 100 yields a percentage of 40%:
0.40 ∙ 100 = 40%
Thus, the probability that a horse has a body mass less than 392 kg equals 40%.
Directions: To answer questions 13 and 14, download the Excel file in Canvas titled “Cholesterol Data”, which contains the concentrations of cholesterol in people of ages 20-44 years (sample size = 100) and people of ages 45-64 years (sample size = 100). This file contains two columns of data, each of which represents the concentration of cholesterol for people in an age category. Use Excel for calculations, modeling, and graphing. Round all calculated values to the nearest tenth of a decimal place. For example, if you calculate the value as 3.8218, round to 3.8.
Calculate the probability that a person in the age category of 20-44 years has high cholesterol.
Excel tutorials:
#4 Data: Basic Calculations; #4 Data: Basic Calculations transcriipt
#5 Data: Sorting; #5 Data: Sorting transcriipt
#8 Frequency; #8 Frequency transcriipt
Calculate the probability that a person in the age category of 45-64 years has high cholesterol.
Excel tutorials:
#4 Data: Basic Calculations; #4 Data: Basic Calculations transcriipt
#5 Data: Sorting; #5 Data: Sorting transcriipt
#8 Frequency; #8 Frequency transcriipt
Step 5: Conclude which category of people are more likely to have high cholesterol: people at ages 20-44 years or people at ages 45-64 years.
Given your analyses in Steps 1 through 4, you are ready to conclude whether the drug company should target their advertising toward people at ages 20-45 years or people at ages 45-64 years. Recall that the company wants to recruit people in the age category with the greater probability of having high cholesterol.
When making a claim about which group of people the drug company should target for their study of high cholesterol, be sure to provide your reasoning, highlighting the relevant evidence supporting your claim.
Select the claim that is better supported by the evidence.
The drug company should target advertising toward people at 20-44 years of age.
The drug company should target advertising toward people at 45-64 years of age.
Summarize the evidence that supports your claim, including how you determined whether the drug company should target people at 20-44 years of age or people 45-64 years of age, based on probability. Use quantitative evidence when possible.
Step 6: Consider how you would describe your data with a model.
Now that you have completed your analysis, let’s reflect on what you have accomplished. You started with data describing the cholesterol concentrations of people of different ages. You plotted a frequency distribution of the data for each age category of people. You also quantified the shape of each distribution by calculating a median, mode, minimum, and maximum. Finally, you use the frequencies of cholesterol concentration greater than 240 mg dL-1 to calculate the probability that a person in each age category has high cholesterol.
Although your analyses focused on people with high cholesterol, other scientists might want to know the probability that a person has low cholesterol. Seems like a waste of time and effort for other scientists to repeat your calculations every time a new problem arises. For this reason, scientists aim to effectively communicate their results to others.
How would you communicate your knowledge about cholesterol? You might tell someone the median, mode, minimum, and maximum of the data that you analyzed; those few numbers communicate a lot of information to someone who understands a frequency distribution. Still, one cannot calculate a probability from these values alone; one needs to know the frequencies of specific values to calculate a probability.
Fortunately, mathematics enable us to model a frequency distribution and quickly, accurately, and precisely share information about probabilities. Specifically, we will use a form of mathematics called a probability distribution.
A probability distribution is a model of a frequency distribution. Think of this model as a cartoon representing the shape of a frequency distribution, except that this cartoon also represents a mathematical function that can predict any probability.
Later in this course, you will learn how to use a probability distribution to describe a frequency distribution. For now, let’s review some common probability distributions.
A) Normal Probability Distribution —This distribution, often called a bell curve, has a single, central mode and a symmetrical tail on each side. The mode is the most probable value.
B) Bimodal Probability Distribution —This distribution has two modes, a lower mode and an upper mode. Both modes may be equally probable.
C) Skewed Probability Distribution —This distribution has a single mode that is shifted to either the left or the right, leaving a long tail on the opposite side. The mode is the most probable value.
D) Uniform Probability Distribution —This distribution is a flat line, such that all values are equally probable. This distribution has no unique mode.
The plots below (A-D) show examples of the four types of probability distributions.
Directions: Use the frequency distributions that you plotted in Step 2 to answer Questions 17 through 20.
Which of the probability functions best matches the shape of the frequency distribution of cholesterol concentration in people between the ages of 20 and 44 years?
A. normal distribution
B. bimodal distribution
C. skewed distribution
D. uniform distribution
Explain your answer to the previous question. Be sure to discuss features of the frequency distribution that caused you to choose a certain probability distribution.
Which of the probability functions best matches the shape of the frequency distribution of cholesterol concentration in people between the ages of 45 and 64 years?
A. normal distribution
B. bimodal distribution
C. skewed distribution
D. uniform distribution
Explain your answer to the previous question. Be sure to discuss features of the frequency distribution that caused you to choose a certain probability distribution.

For This or a Similar Paper Click To Order Now

Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00