Authors: Jay L. Devore
ISBN-13: 978-1305251809
See our solution for Question 38E from Chapter 12 from Devore's Probability and Statistics for Engineering and Science.
Given that, the data display the information of the values of variables ${\rm{N}}{{\rm{o}}_x}$ emission rate(y) measured in $\left( {\frac{{{\rm{MBtu}}}}{{{\rm{hr - f}}{{\rm{t}}^{\rm{2}}}}}} \right)$ and burner area liberation rate (x) measured in ppm. The variable burner area liberation rate is measured in (ppm).
a.
Linear regression model:
A linear regression model is $\hat y = {\hat \beta _0} + {\hat \beta _1}x$, where $\hat y$ is the predicted values of response variable and x is the predictor variable. The ${\hat \beta _0}$ denote the estimate of y-intercept of the line and ${\hat \beta _1}$ be the estimate of slope.
The y-intercept is computed as follows,
\[\begin{array}{c} {{\hat \beta }_0} = \bar y - {{\hat \beta }_1}\bar x\\ = \frac{{\sum\limits_i {{y_i}} - {{\hat \beta }_1}\sum\limits_i {{x_i}} }}{n} \end{array}\]The slope coefficient of linear regression is computed as follows,
\[\begin{array}{c} {{\hat \beta }_1} = \frac{{{S_{xy}}}}{{{S_{xx}}}}\\ = \frac{{\left[ {\sum\limits_i {{x_i}{y_i}} - \frac{{\left( {\sum\limits_i {{x_i}} } \right) \times \left( {\sum\limits_i {{y_i}} } \right)}}{n}} \right]}}{{\sum\limits_i {x_i^2} - \frac{{{{\left( {\sum\limits_i {{x_i}} } \right)}^2}}}{n}}} \end{array}\]The sample size is $n = 14$.
The below table is used for computation of slope and intercept:
| x | y | x^2 | y^2 | xy |
| 100 | 150 | 10000 | 22500 | 15000 |
| 125 | 140 | 15625 | 19600 | 17500 |
| 125 | 180 | 15625 | 32400 | 22500 |
| 150 | 210 | 22500 | 44100 | 31500 |
| 150 | 190 | 22500 | 36100 | 28500 |
| 200 | 320 | 40000 | 102400 | 64000 |
| 200 | 280 | 40000 | 78400 | 56000 |
| 250 | 400 | 62500 | 160000 | 100000 |
| 250 | 430 | 62500 | 184900 | 107500 |
| 300 | 440 | 90000 | 193600 | 132000 |
| 300 | 390 | 90000 | 152100 | 117000 |
| 350 | 600 | 122500 | 360000 | 210000 |
| 400 | 610 | 160000 | 372100 | 244000 |
| 400 | 670 | 160000 | 448900 | 268000 |
| Sumx= 3300 | Sumy=5010 | Sumx^2=913750 | Sumy^2=2207100 | Sumxy=1413500 |
Thus, the slope coefficient of linear regression is:
\[\begin{array}{c} {{\hat \beta }_1} = \frac{{\left[ {\sum\limits_i {{x_i}{y_i}} - \frac{{\left( {\sum\limits_i {{x_i}} } \right) \times \left( {\sum\limits_i {{y_i}} } \right)}}{n}} \right]}}{{\sum\limits_i {x_i^2} - \frac{{{{\left( {\sum\limits_i {{x_i}} } \right)}^2}}}{n}}}\\ = \frac{{\left[ {1413500 - \frac{{3300 \times 5010}}{{14}}} \right]}}{{913750 - \frac{{{{\left( {3300} \right)}^2}}}{{14}}}}\\ = \frac{{232571.4286}}{{135892.8571}}\\ = 1.7114 \end{array}\]Therefore, the point estimate of ${\hat \beta _1}$ is 1.7114.
Thus, the y-intercept is:
\[\begin{array}{c} {{\hat \beta }_0} = \frac{{\sum\limits_i {{y_i}} - {{\hat \beta }_1}\sum\limits_i {{x_i}} }}{n}\\ = \frac{{5010 - \left( {1.7114 \times 3300} \right)}}{{14}}\\ = \frac{{ - 637.62}}{{14}}\\ = - 45.5443 \end{array}\]Therefore, the y-intercept is $ - 45.5443$ .
Therefore, the regression line for the variable’s emission rate and burner area liberation rate is:
\[\begin{array}{c} \hat y = {{\hat \beta }_0} + {{\hat \beta }_1}x\\ = - 45.5443 + 1.7114x \end{array}\]Now, to test the hypothesis that there is a relationship between two rates.
The null and alternative hypothesis is defined as follows,
The null hypothesis is that there is no relationship between the variables liberation rate and ${\rm{N}}{{\rm{o}}_x}$ emission rate.
The alternative hypothesis is that there is useful relationship between the variables liberation rate and ${\rm{N}}{{\rm{o}}_x}$ emission rate.
That is,
\[\begin{array}{l} {H_0}:{\beta _1} = 0\\ {H_a}:{\beta _1} \ne 0 \end{array}\]Here, the hypothesis is two tailed.
The test statistics is,
\[t = \frac{{{{\hat \beta }_1} - {\beta _1}}}{{{s_{{{\hat \beta }_1}}}}} \sim {t_{\alpha ,n - 2}}\]Now, the estimate of error standard deviation of slope coefficient is computed as follows,
\[\begin{array}{c} {s_{{{\hat \beta }_1}}} = \frac{{{S_{yx}}}}{{\sqrt {{S_{xx}}} }}\\ = \frac{{\sqrt {\frac{{\sum\limits_i {y_i^2} - {{\hat \beta }_0}\sum\limits_i {{y_i}} - {{\hat \beta }_1}\sum\limits_i {{x_i}{y_i}} }}{n}} }}{{\sqrt {\sum\limits_i {x_i^2} - \frac{{{{\left( {\sum\limits_i {{x_i}} } \right)}^2}}}{n}} }}\\ = \frac{{\sqrt {\frac{{2207100 - \left( { - 45.5443 \times 5010} \right) - \left( {1.7114 \times 1413500} \right)}}{{14}}} }}{{\sqrt {913750 - \frac{{{{\left( {3300} \right)}^2}}}{{14}}} }}\\ = \frac{{\sqrt {1158.0745} }}{{\sqrt {135892.8571} }} \end{array}\] \[ = 0.0923\]Therefore, the estimate of error standard deviation of slope coefficient is ${s_{{{\hat \beta }_1}}} = 0.0923$ .
Thus, the test statistics is:
\[\begin{array}{c} t = \frac{{{{\hat \beta }_1} - {\beta _1}}}{{{s_{{{\hat \beta }_1}}}}}\\ = \frac{{1.7114 - 0}}{{0.0923}}\\ = 18.5417 \end{array}\]Therefore, the value of test statistics is 18.5417.
Now, the critical value of t is computed as follows:
The degree of freedom is,
\[\begin{array}{c} df = n - 2\\ = 14 - 2\\ = 12 \end{array}\]The level of significance is,
\[\begin{array}{c} \alpha = 1 - 0.99\\ = 0.01\\ \frac{\alpha }{2} = \frac{{0.01}}{2}\\ = 0.005 \end{array}\]Using t-table, the value of t at 0.005 level of significance and 12 degrees of freedom for two tailed is $ \pm 3.428$ .
Therefore, the critical value of t is $ \pm 3.428$.
Decision rule:
Reject the null hypothesis, if $\left| t \right| > {t_\alpha }$.
Fail to reject the null hypothesis, if $\left| t \right| \le {t_\alpha }$ .
Conclusion:
Here, $t = 18.5417 > 3.428$.
Thus, the decision is to reject the null hypothesis.
So, it can be concluded that reject the null hypothesis at 5% level of significance
Therefore, the is sufficient evidence to conclude that there is useful relationship between the variables liberation rate and ${\rm{N}}{{\rm{o}}_x}$ emission rate at 1% level of significance.
b.
The Confidence interval for the slope regression line is,
\[CI = {\hat \beta _1} \pm {t_{\frac{a}{2},n - 2}} \times {s_{{{\hat \beta }_1}}}\]Here, the confidence interval for the expected change in ${\rm{N}}{{\rm{o}}_x}$ emission rate associated with 10 $\left( {\frac{{{\rm{MBtu}}}}{{{\rm{hr - f}}{{\rm{t}}^{\rm{2}}}}}} \right)$ is computed as follows:
\[\begin{array}{c} CI = 10 \times \left( {{{\hat \beta }_1} \pm {t_{\frac{a}{2},n - 2}} \times {s_{{{\hat \beta }_1}}}} \right)\\ = 10 \times \left( {1.7114 \pm {t_{\frac{a}{2},n - 2}} \times 0.0923} \right) \end{array}\]Now, the critical value of t is computed as follows:
The sample size is $n = 17$.
The level of significance is,
\[\begin{array}{c} 1 - \alpha = 1 - 0.95\\ = 0.05\\ \frac{\alpha }{2} = \frac{{0.05}}{2}\\ = 0.025 \end{array}\]The degree of freedom is,
\[\begin{array}{c} df = n - 2\\ = 14 - 2\\ = 12 \end{array}\]Using the t-table, the critical value of t at 0.025 level of significance and 12 degree of freedom for right tailed is 2.560.
Thus,
\[\begin{array}{c} CI = 10 \times \left( {1.7114 \pm {t_{\frac{a}{2},n - 2}} \times 0.0923} \right)\\ = 10 \times \left( {1.7114 \pm 2.560 \times 0.0923} \right)\\ = 10 \times \left( {1.7114 \pm 0.2363} \right)\\ = \left( {10 \times \left( {1.7114 - 0.2363} \right),10 \times \left( {1.7114 + 0.2363} \right)} \right) \end{array}\] \[\begin{array}{c} = \left( {10 \times 1.4751,10 \times 1.9477} \right)\\ = \left( {14.751,19.477} \right) \end{array}\]Therefore, the 95% confidence interval for the expected change in ${\rm{N}}{{\rm{o}}_x}$ emission rate associated
with 10 $\left( {\frac{{{\rm{MBtu}}}}{{{\rm{hr - f}}{{\rm{t}}^{\rm{2}}}}}} \right)$ is lies between 14.751 to 19.477.