Chapter 1. Scatterplots and Regression
- Regression is study of dependence.
- How Y changes on the average as the value of X is varied.
- Linear regression is important instance of regression methodology and is most commonly used.
- Virtually all other regression methods build upon an understanding of how linear regression works.
- The goal of regression is to understand how the values of Y change as X is varied over its range of possible values.
- One important function of the scatterplot is to decide if we might reasonably assume that the response on the vertical axis is independent of the predictor on the horizontal axis.
- The extreme values on the left and right of the horizontal axis are points that are likely to be important in fitting regression models and are called leverage points.
- The separated points on the vertical axis are potentially outlier.
- Outliers are more easily discovered in residual plots.
- Residual plot gain resolution in the plot.
- Residual plot is obtained by removing the expected linear/nonlinear trend in the data.
- On Inheritance of Height data (mheight and dheight) n = 1375
- relation looks to be reasonably linear.
- On Forbes data (atmospheric pressure and boiling point of water) n=17
- relation looks to be linear initially, while residual were not random.
- there is small systematic deviation of experimental values from fitted OLS straight line.
- based on physical theory, log(pres) is expected to be linearly related to bp.
- log10(pres) vs bp is observed to be reasonably linear.
- choice of base has no material effect on the appearance of the graph or on fitted regression models, but interpretation of parameters can depend on the choice of base.
- transformation of variables is a key to extend usefulness of linear regression models.
- On Length at age of smallmouth bass (length at capture in mm vs age at capture) n = 439
- only fish of age 8 or less considered for plot
- angular rings on scales is used to determine the age of fish
- data are cross-sectional, meaning that all observations were taken at the same time.
- in longitudinal study, the same fish would be measured each year, possibly requiring many years of taking measurements.
- relation is not expected to be linear.
- On predicting the weather (early winter snowfall vs late winter snowfall)
- interest in regression problem will be in testing the hypothesis that the two variables are uncorrelated (fitting mean line) vs they are not uncorrelated (fitting OLS).
- On Turkey Growth (weight gain vs dose)
- straight line does not seem to be a reasonable representation of the average dependence of the response on the predictor.
- OLS estimated straight line is line for mean function, in general (linear mean function).
- Non-linear mean function might be more appropriate for growth models.
- We may have parametric model for the mean function and will use data to estimate the parameters.
- The variance function also characterizes the graph, and in many problems we will assume at least at first that the variance function is constant.
- The null plot has a horizontal straight line as its mean function, constant variance function, and no separated points.
- Smoothers for the mean function, we can estimate E(Y|X=x) using a simple nonparametric smoother obtained by averaging the repeated observations at each value of X.
- Smoothers can also be defined when we do not have repeated observations at values of the predictor by averaging the observed data for all values of X close to, but not necessarily equal to x.
- The marginal relationships between the response and each of the variables are not sufficient to understand the joint relationship between the response and the more than one predictor at a time.
- The interrelationships between the predictors are also important.
Comments
Post a Comment