R

Introduction to R

This introduction was developed by Neil McRoberts and Paul Esker, and presented at APS 2015 and ISPP 2018 Workshops. Basic introduction to R. Some online resouces: http://www.introductoryr.co.uk/R_Resources_for_Beginners.html http://www.statmethods.net/index.html http://www.ats.ucla.edu/stat/r/ http://www.r-tutor.com/r-introduction ####R as a calculator R has basics operations to add, subtract, mulitply and divide. Also, R defines certain caluclations, such as pi, based on alpha-numeric nomenclature. 7+2 ## [1] 9 5/2 ## [1] 2.5 6*9 ## [1] 54 1000-89 ## [1] 911 12/pi ## [1] 3.

Correlation

Background Correlation analysis is helpful to identify associations between different variables (measurements). For databases with combinations of qualitative and quantitative data, we use this as a preliminary step to understand the likely relationships, or potential explanatory value of different measurements. We will apply some examples here based on tidyverse to estimate the correlation coefficients based on different methods. We will also visualize the associations graphically. Two primary packages we need for this example are Hmisc y de corrplot.

Linar regression

Background This example is focued on modeling via linear regression. We will illustrate the concepts using an example, with particular focus on the assumptions and the tools that exist in R to explore the model fit. Our goal is to related a “dependent variable” with an “independent variable” the explains something about the process. Our simple example is that we might relate plant height with an index of crop growth (leaf area index).

Modeling methods for regression

Background When building a model, there are different methods we can take to construct it, ranging from manual to automated. There are strengths and weaknesses in using the different methods, but they provide a good background for those interested in taking their models to a higher level (machine level, etc.), since in those situations we are often interested to look for interactions that cannot easily be found with basic approaches.

Multiple regression

Background Given the background and tools presented in linear regression, we will not extend the modeling approach to include additional variables, as well as relationships that are more complicated. This exercise provides the jumping off point for more automated modeling approaches, which will we see in the subsequent example(s). Our assumption in this exercise is that multiple factors have explanatory value to explain the response variable of interest. What does a model of this type look like?

Polynomial regression

Background In many studies, for example if one looks the relationship between nitrogen and yield for many cereal crops, the relationship is not linear, rather there is often a plateau where after a specific amount, the response decreases. A simpler linear-type model will explain some of the variability, but not very well. In these situations we can consider a polynomial form to the model. We can define this relationship in general terms as the the relation betweeen the independent variable, \(x\), and the expected response, \(E(y|x)\).

Nonparametric regression

Background Many times, we are interested in estimating the relationship between different variables that has a general form described as follows: \[f(x) = E[Y|X=x]\] Where we do not have a specific function type defined (i.e., specific model): \[Y = f(X) + e\] As such, we would like to describe the data using the most appropriate model and estimate the parameters. In this introductory exercise, we will use nonparametric methods to do such a task and focus on three possible methods:

Nonlinear regression

Background Nonlinear regression is an important modeling tool for looking at more compliated biological, physiological, etc., relationships. This introductory exercise describes some of the concepts that one should consider when analyzing nonlinear data. The process is iterative for modeling fitting, meaning that the parameters are estimated in a stepwise fashion. In Plant Pathology this is a useful tool for things like disease development over time. These models can be further extended to incorporated additional factors like treatments, years, among other things, to study the overall behavior and observed variability.

Mosquito Dose-Response

Background This example was developed by Julie Baniszewski, a PhD student in the Department of Entomology at Penn State. Julie participated in our last workshop in Mexico under the INTAD-Tag Along program (International Agriculture and Development graduate program). This is a robust example of using dose-response methods in R based on generalized linear modeling concepts (including mixed model). Data set: Mosquito toxicity was tested with 4 instars and monitored until pupation.

Resources and references

Here are some useful online references that related to topics discussed during the workshop. Modeling https://www3.nd.edu/~steve/Rcourse/Lecture7v1.pdf https://tutorials.iq.harvard.edu/R/Rstatistics/Rstatistics.html http://www.statslab.cam.ac.uk/~pat/redwsheets.pdf tidyverse https://www.tidyverse.org/ http://r4ds.had.co.nz/ https://jrnold.github.io/r4ds-exercise-solutions/ Correlation and Regression https://www.statmethods.net/stats/regression.html http://rcompanion.org/handbook/F_12.html Multiple regression http://www.statmethods.net/stats/regression.html http://ww2.coastal.edu/kingw/statistics/R-tutorials/multregr.html http://www.r-bloggers.com/r-tutorial-series-multiple-linear-regression/ https://newonlinecourses.science.psu.edu/stat501/node/283/ Nonlinear regression https://www.jstatsoft.org/article/view/v066i05/v66i05.pdf http://www.apsnet.org/edcenter/advanced/topics/EcologyAndEpidemiologyInR/DiseaseProgress/Pages/NonlinearRegression.aspx Books and other articles Nonlinear Regression with R: https://www.