This introduction was developed by Neil McRoberts and Paul Esker, and presented at APS 2015 and ISPP 2018 Workshops.
Basic introduction to R.
Some online resouces:
http://www.introductoryr.co.uk/R_Resources_for_Beginners.html
http://www.statmethods.net/index.html
http://www.ats.ucla.edu/stat/r/
http://www.r-tutor.com/r-introduction
####R as a calculator
R has basics operations to add, subtract, mulitply and divide. Also, R defines certain caluclations, such as pi, based on alpha-numeric nomenclature.
7+2
## [1] 9
5/2
## [1] 2.5
6*9
## [1] 54
1000-89
## [1] 911
12/pi
## [1] 3.
Background
Correlation analysis is helpful to identify associations between different variables (measurements). For databases with combinations of qualitative and quantitative data, we use this as a preliminary step to understand the likely relationships, or potential explanatory value of different measurements. We will apply some examples here based on tidyverse to estimate the correlation coefficients based on different methods. We will also visualize the associations graphically. Two primary packages we need for this example are Hmisc y de corrplot.
Background
This example is focued on modeling via linear regression. We will illustrate the concepts using an example, with particular focus on the assumptions and the tools that exist in R to explore the model fit.
Our goal is to related a “dependent variable” with an “independent variable” the explains something about the process.
Our simple example is that we might relate plant height with an index of crop growth (leaf area index).
Background
When building a model, there are different methods we can take to construct it, ranging from manual to automated. There are strengths and weaknesses in using the different methods, but they provide a good background for those interested in taking their models to a higher level (machine level, etc.), since in those situations we are often interested to look for interactions that cannot easily be found with basic approaches.
Background
Given the background and tools presented in linear regression, we will not extend the modeling approach to include additional variables, as well as relationships that are more complicated. This exercise provides the jumping off point for more automated modeling approaches, which will we see in the subsequent example(s).
Our assumption in this exercise is that multiple factors have explanatory value to explain the response variable of interest.
What does a model of this type look like?
Background
In many studies, for example if one looks the relationship between nitrogen and yield for many cereal crops, the relationship is not linear, rather there is often a plateau where after a specific amount, the response decreases. A simpler linear-type model will explain some of the variability, but not very well. In these situations we can consider a polynomial form to the model.
We can define this relationship in general terms as the the relation betweeen the independent variable, \(x\), and the expected response, \(E(y|x)\).
Background
Many times, we are interested in estimating the relationship between different variables that has a general form described as follows:
\[f(x) = E[Y|X=x]\]
Where we do not have a specific function type defined (i.e., specific model):
\[Y = f(X) + e\]
As such, we would like to describe the data using the most appropriate model and estimate the parameters. In this introductory exercise, we will use nonparametric methods to do such a task and focus on three possible methods:
Background
Nonlinear regression is an important modeling tool for looking at more compliated biological, physiological, etc., relationships. This introductory exercise describes some of the concepts that one should consider when analyzing nonlinear data. The process is iterative for modeling fitting, meaning that the parameters are estimated in a stepwise fashion. In Plant Pathology this is a useful tool for things like disease development over time. These models can be further extended to incorporated additional factors like treatments, years, among other things, to study the overall behavior and observed variability.
Background
This example was developed by Julie Baniszewski, a PhD student in the Department of Entomology at Penn State. Julie participated in our last workshop in Mexico under the INTAD-Tag Along program (International Agriculture and Development graduate program). This is a robust example of using dose-response methods in R based on generalized linear modeling concepts (including mixed model).
Data set:
Mosquito toxicity was tested with 4 instars and monitored until pupation.
Here are some useful online references that related to topics discussed during the workshop.
Modeling
https://www3.nd.edu/~steve/Rcourse/Lecture7v1.pdf
https://tutorials.iq.harvard.edu/R/Rstatistics/Rstatistics.html
http://www.statslab.cam.ac.uk/~pat/redwsheets.pdf
tidyverse
https://www.tidyverse.org/
http://r4ds.had.co.nz/
https://jrnold.github.io/r4ds-exercise-solutions/
Correlation and Regression
https://www.statmethods.net/stats/regression.html
http://rcompanion.org/handbook/F_12.html
Multiple regression
http://www.statmethods.net/stats/regression.html
http://ww2.coastal.edu/kingw/statistics/R-tutorials/multregr.html
http://www.r-bloggers.com/r-tutorial-series-multiple-linear-regression/
https://newonlinecourses.science.psu.edu/stat501/node/283/
Nonlinear regression
https://www.jstatsoft.org/article/view/v066i05/v66i05.pdf
http://www.apsnet.org/edcenter/advanced/topics/EcologyAndEpidemiologyInR/DiseaseProgress/Pages/NonlinearRegression.aspx
Books and other articles
Nonlinear Regression with R: https://www.