For all those who are willing to pursue their career in Data Visualization and Analytics, ‘R’ is a fantastic platform to start learning with Data science. Just for a quick recap, ‘R’ is both a programming language and interactive environment for statistics.
‘R’ has a text-based script and command line. It’s a simple process where a user just types in command in R-language and hits ‘enter’ on the keyboard to execute the command.
In addition to the console, RStudio provides panels containing:
• A text editor, where R commands can be recorded for future reference.
• A history of commands that have been typed on the console.
• An “environment” pane with a list of variables, which contain values that R has been told to save from previous commands.
• A ﬁle manager.
• Help on the functions available in R.
• A panel to show plots (graphs)
For getting started with ‘R’, you need to download the ‘R’ programming language and then install the IDE to encode it, which is ‘R-Studio’.
You can download the R-Language from https://www.r-project.org/ by selecting any CRAN mirror of your choice. R-language is available for Microsoft Windows/Linux/MacOS platforms
Secondly, you have to download the IDE: ‘R-studio’ from https://www.rstudio.com/products/rstudio/download/
Download and open the R-Studio, the layout should look similar to the image shown below.
More Books on R programming language and R-Studio:
- “R for Data Science”2 by Garrett Grolemund and Hadley Wickham is a good modern introduction to R, and can be read online. This covers use of a collecition of packages called the Tidyverse3. The dplyr4 package is of particular importance.
- Hadley Wickham5 also has several excellent books covering speciﬁc topics online. See “The R Book” by Michael J. Crawley for general reference.
- “Modern Applied Statistics with S” by W.N. Venable and B.D. Ripley is a well respected reference covering R and its predecessor S.
- “Linear Models with R” and “Extending the Linear Model with R” by Julian J. Faraway cover linear models, with many practical examples. Linear models, and the linear model formula syntax ~, are core to much of what R has to oﬀer statistically. Many statistical techniques take linear models as their starting point, including limma for diﬀerential gene expression, glm for logistic regression (etc), survival analysis with coxph, and mixed models to characterize variation within populations.
Few more books : (Click on the titles for free downloads)
- r-introbart baesens-analytics in a big data world. the essential guide to data science and its applications-wiley (2014),
- r for data science- import, tidy, transform, visualize, and model data,
- RStudio’s collection of cheat sheets6 cover newer packages in R.
- An old-school cheat sheet7 for dinosaurs and people wishing to go deeper.
- Bioconductor cheat sheet8
- https://cran.r-project.org/doc/contrib/Short-refcard.pdf https://github.com/mikelove/bioc-refcard/blob/master/README.Rmd
• CRAN9 has hundreds of contributed packages which can be installed with install.packages.
• Bioconductor10 is another huge collection of packages with a biological focus.
Life outside R
Not all data analysis is done in R. The Software Carpentry workshops give a broader introduction to
computing in science.
• Software Carpentry11
Stackoverflow-style sites are great for getting help:
• support.bioconductor.org12 for bioconductor related questions.
• biostars.org13 for general bioinformatics questions.
• stats.stackexchange.com14 for statistics questions.
• stackoverflow.com15 for general programming questions.