Review

So far we have covered the expressions, data types, vectors, vector operations and subset selections, and finally data sets, and their import/export.

In addition to R notebooks shared for this course you can also use the following online courses and tutorials for reviewing these topics:

Example data sets

Following are a short selection from UCI datasets you may use to exercise with the tools covered further below.

Exploring data

Single variable: checks

Check several statistics of a variable:

data(ChickWeight)
summary(ChickWeight$weight)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   35.0    63.0   103.0   121.8   163.8   373.0 

Check normality:

shapiro.test(ChickWeight$weight)

    Shapiro-Wilk normality test

data:  ChickWeight$weight
W = 0.90866, p-value < 2.2e-16
##NOTE THE DIFFERENCE WİTH THE BELOW
shapiro.test(ChickWeight$weight[ChickWeight$Time==10&ChickWeight$Diet==1])

    Shapiro-Wilk normality test

data:  ChickWeight$weight[ChickWeight$Time == 10 & ChickWeight$Diet ==     1]
W = 0.98122, p-value = 0.9555

Sumamrize all variables in a data set one by one:

summary(ChickWeight)
     weight           Time           Chick     Diet   
 Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
 1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
 Median :103.0   Median :10.00   20     : 12   3:120  
 Mean   :121.8   Mean   :10.72   10     : 12   4:118  
 3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
 Max.   :373.0   Max.   :21.00   19     : 12          
                                 (Other):506          

Single variable visualization

Histogram or stem plot:

hist(ChickWeight$weight)
stem(ChickWeight$weight)

Exercise: Increase the number of bars in the histogram below, by using the parameters of hist function as described in its help page.

Exercise: Plot a histogram of chicken which are on diet 2 and during Time period 5

Box-plot shows quartiles and extreme values:

boxplot(ChickWeight$weight)

Two variable: summary and visualization

You may need to check correlation of variables:

data(Titanic)
td <- as.data.frame(Titanic)
summary(td)
  Class       Sex        Age     Survived      Freq       
 1st :8   Male  :16   Child:16   No :16   Min.   :  0.00  
 2nd :8   Female:16   Adult:16   Yes:16   1st Qu.:  0.75  
 3rd :8                                   Median : 13.50  
 Crew:8                                   Mean   : 68.78  
                                          3rd Qu.: 77.00  
                                          Max.   :670.00  
td
NA

Visualizations depend on the data type of the variables. For numerical variables:

plot (ChickWeight$weight ~ ChickWeight$Time)

NOTE ON FORMULAS You weill encounter formulas such as “weight ~ time” in R very often. This syntax is meant to represent what we mean by saying “weight as a function of time” in statistics. We will see more elaborate formulas in regression and advanced exploratory statistics.

For categorical variables two dimensional boxplot is very useful:

boxplot(ChickWeight$weight ~ ChickWeight$Diet)

The complexity of The old style graphics with R

R as a very established system of graphial visualization. You have already used commands like ‘plot’ and ‘hist’.

hist(mtcars$mpg)
plot(mtcars$cyl,mtcars$mpg)

Exercise: Download İstanbul Stock Exchange data, and plot the ISE series.

The traditional commands are indeed powerful and flexible, but they come with the price of complexity. For example things become very complicated if you want to use color coding in your visualizations. See the following example (adapted from (http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html#org7628198) ):

plot(weight ~ Time, data=subset(ChickWeight,Diet=="1"))
points(weight ~ Time, data=subset(ChickWeight,Diet=="4"),pch="x",col="red")
legend("topleft",c("Diet 1","Diet 4"), title="Diet",col=c("black","red"),pch=c("o","x"))

Exercise: Plot ISE and one of the other stock exchanges in the same graphics. Also try saving the graphics into a file.

Home Exercises

  1. Summarize variables in ToothGrowth dataset
  2. Plot len versus dose in ToothGrowth dataset
  3. Repeat exercise two for different supplements (VC or OJ) in the supp variable
  4. Place a legend on your graph for the previous exercise
LS0tCnRpdGxlOiAiQkEgNDY0IC0gV2VlayAzQS1JbnRyb2R1Y3Rpb24gdG8gRXhwbG9yYXRvcnkgU3RhdGlzdGljcyIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6IGRlZmF1bHQKICBodG1sX25vdGVib29rOiBkZWZhdWx0Ci0tLQoKIyMgUmV2aWV3CgpTbyBmYXIgd2UgaGF2ZSBjb3ZlcmVkIHRoZSBleHByZXNzaW9ucywgZGF0YSB0eXBlcywgdmVjdG9ycywgdmVjdG9yIG9wZXJhdGlvbnMgYW5kIHN1YnNldCBzZWxlY3Rpb25zLCBhbmQgZmluYWxseSBkYXRhIHNldHMsIGFuZCB0aGVpciBpbXBvcnQvZXhwb3J0LgoKSW4gYWRkaXRpb24gdG8gUiBub3RlYm9va3Mgc2hhcmVkIGZvciB0aGlzIGNvdXJzZSB5b3UgY2FuIGFsc28gdXNlIHRoZSBmb2xsb3dpbmcgb25saW5lIGNvdXJzZXMgYW5kIHR1dG9yaWFscyBmb3IgcmV2aWV3aW5nIHRoZXNlIHRvcGljczoKCiogKGh0dHBzOi8vY2FtcHVzLmRhdGFjYW1wLmNvbS9jb3Vyc2VzL2ZyZWUtaW50cm9kdWN0aW9uLXRvLXIpCiogKGh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL2RvYy9tYW51YWxzL1ItaW50cm8uaHRtbCkgQ2hhcHRlcnMgMSBhbmQgMgoqIChodHRwOi8vd3d3LnItdHV0b3IuY29tL3ItaW50cm9kdWN0aW9uKQoKIyMgRXhhbXBsZSBkYXRhIHNldHMKCkZvbGxvd2luZyBhcmUgYSBzaG9ydCBzZWxlY3Rpb24gZnJvbSBVQ0kgZGF0YXNldHMgeW91IG1heSB1c2UgdG8gZXhlcmNpc2Ugd2l0aCB0aGUgdG9vbHMgY292ZXJlZCBmdXJ0aGVyIGJlbG93LiAKCiogKGh0dHBzOi8vYXJjaGl2ZS5pY3MudWNpLmVkdS9tbC9kYXRhc2V0cy9CYW5rK01hcmtldGluZykgQ2hvb3NlIHRoZSBiYW5rX2Z1bGwuY3N2IGZpbGUgaW5zaWRlIHRoZSBiYW5rLnppcCBhcmNoaXZlLiBOb3RlIHRoYXQgdGhlIHNlcGFyYXRvciBpcyBzZW1pY29sb24gKDspIG5vdCBjb21tYSgsKSEKKiAoaHR0cHM6Ly9hcmNoaXZlLmljcy51Y2kuZWR1L21sL2RhdGFzZXRzL1Jlc3RhdXJhbnQrJTI2K2NvbnN1bWVyK2RhdGEpIGluc2lkZSB0aGUgUkNkYXRhLnppcCBhcmNoaXZlICx5b3UgY2FuIHVzZSB1c2VycHJvZmlsZS5jc3YgZmlsZS4KKiAoaHR0cHM6Ly9hcmNoaXZlLmljcy51Y2kuZWR1L21sL2RhdGFzZXRzL0lTVEFOQlVMK1NUT0NLK0VYQ0hBTkdFKSBTYXZlIGRhdGFfYWtiaWxnaWMueGxzeCBmaWxlIGFuZCBpbXBvcnQgaW50byBSIHN0dWRpbyBhcyAnJ2V4Y2VsIiBmaWxlLgoqIChodHRwczovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvZGF0YXNldHMvQmlrZStTaGFyaW5nK0RhdGFzZXQpIENob29zZSBob3VybHkuY3N2IGZyb20gdGhlIHppcCBmaWxlLgoKIyBFeHBsb3JpbmcgZGF0YQoKIyMgU2luZ2xlIHZhcmlhYmxlOiBjaGVja3MKCkNoZWNrIHNldmVyYWwgc3RhdGlzdGljcyBvZiBhIHZhcmlhYmxlOgpgYGB7cn0KZGF0YShDaGlja1dlaWdodCkKc3VtbWFyeShDaGlja1dlaWdodCR3ZWlnaHQpCmBgYAoKQ2hlY2sgbm9ybWFsaXR5OgpgYGB7cn0Kc2hhcGlyby50ZXN0KENoaWNrV2VpZ2h0JHdlaWdodCkKIyNOT1RFIFRIRSBESUZGRVJFTkNFIFfEsFRIIFRIRSBCRUxPVwpzaGFwaXJvLnRlc3QoQ2hpY2tXZWlnaHQkd2VpZ2h0W0NoaWNrV2VpZ2h0JFRpbWU9PTEwJkNoaWNrV2VpZ2h0JERpZXQ9PTFdKQpgYGAKClN1bWFtcml6ZSBhbGwgdmFyaWFibGVzIGluIGEgZGF0YSBzZXQgb25lIGJ5IG9uZToKYGBge3J9CnN1bW1hcnkoQ2hpY2tXZWlnaHQpCmBgYAoKCiMjIFNpbmdsZSB2YXJpYWJsZSB2aXN1YWxpemF0aW9uCgpIaXN0b2dyYW0gb3Igc3RlbSBwbG90OgpgYGB7cn0KaGlzdChDaGlja1dlaWdodCR3ZWlnaHQpCnN0ZW0oQ2hpY2tXZWlnaHQkd2VpZ2h0KQpgYGAKCgoqKkV4ZXJjaXNlKio6IEluY3JlYXNlIHRoZSBudW1iZXIgb2YgYmFycyBpbiB0aGUgaGlzdG9ncmFtIGJlbG93LCBieSB1c2luZyB0aGUgcGFyYW1ldGVycyBvZiBoaXN0IGZ1bmN0aW9uIGFzIGRlc2NyaWJlZCBpbiBpdHMgaGVscCBwYWdlLgoKKipFeGVyY2lzZSoqOiBQbG90IGEgaGlzdG9ncmFtIG9mIGNoaWNrZW4gd2hpY2ggYXJlIG9uIGRpZXQgMiBhbmQgZHVyaW5nIFRpbWUgcGVyaW9kIDUKCkJveC1wbG90IHNob3dzIHF1YXJ0aWxlcyBhbmQgZXh0cmVtZSB2YWx1ZXM6CmBgYHtyfQpib3hwbG90KENoaWNrV2VpZ2h0JHdlaWdodCkKYGBgCgojIyBUd28gdmFyaWFibGU6IHN1bW1hcnkgYW5kIHZpc3VhbGl6YXRpb24KCllvdSBtYXkgbmVlZCB0byBjaGVjayBjb3JyZWxhdGlvbiBvZiB2YXJpYWJsZXM6CmBgYHtyfQpkYXRhKFRvb3RoR3Jvd3RoKQpoZWxwKFRvb3RoR3Jvd3RoKQpjb3IoVG9vdGhHcm93dGgkbGVuLFRvb3RoR3Jvd3RoJGRvc2UpCmBgYAoKClZpc3VhbGl6YXRpb25zIGRlcGVuZCBvbiB0aGUgZGF0YSB0eXBlIG9mIHRoZSB2YXJpYWJsZXMuIEZvciBudW1lcmljYWwgdmFyaWFibGVzOgpgYGB7cn0KcGxvdCAoQ2hpY2tXZWlnaHQkd2VpZ2h0IH4gQ2hpY2tXZWlnaHQkVGltZSkKYGBgCgoqKk5PVEUgT04gRk9STVVMQVMqKiBZb3Ugd2VpbGwgZW5jb3VudGVyIGZvcm11bGFzIHN1Y2ggYXMgIndlaWdodCB+IHRpbWUiIGluIFIgdmVyeSBvZnRlbi4gVGhpcyBzeW50YXggaXMgbWVhbnQgdG8gcmVwcmVzZW50IHdoYXQgd2UgbWVhbiBieSBzYXlpbmcgIndlaWdodCBhcyBhIGZ1bmN0aW9uIG9mIHRpbWUiIGluIHN0YXRpc3RpY3MuIFdlIHdpbGwgc2VlIG1vcmUgZWxhYm9yYXRlIGZvcm11bGFzIGluIHJlZ3Jlc3Npb24gYW5kIGFkdmFuY2VkIGV4cGxvcmF0b3J5IHN0YXRpc3RpY3MuCgpGb3IgY2F0ZWdvcmljYWwgdmFyaWFibGVzIHR3byBkaW1lbnNpb25hbCBib3hwbG90IGlzIHZlcnkgdXNlZnVsOgpgYGB7cn0KYm94cGxvdChDaGlja1dlaWdodCR3ZWlnaHQgfiBDaGlja1dlaWdodCREaWV0KQpgYGAKCgojIFRoZSBjb21wbGV4aXR5IG9mIFRoZSBvbGQgc3R5bGUgZ3JhcGhpY3Mgd2l0aCBSCgpSIGFzIGEgdmVyeSBlc3RhYmxpc2hlZCBzeXN0ZW0gb2YgZ3JhcGhpYWwgdmlzdWFsaXphdGlvbi4gWW91IGhhdmUgYWxyZWFkeSB1c2VkIGNvbW1hbmRzIGxpa2UgJ3Bsb3QnIGFuZCAnaGlzdCcuCmBgYHtyfQpoaXN0KG10Y2FycyRtcGcpCnBsb3QobXRjYXJzJGN5bCxtdGNhcnMkbXBnKQpgYGAKCioqRXhlcmNpc2UqKjogRG93bmxvYWQgxLBzdGFuYnVsIFN0b2NrIEV4Y2hhbmdlIGRhdGEsIGFuZCBwbG90IHRoZSBJU0Ugc2VyaWVzLgoKVGhlIHRyYWRpdGlvbmFsIGNvbW1hbmRzIGFyZSBpbmRlZWQgcG93ZXJmdWwgYW5kIGZsZXhpYmxlLCBidXQgdGhleSBjb21lIHdpdGggdGhlIHByaWNlIG9mIGNvbXBsZXhpdHkuIEZvciBleGFtcGxlIHRoaW5ncyBiZWNvbWUgdmVyeSBjb21wbGljYXRlZCBpZiB5b3Ugd2FudCB0byB1c2UgY29sb3IgY29kaW5nIGluIHlvdXIgdmlzdWFsaXphdGlvbnMuIFNlZSB0aGUgZm9sbG93aW5nIGV4YW1wbGUgKGFkYXB0ZWQgZnJvbSAoaHR0cDovL3R1dG9yaWFscy5pcS5oYXJ2YXJkLmVkdS9SL1JncmFwaGljcy9SZ3JhcGhpY3MuaHRtbCNvcmc3NjI4MTk4KSApOgoKYGBge3J9CnBsb3Qod2VpZ2h0IH4gVGltZSwgZGF0YT1zdWJzZXQoQ2hpY2tXZWlnaHQsRGlldD09IjEiKSkKcG9pbnRzKHdlaWdodCB+IFRpbWUsIGRhdGE9c3Vic2V0KENoaWNrV2VpZ2h0LERpZXQ9PSI0IikscGNoPSJ4Iixjb2w9InJlZCIpCmxlZ2VuZCgidG9wbGVmdCIsYygiRGlldCAxIiwiRGlldCA0IiksIHRpdGxlPSJEaWV0Iixjb2w9YygiYmxhY2siLCJyZWQiKSxwY2g9YygibyIsIngiKSkKYGBgCgoqKkV4ZXJjaXNlKio6IFBsb3QgSVNFIGFuZCBvbmUgb2YgdGhlIG90aGVyIHN0b2NrIGV4Y2hhbmdlcyBpbiB0aGUgc2FtZSBncmFwaGljcy4gQWxzbyB0cnkgc2F2aW5nIHRoZSBncmFwaGljcyBpbnRvIGEgZmlsZS4KCiMjIEhvbWUgRXhlcmNpc2VzCgoxLiBTdW1tYXJpemUgdmFyaWFibGVzIGluIFRvb3RoR3Jvd3RoIGRhdGFzZXQKMi4gUGxvdCBsZW4gdmVyc3VzIGRvc2UgaW4gVG9vdGhHcm93dGggZGF0YXNldAozLiBSZXBlYXQgZXhlcmNpc2UgdHdvIGZvciBkaWZmZXJlbnQgc3VwcGxlbWVudHMgKFZDIG9yIE9KKSBpbiB0aGUgc3VwcCB2YXJpYWJsZQo0LiBQbGFjZSBhIGxlZ2VuZCBvbiB5b3VyIGdyYXBoIGZvciB0aGUgcHJldmlvdXMgZXhlcmNpc2U=