## Review

So far we have covered the expressions, data types, vectors, vector operations and subset selections, and finally data sets, and their import/export.

In addition to R notebooks shared for this course you can also use the following online courses and tutorials for reviewing these topics:

## Example data sets

Following are a short selection from UCI datasets you may use to exercise with the tools covered further below.

# Exploring data

## Single variable: checks

Check several statistics of a variable:

data(ChickWeight)
summary(ChickWeight$weight)  Min. 1st Qu. Median Mean 3rd Qu. Max. 35.0 63.0 103.0 121.8 163.8 373.0  Check normality: shapiro.test(ChickWeight$weight)

Shapiro-Wilk normality test

data:  ChickWeight$weight W = 0.90866, p-value < 2.2e-16 ##NOTE THE DIFFERENCE WİTH THE BELOW shapiro.test(ChickWeight$weight[ChickWeight$Time==10&ChickWeight$Diet==1])

Shapiro-Wilk normality test

data:  ChickWeight$weight[ChickWeight$Time == 10 & ChickWeight$Diet == 1] W = 0.98122, p-value = 0.9555 Sumamrize all variables in a data set one by one: summary(ChickWeight)  weight Time Chick Diet Min. : 35.0 Min. : 0.00 13 : 12 1:220 1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120 Median :103.0 Median :10.00 20 : 12 3:120 Mean :121.8 Mean :10.72 10 : 12 4:118 3rd Qu.:163.8 3rd Qu.:16.00 17 : 12 Max. :373.0 Max. :21.00 19 : 12 (Other):506  ## Single variable visualization Histogram or stem plot: hist(ChickWeight$weight)
stem(ChickWeight$weight) Exercise: Increase the number of bars in the histogram below, by using the parameters of hist function as described in its help page. Exercise: Plot a histogram of chicken which are on diet 2 and during Time period 5 Box-plot shows quartiles and extreme values: boxplot(ChickWeight$weight)

## Two variable: summary and visualization

You may need to check correlation of variables:

data(Titanic)
td <- as.data.frame(Titanic)
summary(td)
  Class       Sex        Age     Survived      Freq
1st :8   Male  :16   Child:16   No :16   Min.   :  0.00
2nd :8   Female:16   Adult:16   Yes:16   1st Qu.:  0.75
3rd :8                                   Median : 13.50
Crew:8                                   Mean   : 68.78
3rd Qu.: 77.00
Max.   :670.00  
td
NA

Visualizations depend on the data type of the variables. For numerical variables:

plot (ChickWeight$weight ~ ChickWeight$Time)

NOTE ON FORMULAS You weill encounter formulas such as “weight ~ time” in R very often. This syntax is meant to represent what we mean by saying “weight as a function of time” in statistics. We will see more elaborate formulas in regression and advanced exploratory statistics.

For categorical variables two dimensional boxplot is very useful:

boxplot(ChickWeight$weight ~ ChickWeight$Diet)

# The complexity of The old style graphics with R

R as a very established system of graphial visualization. You have already used commands like ‘plot’ and ‘hist’.

hist(mtcars$mpg) plot(mtcars$cyl,mtcars\$mpg)

Exercise: Download İstanbul Stock Exchange data, and plot the ISE series.

The traditional commands are indeed powerful and flexible, but they come with the price of complexity. For example things become very complicated if you want to use color coding in your visualizations. See the following example (adapted from (http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html#org7628198) ):

plot(weight ~ Time, data=subset(ChickWeight,Diet=="1"))
points(weight ~ Time, data=subset(ChickWeight,Diet=="4"),pch="x",col="red")
legend("topleft",c("Diet 1","Diet 4"), title="Diet",col=c("black","red"),pch=c("o","x"))

Exercise: Plot ISE and one of the other stock exchanges in the same graphics. Also try saving the graphics into a file.

## Home Exercises

1. Summarize variables in ToothGrowth dataset
2. Plot len versus dose in ToothGrowth dataset
3. Repeat exercise two for different supplements (VC or OJ) in the supp variable
4. Place a legend on your graph for the previous exercise
LS0tCnRpdGxlOiAiQkEgNDY0IC0gV2VlayAzQS1JbnRyb2R1Y3Rpb24gdG8gRXhwbG9yYXRvcnkgU3RhdGlzdGljcyIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6IGRlZmF1bHQKICBodG1sX25vdGVib29rOiBkZWZhdWx0Ci0tLQoKIyMgUmV2aWV3CgpTbyBmYXIgd2UgaGF2ZSBjb3ZlcmVkIHRoZSBleHByZXNzaW9ucywgZGF0YSB0eXBlcywgdmVjdG9ycywgdmVjdG9yIG9wZXJhdGlvbnMgYW5kIHN1YnNldCBzZWxlY3Rpb25zLCBhbmQgZmluYWxseSBkYXRhIHNldHMsIGFuZCB0aGVpciBpbXBvcnQvZXhwb3J0LgoKSW4gYWRkaXRpb24gdG8gUiBub3RlYm9va3Mgc2hhcmVkIGZvciB0aGlzIGNvdXJzZSB5b3UgY2FuIGFsc28gdXNlIHRoZSBmb2xsb3dpbmcgb25saW5lIGNvdXJzZXMgYW5kIHR1dG9yaWFscyBmb3IgcmV2aWV3aW5nIHRoZXNlIHRvcGljczoKCiogKGh0dHBzOi8vY2FtcHVzLmRhdGFjYW1wLmNvbS9jb3Vyc2VzL2ZyZWUtaW50cm9kdWN0aW9uLXRvLXIpCiogKGh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL2RvYy9tYW51YWxzL1ItaW50cm8uaHRtbCkgQ2hhcHRlcnMgMSBhbmQgMgoqIChodHRwOi8vd3d3LnItdHV0b3IuY29tL3ItaW50cm9kdWN0aW9uKQoKIyMgRXhhbXBsZSBkYXRhIHNldHMKCkZvbGxvd2luZyBhcmUgYSBzaG9ydCBzZWxlY3Rpb24gZnJvbSBVQ0kgZGF0YXNldHMgeW91IG1heSB1c2UgdG8gZXhlcmNpc2Ugd2l0aCB0aGUgdG9vbHMgY292ZXJlZCBmdXJ0aGVyIGJlbG93LiAKCiogKGh0dHBzOi8vYXJjaGl2ZS5pY3MudWNpLmVkdS9tbC9kYXRhc2V0cy9CYW5rK01hcmtldGluZykgQ2hvb3NlIHRoZSBiYW5rX2Z1bGwuY3N2IGZpbGUgaW5zaWRlIHRoZSBiYW5rLnppcCBhcmNoaXZlLiBOb3RlIHRoYXQgdGhlIHNlcGFyYXRvciBpcyBzZW1pY29sb24gKDspIG5vdCBjb21tYSgsKSEKKiAoaHR0cHM6Ly9hcmNoaXZlLmljcy51Y2kuZWR1L21sL2RhdGFzZXRzL1Jlc3RhdXJhbnQrJTI2K2NvbnN1bWVyK2RhdGEpIGluc2lkZSB0aGUgUkNkYXRhLnppcCBhcmNoaXZlICx5b3UgY2FuIHVzZSB1c2VycHJvZmlsZS5jc3YgZmlsZS4KKiAoaHR0cHM6Ly9hcmNoaXZlLmljcy51Y2kuZWR1L21sL2RhdGFzZXRzL0lTVEFOQlVMK1NUT0NLK0VYQ0hBTkdFKSBTYXZlIGRhdGFfYWtiaWxnaWMueGxzeCBmaWxlIGFuZCBpbXBvcnQgaW50byBSIHN0dWRpbyBhcyAnJ2V4Y2VsIiBmaWxlLgoqIChodHRwczovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvZGF0YXNldHMvQmlrZStTaGFyaW5nK0RhdGFzZXQpIENob29zZSBob3VybHkuY3N2IGZyb20gdGhlIHppcCBmaWxlLgoKIyBFeHBsb3JpbmcgZGF0YQoKIyMgU2luZ2xlIHZhcmlhYmxlOiBjaGVja3MKCkNoZWNrIHNldmVyYWwgc3RhdGlzdGljcyBvZiBhIHZhcmlhYmxlOgpgYGB7cn0KZGF0YShDaGlja1dlaWdodCkKc3VtbWFyeShDaGlja1dlaWdodCR3ZWlnaHQpCmBgYAoKQ2hlY2sgbm9ybWFsaXR5OgpgYGB7cn0Kc2hhcGlyby50ZXN0KENoaWNrV2VpZ2h0JHdlaWdodCkKIyNOT1RFIFRIRSBESUZGRVJFTkNFIFfEsFRIIFRIRSBCRUxPVwpzaGFwaXJvLnRlc3QoQ2hpY2tXZWlnaHQkd2VpZ2h0W0NoaWNrV2VpZ2h0JFRpbWU9PTEwJkNoaWNrV2VpZ2h0JERpZXQ9PTFdKQpgYGAKClN1bWFtcml6ZSBhbGwgdmFyaWFibGVzIGluIGEgZGF0YSBzZXQgb25lIGJ5IG9uZToKYGBge3J9CnN1bW1hcnkoQ2hpY2tXZWlnaHQpCmBgYAoKCiMjIFNpbmdsZSB2YXJpYWJsZSB2aXN1YWxpemF0aW9uCgpIaXN0b2dyYW0gb3Igc3RlbSBwbG90OgpgYGB7cn0KaGlzdChDaGlja1dlaWdodCR3ZWlnaHQpCnN0ZW0oQ2hpY2tXZWlnaHQkd2VpZ2h0KQpgYGAKCgoqKkV4ZXJjaXNlKio6IEluY3JlYXNlIHRoZSBudW1iZXIgb2YgYmFycyBpbiB0aGUgaGlzdG9ncmFtIGJlbG93LCBieSB1c2luZyB0aGUgcGFyYW1ldGVycyBvZiBoaXN0IGZ1bmN0aW9uIGFzIGRlc2NyaWJlZCBpbiBpdHMgaGVscCBwYWdlLgoKKipFeGVyY2lzZSoqOiBQbG90IGEgaGlzdG9ncmFtIG9mIGNoaWNrZW4gd2hpY2ggYXJlIG9uIGRpZXQgMiBhbmQgZHVyaW5nIFRpbWUgcGVyaW9kIDUKCkJveC1wbG90IHNob3dzIHF1YXJ0aWxlcyBhbmQgZXh0cmVtZSB2YWx1ZXM6CmBgYHtyfQpib3hwbG90KENoaWNrV2VpZ2h0JHdlaWdodCkKYGBgCgojIyBUd28gdmFyaWFibGU6IHN1bW1hcnkgYW5kIHZpc3VhbGl6YXRpb24KCllvdSBtYXkgbmVlZCB0byBjaGVjayBjb3JyZWxhdGlvbiBvZiB2YXJpYWJsZXM6CmBgYHtyfQpkYXRhKFRvb3RoR3Jvd3RoKQpoZWxwKFRvb3RoR3Jvd3RoKQpjb3IoVG9vdGhHcm93dGgkbGVuLFRvb3RoR3Jvd3RoJGRvc2UpCmBgYAoKClZpc3VhbGl6YXRpb25zIGRlcGVuZCBvbiB0aGUgZGF0YSB0eXBlIG9mIHRoZSB2YXJpYWJsZXMuIEZvciBudW1lcmljYWwgdmFyaWFibGVzOgpgYGB7cn0KcGxvdCAoQ2hpY2tXZWlnaHQkd2VpZ2h0IH4gQ2hpY2tXZWlnaHQkVGltZSkKYGBgCgoqKk5PVEUgT04gRk9STVVMQVMqKiBZb3Ugd2VpbGwgZW5jb3VudGVyIGZvcm11bGFzIHN1Y2ggYXMgIndlaWdodCB+IHRpbWUiIGluIFIgdmVyeSBvZnRlbi4gVGhpcyBzeW50YXggaXMgbWVhbnQgdG8gcmVwcmVzZW50IHdoYXQgd2UgbWVhbiBieSBzYXlpbmcgIndlaWdodCBhcyBhIGZ1bmN0aW9uIG9mIHRpbWUiIGluIHN0YXRpc3RpY3MuIFdlIHdpbGwgc2VlIG1vcmUgZWxhYm9yYXRlIGZvcm11bGFzIGluIHJlZ3Jlc3Npb24gYW5kIGFkdmFuY2VkIGV4cGxvcmF0b3J5IHN0YXRpc3RpY3MuCgpGb3IgY2F0ZWdvcmljYWwgdmFyaWFibGVzIHR3byBkaW1lbnNpb25hbCBib3hwbG90IGlzIHZlcnkgdXNlZnVsOgpgYGB7cn0KYm94cGxvdChDaGlja1dlaWdodCR3ZWlnaHQgfiBDaGlja1dlaWdodCREaWV0KQpgYGAKCgojIFRoZSBjb21wbGV4aXR5IG9mIFRoZSBvbGQgc3R5bGUgZ3JhcGhpY3Mgd2l0aCBSCgpSIGFzIGEgdmVyeSBlc3RhYmxpc2hlZCBzeXN0ZW0gb2YgZ3JhcGhpYWwgdmlzdWFsaXphdGlvbi4gWW91IGhhdmUgYWxyZWFkeSB1c2VkIGNvbW1hbmRzIGxpa2UgJ3Bsb3QnIGFuZCAnaGlzdCcuCmBgYHtyfQpoaXN0KG10Y2FycyRtcGcpCnBsb3QobXRjYXJzJGN5bCxtdGNhcnMkbXBnKQpgYGAKCioqRXhlcmNpc2UqKjogRG93bmxvYWQgxLBzdGFuYnVsIFN0b2NrIEV4Y2hhbmdlIGRhdGEsIGFuZCBwbG90IHRoZSBJU0Ugc2VyaWVzLgoKVGhlIHRyYWRpdGlvbmFsIGNvbW1hbmRzIGFyZSBpbmRlZWQgcG93ZXJmdWwgYW5kIGZsZXhpYmxlLCBidXQgdGhleSBjb21lIHdpdGggdGhlIHByaWNlIG9mIGNvbXBsZXhpdHkuIEZvciBleGFtcGxlIHRoaW5ncyBiZWNvbWUgdmVyeSBjb21wbGljYXRlZCBpZiB5b3Ugd2FudCB0byB1c2UgY29sb3IgY29kaW5nIGluIHlvdXIgdmlzdWFsaXphdGlvbnMuIFNlZSB0aGUgZm9sbG93aW5nIGV4YW1wbGUgKGFkYXB0ZWQgZnJvbSAoaHR0cDovL3R1dG9yaWFscy5pcS5oYXJ2YXJkLmVkdS9SL1JncmFwaGljcy9SZ3JhcGhpY3MuaHRtbCNvcmc3NjI4MTk4KSApOgoKYGBge3J9CnBsb3Qod2VpZ2h0IH4gVGltZSwgZGF0YT1zdWJzZXQoQ2hpY2tXZWlnaHQsRGlldD09IjEiKSkKcG9pbnRzKHdlaWdodCB+IFRpbWUsIGRhdGE9c3Vic2V0KENoaWNrV2VpZ2h0LERpZXQ9PSI0IikscGNoPSJ4Iixjb2w9InJlZCIpCmxlZ2VuZCgidG9wbGVmdCIsYygiRGlldCAxIiwiRGlldCA0IiksIHRpdGxlPSJEaWV0Iixjb2w9YygiYmxhY2siLCJyZWQiKSxwY2g9YygibyIsIngiKSkKYGBgCgoqKkV4ZXJjaXNlKio6IFBsb3QgSVNFIGFuZCBvbmUgb2YgdGhlIG90aGVyIHN0b2NrIGV4Y2hhbmdlcyBpbiB0aGUgc2FtZSBncmFwaGljcy4gQWxzbyB0cnkgc2F2aW5nIHRoZSBncmFwaGljcyBpbnRvIGEgZmlsZS4KCiMjIEhvbWUgRXhlcmNpc2VzCgoxLiBTdW1tYXJpemUgdmFyaWFibGVzIGluIFRvb3RoR3Jvd3RoIGRhdGFzZXQKMi4gUGxvdCBsZW4gdmVyc3VzIGRvc2UgaW4gVG9vdGhHcm93dGggZGF0YXNldAozLiBSZXBlYXQgZXhlcmNpc2UgdHdvIGZvciBkaWZmZXJlbnQgc3VwcGxlbWVudHMgKFZDIG9yIE9KKSBpbiB0aGUgc3VwcCB2YXJpYWJsZQo0LiBQbGFjZSBhIGxlZ2VuZCBvbiB5b3VyIGdyYXBoIGZvciB0aGUgcHJldmlvdXMgZXhlcmNpc2U=