create account

Boxplots In R With ggplot2 by dkmathstats

View this thread on: hive.blogpeakd.comecency.com
· @dkmathstats ·
$17.15
Boxplots In R With ggplot2
Hi there. I have not posted an R programming post in a long while. Here it is.

This post is on boxplots in R with the `ggplot2` package. 

### Sections
---

* A Quick Overview Of Boxplots
* Example One - Denim Waste Data
* Example Two - Grades In A Statistics Class
* References


### A Quick Overview Of Boxplots
---

Boxplots are one form of data visualization which features the spread of numeric values. 

Consider the boxplot summary image below. 

<center><img src="http://www.physics.csbsju.edu/stats/simple.box.defs.gif" /></center>
<center><a href="http://www.physics.csbsju.edu/stats/simple.box.defs.gif">Image Source</a></center>


* The largest number in a set of values is the maximum.
* In a set of values, the smallest number in a set of values is the minimum.
* The median is the "middle number" where half of the values are above it and the other half are below this median number.


* In the first quartile (Q1), 75% of values are above this and 25% of values are below this.
* In the third quartile (Q3), 75% of values are above this and 25% of values are below this.
* The interquartile range (IQR), is the difference of the third quartile and the first quartile (Q3 minus Q1).


The following two examples feature datasets from the `faraway` library in R. You can install the `faraway` library with the code:

`install.packages("faraway")`


### Example One - Denim Waste Data
---

This first example features denim waste data from the faraway library in R. Here is a screenshot of the documentation (from my R Studio).

<center>![denim_doc.PNG](https://steemitimages.com/DQmV88eRW5V688V3wGXW8Nn9duizv14V1FLVDAbMKyZRwV7/denim_doc.PNG)</center>


The `faraway` and `ggplot2` packages are loaded into R. The `head()` and `str()` functions are used to quickly examine the dataset in R.

<center>![denim01.PNG](https://steemitimages.com/DQmPKyP5ZVXHSnfTwTpdDo5a7RGn4N8DuoYRuaQWbCH11Jb/denim01.PNG)</center>

The variable supplier is a factor. When the boxplot is created with the `ggplot2` package, we would have multiple boxplots with one for each supplier. Multiple boxplots are useful for visual comparisons.

![denim02.PNG](https://steemitimages.com/DQmbWt9h51rppnjPuBXYxjeJTSdjx7WBFJ8S9tMLzaCfDTg/denim02.PNG)

<center>![denim_boxplot.png](https://steemitimages.com/DQmf19x2fN3V5G3M4p9YwwtwygBLXdpzhVKEN6JhgW5wp1t/denim_boxplot.png)</center>

In the `ggplot()` function, the denim data is loaded in. The x-axis contains the supplier factor variable. Percentage waste in the form of the waste variable is in the y-axis. Do notice that some of the percentages are negative.

The `geom_boxplot()` add-on function is needed to produce the boxplots. Labels and the title can be produced with the `labs()` add-on function.

Label and title aesthetics such as font size, font colour and font style can be modified within the `theme()` function. Inside `theme()`, the argument `plot.title = element_text(hjust = 0.5, colour = "black")` centers the title.


### Example Two - Grades In A Statistics Class
---

This second boxplot example is on grades from a statistics class. Some data cleaning and reformatted is done in order to prepare the data for ggplot2. 

<center>![statsClass_doc.PNG](https://steemitimages.com/DQmYhB1z9Ce2NqnqeCuC5spQRSogbYEsg57Zh9PHRqJvAFa/statsClass_doc.PNG)</center>

I first load in the faraway, tidyr and ggplot2 packages into R.

<center>![statsClass_01.PNG](https://steemitimages.com/DQmZBP4iFt7onHQNQfoAX3eu67sgKoXtPyVdEkSRPxqaURU/statsClass_01.PNG)</center>

From the `summary()` output, you can get an idea of the values in the dataset. The scores for midterm, final and hw are scores from the course grade of 100. These scores need to be converted to scores out of 100 percent. The same scaling is needed for boxplots.

The next lines of code involve renaming column names and rescaling the grades.

![statsClass_02.PNG](https://steemitimages.com/DQmR2ZhwX5B2YqHRY6EmthDUhqvAwqEuUVAxqb44pu2y2Bj/statsClass_02.PNG)

The data right now is not ready for ggplot2. The `gather()` function from the tidyr package is used to convert the data from a wide format to a long format. I then convert the Assessment column as ordered factors. Ordered factors help preserve order in the boxplots.

![statsClass_03.PNG](https://steemitimages.com/DQmQ4Dr8kHve51VQWn4Wt6Xz9ZsfLeKzH4nHsqSwurJAUuU/statsClass_03.PNG)

Now that the data is ready, producing the boxplot with R's ggplot2 should not be too difficult. This code is very similar to Example One's code.

![statsClass_04.PNG](https://steemitimages.com/DQmXFECASQ7MU1EA6xGUi8HHu61inEB5XZZe5aMHuyK6xiJ/statsClass_04.PNG)

<center>![stats500_plot.png](https://steemitimages.com/DQmcF9ZeozAxEbfXhvujMbDWDYLEsT9AU5w8sAGyBnbBu2M/stats500_plot.png)</center>

In the `Final` boxplot, those bottom five data points outside of the box are outliers. Those data points represent really below average scores on the final exam (or assignment). 

The median final grade as indicated by the thick black line appears to be around 75 percent. Do not confuse this with the mean (average). 

### References
---

* http://www.cookbook-r.com/Manipulating_data/Changing_the_order_of_levels_of_a_factor/
* http://www.physics.csbsju.edu/stats/box2.html
* https://www.mathsisfun.com/data/quartiles.html
* R Graphics Cookbook By Winston Chang is a good book for ggplot2 reference code.
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 28 others
properties (23)
authordkmathstats
permlinkboxplots-in-r-with-ggplot2
categoryprogramming
json_metadata{"tags":["programming","steemstem","statistics","r","mathematics"],"image":["http://www.physics.csbsju.edu/stats/simple.box.defs.gif","https://steemitimages.com/DQmV88eRW5V688V3wGXW8Nn9duizv14V1FLVDAbMKyZRwV7/denim_doc.PNG","https://steemitimages.com/DQmPKyP5ZVXHSnfTwTpdDo5a7RGn4N8DuoYRuaQWbCH11Jb/denim01.PNG","https://steemitimages.com/DQmbWt9h51rppnjPuBXYxjeJTSdjx7WBFJ8S9tMLzaCfDTg/denim02.PNG","https://steemitimages.com/DQmf19x2fN3V5G3M4p9YwwtwygBLXdpzhVKEN6JhgW5wp1t/denim_boxplot.png","https://steemitimages.com/DQmYhB1z9Ce2NqnqeCuC5spQRSogbYEsg57Zh9PHRqJvAFa/statsClass_doc.PNG","https://steemitimages.com/DQmZBP4iFt7onHQNQfoAX3eu67sgKoXtPyVdEkSRPxqaURU/statsClass_01.PNG","https://steemitimages.com/DQmR2ZhwX5B2YqHRY6EmthDUhqvAwqEuUVAxqb44pu2y2Bj/statsClass_02.PNG","https://steemitimages.com/DQmQ4Dr8kHve51VQWn4Wt6Xz9ZsfLeKzH4nHsqSwurJAUuU/statsClass_03.PNG","https://steemitimages.com/DQmXFECASQ7MU1EA6xGUi8HHu61inEB5XZZe5aMHuyK6xiJ/statsClass_04.PNG","https://steemitimages.com/DQmcF9ZeozAxEbfXhvujMbDWDYLEsT9AU5w8sAGyBnbBu2M/stats500_plot.png"],"links":["http://www.physics.csbsju.edu/stats/simple.box.defs.gif","http://www.cookbook-r.com/Manipulating_data/Changing_the_order_of_levels_of_a_factor/","http://www.physics.csbsju.edu/stats/box2.html","https://www.mathsisfun.com/data/quartiles.html"],"app":"steemit/0.1","format":"markdown"}
created2017-12-21 04:26:18
last_update2017-12-21 04:26:18
depth0
children1
last_payout2017-12-28 04:26:18
cashout_time1969-12-31 23:59:59
total_payout_value13.072 HBD
curator_payout_value4.080 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length5,348
author_reputation150,253,198,901,782
root_title"Boxplots In R With ggplot2"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id24,461,905
net_rshares2,756,198,490,080
author_curate_reward""
vote details (92)
@cryptogrind ·
this might as well have been written in german for all the sense that i could make of it, keep it up though, we need the front end guys like you to keep pushing the tech forward ... an upvote from me! :)
properties (22)
authorcryptogrind
permlinkre-dkmathstats-boxplots-in-r-with-ggplot2-20171221t042811030z
categoryprogramming
json_metadata{"tags":["programming"],"app":"steemit/0.1"}
created2017-12-21 04:28:12
last_update2017-12-21 04:28:12
depth1
children0
last_payout2017-12-28 04:28:12
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length203
author_reputation5,515,908,549,737
root_title"Boxplots In R With ggplot2"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id24,462,132
net_rshares0