Module # 8 Correlation Analysis and ggplot2

Create your own visual analytics based on correlation or regression analysis using ggplot2. The visual will follow our textbook recommendation to use grid to enhance the comparisons between scatter plots or your variables.
Download data set

Post the result on your blog and express your opinion about Few's recommendation.   



The first visualization that demonstrates regression analysis was a simple one that showed the relationship between Weight and MPG. Below is the code and visualization. From this simple regression analysis, we can see that as the weight of the car increases, the MPG decreases. 

> ggplot(mtcars, aes(wt, mpg)) +
+   geom_point() +
+   geom_smooth(method = "lm") +
+   labs(title = "Weight vs. MPG Regression Analysis",
+        x = "Weight",
+        y = "Miles per Gallon (MPG)")
`geom_smooth()` using formula = 'y ~ x'





For the next visual representation I decided to use the package corrplot. I incorporated addCoef.col which adds correlation coefficient values directly on the plot. From here we are able to see a clear visualization that demonstrates the relationship between variables. Red indicates strong negative correlation while Blue indicates strong positive correlation. White indicates weak or no correlation. For example, we can see weight and mpg have a negative correlation, this can be supported by the Weight vs. MPG Regression Analysis visualization above. 

> corrplot(cor(mtcars), addCoef.col = "black", method = "color")




The last visualization I wanted to create will utilize three variables. Just like the regression analysis visualization used at the beginning, I am going to add a third variable, Gears. Using ggplot and geom_point, I distinguished the number of gears for each respective plot by color. Red indicates three gears, green means four gears, and blue means five gears. From this graph, we can see that as the weight of the car increases, the mpg decreases. Not to mention, heavier cars tend to have less gears compared to lighter cars which have more gears. This means that heavier cars with less gears equates to poor mpg while lighter cars with more gears leads to greater mpg. 


> ggplot(mtcars, aes(x=wt, y=mpg, color=factor(gear))) +
+   geom_point(size=4) + 
+   labs(
+     title = "Car Weight vs. Miles per Gallon",
+     x = "Weight",
+     y = "Miles per Gallon",
+     color = "Number of Gears"  
+   )




Overview and Few's Recommendation:

Few encourages simple graphs with color that highlight the effectiveness of the graph. I would say the last two graphs would meet that criteria. The first graph was pretty simple so not much color could be used. I could use a different color for the regression line to show the negative correlation. For the second graph I added text to easily see the correlation but some might argue that it may be distracting. This graph also lacked labels which could lead to more confusion. A title and some form of legend that explains the numbers 1 and -1 would prove useful. The last graph would fit under few's recommendation the most. It is simple, clear, and labeled properly to avoid confusion. We are able to deduce a relationship between three variables without hesitation due to effective visualization practices.  





Comments

Popular posts from this blog

Final Project

Module # 12

Module # 13