Final Project
After a long semester, we have reached the end of LIS4317, Visual Analytics! Using what I have gained from this course, I will establish work that will represent the numerous topics covered. The best way to accomplish this is by answering a question or issue that I am experiencing in my own life. In a few short days, I am expected to graduate from the University of South Florida. A challenging, but rewarding experience to which I will be forever grateful for. However, this brought upon the question of college graduation and the hurdles that come along with completion. What factors play into an individual graduating? How many people actually graduate from college? Do specific locations lead to more college graduates? How much does financial aid impact the graduation process? Using this dataset titled College Completion Dataset, from Kaggle, I will answer these questions through various visualization methods.
The software that was used for this project was Rstudio.
The original dataset holds 3798 observations and 63 variables.
After careful consideration, certain variables were removed thus resulting in 3798 observations and 53 variables.
Grouped Bar Chart
The first visualization will be a grouped bar chart. To provide clarity when graphing, I decided the top 20 college (based on student count) would be able to showcase a digestable amount of information. It is also important to note that many colleges had "NA" values for the grad percentile columns. This needed to be altered in order to prevent skewed results. Using the drop_na function, I was able to keep the top 20 colleges with graduation percentile information. From the graph below, we can see that in many cases, students were graduating at rates a bit longer than the standard 4-year duration. Many factors could play into this such as financial circumstances, retaking courses, uncertainty of degree progression, or even prioritizing other areas of life such as family or work. No matter the reason, it was interesting to see how prevalent this occurs in major schools. I also took longer with my college degree. Different factors played a role but also made me realize that others were going through similar situations.
Scatter Plot (With Linear Regression Trend)
The next visualization is a scatter plot with a linear regression trend line. The purpose of this plot is to answer the question of how much financial aid impacts the graduation process. This plot graphs each college's aid value versus the percentage of students who graduated within 100% of the normal time (4-years). With a little bit of data cleaning, values that were listed as "NA" from both of these variables were dropped, thus allowing us to continue with visualizing. From this plot, we can clearly see as financial aid increases, the graduation percentile tends to increase too. One thing to note is that there is a high density in the $0 to $10,000 range. This indicates that more colleges offer relatively lower aid. Even though lower aid is offered to more students we do see a significant amount of on-time graduations. We can also see that colleges that offer high amounts of aid see very few students who do not graduate within the normal amount of time. One thing to note is that this does not mean colleges cannot willingly give out large sums of aid after seeing this relationship. It is something that they must keep note of when considering what determinants will lead to more on-time and graduations in general within their respective schools.
Multi-variable Scatter Plot
The next visualization is a multi-variable scatter plot. This plot demonstrates the correlation between median SAT Scores of each college and the percentage of students who graduated within the normal 4-year period(100%). This plot also shows what type of college is based on color (Red = Private for-profit, Green = Private not-for-profit, and Blue = Public) and the duration of the college (Circle = 2-year and Triangle = 4-year).
From this visualization, we can depict interesting trends. For example, we see many Private not-for-profit colleges tend to have higher SAT scores with a higher amount of students graduating within the 4-year period. However, the distribution among private not-for-profit is much greater than the other university types. Public university SAT scores tend to stay within the 800-1200 range. Private not-for-profit tends to have more outliers. Private for-profit has very little plot points so it's more difficult to locate accurate trends. However, based on the limited quantity of data, we do see that Private for-profit stays within the higher range of graduation on time. One thing to note is that after data cleaning, there are more private not-for-profit colleges than Public and Private for-profit colleges. If more data was available for the other two universities, we would be able to greater trends and patterns.
Seeing that Private not-for-profit tend to score higher SAT scores, we can make assumptions that these students have more resources compared to students from other universities. They could have access to private tutors and better schools during their time in high school.
Choropleth Map
The last visualization was a bit challenging to make but with extra research, I was able to demonstrate trends across the entire United States. Using the variables grad_100_percentile and state, I constructed a choropleth map or a heatmap by region. The colors I used were Purple and Orange. If a state has a higher concentration of purple, it indicates a higher percentage of students who graduated within the 4-year period. If a state looks more orange, it indicates a lower amount of students who graduated within the 4-year period. This visualization answers the question of whether location can lead to more college graduates. From this visualization, we can that northeastern states tend to have more on-time graduates. States on the west coast such as California, Oregon, and Washington also have a higher percentage of students graduating within the 4-year period. Southern states on the other-hand have less on time graduations. It is important to note that this could be swayed depending on the amount of data provided by colleges in each state. However, based on the information provided, it can be safe to assume that states closer to either coast have a higher likelihood of students graduating on time.
Conclusion and Reflection
The visualizations constructed for this project answered the questions brought forward in the initial stages of this assignment. We were able to see what variables played a role in graduation. The plots that were devised included a Grouped Bar Chart, a Scatter Plot (With Linear Regression Trend), a Multi-variable Scatter Plot, and a Choropleth Map. The first visualization, Grouped Bar Chart, we were able to see a lot of colleges had students graduating in the 150% time category. This means students took longer to graduate than the traditional 4 years. Certain assumptions were made but this plot showed how common it is for students to take longer in completing college. The next visualization, Scatter Plot (with linear regression trend), focused on portraying the relationship between Financial Aid given and Graduating within the 4-year period. We were able to deduce that if colleges provide more aid, students are more likely to graduate on time. One key aspect to note is that colleges tend to allocate within the $0 to $10,000 range. There is a wide variety of the amount of students who were able to complete college in that time period. By focusing on the linear regression trend line, we can see a positive slope trend. If colleges provided more financial aid, they are likely to see more students complete their degrees. The visualization, Multi-variable Scatter Plot, demonstrates the relationships between multiple variables. The variables chosen for this plot includes Median SAT score, 100% Graduation Percentile, University duration, and University type. We can see that students who have higher SAT scores tend to go to private not-for-profit universities. Students who go to public and for-profit universities tend to score a bit lower with their SAT. We also see more students who go to these not-for-profit universities tend to graduate within the 4-year time frame. This gives us greater inferencing as to how the allocation of more resources provides students the opportunity to go to these prestigious colleges with a higher guarantee of graduating. The last visualization, Choropleth Map, helps us visualize how location can serve as an indicator to how likely a student will graduate. As mentioned, students who attend colleges closer to the west coast and northeastern section of the United States, have a higher likelihood of graduating on time compared to students from other regions of the country.
This project allowed me to understand the various factors that play a role in one's ability to complete their education. Money, time, and resources are just some of the variables that one must consider when pursuing higher education. This course solidified my skills with visualization. ggplot2 is a fundamental tool I hope to continue expanding upon. Data is important but the way you share and portray it is pivotal. As I finish out my time in college, I can't express how grateful I am that I had the opportunity to complete my degree at USF. Thank you all for taking the time to read this post as well as my journey through LIS4317!



Comments
Post a Comment