Module # 10 assignment
Review the reading resources and post on your blog a new entry with your work with ggplot2 and time series (try yourself) and discuss the input of visualization on time series analysis.
The dataset I will be using is titled "Video Game Sales" (https://www.kaggle.com/datasets/anandshaw2001/video-game-sales)
The first plot I created deals with the amount of video game sales over the years. Using ggplot, I was able to establish a relationship of the total amount of video games sold over the course of forty years.
> ggplot(videoGameSales, aes(x = as.numeric(Year), y = Global_Sales)) +
+ geom_line(stat = "summary", fun = sum) +
+ labs(title = "Global Video Game Sales Over the Years", x = "Year", y = "Total Sales (millions)")
The next plot was created with R-base graphics. This one was a little trickier without the use of ggplot. More steps had to be incorporated such as the use of the aggregate function. I divided the code into sections in order to make it easy to follow. First the Year had to be changed to numeric. Then the sales by the year had to be aggregated to the regional sales. Then it was converted to a matrix. Column names were designated. And then bar plot function is used. As mentioned earlier, if I used ggplot instead, this would have been much easier. We can see from the graph that NA (North America) dominated in sales for certain years with EU (European) trailing close behind, sometimes neck and neck. JP (Japan) would sometimes come close with EU during the late 1990s. Other Regions would also be matched in sales with JP in the mid 2000s.
> # Year to numeric
> videoGameSales$Year <- as.numeric(as.character(videoGameSales$Year))
>
> # Aggregate sales by year
> sales_data <- aggregate(. ~ Year, data = videoGameSales[, c("Year", "NA_Sales", "EU_Sales", "JP_Sales", "Other_Sales")], sum)
>
> # Convert to matrix
> sales_matrix <- t(as.matrix(sales_data[, -1])) # Exclude 'Year' column from matrix
>
> # Assign column names as years
> colnames(sales_matrix) <- sales_data$Year
>
> #Plot stacked Barplot
> barplot(sales_matrix, beside = FALSE, col = c("lightblue", "orange", "red", "purple"),
+ main = "Video Game Sales by Region Over the Years",
+ xlab = "Year", ylab = "Sales (millions)", border = NA,
+ legend.text = c("NA", "EU", "JP", "Other"))
>
The last plot will be similar to the previous but instead ggplot will be used. Now there
will be a focus on the amount of video games sold by genre
> ggplot(videoGameSales, aes(x = as.factor(Year), y = Global_Sales, fill = Genre)) +
+ geom_bar(stat = "identity") +
+ labs(title = "Video Game Sales by Genre", x = "Year", y = "Sales (millions)") +
+ theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7.5))
>
Comments
Post a Comment