Is global warming a real thing? I haven’t doubted it since I learn about the fact in elementary school. However, for some queer reasons, the fact became a question in recent years. Although there are already many plots and infographics showing how rapid the temperature is increasing, it is still interesting to prove that fact myself, and to see the trend, as a practice of using Tableau.
Through this project, I intended to see the trend of temperature changing and how temperature is co-related with other weather factors such as humidity, pressure, and wind speed. This project focused on 27 cities of the United States in the years of 2013 to 2016.
For this project, I adopted Historical Hourly Weather Data 2012-2017 from Kaggle.com. It provided hourly weather data for 30 US & Canadian cities and 6 Israeli cities, in weather factors including temperature, humidity, pressure, wind speed, wind direction and weather description, each weather factor was provided in different CSV files.
It also thoughtfully prepared the cities’ geographical information data in a separate CSV.
Data Cleaning and Preparing
The dataset was pretty well prepared and there were not so many weird entries. However, due to the size of the dataset, I had to manipulate the dataset before import it to Tableau. I used RStudio and OpenRefine to accomplish this process.
First of all, I need to merge all six weather factor CSV files. After import the CSV files into R as datasets, I used the “merge” function to combine them.
weather <- merge(temperature, humidity) weather <- merge(weather, pressure) weather <- merge(weather, wind_direction) weather <- merge(weather, wind_speed) weather <- merge(weather, weather_description)
Then I imported the geographic data to R and intended to merge it with the weather dataset. Since the header of this dataset is capitalized, I had to turn the header to lowercase in front.
geoinfo <- read.csv("your_file_path") # turn headers to lowercase names(geoinfo) <- tolower(names(geoinfo)) # print(geoinfo) weather_geo <- merge(weather, geoinfo)
I later noticed the temperature data’s number was too big, and it turns out the records were in Kalvin. In order to make them more intuitive, I turned them to Celsius.
weather_geo$temperature <- weather_geo$temperature - 273.15
In the merging process, the order of observation was jumbled. I then sorted them in the order of the
weather_geo <- weather_geo[order(weather_geo$city, weather_geo$datetime ),]
After checking the data, I imported to a new CSV file. I noticed that this CSV has an automatically generate the first column of row numbers, which is generated by R, and the observations of the year 2012 and 2017 are not integrated. Therefore, using OpenRefine, I removed its first column and facet out the rows of 2012 and 2017, and then exported it out as a prepared dataset. The final dataset has 11 variables and 1262304 observations.
Examples and Inspiration
In order to see what experts will visualize weather data, I explored some existing data visualizations which talked about the weather and global warming.
Sonja Kuijipers visualized Eindhoven’s 2014 weather data (see also on Behance), showing a pretty radial layout of multiple weather data facts for one year and one city. Although the layout looks complicated, it’s actually a combination of multiple curved histograms. Another notable character is that in the inner ring that displays the temperature data, Sonja utilized both average values and max/min values.
Another example is Global Weather Radials, designed by Timm Kekeritz, et al. It is a small multiple of a series of 35 radial histograms, each showing one city’s weather in the year 2013. The designer utilized a gradient color code to enhance the temperature representation, while the histogram is already depicting the temperature data. The plot also showed the raining precipitation by circles of different sizes, which gives audience a clear sense of in which season there were more raining days.
What’s more, this GISTEMP visualization provided a great example of comparing the yearly temperature change of each month. The plot’s color code setting clearly shows that the earth is becoming warmer.
Visualization Process and Thoughts
After imported the prepared dataset to Tableau, my first attempt was to put all the cities’ weather facts on a map. I soon noticed that it would be unreasonable to include the six Israeli cities. What’s more, the three Canadian cities are less necessary as well. Therefore, I reduced the data range to 27 US cities.
The map shows the trend clearly that the southern cities are hotter than the northern ones. However, it didn’t show how the temperature is changed over time. Hence, I decided to use instead of the average temperature of each year, but the average of each month and created an animation that shows how the temperature is changing over the month. In order to make the map more informative, I also added other weather factors to it as tooltips.
What’s more, I added an action to the map which allows the audience to click each city of different time to view the detail data.
As discussed, one of the targets of this project is to prove that global warming “is a real thing”. To do that, I formulated several relating questions. Did temperature grow in these years overall? How about each month? Did temperature increase in all these cities? The first question is relatively easy. With importing the average temperature as the rows, and add the date-time year as the columns, a clear trend can be seen (see the left chart).
In order to answer the other two questions, however, is a bit more complicated. I created the following four charts, tried to determine the temperature growing from different aspects.
Another target of this project is to see whether other weather factors are co-related with temperature. For this purpose, I created a series of Q-Q Plots and took a look at the correlations of temperature and other weather factors.
All these three plots utilized the average temperature as the column and the other weather factors’ record (dimension) as the row, which can give the audience a sense of how humidity, pressure, and wind speed would perform when the temperature is different in each month.
Result and Observation
Observation 1: Temperature in US was growing in 2013-2016
Through these two charts, we can see clearly that the temperature is increasing for 27 out of 27 cities in those four years, and the average temperature of each independent month is mostly growing, except February and May. Another two graphs show a similar trend.
The Box Plot shows that the hourly average temperature of the four years is also growing. And the GISTEMP visualization inspired Line Graph, although a little bit vague, shows how the red line (temperature of 2017) is slightly above the others.
Observation 2: Temperature and Humidity are Negatively Correlated
Analyzing the Q-Q Plots uncovered that temperature and Humidity are negatively correlated, while the trends of the other two factors are ambiguous. In order to prove this observation, I added another density plot to see whether the correlation still exists.
For the reason that the plots show how humidity is correlated with temperature, I have wondered does it influence or influenced by global warming, and I did some simple research. According to an article posted on Yale Environment 360, a recent study indicated that climate change can also increase humidity, and the heatwaves’ effect can eventually magnify by the increased humidity and will be harmful to people who work outdoors. (E360 Digest) Another article on seeker.com introduced a concept called “wet bulb”, which is the combined effect of heat and wet. The article quoted a US National Oceanic and Atmospheric Administration’s explanation, “a temperature of 92 degrees Fahrenheit will feel like 94 degrees, at a relative humidity level of 40 percent. But it will feel like a scorching 131 degrees if relative humidity is 90 percent.” (Walters)
Since humidity is such an important factor to climate change, I was curious whether my dataset can reveal its increasing. I then created the histograms below. However, neither the monthly chart nor the yearly chart is proving the fact that humidity is increasing in those four years. It might because there were not enough years were included in this plot, which could cause bias, especially when I noticed that humidity is actually increasing in the first three years.
At the beginning of this project, the hardest thing for me was to formulate proper questions to ask. I started with playing Tableau and obsessed with the different possible patterns that can be generated with my dataset. However, without a meaningful question, the visualizations are useless. I eventually deleted my favorite visualization from the storyboard, which shows the New York City weather in the four years. I found the visualization aesthetically pleasing and kept it in my story until the last minute. When I asked my professor Dr.Sula about my confusion, he stated that analysis and design are equally important in data visualization, and “both are less without each other”. Although I totally agree with this view, it was hard to satisfy both aspects in actual practice.
When I asked my professor Dr.Sula about my confusion, he stated that analysis and design are equally important in data visualization, and “both are less without each other”. Although I totally agree with this view, it was hard to satisfy both aspects in actual practice.
On the other hand, Tableau has many functions, and I used only a small portion of them in this project. One reason is that I am still confusing about the proper usage of dimension, attribute, and measures. Although the article Dimensions and Measures, Blue and Green has introduced their differences clearly, for me it still requires a lot of practice to really understand when to use which.
In addition, the dataset I adopted includes a lot more variables than I actually used in this project. One example is the weather description. This variable determined what’s the weather like in each hour. However, since its values are qualitative, the data is relatively messier and some description are very ambiguous. It requires a lot more time to cluster them. If I got a chance to keep working on this project, adding this variable to my visualization will be very interesting.
Last but not least, it is worth mentioning that in the class critique, Pou Yang pointed out that some headers of my visualization are overlapped, which causes confusion. What’s more, by observing her operation, I found some of my headers are actually misleading. Although these are minor details, they can significantly affect audience’s understanding.
In summation, through this project, I successfuly started my exploration on Tableau and chart design. However, there are still a plenty to learn before I can create professional data visualizations, and I should start with formula insightful questions of data.