TDM 10200: Project 8 — 2024
Motivation: We will continue to introduce functions and visualization
Context: Write functions with visualizations
Scope: python, functions, pandas, matplotlib, Parquet columnar storage file format
Reading and Resources
Datasets
/anvil/projects/tdm/data/whin/weather.parquet
We added eleven new videos to help you with Project 8. |
You need to use 3 cores for your Jupyter Lab session for Project 8 this week. |
You can use You can use |
Questions
Question 1 (2 points)
Read the file into a DataFrame called myDF
.
-
Convert the
observation_time
column to into adatetime
type. -
Create 3 new columns for the
year
,month
andday
, based on the columnobservation_time
. -
For a given
station_id
, calculate the average month-and-year-pair temperatures (from the columntemperature
) for thatstation_id
. Try this for a few differentstation_id
values. -
Now write a function called
get_avg_temp
that takes onestation_id
as input and returns the average month-and-year-pair temperatures (associated with that specificstation_id
). Make sure that the results of your function match with your work from question 1c.
Question 2 (2 points)
For this function, be sure to import matplotlib.pyplot
.
We will use the function from question 1d to make some line plots.
-
For a given
station_id
, create a line plot, with one line for each year. Try this for a few differentstation_id
values. -
Now that you are sure your analysis from 2a works well, wrap your work from question 2a into a function that takes a
station_id
as input, and creates a line plot, with one line for each year (for the average month-and-year-pair temperatures from thatstation_id
).
Question 3 (2 points)
-
Revisit the function from question 1d, to find the maximum temperature (instead of the average temperature) in each month-and-year-pair, for a given station. As before, you should test this for several examples before you build the function, and then make sure your function matches your examples.
-
Revisit the function from question 2b, to make a function that takes one
station_id
as input and it creates a bar plot (instead of a line plot), depicting the maximum temperature in each month-and-year-pair (instead of the average temperature).
Your work from question 3b can utilize the function you build in question 3a. |
Question 4 (2 points)
-
For a given
station_id
, create a box plot that shows the month-by-month wind speeds in 2020 for that specifiedstation_id
. Try this for a few differentstation_id
values. -
Write a function that takes a
year
(not necessarily 2020) and astation_id
as inputs, and the function creates a box plot about the month-by-month wind speeds in that specific year (not necessarily 2020), at the specifiedstation_id
.
Question 5 (2 points)
-
Explore the dateset and find something interesting, like (for instance) something about the wind speed, pressure, soil temperature, etc., and do some analysis.
-
Make a visualization that shows one or more plots about your analysis.
-
Wrap the work 5a and 5b into a function that can be used to create the visualizations in a systematic way, and test the function with the same inputs used in 5a and 5b.
Project 08 Assignment Checklist
-
Jupyter Lab notebook with your code, comments and output for the assignment
-
firstname-lastname-project08.ipynb
.
-
-
Python file with code and comments for the assignment
-
firstname-lastname-project08.py
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |