Choosing a Model
Here, we overview ways to think about choosing a modeling technique.
These considerations are optional. Maybe there is only one or two which you are absolutely sure of in your problem; other times you might feel like there are many ways you could approach the problem. Going through each of these ways can at least illuminate your uncertainty; sometimes its helpful to know what we don’t know.
Flexibility vs. Interpretability
There is a tradeoff between highly flexible models and highly interpretable models. We showcase that, as models become more complex and able to accomodate our convoluted datasets, it often becomes difficult to try to discern just exactly how it is able to make the predictions or inferences it does- leading to the well known conception of a "black box", a model which works but we just don’t know how.
Classification vs. Regression
Quantitative data, like someone’s age, birth year, or income is considered regression data. Qualitative data, like what kind of eye color someone has, disease status, or voting pattern would be classification data.
Prediction vs. Inference
Sometimes, we want to make predictions, like if an image is a dog or a cat; other times, we want to infer the properties of something, like consumer shopping patterns. Here we explore the contrast between prediction and inference.
Supervision
You saw in the Starter Guide on the meaning of f(x) in data science that there is some associated output $Y$. Well that’s true…if you are doing supervised learning. Here we explore supervised vs. unsupervised approaches.