This picture reminds me a lot of the poster I used as my original knowledge source. It calls out the same four high level classes: Canadian, American, Irish and Scottish. It also has many of the sub-classes but not all of them. For example, it has some of the Scottish single malt regions, but not Campbeltown and Island. And there is no single malt listed under American whiskey, though there are several in my database.
This chart uses what is called a 'radar chart' to depict its information.
There are eighteen axes coming out of the center that show the whiskey's rating on 18 different taste dimensions: sweet, smoky, grainy, vanilla, honey, spicy, briny, malty, cocoa, buttery, toffee, fruit, bacon fat, oaky, caramel, corny, biscuity, and peaty. A given whiskey can have one of three scores on each of these dimensions: 0, 1 or 2. I decided to change these numbers into written descriptions. If a whiskey scored a zero, that flavor is not mentioned at all. If it scores a '1', the flavor is mentioned, and if it scores a '2' I added the word "very". So single malt scotch from Speyside could be described as "very sweet, very honey, very fuity, very oaky, grainy, cocoa, buttery and caramel."
Once again, judgment now plays an important part. While I love this graphic, I think most people's taste buds are not refined enough to taste the subtleties conveyed here. Most of us would be lucky to perceive the "very" strong flavors, let alone the secondary mentions. So I made the decision to simplify even further and only mention the flavors that were strongly associated with the whiskey, the "very" flavors.
I had to decide how to record this information for my data table, because that will have an impact on how it is displayed in the 'baseball card' shown in the final recommendation. I could make a column for each of the 18 flavors and record the score -- 0, 1 or 2 -- but that doesn't seem very user friendly. There would be eighteen attributes listed for each whiskey, many of which will not be applicable. Instead I decided to create a group of columns and label them 'taste1', 'taste2' and so on, and fill in the strong flavors associated with that whiskey. My data table now looks like this:
To fill in the blanks, I did further research. I found several sources that had taste profiles for both Island and Campbeltown single malts, so I filled those in, using the terminology I had already established.
The real challenge was American single malts. I found a New York Times article that talked about how this is an up-and-coming category, so it is important to have good data, but it also said this is such a diverse group it defies being summarized with a single taste profile. So I did what I have been resisting so far in this project: I created individual taste profiles for each whiskey.
I was able to find reviews for each American single malt in my database, and I used these reviews to create a taste profile containing my established terminology. Once again, if we were crowd-sourcing the data, or if this project were being funded by a customer, this individualized approach could be used for all of the whiskeys in the database, but this is a demo application that I am building to illustrate the process of building a Recommender, so I am going to keep things as simple as possible. There is a similar dynamic tension between effort and completeness in the final step of data preparation: adding images. That will be the topic of my next post.