Thursday, July 17, 2014

Adding Taste Attributes to the Whiskey Data Table

Now that I had a price level attribute in the table (see my last post) I wanted to add something about taste. I wasn't sure what information would be available about taste. There are plenty of qualitative descriptions on the web, describing particular types of whiskeys, but I wanted a single source that covered most of the whiskeys in my database. I found this very helpful graphic:


This picture reminds me a lot of the poster I used as my original knowledge source. It calls out the same four high level classes: Canadian, American, Irish and Scottish. It also has many of the sub-classes but not all of them. For example, it has some of the Scottish single malt regions, but not Campbeltown and Island. And there is no single malt listed under American whiskey, though there are several in my database.

This chart uses what is called a 'radar chart' to depict its information.


There are eighteen axes coming out of the center that show the whiskey's rating on 18 different taste dimensions: sweet, smoky, grainy, vanilla, honey, spicy, briny, malty, cocoa, buttery, toffee, fruit, bacon fat, oaky, caramel, corny, biscuity, and peaty. A given whiskey can have one of three scores on each of these dimensions: 0, 1 or 2. I decided to change these numbers into written descriptions. If a whiskey scored a zero, that flavor is not mentioned at all. If it scores a '1', the flavor is mentioned, and if it scores a '2' I added the word "very". So single malt scotch from Speyside could be described as "very sweet, very honey, very fuity, very oaky, grainy, cocoa, buttery and caramel."

Once again, judgment now plays an important part. While I love this graphic, I think most people's taste buds are not refined enough to taste the subtleties conveyed here. Most of us would be lucky to perceive the "very" strong flavors, let alone the secondary mentions. So I made the decision to simplify even further and only mention the flavors that were strongly associated with the whiskey, the "very" flavors.

I had to decide how to record this information for my data table, because that will have an impact on how it is displayed in the 'baseball card' shown in the final recommendation. I could make a column for each of the 18 flavors and record the score -- 0, 1 or 2 -- but that doesn't seem very user friendly. There would be eighteen attributes listed for each whiskey, many of which will not be applicable. Instead I decided to create a group of columns and label them 'taste1', 'taste2' and so on, and fill in the strong flavors associated with that whiskey. My data table now looks like this:


The number of flavors ranges from a high of seven for Highland single malt scotch to a low of one for several types of American whiskey. I'm okay with this; I think this reflects the differing complexities of different types of whiskey. People aren't buying American corn whiskey -- "moonshine" -- for its sophistication. My only real problem is the fact that I have several whiskey types with no taste information at all, because they are not listed in my taste knowledge source.

To fill in the blanks, I did further research. I found several sources that had taste profiles for both Island and Campbeltown single malts, so I filled those in, using the terminology I had already established.

The real challenge was American single malts. I found a New York Times article that talked about how this is an up-and-coming category, so it is important to have good data, but it also said this is such a diverse group it defies being summarized with a single taste profile. So I did what I have been resisting so far in this project: I created individual taste profiles for each whiskey.

I was able to find reviews for each American single malt in my database, and I used these reviews to create a taste profile containing my established terminology. Once again, if we were crowd-sourcing the data, or if this project were being funded by a customer, this individualized approach could be used for all of the whiskeys in the database, but this is a demo application that I am building to illustrate the process of building a Recommender, so I am going to keep things as simple as possible. There is a similar dynamic tension between effort and completeness in the final step of data preparation: adding images. That will be the topic of my next post.

No comments:

Post a Comment