The Shape of Knowledge: Recommender

Showing posts with label Recommender. Show all posts

Tuesday, April 7, 2015

Revisiting How the Nonprofit Planner Fits Into My Theme

As I get deep in an actual web application, like the Nonprofit Planner and Grant Generator, it is easy to lose track of what my overall goal is with this blog and my book "The Shape of Knowledge".

My main message is, and has been, that there are basic knowledge patterns that capture important types of reasoning. The pattern I am elaborating on here is a Composer, which captures content once and then configures and reconfigures it into various documents to meet a particular need.

The value of identifying these patterns is that a talented developer, like my partner Steve, can write a powerful program that consumes predefined knowledge artifacts created by domain experts and then generates a web app automatically based on that knowledge. I call this type of program an application generator, or an 'apperator', for short.

At the same time, the identification of knowledge patterns empowers us, the knowledge authors who create the knowledge artifacts, because we can use these 'apperator' programs to build our own custom solutions without writing code.

In this particular case, I built a template using KnowtShare (our online collaboration tool), and made it available to charities for developing and documenting their business plans. The shape of that template, and therefore all plans created using it, is a hierarchical tree model. I call this shape a 'triangle' because it grows from a single root to the many 'leaves' of the tree. This tree/triangle is the knowledge artifact that stores the raw material for the Grant Composer. Each charity will create and maintain its own business plan.

There is a second artifact required, and it is built and maintained by me, as a community resource. That is a table -- a 'square' -- containing the information requirements for particular foundations.

The 'apperator' Steve has written is a Composer that knows how to consume these two types of artifacts: a hierarchical KnowtShare file filled with written information items and a .csv file that contains a matrix of communication targets (foundations) and their preferred order for those information items. Context questions are used to identify which branches of the tree and which rows of the table to use when creating a solution at run-time.

I, as the knowledge author, decided to use business plan information to populate a grant application for foundations, so the actual app that is created is a Grant Application Generator. But the same pattern could be used to collect an individual's CV and generate a customized resume for a particular company -- that would be a Resume Generator. Or a student's transcript and essay questions to generate a customized college application -- that would be a College Application Generator. This makes sense: these examples are very similar to the Grant Generator.

What about something further afield? There's a whole industry evolving around using legal boilerplate and context questions to generate legal documents. The Composer pattern could be used to achieve the same results, and with far less programming.

My last post was dedicated to describing how the very same business planning information used in grant applications can be used to generate other reports and marketing materials. All of these are examples of the Composer Pattern and can be created by the same aperator with only minor tweaks.

The Composer is just one pattern. I have written about the Recommender pattern and the Scoring pattern as well, and I will be describing others going forward.

The challenge, and the opportunity, come from thinking about problem solving at a higher level of abstraction than we normally do.

Most applications are built to solve a particular problem. The IT team interviews their customers to determine the functional specifications for an application they want built. Then the IT people go off and build it, checking back in periodically with the customers to make sure they're still on track.

Typically there is a deadline looming, and everyone is focused on the problem at hand. No one is interested in investing time (and money) in designing a meta solution that will make the next problem easier to solve. In fact, if programming is paid for on an hourly basis, there is a disincentive to look for the patterns that will make application development more efficient. But as knowledge authors, and as problem solvers in general, there are many benefits to identifying and harnessing new patterns.

Knowledge patterns only become visible when you are exposed to many different but similar problems, and when you look for them. It also helps to have a vocabulary and a mental framework that make it easier to discuss the patterns you sense are emerging. My goal is to contribute to the creation of those foundational tools, and to create real world examples that will stimulate discussion.

Tuesday, January 27, 2015

The Grant Application Generator

While the process of creating a 'composer' that generates grant applications highlighted the need for charities to do better planning, I have never lost sight of my original intention. Yes, the Nonprofit Planner has value on its own, but the magic of the tool is that it collects important content in a highly reusable form. The first composer we have built to utilize this content is the Grant Generator, and there will be other composers down the road that can use the planning content as well.

In my book, and in prior posts about the Recommender, I talked about knowledge patterns. I have defined a knowledge pattern as a consistent set of knowledge artifacts that can be used over and over again to solve a certain type of problem. The knowledge pattern for a Recommender looks like this:

I've written a series of posts about applying this pattern to build a Whiskey Recommender.

The pattern for a Composer is remarkably similar to that of a Recommender. The pattern for a generic Composer looks like this:

How this general pattern translates into specifics for the Grant Composer is shown here:

Two knowledge artifacts are required for a Composer: a document content tree, which contains the raw material for the document to be composed, and a table, which contains the assembly isntructions.

In earlier posts, I described the Foundation Sequencing Table, which holds the assembly information necessary to generate a grant application. This table will be a community asset that I will build and maintain. There will be only one version of this table because the information requirements for a particular foundation are the same for all charities that apply for grants.

The content to be assembled, however, is unique to each charity; each will have its own business plan, documented using our planning tool. This plan is an asset that will be built and maintained by the charity.

These two knowtifacts, then, are inputs to the composer application. Like all of our applications, the first step in using the tool is to collect the context for the current situation. In general, we run an application to solve a particular problem in the current moment, and the context is the way we describe our current needs.

The Grant Application Generator needs to collect two types of context information.

The first type is information that helps the application 'prune' down the input knowledge artifacts to the relevant columns, tree branches and fields. Questions like 'which foundation', 'which program' and 'long or short answers' fall into this category.

The second type is specific information that refers to this particular grant request only, and therefore cannot come from the business plan. Information in this category includes the request amount and date. This information must be collected at run-time so it can be filled in at the appropriate locations in the generated document.

Once the context is collected and 'OK' is clicked, the Grant Generator creates a document that meets the specifications: just the information the target foundation wants to see, and in the order they want to see it. (I've grayed out this charity's info, to protect their privacy.)

In my next post I'll discuss some of the details about how this report is formatted, and how charities can use it.

Monday, August 18, 2014

Deb's Whiskey Recommender

Earlier posts have described the processes, both mental and physical, that I went through to create the knowtifacts necessary for generating a Whiskey Recommender. I used basic Excel skills to create the Options table. I used KnowtShare to generate the Context Decision Tree. I used Google and Bing's image search tools to find suitable images and placed them in a folder. I uploaded these three components into the Recommender Apperator, and it generated "Deb's Whiskey Recommender" for me.

For each of these steps I needed some basic computer skills, but I never needed to write computer code. What was most essential for me to provide was the knowledge of whiskey and an opinion about which type of whiskey was the best choice in different contexts.

Since I wasn't an actual whiskey expert, I had to use the internet to educate myself. I also realized, too late, that I had taken on a really, really complicated subject for what was supposed to be a simple tutorial! This forced me to make several compromises between completeness and manageability. I had to cut corners. Each time I did, I noted that if I were building a real application I might have made different choices, and suggested how a more complete approach might be pursued.

So what did my efforts get me? Here are some screenshots of Deb's Whiskey Recommender, the app generated by the Recommender Apperator:

The logic I embedded in the Context Decision Tree is transformed into a simple wizard.

As the user makes choices, those choices are placed into the 'breadcrumbs' at the top of the page. This navigation device not only makes it clear which path has been taken through the tree, it makes it simple to backtrack to any one of the decisions and change it. When a user clicks on a prior choice, he or she will be taken to that spot in the wizard.

From a 'Shape of Knowledge' perspective, it is worth noting that the Context Decision knowledge is captured in a triangle shape, a tree, but the user interface serves that knowledge up in a linear fashion. This is an example of one of the best practices I talk about in my book: use the best knowledge shape for capturing the knowledge, but use the simplest shape possible when presenting it to a user. In this case, the triangular knowledge is flattened to a line by the wizard-like UI.

Based on this particular set of decisions, a long list of whiskeys is recommended. This path takes the user to the bourbons, both Tennessee Whiskey and Kentucky Straight Bourbon. Here is a sample of the recommended list:

The baseball card format lays out information from the Options Table. Shown are three of the many Kentucky Straight Bourbons in the Whiskey Table. At this point, the implications of the compromises I made become clear: everything on these three cards is identical, except for the brand name. Because all three whiskeys are from the same manufacturer, they have the same image. Because they are all the same type of whiskey, they have the same taste profile. Even though the price levels were calculated individually, they all happen to fall into the same price group. It would have been nicer if I had documented this information down to the individual brand level, but with 450 brands in the data table that wasn't feasible.

The options for knowledge authors building future Recommenders are these: pick a simpler topic, with fewer options; crowd-source the information in the Options Table; or be prepared to do a whole lot of work yourself, which is worth it if you are creating a Recommender to sell, establish your expertise, or to create competitive advantage.

Meanwhile, it is important to remember that the amount of effort required to create a complete data table is minor compared to the effort required to build this sort of application from scratch! The data has to be there, in any case -- there's no avoiding that work. But the ease of building the decision logic in KnowtShare is unparalleled, and the user interface and the computer logic that knits the pieces together comes for free.

This demo application achieved my goals: to document the process of building a Recommender, and to demonstrate that a potentially sophisticated application can be created based solely on domain expertise, no computer code required.

Tuesday, August 5, 2014

Creating the Whiskey Context Decision Tree - Part Two

My last post described the general challenges of building a context decision tree, and the specific challenges surrounding my search for expertise regarding whiskey selection. In the end, I decided I would have to rely on my own ideas, developed during the many hours of research I conducted while building the other inputs to the Whiskey Recommender. My immediate need was to create a context decision tree so the Recommender Apperator could generate a Whiskey Recommender app. Once that example is published, I hope to find some real, live whiskey experts that can help me build other, alternative Whiskey Recommenders.

I created my decision tree using our free KnowtShare web application. KnowtShare was designed to make it easy to build tree-shaped models. We, at Apprentice Systems, know from our many years of building intelligent systems that trees - hierarchical, non-cyclical models - are essential for capturing several modes of reasoning. Classification, composition and multi-pronged evaluation are all best described using trees. The branching behavior of a decision tree is also easy to document with this sort of model.

While some trees are best built bottom-up, a decision tree is usually built top-down. In other words, you decide the first question you are going to ask, list the possible choices, then determine what choices will be presented based on that first choice, and so on. The tree will document all the possible paths through the context collection process, but users will only travel one of those paths each time they use the application. At the end of each path, a 'leaf' on the tree, will be a recommended option - in this case, a type of whiskey.

I decided, pretty easily in fact, what the first question in my decision tree would be: "Are you going to mix your whiskey or drink it straight up?" I think this is a good first question because most whiskey connoisseurs agree that if you are going to mix your whiskey with Coke, it really doesn't much matter what you drink! Just don't waste a good Scotch! Of course, there are those who will mix an expensive whiskey with Coke, and I need to allow for that possibility as well.

Here is the 'mix it' side of my whiskey decision tree:

The next set of choices I present are using a flavorless mixer (water or club soda) versus a flavored mixer. You can still be a whiskey purist, of sorts, if you only use water or club soda. If your whiskey is to become part of a cocktail, however, it will be difficult to taste the whiskey itself - which is perhaps one of the goals of drinking a cocktail.

I'm pretty comfortable with these first few questions; I think I am on safe ground. Now the going gets much more difficult. I knew the 'flavored mixer' path was taking me towards the cheaper whiskeys, such as Canadian and American blended whiskeys, so I created the next set of choices: do you care more about price or status? If the user selects 'status', I decided to recommend either a Kentucky Straight Bourbon (like Wild Turkey) or a Tennessee Whiskey (like Jack Daniels). Picking 'price' leads to a variety of inexpensive American and Canadian options.

The real puzzler is the next set of options for the non-flavored mixer path. Remember, choices have to be presented in a language that represents the user's perspective (this was covered in the last post). Waxing poetic about the process differences between Scotch, Irish Whiskey and Bourbon won't work here! The target audience of the Whiskey Recommender is someone who is new to the world of whiskey, not an expert. Newbies couldn't care less about the finer points of distilling whiskey, especially since their palates probably can't distinguish the taste differences that result. So I decided to keep things simple, and make the next set of choices "do you prefer American products, or not?"

Finally, at the next level, I reached a point in the tree where I relied on taste profiles to make a distinction. American products were split into 'spicy', which leads to American Rye, and 'sweet', which leads to the bourbons. The non-American choice was split into 'light and smooth' (Irish Blended and Canadian Single Malt) and 'smokey and robust' (Scotch Blended and Single Grain).

This is the point where I need to reiterate an important message from the last post: this is all subjective! Even if I were a whiskey expert, which I'm not, this would still just reflect my opinion. A recommendation, by definition, exists in the world of subjectivity. Otherwise, this would be an 'Answerer', not a Recommender.

The power of the Recommender pattern is that I can create "Deb's Whiskey Recommender" just by building this tree in KnowtShare and combining it with the Whiskey Data Table in the apperator. Then my friend Richard, a real whiskey expert, can generate "Richard's Whiskey Recommender" by creating his own decision tree. He can even reuse my Whiskey Data Table. Better yet, in the future we can crowd source a much more complete data table and make that a community asset we all tap into when we build our own unique Whiskey Recommenders. That's the vision.

Back to my Recommender. Let's take a look at the other major branch of my decision tree, the 'straight up' branch:

I reused the 'do you prefer American products' choice again, and early on in the question sequence, because I needed to account for American Single Malts and I can't do it through a taste profile. American Single Malts are all over the board, taste-wise, because American distillers are using a wide variety of processes to create their single malts. Some are following the Scottish methods, and beating high-end Scotches in blind taste tests. Some are creating their own unique processes that are leading to unique taste profiles. All that binds them together is their country of origin, and that American origin is important to some people, including me. I like our underdog status in the Single Malt arena, and I like to support these distillers when I can.

As for the rest of the 'straight up' branch, it is a taste-driven breakdown between Irish Whiskey and Scotch, and within Scotch, a breakdown into the various regions. While some regions have distinct flavor profiles (like Islay) and some are more diverse (like the Highlands), taste is still the easiest way to direct a Scotch drinker to a particular region.

A real Whiskey Recommender, as opposed to this demo application I am building, could recommend specific Scotch brands, like Laphroaig. To keep things manageable, I have decided to stop my recommendations at a type of whiskey, like Islay Single Malt Scotch. If you've read the other posts about building the Whiskey Recommender, you will know that this is one of many simplifying assumptions I have made during this process.

Now that I've built my Whiskey Context Decision Tree and saved my KnowtShare file, I have all of the knowtifacts necessary to generate my app. The next step will be to enter them into the Recommender Apperator and generate my own custom web app.

Tuesday, July 29, 2014

Creating the Whiskey Context Decision Tree - Part One

Now that my Whiskey Data Table (or Whiskey Options Table) is complete, I can turn my attention to creating the context decision tree. The context decision tree in a Recommender contains a series of questions the user will be asked to determine his or her needs and gather any information about the current situation that may impact the decision being made. It also contains the link between each path through the decision tree and the options, or rows, in the data table.

Given these two important roles -- determining the user's needs and pointing to a particular set of options or solutions -- it is not surprising that the context decision tree is the knowtifact that contains the most critical 'expertise' in a Recommender application. In a complex subject like whiskey, there are hundreds of questions one might ask before making a recommendation, and even then it is not obvious what the best recommendation should be. In other words, creating a context decision tree is hard and full of subjectivity.

Because it is such a subjective exercise, if you ask ten different domain experts to build a context decision tree you are likely to get ten different results. This diversity of opinion reflects real life; if decisions were simple and obvious, we wouldn't need to consult with experts. The value of capturing expertise in a Recommender is multifaceted: it makes knowledge available to a wider audience by embodying it in a web app, it makes it possible to compare and contrast the decision processes of multiple experts, and multiple experts can leverage the same options table to build their own Recommender just by building different context decision trees.

The single biggest benefit of creating a context decision tree, however, is both subtle and surprising, and that is that it forces experts to actually articulate their knowledge. You might think getting experts to talk about what they know is not such a difficult thing to do, and that is true. But getting them to create a context decision tree can be very tough, and it raises issues like 'tacit vs explicit' knowledge and 'shallow vs deep' knowledge.

First, much of an expert's knowledge is typically tacit, especially when it comes to their decision making process. They just 'know' what to do, and they often can't explain why they do it. Building a context decision tree requires that they make this tacit decision process explicit.

Second, much of what passes for expertise is really 'shallow' knowledge like product and historical information, and rote process knowledge. This sort of expertise only requires a good memory. Building a good context decision tree often requires 'deep' knowledge, knowledge not only about what and how things are but why things are the way they are. It requires the ability to apply knowledge, not just repeat facts.

I've said from the beginning, I am not a whiskey expert. I started down the road on this demo application because I found a really cool classification chart for whiskies. I was able to take the next step and build the Whiskey Data Table because collecting data is more about diligence than expertise. But now I needed to find a whiskey expert.

Once again I turned to the internet, which is chock full of websites that claim to contain whiskey expertise. I started doing searches with phrases like "how to pick a whiskey" and "how to select the right whiskey". The search engines were able to pull up hundreds of links that purported to answer that question, but when I read what they had to say it was always a regurgitation of the same information over and over -- the differences between Irish whiskey and scotch, between Kentucky straight bourbon and Tennessee whiskey, between the bourbon process and the scotch process, between the mash bill for Canadian rye whiskey and American rye... The closest anyone came to providing useful information for the task at hand was the taste profiles for different scotch regions. Yes, taste is something a beginner whiskey drinker would care about. Mash bills? not so much.

Which brings us to a classic marketing problem. Users/consumers/customers are always looking for benefits -- touchy-feely, use-case specific benefits that will accrue to them as users. Manufacturers/sellers/retailers tend to want to talk about product features -- concrete, well-documented facts about the item or service they are selling. The marketing people are the ones who must make the connection between the two, and advertising and product literature exist to create that bridge.

In a similar fashion, a context decision tree, which is a series of questions that will be answered by the user, needs to be written in the language of benefits and use-cases. My web research surfaced very little knowledge that would help someone new to the world of whiskey decide where to begin. With so many choices, some of which are quite expensive, how can a relatively inexperienced drinker make a selection that is a good fit for his or her current needs? That is what I needed to know to create a good Whiskey Context Decision Tree.

After approximately 100 hours of total research time -- between building the classification tree, the data table, and general whiskey research -- I was going to have to use my own creativity and judgment to build the context decision tree. It was sure to have a lot of flaws, but at least I would be able to generate an initial version of a Whiskey Recommender.

Wednesday, July 23, 2014

Adding Images to the Whiskey Recommender

While images aren't required for a Recommender app, there is a place for a picture in the 'baseball card' template, just as you would expect for any baseball card. Here is an example of the template, as it is used in the Beach Town Recommender:

I want to add images to the Whiskey Recommender, but this is more difficult than it was for beach towns. The challenge arises, once again, because I am working with a much larger set of possible options -- 450 different whiskey brands!

The Beach Town Recommender has about 30 towns in it, so it was relatively easy to search the internet for good medium-sized images of my beach towns and save them to a folder on my computer. I won't be able to do that for 450 whiskeys; it's just too much work. I'll have to find different images for different types of whiskeys, and stop there.

The basic process I used was to (1) locate images using a search engine, and then (2) save those images to a folder using a simple, consistent naming convention. Then I added a "Picture" column to my data table and (3) entered the file name of the image I wanted to display for that particular whiskey.

I started out with a nice, generic picture of whiskey that I could use for my default image. This is the one I selected:

I named this 'whiskey.jpg' and I copied that name into the entire "Picture" column in my data table. I would copy and paste over this name if I found a better/more specific picture.

Now I moved on to my top level classes in my classification tree: American, Scotch, Canadian and Irish. What sort of images could act as generic images for these high level classes?

I learned after doing a bit of image research, which I did in the image section of both Google and Bing, that there were pictures available that showed groups of whiskey bottles, selected to represent different types of whiskeys. Here's an example:

This is the image I decided to use for 'American'. Someone else had already created this for me -- I could see that different bottles had been cut and pasted into the image -- but that's okay, it saved me some time. What I liked about this picture is that it had a selection of Tennessee Whiskey, Kentucky Straight Bourbon and Rye Whiskeys, making it a snapshot of a variety of American types and brands. I found similar images for Canadian, Irish and Scotch. At this point, rather than copying and pasting image names into my data table, I decided to hold off and see what other pictures I could find.

I was able to find similar multi-bottle pictures for several sub-classes: American rye, Kentucky bourbon, Tennessee whiskey, single malt scotch, and blended scotch. I would have liked to have pictures for the other sub-classes as well, like 'American blended' or 'American corn', but I couldn't find anything appropriate. So I used the generic 'American' image for those types, just as I had to settle for only one image for all of the Canadian whiskeys. At this point, I copied the names of the most specific image I had available into the appropriate cells in my "Picture" column.

Now I assessed the situation. I realized that when someone used the Whiskey Recommender, if they were directed to a particular type of whiskey, like single malt scotch, they would see 20+ 'baseball cards' -- all with different information on them but with the same picture, this one:

That's a little too boring, even for a demo! So I knew I needed to push one more level down in the data, if I could.

There were a couple of different options I could pursue. In single malt scotch I had another column in my data table: "Region". I could look for images that went with Speyside, Highlands, Islay, etc. But the other possibility was to look for images that went with the "Company" dimension, images for Diageo, Suntory/Beam, Sazerac, and the like. I decided to look for company-specific images, because that could add visual diversity to all of the whiskey types, not just single malt scotch.

I was able to find images for Beam Irish, Beam American, Forty Creek Canadian, Sazerac Canadian, Diageo Scotch and Pernod Ricard Scotch. Here is the image for Pernod Ricard Scotch:

After adding these images to my image folder, I copied and pasted the image names into the appropriate cells of my data table. I would love to have a picture of every specific whiskey brand, but for this particular demo, I decided to stop at this point. The Whiskey Data Table -- one of the two knowledge artifacts required to create a Recommender app -- was complete.

When I build the Whiskey Recommender using the Recommender Apperator, I will need to upload my folder of images along with my Whiskey Data Table and Context Decision Tree so that the pictures can appear in my web app.

One final comment about finding appropriate images: I always use my search engine's tools to limit the images I am shown to medium-sized files. That is big enough to provide good resolution in my web app, while saving data storage and reducing load times during execution.

Now I need to move on to creating the final 'knowtifact' that is required to generate a Recommender: the Context Decision Tree.

Thursday, July 17, 2014

Adding Taste Attributes to the Whiskey Data Table

Now that I had a price level attribute in the table (see my last post) I wanted to add something about taste. I wasn't sure what information would be available about taste. There are plenty of qualitative descriptions on the web, describing particular types of whiskeys, but I wanted a single source that covered most of the whiskeys in my database. I found this very helpful graphic:

This picture reminds me a lot of the poster I used as my original knowledge source. It calls out the same four high level classes: Canadian, American, Irish and Scottish. It also has many of the sub-classes but not all of them. For example, it has some of the Scottish single malt regions, but not Campbeltown and Island. And there is no single malt listed under American whiskey, though there are several in my database.

This chart uses what is called a 'radar chart' to depict its information.

There are eighteen axes coming out of the center that show the whiskey's rating on 18 different taste dimensions: sweet, smoky, grainy, vanilla, honey, spicy, briny, malty, cocoa, buttery, toffee, fruit, bacon fat, oaky, caramel, corny, biscuity, and peaty. A given whiskey can have one of three scores on each of these dimensions: 0, 1 or 2. I decided to change these numbers into written descriptions. If a whiskey scored a zero, that flavor is not mentioned at all. If it scores a '1', the flavor is mentioned, and if it scores a '2' I added the word "very". So single malt scotch from Speyside could be described as "very sweet, very honey, very fuity, very oaky, grainy, cocoa, buttery and caramel."

Once again, judgment now plays an important part. While I love this graphic, I think most people's taste buds are not refined enough to taste the subtleties conveyed here. Most of us would be lucky to perceive the "very" strong flavors, let alone the secondary mentions. So I made the decision to simplify even further and only mention the flavors that were strongly associated with the whiskey, the "very" flavors.

I had to decide how to record this information for my data table, because that will have an impact on how it is displayed in the 'baseball card' shown in the final recommendation. I could make a column for each of the 18 flavors and record the score -- 0, 1 or 2 -- but that doesn't seem very user friendly. There would be eighteen attributes listed for each whiskey, many of which will not be applicable. Instead I decided to create a group of columns and label them 'taste1', 'taste2' and so on, and fill in the strong flavors associated with that whiskey. My data table now looks like this:

The number of flavors ranges from a high of seven for Highland single malt scotch to a low of one for several types of American whiskey. I'm okay with this; I think this reflects the differing complexities of different types of whiskey. People aren't buying American corn whiskey -- "moonshine" -- for its sophistication. My only real problem is the fact that I have several whiskey types with no taste information at all, because they are not listed in my taste knowledge source.

To fill in the blanks, I did further research. I found several sources that had taste profiles for both Island and Campbeltown single malts, so I filled those in, using the terminology I had already established.

The real challenge was American single malts. I found a New York Times article that talked about how this is an up-and-coming category, so it is important to have good data, but it also said this is such a diverse group it defies being summarized with a single taste profile. So I did what I have been resisting so far in this project: I created individual taste profiles for each whiskey.

I was able to find reviews for each American single malt in my database, and I used these reviews to create a taste profile containing my established terminology. Once again, if we were crowd-sourcing the data, or if this project were being funded by a customer, this individualized approach could be used for all of the whiskeys in the database, but this is a demo application that I am building to illustrate the process of building a Recommender, so I am going to keep things as simple as possible. There is a similar dynamic tension between effort and completeness in the final step of data preparation: adding images. That will be the topic of my next post.

Tuesday, July 15, 2014

Adding the Price Attribute to the Whiskey Table

My next challenge in building the Whiskey Recommender is to add attributes to the data table that will become the core of the information shown on the 'baseball card' for the recommended whiskies.

I'm going to start with 'price'. I already know I won't be adding actual prices to the table because I've decided to stop my decomposition process at the 'brand' level. Since all the variants of a brand, including bottle size, will have a significant impact on the actual price I need to aggregate that pricing into something more conceptual, like 'price level'.

I start by searching the internet for a downloadable source of whiskey prices. I found one pretty close to home: I live in Northern Virginia, and the liquor stores are state run, or at least state controlled. I found a downloadable file with all the current liquor prices. It looks like this:

This type of file is called a "csv" file, which stands for 'comma separated values'. The first row in a csv file usually contains a description of the data that follows. In this case, I learned that each row in the table contains seven items of information: the description of the item, an identifying code of some sort, the brand, the bottle size, the age of the liquor, its proof, and price.

It's pretty simple to import this kind of file into Excel, which will create a table using the first row for column headings and placing the data in the correct cells beneath.

The next thing I did is eliminate all the rows that did not contain whiskey information. The VA liquor stores sell many different alcoholic products and I needed to pull the irrelevant ones out of the analysis. I also eliminated columns that weren't relevant to my analysis: code, age and proof. If I were decomposing my whiskey brands to a lower level, like different ages, I would need this information, but since I have made the decision to aggregate all the variants into a single price level I can eliminate these columns. Now I was down to description, brand, bottle size and price.

One variant I can't ignore is the impact of bottle size.To make sure I am aggregating 'apples to apples' prices, I need to average prices for one bottle size only; this is called 'controlling' for bottle size. I notice by looking at the data that 750 ml is the most common bottle size, so I used the Excel data functions to filter the data and keep only the rows for 750 ml bottles. This is now the data set I will use for my analysis.

I then used Excel's average function to calculate averages for each description. For example, the data shown above would be included in the 'scotch' average. It's clear from inspection that there are some outliers -- some bottles are very expensive! But our hope, as data analysts, is that the existence of outliers is spread out across different categories, and if there are clusters of them (like in scotch) that is a fair representation of the relative price for that category. Remember, we are trying to calculate a price rating of some sort, not an actual price. Scotch, and in particular, single malt scotch, is relatively expensive.

Once I had calculated an average for each description, I created a 5 point scale and a price range for each, with "5" being the most expensive. I used the description to map as closely as possible into the classification structure I had created in my data table and entered the price rating. To keep with our scotch example, I set the price level for every scotch in the table to "4". Now is where judgment comes in.

If I were building this application for a client, I would take the time to breakout sub-classes like 'single malt' in the Virginia price list so I could actually calculate the average price for that group. But since I am building this app as a demonstration piece, I was comfortable with sampling the single malts (I mean statistically sample, not literally - though I have done some of that in the course of my research) and giving single malt scotch a '5' on price level.

As a final pass of the data, I looked for brands that were priced significantly lower than other brands in their cohort. Different companies will have portfolios of brands in which one is used as their entry price point and is priced lower than the rest. If I can call those out without too much additional effort, I will. So I gave Dewar's a '3' and a scotch called Passport a '2'. I'm not sure that I caught all of these special cases, because the Virginia liquor stores don't sell every whiskey that's in my database, but I did the best I could with the data that is readily available.

Here's a sample of the whiskey data table with the price attribute added:

Notice Speyburn is a single malt scotch that I gave a '3' price rating, based on the actual price being significantly lower than other single malt scotches in the Virginia price list.

Tuesday, July 8, 2014

Creating the Whiskey Data Table

As I mentioned in the last post, the whiskey poster I am using as my initial knowledge source has almost 450 different whiskey brands listed. These brands represent the many options a whiskey drinker has from which to choose. In a Recommender with a smaller number of options, like the Beach Town Recommender, these possible choices can be added as notes to the KnowtShare file and dropped directly into the classification tree, becoming the 'leaves' of that tree. However, 450 notes would be difficult to work with in a visual environment. Data tables are better at that.

So I switched from KnowtShare to Excel and began creating my data table.

I began by creating columns that aligned with the classes and sub-classes I had established in my classification tree.

Each row in the table is an option that can be recommended. The first column should always hold the option name; in this particular example it is a particular brand of whiskey.

I should point out that 'brand' is not the lowest possible level I could go in whiskey database. A particular brand can have many variants. For example, Macallan, a very fine scotch, comes in Macallan 12, 18 and 21, the number in the name referring to the number of years the whiskey has been aged. There is a basic Macallan, which is aged exclusively in sherry seasoned casks, and a Macallan Fine Oak, which is aged in a combination of sherry and bourbon seasoned oak for a lighter flavor.

In fact, the variants of whiskey seem to be endless, so to make the Recommender manageable I decided to stop at the brand level. This has its own challenges, because my next step will be to add some attributes to the table, attributes like 'taste' and 'price'. Any attributes I add will need to be generalizations, in a sense, because they will be referring to a group of whiskeys that vary in age, process and bottle size. If the data in this table could be crowd-sourced in some way it would be possible to account for this complexity, but as a demonstration app (which is what I am building) I will need to make some simplifying assumptions.

My next task is to expand the table by adding attributes to the whiskey brands.

Tuesday, July 1, 2014

Creating a Whiskey Recommender: The Classification Tree

The whiskey poster I found in an issue of Fast Company (see my last post) showed a complex network of specific whiskey brands and types, but on closer inspection it's clear that it is a single classification tree bound together by class/subclass relationships. There are four major classes of whiskey in this chart: American, Canadian, Irish and Scotch. These classes then break down into sub-classes in a variety of ways, based on the ingredients and process used.

I needed to capture the knowledge from the Whiskey poster in an actionable form, so I used KnowtShare to create a classification tree. This is the American Whiskey branch:

The window in the lower right hand corner is a navigation pane that shows the complete tree.

The KnowtShare app lets me create notes and group them quickly and easily into a hierarchical structure.

The American Whiskey sub-classes are a mix of grain used (wheat, rye, corn) and process (single malt and blended). Bourbon is a unique designation based on both grain requirements (at least 51% corn) and process (charred barrels).

Some of these sub-classes are further broken down based on region and company. The whiskey poster goes on to name specific brands in each of these categories, but with 450 brands listed I opted to stop the KnowtShare tree at this level and handle the brands in another knowtifact: the options data table.

Here are the Canadian and Irish branches of the tree:

The Irish whiskey branch has two new sub-classes: single grain and single pot.

And finally, here is the Scotch whiskey branch:

Scotch is unique because it calls out six different regions for single malt. Each is considered to have its own special taste based on process, but all are made from malted barley.

In KnowtShare, when working with a very large tree such as this, you can use the page itself as the top node of the tree (here it equals "whiskey"). This allows you to arrange the next group of classes in any way that works best visually. The text view will show everything as one comprehensive outline and the .knt file that is saved will combine the four groups into a single hierarchical file.

The next task will be to create the options data table.

Friday, June 27, 2014

Creating a Recommender

I am going to do a series of posts that document the process of creating an app using a knowledge pattern. I am going to use a particular pattern we call a 'Recommender'. The Recommender pattern is shown in both The Shape of Knowledge eBook and in Part 2 of the video series that provides an overview of the eBook's content.

You can view the videos at these links:
Part 1 https://www.youtube.com/watch?v=ui8kjxGmjE0
Part 2 https://www.youtube.com/watch?v=Hma_ho2fHck

The Recommender pattern is made up of one optional knowtifact and two required knowtifacts.

The optional knowledge artifact is a classification tree of the options under consideration. The two required knowtifacts are a context decision tree and a data table that lays out all of the options and their attributes.

In the book I talk about two simple Recommenders: an app that asks a potential knowledge author questions about what they want to achieve and then recommends the best knowledge artifact for the job (the KA Recommender) and a Beach Town Recommender that asks a future traveler a series of questions and then recommends the best beach towns for vacation.

For this example I am going to create a Whiskey Recommender (or "Whisky" Recommender, if you prefer that spelling of the word; I learned in my research that whiskey aficionados feel passionately about this question).

I picked whiskey NOT because I am an expert - I'm not. I picked it because I ran across this great visual in an issue of Fast Company:

This poster, which is very cool looking, appears to be a hopelessly complex constellation of whiskey names and types, but I knew immediately that it was something much more fundamental -- it is a classification tree for whiskeys.

In The Shape of Knowledge, the classification tree is one of my prime examples of the Triangle shape. This whiskey visual doesn't look very much like a triangle, or 'rooted tree', but it is. My first task in building the Whiskey Recommender is to transform this content into a usable form, and I will do that by organizing the basic tree structure shown in the diagram, using KnowtShare.