Beyond Word Clouds - Part 2

If you haven't already, why not check out the first article in this series. It covers the Text Is Beautiful Concept Cloud, whereas this piece is concerned with the Concept Web.

Below is a Concept Cloud and Concept Web for reviews of the Sydney Opera House on TripAdvisor.

At first glance, the Concept Web might appear more complicated than the Concept Cloud, however, we believe that it is actually easier to interpret after becoming familiar with it. Let's take a closer look...

What is similar to the Cloud?

  • Themes - Like the Concept Cloud, the Concept Web represents distinct theme groupings via text colour.
  • Word Size - Also like the Concept Cloud, text size represents the prevalence of a concept within the data. You may notice that the Concept Web uses a more subtle variation in sizing than the Concept Cloud. This is due to its fixed positional layout.

And what is different?

  • Positional Clustering - In contrast to the Concept Cloud, which arranges concepts for optimal aesthetics, the Concept Web's layout is based on clustering of concepts in two-dimensional space. In practical terms, concepts will be positioned closely to other concepts that they are highly related to. This also means that unrelated concepts will tend to be positioned further away from each other. This proximal dependence should be intuitively clear when viewed in terms of themes; in the Concept Web, concepts with the same colour/theme will always appear relatively close to each other.
  • Spanning Tree - Another interesting feature of the Concept Web, is its minimum spanning tree, which, based on concept relatedness, can help to provide insight into 'stories' within the data.

Using the Concept Web

Let's look at a real use case; for this we will be using the Sydney Opera House data already presented above. In the following picture, observe that we can identify the nature of the highlighted theme by identifying its largest and most connected concept.

In this case it is 'Sydney', which is obviously the geographical location of the Sydney Opera House. Using this process we can rapidly identify the principal thematic topics within any data set. Additionally, remembering that proximity implies relatedness, we can identify how these broad topics relate to each other by observing which themes are adjacent...

In the context of this data (user reviews) we can see how people describe the location 'Sydney' in reference to the 'Opera House' itself and the 'view'. Looking closer at these particular themes, there seems to be a strong connection between the Sydney Harbour Bridge and impressive/spectacular view. Helpful if you find yourself in Sydney in the future!

Finally, let's take a look at one last picture...

Intuitively, it seems plausible that people found the tour of the Opera House to be interesting, informative and covering the history. It also appears to be considered expensive. As you can see, we can use this technique to rapidly elicit topics of interest to manually investigate in the data.

We think the Concept Web is a powerful tool for exploring data. Hopefully this article has provided you with some ideas on how to get started with it.


Comments are closed.


Pingbacks are closed.

© 2012-2014 Kapiche Limited. Back to Top