A Deeper Look at the Correlation Wheel

It's time to take a deeper look at the Correlation Wheel. We are excited about the correlation wheel. One of the most important statistics used to understand text is co-occurence. Knowing what things co-occur together often can be quite a useful tool when trying to understand not just text, but a whole range of other data as well.

Having said that, there is also a danger that co-occurence can make things even harder to understand. To see why, we need to understand the various ways we can calculate co-occurence.

When we say concept X co-occurs with concept Y 10 times, what we are saying is that they appear together in the text 10 times. Pretty straight forward right? Is that a useful piece of information though? Lets assume concept X appears in the text 100 times and concept Y appears 10 times. In the context of concept Y, the fact that it occurs with concept X 10 times is very important - every time you see concept Y you will also see concept X (that's a probability of 100%). But in the context of concept X, its isn't so important. You have a 10% chance that when you see concept X you will also see concept Y. That isn't a very high probability. This is because a statistic like co-occurence, when calculated as a probability, is asymmetric. Depending which direction you look at it, from concept X or concept Y in our example, it can mean vastly different things.

With this in mind, our correlation wheel doesn't use an asymmetric co-occurence probability. Instead, it uses a symmetrical statistical measure we call prominence. Two concepts (like our favourite X & Y) will have a high prominence score if they appear together often and apart rarely. It's probably best to illustrate this with another example.

Lets revisit the original X and Y scenario. The relationship between X and Y we described earlier wouldn't get a very high prominence score because X appears a lot in the text without Y (90 times in fact). But lets modify the frequencies in our imaginary world. Lets say X appears in the text 12 times, Y appears 11 times and they co-occur 10 times. Because they appear together often (10 times) but appear apart rarely (3 times in total), this is a prominent relationship. It also doesn't matter whether you are looking at this relationship from the context of concept X or Y, the prominence figure is the same. This is now a symmetrical measure.

So, it probably goes without saying that our correlation wheel does't just visualise co-occurence probability. Instead it draws links between concepts with high prominence scores. Let see what this actually looks like with some data.

The Correlation Wheel for the Wikipedia article on Evolution

As you can see, concepts like evolution, biology, species etc. aren't prominent with any other concepts. You can tell by looking at the concept web or concept cloud that they are important concepts in the text - but they aren't very prominent with any other concepts. In simple terms, these concept appear often in the text with a wide range of different concepts. When you see these concepts in the text, the probability you will see another concept with it (any random concept) is quite high but the probability you see it with a specific concept is quite low. This is because these concepts are so frequent in the text that it occurs with a wide range of different concepts, not just one or two.

There are a number of other prominent relationships depicted in this visualisation though. Take the concept Genetic for example. It is strongly correlated with the concepts information and drift. If you didn't know what genetic drift was before visualising this article, you might now be motivated to go find out. Likewise, the concept cells is strongly correlated with bacteria, body, grow and proteins. Didn't know about the biological relationship between cells and proteins? Now might be the time to google it.

Inspecting which concepts are correlated with the concept cells.

We use the correlation wheel to discover the important prominent relationships within text. We find these relationships help us understand the text and can even be a trigger to dig deeper. Prominent concepts are those that appear together regularly and apart rarely. Prominent concepts don't have to appear frequently in the text (we can use the concept web or concept cloud to find those concepts), they just need to have a strong bidirectional relationship with at least one other concept. Head on over to the create page and see what hidden relationships you can uncover in your text!


Comments are closed.


Pingbacks are closed.

© 2012-2014 Kapiche Limited. Back to Top