textisbeautiful.net - Latest entrieshttp://textisbeautiful.net/blog/The latest entries for the site textisbeautiful.neten-usZinniaThu, 23 May 2013 00:05:18 +0000House Cleaning and Publicity http://textisbeautiful.net/blog/2013/05/23/house-cleaning-and-publicity/<p>It has been a while since we updated the blog so we thought we'd give everyone a quick update on what has been happening on <a href="/index.html">http://textisbeautiful.net</a> recently.</p> <p>Firstly, we have cleaned up the blog commenting system. We have added a spam filter and turned out trackbacks. The lesson here is don't publish a blog without appropriate spam filtering on comments. If you do, you will get absolutely inundated with spam. Soon we will also add a captcha to the commenting system to stop our databases filling up.</p> <p>Next, we made some incremental improvements to the site to make it easier to use. This included some tweaking of the help text and addressing our most common error in a more user friendly way. We get inundated with emails from people asking us why their data won't run on the site. 99.99% of the time it is because the data they have isn't between 5,000 and 100,000 character long. Obviously our reporting of this error condition wasn't good enough as it was escaping people's attention so we have made it much more obvious now.</p> <p>Finally, we were recently contacted by the nice people at the <a href="http://collaborativeservicesinc.wordpress.com/" target="_blank">Collaborative Services Blog</a> wondering if they could interview us for their series on Words. Obviously we were happy to help out! You can <a href="http://collaborativeservicesinc.wordpress.com/2013/05/16/a-more-intelligent-word-cloud/" target="_blank">see the results for yourself here</a>.</p> <p>Let me finish by saying we are continuing to work extremely hard on some dramatic improvements to TextIsBeautiful so that it is much more useful and powerful for all our wonderful users. We have had over 20,000 unique visitors to the site since we launched it just over 6 months ago without any real marketing effort on our side which is fantastic and far exceeds our expectations!</p> ryan.stuart.85@gmail.com (rstuart)Thu, 23 May 2013 00:05:18 +0000http://textisbeautiful.net/blog/2013/05/23/house-cleaning-and-publicity/Beyond Word Clouds - Part 2 http://textisbeautiful.net/blog/2013/01/15/beyond-word-clouds-part-2/<p>If you haven't already, why not check out the <a href="http://textisbeautiful.net/blog/2012/12/14/beyond-word-clouds-part-1/">first</a> article in this series. It covers the Text Is Beautiful Concept Cloud, whereas this piece is concerned with the Concept Web.</p> <p>Below is a Concept Cloud and Concept Web for reviews of the <a href="http://en.wikipedia.org/wiki/Sydney_Opera_House" target="_blank">Sydney Opera House</a> on <a href="http://www.tripadvisor.com.au/Attraction_Review-g255060-d257278-Reviews-Sydney_Opera_House-Sydney_New_South_Wales.html" target="_blank">TripAdvisor</a>.</p> <a target="_blank" href="http://imgur.com/WAhjO"><img src="http://i.imgur.com/WAhjO.png" alt="" /></a> <br/><br/> <a target="_blank" href="http://imgur.com/ceNOG"><img src="http://i.imgur.com/ceNOG.png" alt="" /></a> <br/><br/> <p>At first glance, the Concept Web might appear more complicated than the Concept Cloud, however, we believe that it is actually easier to interpret after becoming familiar with it. Let's take a closer look...</p> <h4>What is similar to the Cloud?</h4> <ul> <li><b>Themes</b> - Like the Concept Cloud, the Concept Web represents distinct theme groupings via text colour.</li> <li><b>Word Size</b> - Also like the Concept Cloud, text size represents the prevalence of a concept within the data. You may notice that the Concept Web uses a more subtle variation in sizing than the Concept Cloud. This is due to its fixed positional layout.</li> </ul> <h4>And what is different?</h4> <ul> <li><b>Positional Clustering</b> - In contrast to the Concept Cloud, which arranges concepts for optimal aesthetics, the Concept Web's layout is based on clustering of concepts in two-dimensional space. In practical terms, concepts will be positioned closely to other concepts that they are highly related to. This also means that unrelated concepts will tend to be positioned further away from each other. This proximal dependence should be intuitively clear when viewed in terms of themes; in the Concept Web, concepts with the same colour/theme will always appear relatively close to each other.</li> <li><b>Spanning Tree</b> - Another interesting feature of the Concept Web, is its <a href="http://en.wikipedia.org/wiki/Minimum_spanning_tree" target="_blank">minimum spanning tree</a>, which, based on concept relatedness, can help to provide insight into 'stories' within the data.</li> </ul> <h4>Using the Concept Web</h4> <p>Let's look at a real use case; for this we will be using the Sydney Opera House data already presented above. In the following picture, observe that we can identify the nature of the highlighted theme by identifying its largest and most connected concept.</p> <a target="_blank" href="http://imgur.com/U6krm"><img src="http://i.imgur.com/U6krm.png" alt="" /></a> <br/><br/> <p>In this case it is 'Sydney', which is obviously the geographical location of the Sydney Opera House. Using this process we can rapidly identify the principal thematic topics within any data set. Additionally, remembering that proximity implies relatedness, we can identify how these broad topics relate to each other by observing which themes are adjacent...</p> <a target="_blank" href="http://imgur.com/ivDQ2"><img src="http://i.imgur.com/ivDQ2.png" alt="" /></a> <br/><br/> <p>In the context of this data (user reviews) we can see how people describe the location 'Sydney' in reference to the 'Opera House' itself and the 'view'. Looking closer at these particular themes, there seems to be a strong connection between the <a href="http://en.wikipedia.org/wiki/Sydney_Harbour_Bridge" target="_blank">Sydney Harbour Bridge</a> and impressive/spectacular view. Helpful if you find yourself in Sydney in the future!</p> <p>Finally, let's take a look at one last picture...</p> <a target="_blank" href="http://imgur.com/6IkLf"><img src="http://i.imgur.com/6IkLf.png"alt="" /></a> <br/><br/> Intuitively, it seems plausible that people found the tour of the Opera House to be interesting, informative and covering the history. It also appears to be considered expensive. As you can see, we can use this technique to rapidly elicit topics of interest to manually investigate in the data.</p> <p>We think the Concept Web is a powerful tool for exploring data. Hopefully this article has provided you with some ideas on how to get started with it.</p> kris-blog@textisbeautiful.net (krogers)Tue, 15 Jan 2013 00:07:02 +0000http://textisbeautiful.net/blog/2013/01/15/beyond-word-clouds-part-2/Launch Day - A great start http://textisbeautiful.net/blog/2012/12/20/launch-day-great-start/<p>As you know we started the public push of Text Is Beautiful yesterday and besides a few growing pains we were happy with how things went. Overall, we had about 200 visitors to the site and discovered a few bugs we were quick to remedy. We were happy with how it went and we think it is a great platform to build from.</p> <p>Besides the actual mechanics of the launch, we also saw a couple of pieces of encouraging feedback which is always good. The first came from <a href="http://mrfeinberg.com/" target="_blank">Jonathan Feinberg</a>, the inventor of Wordle, via a <a href="http://blog.wordle.net/2012/12/text-is-beautiful.html" target="_blank">blog post</a>.</p> <blockquote><p>Textisbeautiful.net is a beautiful site, and I highly recommend it.<p></blockquote> <p>Jonathan's work with Wordle was probably the biggest reason why we started this site. We loved the aesthetics of the Word Clouds he produced and we wanted to add some text analytics technology to them so they could be used more as a tool to understand the underlying text. It was very encouraging for us to hear that he liked the site.</p> <p>The next good piece of feedback was in the form of a <a href="http://www.forbes.com/sites/haydnshaughnessy/2012/12/19/how-the-apple-vs-samsung-litigation-hurts-apple/" target="_blank">post at Forbes</a> written by <a href="http://blogs.forbes.com/haydnshaughnessy/" target="_blank">Haydn Shaughnessy</a>. The entire post is a great example of one of the places we see this technology being really useful. Haydn collected some news stories via a Google News search and used Text Is Beautiful to visualise and draw insight from their text content. He finished the article with a statement we strongly agree with:</p> <blockquote><p>As information becomes too voluminous I am convinced we need these tools in our daily lives. They are as important as the written word.</p></blockquote> <p>We couldn't of put it better ourselves. Thanks Jonathan and Haydn.</p> ryan.stuart.85@gmail.com (rstuart)Thu, 20 Dec 2012 01:24:03 +0000http://textisbeautiful.net/blog/2012/12/20/launch-day-great-start/Growing Pains http://textisbeautiful.net/blog/2012/12/19/growing-pains/<p>Whilst Text Is Beautiful has been online for a couple of weeks already, it is only over the last couple of days that we have started openly sharing it with the public. Since we had no real idea about how many users to expect, you could say this was an attempt to ramp things up gently. Already though, we have encountered some teething issues - some particular Wikipedia pages not processing to completion and problems with our load balancing that has been leading to request timeouts in periods transitioning to and from heavy load.</p> <p style="margin: 7px 0px;"><img src="http://i.imgur.com/vIcjv.png" alt="" title="Hosted by imgur.com" style="height: 200px"/><p style="font-size: 0.8em;">* We don't endorse hamster-powered hardware</p></p> <p>We are committed to providing a high-quality and reliable service, but we ask for your patience, especially as we smooth out the wrinkles revealed by this initial large-scale use. We fund the site personally so must tread a fine line between availability and cost-efficiency, but will always endeavour to support our users as best we can.</p> <p>All important information and updates will always be released first on our blog so why not <a href="http://feeds.feedburner.com/textisbeautiful" target="_blank">subscribe</a> and keep informed on the continual improvement and evolution of the site.</p> kris-blog@textisbeautiful.net (krogers)Wed, 19 Dec 2012 05:31:18 +0000http://textisbeautiful.net/blog/2012/12/19/growing-pains/A Deeper Look at the Correlation Wheel http://textisbeautiful.net/blog/2012/12/15/deeper-look-correlation-wheel/<p>It's time to take a deeper look at the Correlation Wheel. We are excited about the correlation wheel. One of the most important statistics used to understand text is co-occurence. Knowing what things co-occur together often can be quite a useful tool when trying to understand not just text, but <a href="http://en.wikipedia.org/wiki/Co-occurrence_networks">a whole range of other data</a> as well.</p> <p>Having said that, there is also a danger that co-occurence can make things even harder to understand. To see why, we need to understand the various ways we can calculate co-occurence.</p> <p>When we say concept X co-occurs with concept Y 10 times, what we are saying is that they appear together in the text 10 times. Pretty straight forward right? Is that a useful piece of information though? Lets assume concept X appears in the text 100 times and concept Y appears 10 times. In the context of concept Y, the fact that it occurs with concept X 10 times is very important - every time you see concept Y you will also see concept X (that's a probability of 100%). But in the context of concept X, its isn't so important. You have a 10% chance that when you see concept X you will also see concept Y. That isn't a very high probability. This is because a statistic like co-occurence, when calculated as a probability, is asymmetric. Depending which direction you look at it, from concept X or concept Y in our example, it can mean vastly different things.</p> <p>With this in mind, our correlation wheel doesn't use an asymmetric co-occurence probability. Instead, it uses a symmetrical statistical measure we call prominence. Two concepts (like our favourite X &amp; Y) will have a high prominence score if they appear together often and apart rarely. It's probably best to illustrate this with another example.</p> <p>Lets revisit the original X and Y scenario. The relationship between X and Y we described earlier wouldn't get a very high prominence score because X appears a lot in the text without Y (90 times in fact). But lets modify the frequencies in our imaginary world. Lets say X appears in the text 12 times, Y appears 11 times and they co-occur 10 times. Because they appear together often (10 times) but appear apart rarely (3 times in total), this is a prominent relationship. It also doesn't matter whether you are looking at this relationship from the context of concept X or Y, the prominence figure is the same. This is now a symmetrical measure.</p> <p>So, it probably goes without saying that our correlation wheel does't just visualise co-occurence probability. Instead it draws links between concepts with high prominence scores. Let see what this actually looks like with some data.</p> <p style="text-align:center;"><img class="img-polaroid" src="http://i.imgur.com/41nsa.png" /><br><small>The Correlation Wheel for the <a href="http://en.wikipedia.org/wiki/Evolution">Wikipedia article on Evolution</a></small></p> <p>As you can see, concepts like <em>evolution</em>, <em>biology</em>, <em>species</em> etc. aren't prominent with any other concepts. You can tell by looking at the concept web or concept cloud that they are important concepts in the text - but they aren't very prominent with any other concepts. In simple terms, these concept appear often in the text with a wide range of different concepts. When you see these concepts in the text, the probability you will see another concept with it (any random concept) is quite high but the probability you see it with a specific concept is quite low. This is because these concepts are so frequent in the text that it occurs with a wide range of different concepts, not just one or two.</p> <p>There are a number of other prominent relationships depicted in this visualisation though. Take the concept <em>Genetic</em> for example. It is strongly correlated with the concepts <em>information</em> and <em>drift</em>. If you didn't know what genetic drift was before visualising this article, you might now be motivated to go find out. Likewise, the concept <em>cells</em> is strongly correlated with <em>bacteria</em>, <em>body</em>, <em>grow</em> and <em>proteins</em>. Didn't know about the biological relationship between cells and proteins? Now might be the time to <a href="https://www.google.com.au/search?q=cells+proteins">google it</a>.</p> <p style="text-align:center;"><img class="img-polaroid" src="http://i.imgur.com/QxMub.png" /><br><small>Inspecting which concepts are correlated with the concept <em>cells</em>.</small></p> <p>We use the correlation wheel to discover the important prominent relationships within text. We find these relationships help us understand the text and can even be a trigger to dig deeper. Prominent concepts are those that appear together regularly and apart rarely. Prominent concepts don't have to appear frequently in the text (we can use the concept web or concept cloud to find those concepts), they just need to have a strong bidirectional relationship with at least one other concept. Head on over to <a href="/create/index.html">the create page</a> and see what hidden relationships you can uncover in your text!</p> ryan.stuart.85@gmail.com (rstuart)Fri, 14 Dec 2012 14:00:00 +0000http://textisbeautiful.net/blog/2012/12/15/deeper-look-correlation-wheel/Beyond Word Clouds - Part 1 http://textisbeautiful.net/blog/2012/12/14/beyond-word-clouds-part-1/<p>As we prepare to herald in the new year, let us take a moment to consider the recent, rapid popularisation of data visualisations on the web. Whether you attribute this to some kind of <a href="http://blog.visual.ly/why-is-data-visualization-so-hot/" target="_blank">biological adaptation to visual processing</a>, short attention spans, another web fad, or any other <a href="https://socialmediachimps.com/2012/03/why-infographics-data-visualization-works/" target="_blank">multitude of reasons</a>, one thing seems to be clear - that there is a great deal of the visualisation terrain yet to be explored.</p> <p>From quantitative to qualitative; simple charts to complex, interactive layouts; few tools seem to have garnered the levels of fame (or notoriety, depending on who you talk to), as the innocuous word cloud. With humble beginnings as frequency-weighted lists of metadata information, services such as <a href="http://www.wordle.net/" target="_blank">Wordle</a> that operate on free-form text have elevated their ubiquity to new heights. Art, qualitative analysis tool, learning apparatus or just pretty filler? Some would <a href="http://www.niemanlab.org/2011/10/word-clouds-considered-harmful/" target="_blank">argue the latter</a>. Here at Text is Beautiful we hold a slightly more nuanced view.</p> <p>Just like any tool there is the potential for abuse and misuse, but let us delve a little deeper...</p> <h4>So what <i>is</i> good about word clouds?</h4> <ul> <li><b>Accessibility</b> - Free, easy to use data visualisation tools encourage people to explore and experiment with information. There may be misteps in the beginning, but we need a starting point from which to improve and evolve our processes.</li> <li><b>Engagement</b> - As a pre-reading or pre-writing tool, word clouds can engage and focus our attention within a certain domain, or even potentiate the creative process through visual stimulus.</li> <li><b>Affirmation</b> - Word clouds can be used to summarise and visually reinforce ideas already covered through written text.</li> </ul> <h4>But what about the limitations?</h4> <ul> <li><b>Lack of context</b> - Most word clouds only manage to utilise word-frequency in their presentation, which provides no way to accurately construe relationships between words without pre-existing knowledge of the data. <li><b>Lack of depth</b> - After any initial points of interest are found there are generally no mechanisms to continue exploring and formulating ideas about the data.</li> <li><b>Lack of insight</b> - Because of these kinds of limitations it may be difficult to use these visualisations to elicit new insight from the data.</li> </ul> <h4>Where to go from here?</h4> <p>Well, we think that a good start might be extending the range of information represented by word clouds.</p> <h4>Enter, The Concept Cloud</h4> <p>Can I draw your attention to a <b>Concept Cloud</b> for the <a href="http://www.imdb.com/title/tt0080684/" target="_blank">Star Wars - The Empire Strikes Back</a> movie <a href="http://www.imsdb.com/scripts/Star-Wars-The-Empire-Strikes-Back.html" target="_blank">script</a>.</p> <a href="http://i.imgur.com/jtOMa" target="_blank"><img src="http://i.imgur.com/jtOMa.png" title="Hosted by imgur.com" alt="" style="height:500px;" /></a> <p>At initial glance it appears to be no different to other, similar-looking word visualisations, but by examining closer, some deeper detail may be revealed:</p> <a href="http://i.imgur.com/yuwMl" target="_blank"><img src="http://i.imgur.com/yuwMl.png" alt="" title="Hosted by imgur.com" style="height:500px;" /></a> <p>Colours in the Concept Cloud are indicative of distinct <b>themes</b>. Themes themselves represent rough groupings of related <a href="/faq/index.html" target="_blank">concepts</a>. In the above case this knowledge may enable us to make an intuitive leap and theorise that the names Milennium Falcon and Falcon are directly related to some kind of ship.</p> <p>Exploring further and inspecting other concepts within the same theme, such as asteroid, space and laser, may give a more clearer picture of the nature of the ship in this story:</p> <a href="http://i.imgur.com/eXDoc" target="_blank"><img src="http://i.imgur.com/eXDoc.png" alt="" title="Hosted by imgur.com" style="height:500px;" /></a> <p>In this way we can begin to generate ideas about relationships in the data, swiftly building a platform from which to launch our manual investigations into the data. We think that's pretty cool.</p> <p>Another interesting use of theme colouring is to select a colour palette that goes from dark to light. The colours will be applied to themes in order of decreasing connectedness. Simply, this means that the colour will fade out to white for the least connected, and generally least pertintent themes. The Concept Cloud below for the Wikipedia article on the <a href="http://en.wikipedia.org/wiki/Tasmanian_devil" target="_blank">Tasmanian Devil</a> illustrates how this feature can instantly draw attention to the central, important concepts in the data.</p> <a href="http://i.imgur.com/PjAyP" target="_blank"><img src="http://i.imgur.com/PjAyP.png" alt="" title="Hosted by imgur.com" style="height:500px;" /></a> <p>So why not generate some Concept Clouds for your data and experiment with the colour palettes. And please don't hesitate to tell us about any interesting finds you make!</p> <p>Hopefully we've provided some worthy ideas that argue for the potential in further experimentation and exploration with word clouds and other simple, qualitiative visualizations. Why not check out <a href="http://textisbeautiful.net/blog/2013/01/15/beyond-word-clouds-part-2/">Part 2</a> where we introduce the Concept Web, which utilises its spatial layout to represent relatedness and structure.</p> kris-blog@textisbeautiful.net (krogers)Fri, 14 Dec 2012 03:51:47 +0000http://textisbeautiful.net/blog/2012/12/14/beyond-word-clouds-part-1/TextIsBeautiful.net is Alive! http://textisbeautiful.net/blog/2012/12/09/textisbeautifulnet-alive/<p>If you are reading this then you have probably already figured out textisbeautiful.net has been launched and is ready to create beautiful visualisations from your text. This website has existed as an idea for approximately 24 months but it wasn't until the end of 2012 when the work was put in to make this site a reality.</p><p>So what is textisbeautiful.net I hear you ask? It's a place you can come with text in your virtual hand (don't worry if you don't have any, you can visualise Wikipedia articles too) and create some awesome visualisations. We think these visualisations will help you and others gain some quick insight into the text. Hopefully this website is our first step in making modern text analytics available to the people of the internet. If it goes well it might even lead to some bigger and better services to help you analise text.</p><p>That's the back story, but what does textisbeautiful.net actually give you? We are launching the site with 3 visualisations to start with. You can see ther three visualisations below. They have been created using the <a href="http://www.imsdb.com/Movie%20Scripts/Star%20Wars:%20Return%20of%20the%20Jedi%20Script.html">script of Star Wars: Return of the Jedi</a>.</p><p style="text-align:center;"><img class="img-polaroid" src="http://i.imgur.com/bJesk.png" alt="Concept Web" /><br /><small>Concept Cloud: Inspired by Wordle but colour has meaning.</small></p><p style="text-align:center;"><img class="img-polaroid" src="http://i.imgur.com/fL1sX.png" alt="Concept Web" /><br /><small>Concept Web: Position and colour matter in this graphic.</small></p><p><img class="img-polaroid" src="http://i.imgur.com/3rhNd.png" alt="Correlation Wheel" /><br /><small>Correlation Wheel:&#160;Visualisation&#160;of highly correlated concepts.</small></p><p>I will resist the urge to get deep into each of these visualisations - that will happen in a future blog post - but a range of questions are answered in the FAQ and each visualisation includes some help text.</p><p>So, in summary, we hope you like the new site and the tools it offers. It is <strong>free</strong> and all the functionality you see now will remain free. We do have hosting costs to pay but that won't come at the expense of what we are offering at launch. We hope you enjoy the new site!</p> ryan.stuart.85@gmail.com (rstuart)Sun, 09 Dec 2012 05:08:28 +0000http://textisbeautiful.net/blog/2012/12/09/textisbeautifulnet-alive/