Letting the Data Tell the Story


In our previous post we described the technique for assigning categories to data, based on input from content experts within a “training database”.  This technique is effective for summarizing large, text-heavy data into specific categories for summaries and improved visualization.  While this approach is useful for those purposes, it will not allow us to uncover new insights or trends because we are imposing a preconceived and finite set of options, or in other words, what we already know. The following describes our approach to using clustering techniques for exploring text-heavy data. We applied this technique to two different datasets: scientific journals [...]