The Reddit network - self posts

This graph was based on data collected from public Reddit posts since 2008, kindly provided by Deimorz.

In order to identify the most interesting cliques, the following process was applied to the data (if you aren't familiar with the nomenclature of graph theory, a "node" is a point on a graph (so one of the subreddits in this case), an "edge" is a link between two nodes and the "degree" of a node is the number of edges connected to it):

  • Removed edges between subreddits that have less than eight occurrances.
  • Removed nodes with a degree greater than 75 (this was enough to get rid of every sub in the top 20 subreddits (by subscriber). Since these subs are likely to link to a wide variety of topics, an association with one of these subs is not particularly interesting to us.
  • Remove any remaining nodes that are now orphaned (i.e. no edges link to them).
  • Use a ForceAtlas layout in Gephi to define the cliques.

The nodes have been sized and coloured according to their degree (i.e. the number of edges connected to them), so subreddits that link to lots of other parts of reddit will appear to be bigger and redder.

There are many different ways to define associations and many different ways to filter/identify niches, so if you have any other ideas then feel free to post something on the comments for this page if you'd like to suggest anything. A link to download the graph file displayed here can be found at the bottom of the page.

The graph below is zoomable, draggable and searchable, so have an explore! It covers posts from Thu, 23 Jun 2005 18:50:02 GMT to Thu, 28 Mar 2013 01:24:17 GMT. If you click on a subreddit, it will open in a new tab.

Powered by
sigma.js
. Best viewed in Google Chrome