Pages

Tuesday, 18 March 2014

I am not a Node: Rethinking Big Data and Empiricism

Pattern has been of interest to cyberneticians for many years. Bateson coined the term “the pattern that connects” as a metaphor for the ecological perspective of thought necessary to see the world as a whole. Study of patterns within leaves, the patterns of symmetry in the limbs of a beetle, patterns in art, and so on led Bateson to consider the causes of these patterns (he identified 'constraint' as a major cause). Given that we now have the patterns in data analytics, would Bateson seek to explain those in a similar way?

Although such patterns are not 'natural' in the same sense as the patterns on leaves, data is clearly 'out there': both 'big data' and its analytics are now features of the world we live in. The tools and algorithms are there too. But does the presence of data make it's analysis empirical? To address this, we have to consider the nature of the regularities that are produced through the combination of data, algorithms and tools. One of the continual complaints about ‘big data’ analytics was that (for example) social network analysis resulted in pretty pictures that appeared to tell us things that we already knew (but in some kind of “objectified” way). This is not, however, an empirical result. It is instead a confirmation of regularities established elsewhere, reinforcing existing preconceptions and explanations about what we think is happening. In a highly politicised environment where the denial of some regularities can be common among those who would prefer they weren’t true, analytic reports of this kind can be powerful because of their own material constitution (and hence status as an 'object'), of the fact that there’s understandable ‘mathematics’ that sits behind them, and because they appear to show 'evidence'. But in addressing the question of whether this is empirical or ideal, we face some difficult problems.

A typical social network graph shows 'nodes' joined together by 'arcs'. We think of each node in such a diagram as a ‘person’. But this is misleading. In fact each node is no more than a source of "declared relations" to other "sources of declared relations". What's a declared relation? Well, it's a Facebook 'like' or a comment. The important thing about this, though is that looked at in this way, the lines within a social network diagram are constitutive of the nodes: the 'arcs' don't actually 'add' any information; the picture as a whole is expressive of a state of affairs concerning 'sources of declared relations'. This gets more interesting because there are many ways in which we can conceive of a ‘source of declared relations’. It really depends on what we think is declared by a source. In social network analysis, we might believe that "affection" or "attachment bonds" are declared. But the positioning of familial relations (which are likely to be stronger emotional bonds) with friendship relations would appear to demonstrate a wide range of strength of bonds which aren't necessary borne out by the positioning of different nodes on the graph.

By seeing a node as a “source of declared relations” to other sources of declared relations means that different characterisations of a ‘node’ can also be made. Instead of seeing a 'source of declared relations' as a person, we might instead see it as a "document”. How does a document declare its relations? It does it through the words and phrases it uses within it. A simple example of this is the ‘citations’ shown by one document of other documents. Seeing documents as sources of declared relations in this way means that metrics of ‘influence’ of documents, and (in particular) metrics of influence of the authors of documents (where an author is a source of these sources of relations to other sources of relations). A citation is a declared relation of this sort. However, there are other ways in which a document may be seen to exhibit relations to other documents through its exhibition of undeclared relations with other documents. These are the words and phrases which may be present in one document and in other documents.

Given all of this, we can return to question about the empirical nature of big data. Empiricism relies on regularities. Where are the regularities which relate data analytics and the world? On the one hand, we might say that data analytics is a variety of statistics, and that like statistics, data analytics produces regularity-by-proxy. In other words, statistics permits us to wash over the details of individual cases, to produce broad-brush averages of results, and then to produce regularities among those broad-brush pictures according to different sorts of interventions (for example, smoking and cancer). It may be with data visualisations that the principal question concerns the correlation between the viewing of visualisations and the understanding of individuals of the phenomena represented. What might this tell about what must be going on in the heads of people whose communications, data and language is represented pictorially? Maybe the visualisations of data and our reaction to this visualisation gives us a way of being able to explore the different models of agency that are presupposed through our different ideas about information, or our different ideas about agency...

It certainly appears to be the case that in our excitement at being able to produce pretty pictures from data, we have overlooked the need to identify regularities which require explanation. I am sure that if we look harder, we will see correlations between social environment, practice, declarations of social relations, habits, routines, etc. I am sure that if we seek to explain regularities arising from these, our understanding of ourselves and our relation to technology will increase. 

1 comment:

  1. The post is very much valuable regarding daily improvisation and rethinking.

    ReplyDelete