Data, whether big, small or indifferent, is really all about counting. Typically, 'big data' involves the counting of words. I rather like the idea of 'indifferent data': how would you count that? Being indifferent, there's not much remarkable, or countworthy, in it one would think. But what makes 'big data' so countable? What, in fact are we counting?
There are two things to clarify here. If you've only heard of 'big data' as the new scientific buzz-word, but not thought about exactly what it is, then my statement that it is simply about "counting" might strike you as a crude oversimplification. But it's not: most techniques of data analysis rely on probabilitistic (hmmm - that's problematic!) theories of Shannon or Bayes, each of which relies on counting like-events and distinguishing them from unlike-events. Yet by counting words in big data - facebook posts, tweets, blogs and so on, we can indeed create remarkable inferences: Google translate does a pretty good job of converting one language to another by simply counting words! But the success of Google translate raises more questions: what is it that happens when we 'count' anything?
David Hume puzzled over this over two hundred and fifty years ago. His question was how scientific knowledge was possible; our question now is something like "how is big data analysis scientific"? Yeats says "measurement began our might" no doubt partly thinking of Blake's foolish Urizen (who Blake saw as a demonic Newton) using his divider to map the heavens, but also acknowledging that some balance had to be struck between Bacon, Newton, Locke and the poets. Hume is perhaps the figure whose scepticism is well-placed to create Yeats's balance. He saw that counting required the identification of analogies: that a 1 by virtue of its similarity to another 1, together make 2, and upon the induction that other 1s will also be analogous, knowledge is founded. Yet, Hume asks, upon what grounds is this similarity determined? The question is more pressing when we try and count words. How many times do I say 'words' in this document? Is each use of the word 'words' an equivalence? Does it mean that each time I mean the same thing? Might I not be like Humpty Dumpty in Alice through the Looking Glass and say that when I use a word, it means whatever I want it to mean?! And what if I look for the word "word", rather than the word "words"? What then?
Counting is the determination and aggregation of analogies, and of surprises - or anomalies. Analogies are not determinable without the determination of anomalies. Fundamentally there is a distinction. More fundamentally, these distinctions have to be agreed - at least between scientists. Since Hume's scientific epistemology was all about the agreement between scientists about causes, the agreement about analogies is a pretty central part of that. Actually, Hume wasn't entirely clear on this. It took a 20th century genius to dig into this problem, as he grappled with the madness of mathematical abstractions in economics. John Maynard Keynes's "Treatise on Probability" of 1921 is his masterwork (not so much the General Theory, which owes so much to it). The Keynesian twist is to see that the business of 'analogising' in order to count is a continual process of breaking down things that we initially see to be 'the same' (in other words, things that we are indifferent to) and gradually determining new surprises (anomalies) and new analogies.
The point is that the agreement of analogies and anomalies is a conversation between scientists. Without actual embodied participation in the phenomena which produce the analogies and anomalies, there is no way of coordinating the conversation. Without any way of coordinating the conversation, there is an encroaching mysticism: nonsense explanatory principles take over - the 21st century equivalent of phlogiston, or the 'dormitive principle of ether' in Moliere. Data becomes a religion divorced from science. Education driven by data in this way is also divorced from science. We end up in the worst-case scenario: an educational system renouncing the humanities and arts because they are unscientific, whilst embracing a science which is in the thrall of quackish data analysis!
Can data restore the scientific balance? Can we answer the question "how is big data analysis scientific"? The trick, I believe is to see the identification and counting of analogies and anomalies as the identification of constraints - the identification of what is not there. The problem with Western science is that it has become over-focused on causation, or presence and actuality. Education is a domain which shows that causation is clearly a nonsense concept: so much idiocy has been devoted to the 'causes of learning' - including the forthcoming Teaching Excellence Framework. (There's no point in fighting the TEF: it will happen, it will fail - and maybe then people will think harder). But science really is about constraint, because all living and non-living things realise the possibilities of their existence within constraints. In education, 'realising the possibilities of existence' is something we call "learning". Teachers manipulate the constraints - if they are themselves free enough of constraints to do the job properly.
We can count words in documents and in doing so we can learn something about the constraints we are operating within. In this blog post, English grammar constrains me as much as the meaning I am trying to convey. We can agree the analogies of our counting. We can critique the analogies of our counting, and seek new analogies and anomalies to focus on. Each step of the way we discover more about the conditions within which we live and the ways those conditions are reproduced and transformed by us. We can, of course, do much more than count words in documents: there are analogies to be found everywhere; new defensible "countings" to be performed. At each level, we see what is not. We will see how warped our education system has become, how its ecology is under threat, how the collapse of university education into apparently 'successful' businesses threatens civil society, how the market in education works like CFCs on the ozone of our social fabric. This is the beginning of an educational metatechnology.
So measurement did "begin our might". But the language of poets and musicians can also be counted in ways which show how an aesthetic ordering of constraint - of what is not - might be coordinated for the flourishing of an ecological social fabric.
There are two things to clarify here. If you've only heard of 'big data' as the new scientific buzz-word, but not thought about exactly what it is, then my statement that it is simply about "counting" might strike you as a crude oversimplification. But it's not: most techniques of data analysis rely on probabilitistic (hmmm - that's problematic!) theories of Shannon or Bayes, each of which relies on counting like-events and distinguishing them from unlike-events. Yet by counting words in big data - facebook posts, tweets, blogs and so on, we can indeed create remarkable inferences: Google translate does a pretty good job of converting one language to another by simply counting words! But the success of Google translate raises more questions: what is it that happens when we 'count' anything?
David Hume puzzled over this over two hundred and fifty years ago. His question was how scientific knowledge was possible; our question now is something like "how is big data analysis scientific"? Yeats says "measurement began our might" no doubt partly thinking of Blake's foolish Urizen (who Blake saw as a demonic Newton) using his divider to map the heavens, but also acknowledging that some balance had to be struck between Bacon, Newton, Locke and the poets. Hume is perhaps the figure whose scepticism is well-placed to create Yeats's balance. He saw that counting required the identification of analogies: that a 1 by virtue of its similarity to another 1, together make 2, and upon the induction that other 1s will also be analogous, knowledge is founded. Yet, Hume asks, upon what grounds is this similarity determined? The question is more pressing when we try and count words. How many times do I say 'words' in this document? Is each use of the word 'words' an equivalence? Does it mean that each time I mean the same thing? Might I not be like Humpty Dumpty in Alice through the Looking Glass and say that when I use a word, it means whatever I want it to mean?! And what if I look for the word "word", rather than the word "words"? What then?
Counting is the determination and aggregation of analogies, and of surprises - or anomalies. Analogies are not determinable without the determination of anomalies. Fundamentally there is a distinction. More fundamentally, these distinctions have to be agreed - at least between scientists. Since Hume's scientific epistemology was all about the agreement between scientists about causes, the agreement about analogies is a pretty central part of that. Actually, Hume wasn't entirely clear on this. It took a 20th century genius to dig into this problem, as he grappled with the madness of mathematical abstractions in economics. John Maynard Keynes's "Treatise on Probability" of 1921 is his masterwork (not so much the General Theory, which owes so much to it). The Keynesian twist is to see that the business of 'analogising' in order to count is a continual process of breaking down things that we initially see to be 'the same' (in other words, things that we are indifferent to) and gradually determining new surprises (anomalies) and new analogies.
The point is that the agreement of analogies and anomalies is a conversation between scientists. Without actual embodied participation in the phenomena which produce the analogies and anomalies, there is no way of coordinating the conversation. Without any way of coordinating the conversation, there is an encroaching mysticism: nonsense explanatory principles take over - the 21st century equivalent of phlogiston, or the 'dormitive principle of ether' in Moliere. Data becomes a religion divorced from science. Education driven by data in this way is also divorced from science. We end up in the worst-case scenario: an educational system renouncing the humanities and arts because they are unscientific, whilst embracing a science which is in the thrall of quackish data analysis!
Can data restore the scientific balance? Can we answer the question "how is big data analysis scientific"? The trick, I believe is to see the identification and counting of analogies and anomalies as the identification of constraints - the identification of what is not there. The problem with Western science is that it has become over-focused on causation, or presence and actuality. Education is a domain which shows that causation is clearly a nonsense concept: so much idiocy has been devoted to the 'causes of learning' - including the forthcoming Teaching Excellence Framework. (There's no point in fighting the TEF: it will happen, it will fail - and maybe then people will think harder). But science really is about constraint, because all living and non-living things realise the possibilities of their existence within constraints. In education, 'realising the possibilities of existence' is something we call "learning". Teachers manipulate the constraints - if they are themselves free enough of constraints to do the job properly.
We can count words in documents and in doing so we can learn something about the constraints we are operating within. In this blog post, English grammar constrains me as much as the meaning I am trying to convey. We can agree the analogies of our counting. We can critique the analogies of our counting, and seek new analogies and anomalies to focus on. Each step of the way we discover more about the conditions within which we live and the ways those conditions are reproduced and transformed by us. We can, of course, do much more than count words in documents: there are analogies to be found everywhere; new defensible "countings" to be performed. At each level, we see what is not. We will see how warped our education system has become, how its ecology is under threat, how the collapse of university education into apparently 'successful' businesses threatens civil society, how the market in education works like CFCs on the ozone of our social fabric. This is the beginning of an educational metatechnology.
So measurement did "begin our might". But the language of poets and musicians can also be counted in ways which show how an aesthetic ordering of constraint - of what is not - might be coordinated for the flourishing of an ecological social fabric.