Thing 19: Text and data mining

So, data mining or, more specifically, text mining is about using a computer program to seek for a specific word or phrase in a sea of information. And anyone who doesn’t have the ability to create such a program might find one on GitHub.

It’s a world away from my idea of research but an equally valid and potentially incredibly useful one.

The major pitfall (excuse the pun) of course is the fact that the source material  through which you search may be protected by copyright – such as databases which are covered by database rights. This has got to be a huge problem for researchers who rely on TDM to produce results.

I don’t suspect it’ll be a problem that I will encounter all too often but knowing that it’s there and where my researchers can get help is enormously important.

I’d love to see the process of TDM in practice. I’m sure I’ve written incredibly basic computer programs in the past that touch on this. That old BBC Micro program in BBC BASIC I built to “stop” a process when it met an “obstacle”, for instance.



