In 1973, when I entered the computer field, the standard format for data was an 80-column punched card and a 132-character line printer. Data was forced to fit into those boundaries, and many creative solutions were crafted to make that happen.
If we needed to store the data, we would put it on paper tape or magnetic tape — 800 bpi, then high density (1600 bpi, 6250 bpi) tape. One day, we installed a disk drive — 5 MB — and could update single records without copying whole files.
Today, we talk about “Big Data” and propose massive reconstruction of our technical mindset to deal with this novel problem. “Big Data” — in quotes and capitalized — is the raging problem of the day.
Does this mean that “Big Data” is more a statement of what limits our technology (and our skills) have, than a statement of what is actually going on in the real world?
A member posted a discussion in the LinkedIn discussion group for the TDWI organization, asking:
“Where BIG DATA starts? 10+TB? 50+TB? 100+TB? 1+PT?”
This question focuses on the size of the data — the amount, or volume, of data under management — as the significant metric. That seems reasonable enough, since the data is characterized as “Big” in the first place. And the volume of the data is one of the three V’s that have been the hallmark of describing, if not defining, what “Big Data” is — the other two V’s are variety (the number of different types of data) and velocity (the rate at which data is created, is delivered, or changes). Many of the responses to the question attempt to define “Big Data” as something that is unmanageable given traditional tools and techniques. Many also take for granted that there is more data to be managed and therefore there is more data that must be managed.
If we recall the transition over the decades — punched cards, paper tape, magnetic tape, “hi-density” mag tape, fixed disk, removable disk, databases, SMP, MPP — each one of these solved the “big data” problem of its day.
“Big Data” (note the capital letters, to indicate that this something special and new) is the problem of our day. It will, in time, become passe as technology solves for it.
So, applying some metric to it seems pointless. It is, as someone pointed out in the discussion, a “big data” problem when you haven’t got the equipment, the skills, the tools, or the knowledge to deal with all the data you have on hand.
In the past, we solved for this problem by reducing the data down to what we could handle — and we did that reduction based on the “4th V” — value.
In the past, we actually did business analysis to determine what data provided value and what was just noise. Then, under the pressure of our limited skill, technology, time and budget, we made choices that fit the data to the business.
We seem to have abandoned this hard skill and these hard choices in the quest for “Big Data”. And, perhaps, there is not enough conversation about the “4th V” — the only “V” that really counts for something.