Recently a number of people have emailed asking how they can become data scientists. What degree should they get? What experience? What’s the career path? These are tough questions because, almost by definition, data science crosses multiple disciplines. Whether you analyze business, healthcare, nuclear fusion or weather, the details of data science are always different. Yet what makes something “data science,” far more than any domain or technology, is the approach. It’s a process, not an event. It’s a “how,” not a “what.” Data science is a state of mind.
Anyone can get involved with data science. You don’t have to be a statistician or computer wizard. It isn’t about your finesse with Hadoop, Mahout or Python (and if you don’t know what those are, don’t worry about it). Sure, a deeper understanding of the technical mechanics can really accelerate your efforts, but that’s how to think about those things; as mechanics. Technology is a shoe, not a foot – it makes the journey better but doesn’t take you anywhere by itself. Far more important is the question at hand and your process for tackling it. In fact, if you follow a few simple rules you may be surprised by how far you can get with no technology at all.
For example, we often hear the argument that “visionary” leadership is required for a company to be great. To a non-data-scientist, this is an interesting philosophical point to banter about, bringing to mind visionary examples like Steve Jobs or Jack Welch. A few generations ago the same could have been said for Rockefeller or Carnegie. Yet to a data scientist this “visionary leadership” theory goes in a different direction.
When hearing “visionary leadership is required for a company to be great,” a data scientist sees a hypothesis begging to be tested. Step 1 is to find the “positive.” Ask yourself how you’d know if a company had visionary leadership in the first place. What’s the explicit criteria for vision? If your criteria is “I know it when I see it,” that’s not good enough. It needs to be concrete.
Step 2 is to find the “negative.” How would you know if a company did not have visionary leadership. This is key. If you don’t know explicitly when a condition doesn’t exist, there’s no way to test it.
Step 3 is to start counting stuff. Find random examples of companies, some great, some failures, and figure out for yourself which did (or didn’t) have visionary leaders according to the strict criteria. If a lot of the winners had visionary leaders but the losers didn’t, you may be onto something. If there isn’t any pattern one way or another, the whole visionary leadership thing may be revealed as a poor indicator of likely performance.
Theories like “vision” can sound reasonable at first blush, but three steps and even 10 minutes of Googling can quickly shed a lot of light. Most people would be hard pressed to move past Step 2, since vision is really hard to define – especially without the benefit of hindsight. If a definition of vision survived steps 1 and 2, it could still run into trouble with step 3. For every visionary success there are truck loads of visionary failures (think Pets.com). If you can’t really define the positive, or the negative, and there’s no significant relationship between those attributes and the stuff you’ve counted, at the very least you’ve learned not to put much stock in a theory. You don’t need a supercomputer. You don’t need a terabyte of data. You just need to start thinking about things in a slightly different way and voila… you’re doing it.
These three steps are the Scientific Method at its most basic. Is there more to data science? Yes – of course. It’s an entire discipline, but that’s not to say you have to be a ninja day 1 just to get started. Anyone can take the first step. The hard sciences have all been fashioned around this process, and it isn’t unique to “data science” per se. Data science is a broad umbrella defined by its use of the Scientific Method, in conjunction with data and evidence, to try and better understand things. Is a medical researcher using data science when she looks at clinical data to find correlations? Yes. Is a high performance computing expert using data science when writing algorithms to simulate an earthquake. Yes. Is a marketing person using data science when she wants to get to the bottom of the whole “visionary leadership” theory using a pencil, paper and these three basic steps? Absolutely.
Our advice to anyone wanting to become a data scientist is this; just jump in and get started. Don’t be intimidated and don’t wait around for a decree from Mount Olympus. Find a burning question you’re passionate about and start pondering how to begin taking it apart through the simple steps of (1) finding the positive, (2) finding the negative and (3) starting to count stuff. You may not have much data, but even a little is better than nothing. As we say, a small “n” (pieces of data in a sample) is better than an “n” of zero. More than anything, data science is about looking at things from a different perspective with a different set of rules in the search for knowledge. It’s a state of mind we encourage anyone to try.