Data science is a state of mind

Recently a number of people have emailed asking how they can become data scientists. What degree should they get? What experience? What’s the career path? These are tough questions because, almost by definition, data science crosses multiple disciplines. Whether you analyze business, healthcare, nuclear fusion or weather, the details of data science are always different. Yet what makes something “data science,” far more than any domain or technology, is the approach. It’s a process, not an event. It’s a “how,” not a “what.” Data science is a state of mind.

Anyone can get involved with data science. You don’t have to be a statistician or computer wizard. It isn’t about your finesse with Hadoop, Mahout or Python (and if you don’t know what those are, don’t worry about it). Sure, a deeper understanding of the technical mechanics can really accelerate your efforts, but that’s how to think about those things; as mechanics. Technology is a shoe, not a foot – it makes the journey better but doesn’t take you anywhere by itself. Far more important is the question at hand and your process for tackling it. In fact, if you follow a few simple rules you may be surprised by how far you can get with no technology at all.

For example, we often hear the argument that “visionary” leadership 3 steps to data science thinkingis required for a company to be great. To a non-data-scientist, this is an interesting philosophical point to banter about, bringing to mind visionary examples like Steve Jobs or Jack Welch. A few generations ago the same could have been said for Rockefeller or Carnegie. Yet to a data scientist this “visionary leadership” theory goes in a different direction.

When hearing “visionary leadership is required for a company to be great,” a data scientist sees a hypothesis begging to be tested. Step 1 is to find the “positive.” Ask yourself how you’d know if a company had visionary leadership in the first place. What’s the explicit criteria for vision? If your criteria is “I know it when I see it,” that’s not good enough. It needs to be concrete.

Step 2 is to find the “negative.” How would you know if a company did not have visionary leadership. This is key. If you don’t know explicitly when a condition doesn’t exist, there’s no way to test it.

Step 3 is to start counting stuff. Find random examples of companies, some great, some failures, and figure out for yourself which did (or didn’t) have visionary leaders according to the strict criteria. If a lot of the winners had visionary leaders but the losers didn’t, you may be onto something. If there isn’t any pattern one way or another, the whole visionary leadership thing may be revealed as a poor indicator of likely performance.

Theories like “vision” can sound reasonable at first blush, but three steps and even 10 minutes of Googling can quickly shed a lot of light. Most people would be hard pressed to move past Step 2, since vision is really hard to define – especially without the benefit of hindsight. If a definition of vision survived steps 1 and 2, it could still run into trouble with step 3. For every visionary success there are truck loads of visionary failures (think Pets.com). If you can’t really define the positive, or the negative, and there’s no significant relationship between those attributes and the stuff you’ve counted, at the very least you’ve learned not to put much stock in a theory. You don’t need a supercomputer. You don’t need a terabyte of data. You just need to start thinking about things in a slightly different way and voila… you’re doing it.

These three steps are the Scientific Method at its most basic. Is there more to data science? Yes – of course. It’s an entire discipline, but that’s not to say you have to be a ninja day 1 just to get started. Anyone can take the first step. The hard sciences have all been fashioned around this process, and it isn’t unique to “data science” per se. Data science is a broad umbrella defined by its use of the Scientific Method, in conjunction with data and evidence, to try and better understand things. Is a medical researcher using data science when she looks at clinical data to find correlations? Yes. Is a high performance computing expert using data science when writing algorithms to simulate an earthquake. Yes. Is a marketing person using data science when she wants to get to the bottom of the whole “visionary leadership” theory using a pencil, paper and these three basic steps? Absolutely.

dont be intimidatedOur advice to anyone wanting to become a data scientist is this; just jump in and get started. Don’t be intimidated and don’t wait around for a decree from Mount Olympus. Find a burning question you’re passionate about and start pondering how to begin taking it apart through the simple steps of (1) finding the positive, (2) finding the negative and (3) starting to count stuff. You may not have much data, but even a little is better than nothing. As we say, a small “n” (pieces of data in a sample) is better than an “n” of zero. More than anything, data science is about looking at things from a different perspective with a different set of rules in the search for knowledge. It’s a state of mind we encourage anyone to try.

This Post Has 10 Comments

  1. Kendall

    Nice Thomas. As always you’ve made it all a little less scary and explained it in a way any non-techie can understand. Your articles are positive and get through to experts and nonexperts alike. Nonexperts can’t appreciate the benefits of data science until they realize it is understandable and within their abilities to comprehend.

  2. Ravi

    Yes thanks. Very nice introduction to data science starting.

  3. Nicolien

    Data Science is DEFINITELY NOT a state of mind, neither a watered down 1,2,3 step analysis. To say that anyone can come to any significant, meaningful and useful conclusion, within ANY field of research or study, is in my opinion careless and irresponsible. The scientific process calls for discipline, diligence, experience and insight within a specific field of knowledge, not to mention the importance of peer review.
    I do believe that knowledge is life changing and would encourage EVERYONE to read, analyse and think – but googling and doing a quick positive/negative analysis of data does not make you a data scientist! Knowledgeable people would soon find loopholes and probably discredit your analysis and your name.

  4. Jean

    Thanks. That’s a very validating article 🙂

  5. Ravi

    This article was clearly intended as an introduction to data science, not a full symposium on the scientific method, statistics, big data and everything else that could possibly be considered. In this respect it did a good job in my view and some of the comments in this thread may be overreacting beyond the article’s intent. Yes it could go into more details on methodology and philosophy of science, but I didn’t get the sense it was claiming this or overreaching. Did a nice job for what it set out to as inviting people to make data science less alienating and something to begin seeing more value in.

  6. Emily

    I get the feeling we all value the scientific method, and this is a nice welcoming starter.

  7. Irma V. Stafford

    The word data is the plural of datum, neuter past participle of the Latin dare, “to give”, hence “something given”. In discussions of problems in geometry , mathematics , engineering , and so on, the terms givens and data are used interchangeably. Such usage is the origin of data as a concept in computer science or data processing : data are numbers, words, images, etc., accepted as they stand.

  8. Milo Boone

    With the 1950s came increasing awareness of the potential of automatic devices for literature searching and information storage and retrieval. As these concepts grew in magnitude and potential, so did the variety of information science interests. By the 1960s and 70s, there was a move from batch processing to online modes, from mainframe to mini and microcomputers. Additionally, traditional boundaries among disciplines began to fade and many information science scholars joined with library programs. They further made themselves multidisciplinary by incorporating disciplines in the sciences, humanities and social sciences, as well as other professional programs, such as law and medicine in their curriculum. By the 1980s, large databases, such as Grateful Med at the National Library of Medicine , and user-oriented services such as Dialog and Compuserve , were for the first time accessible by individuals from their personal computers. The 1980s also saw the emergence of numerous special interest groups to respond to the changes. By the end of the decade, special interest groups were available involving non-print media, social sciences, energy and the environment, and community information systems. Today, information science largely examines technical bases, social consequences, and theoretical understanding of online databases, widespread use of databases in government, industry, and education, and the development of the Internet and World Wide Web.

  9. Jerry

    What Ravi said, especially the second response. Relax Nic!

  10. Lucille C. Glenn

    Repeat. Science is an endless process. And, like software engineering, data science at its best is agile and iterative.

Leave a Reply