Data Science for CEOs: A Terminology Primer

This post is the first in a multi-part series aimed at explaining the core concepts of Data Science to CEOs and executive decision-makers.

Tim McFarland challenged me recently to ideate some small but potent technology primers for the members of the forums at Elevate Performance.

My advice there may end up being more broadly generalized, but it had me thinking about just how many CEOs are currently making decisions pertaining to the field of Data Science.  That's a tricky thing to do if your organization is not a mathematically or technologically focused one, since the business community is inundated with buzzword-heavy sales pitches and impress-you-with-jargon marketing materials which ultimately cloud understanding.

Data Science this.  Analytics that.

For the CEO, it's important to have an understanding of the terminology of the field so that initiatives can be communicated effectively.  If for no other reason, understanding the terminology serves as a defense from having the same terminology used against the CEO--either by an eager consultant in a sales pitch or as a hand-waving technique from a colleague.

Let's get to it.

Data Science

Data Science isn't really a new field at all, although the excitement surrounding it has peaked in recent years.  It's the intersection of Computer Science (software engineering) and the mathematical field of Statistics. Although, as it turns out, statisticians have been using computer models in their field since the advent of computers.

The term has grown to be inclusive of some concepts enabled by relatively recent advances in computing capabilities, such as Machine Learning and other applications of Artificial Intelligence.  Again, these things aren't new, they are just tremendously more accessible now that commodity hardware has the processing power to implement them and standardized tools have emerged.  This has brought Artificial Intelligence out of research laboratories and into industry.

Metrics

In regards to Data Science or data in general, a metric is simply a standardized measurement.  A metric can be anything from a website page view to financial and accounting figures.  If it can be consistently measured, it's a metric.

The collection of metrics is the first step to Analytics.  If you collect enough metrics by an individual's esoteric standards, then you've got Big Data, a concept so arbitrary it's become practically meaningless.

Analytics

"Analytics" is perhaps one of the most widely abused terms in the business community.  I'd wager that the majority of the time when the term is used, it's actually just describing a set of metrics (metrics over time, for instance).  Which is, in and of itself, not truly analytics.  More accurately, Analytics is the process of systematic analysis of a given set of metrics or data points.  The term analytics is also loosely applied to the output of an analytics process, such as visualization of data and collections of observed insights.

If website page views are a metric then Analytics would be the process of identifying time-series trends in page views, comparing page views in a given time period to a past time period, or predicting future page views based on a mathematical model applied to historical data.

Big Data

I take it back.  Big Data is almost certainly the most widely abused term in the business community.  The best way to define the term is, vaguely.  For this reason, I like to think of Big Data as a collection of metrics that grows large enough that an organization's analytics process must increase dramatically in complexity to process it.

For a small organization with a manual analysis process, Big Data might be more information than can be processed in an Excel spreadsheet.  For a medium-sized organization, Big Data might be more data than can fit into memory on a single computer, necessitating a switch to a multi-machine distributed architecture.  For Google, Big Data means engineering their own hardware and distributing computing across tens of thousands of computers in data centers around the world.

Thus the term tends to be a moving target, and an organization's concept of what is truly "Big Data" tends to evolve as their analytics process increases capacity.  In our opinion at Lofty Labs, true Big Data starts when datasets grow to many terabytes in size.  At this scale, tools like Hadoop and Spark become necessary to distribute and analyze data with any efficiency.


In our next Data Science for CEOs, we'll dive into the field of Machine Learning and how it can be applied across a variety of industries.  If you'd like to get notified on new posts, drop your email into the signup form in the sidebar and we'll send you blog updates straight to your inbox!

More from Lofty