The tech world is home to endless possibilities. While it’s fascinating to see them in action, it could sometimes be a bit challenging to fully understand these technologies.
One such example that is the talk-of-the-town these days is data science. So what really is data science? This is exactly what we are going to talk about today. So stick around.
According to Grand View Research, the data science market is estimated to reach a value of $25.94 billion by 2027 with a CAGR of 26.9%.
Reports by Statista state that the big data market will generate a revenue of $103 billion by 2027.
Source: Statista
Promising stats, right? So how is it crunching these staggering numbers? What’s all this hype about? Let’s get into the meat & potatoes of the subject.
Data science is not just about making complicated models & graphs, fancy visualizations, or using programming languages like Python for writing code.
It is about using data to add value to your company, help them make informed decisions, and create as much positive impact as possible.
Now, value addition & impact can take various forms. It could be valuable insights, product, or product/service recommendations.
It could essentially be anything. To be able to do all that you need data science skills, fancy visualizations, and coding as well.
Knowing what data engineers do will give you an understanding of what really is data science?
Data science engineering is about solving real company problems using data and there are no restrictions as to what tools are used.
There are a few misconceptions about the subject and one reason is that there’s a huge disharmony between what’s popular to talk about and what actually goes on in the industry.
Before data science was popularized, the term “data mining” was mostly used. In a scholarly article “From Data Mining to Knowledge Discovery”, published in 1996, data mining refers to the overall process of discovering valuable information through data.
In 2001, Willian S. Cleveland took data mining to another level by combining it with computer science. He basically made mining data a lot more technical because he believed that would pave way for more possibilities and powerful innovation.
This is also the time when the web 2.0 revolution surfaced. In it, websites were not just a digital platform for accessing information but a medium for shared experiences among millions of users around the world. These revolutionary websites include MySpace (2003), Facebook (2004) & YouTube (2005).
We could now interact with the web through the internet and share content, throw a like or comment on posts – leaving out digital footprints behind & help create an ever-increasing ecosystem that is an extension of ourselves these days and one that we love. And guess what?
All this digital activity is data. Loads of it. It was just too much to handle and we now know it as big data.
Big data opened doors to a whole new world of possibilities in finding valuable insights using electronic information. It also meant that new & more sophisticated infrastructure was required to support handling this unimaginable data.
It required powerful computing technologies like MapReduce, Hadoop, and Spart.
So the rise of big data sparked the rise of data science to support the growing business needs in drawing insights.
Today, data science is almost everything that has something to do with data such as data collection, analysis, & modeling.
The most important aspect. However, is the application. The most interesting & important are AI & ML. so let’s briefly talk about them.
Big data made it possible to train machines with a data-driven approach instead of a knowledge-based one. Something that has the potential to influence the way humans make decisions and perceive the world.
Deep learning is no longer a concept. It is now a reality and affects us on a daily basis. Machine Learning & Artificial Intelligence are dominating the world of data science and it is belittling other aspects such as exploratory analysis, experimentation, and even skills like business intelligence.
There is a general perception that data science is a bunch of researchers joining heads and focusing on AI & ML only.
But in reality, companies are hiring data scientists as analysts. You must be thinking…
While most of the data scientists work on more technical aspects, GAFA companies have so many low-hanging fruits to improve that they don’t really need advanced ML engineers only.
Again, a good data scientist isn’t just someone who can make exceptional data models & visuals. It is about the impact you can have with your skills. You’re not just crunching numbers.
You are a problem solver and a strategist. The employers will present you with problems and you are expected to guide them in the right direction.
Now that we have a better understanding of what really is data science? Let’s conclude with some real-life examples of what Silicon Valley expects from a data scientist. But first, take a look at this chart.
(Source: Hackernoon)
It is a very useful chart and basically describes all the basic elements of data engineering. At the foot, we have the “collect” step. This is obvious, we need to first collect before we can do anything.
The step that is less known is somewhere in between the “learn/optimize” and “aggregate/label” steps.
Everything that’s here is actually one of the most important for companies. It’s the step where the data engineer is guiding companies about what to do with the product.
Here, the “analytics” give you insights like what is happening to my users? How are they interacting with my product etc?
The “metrics” will tell you if you’re successful or not. Then we have “A/B testing” & “experimentation”, which allows seeing things like which product version is the best?
These are very crucial aspects of data science and what most Silicon Valley bosses will expect from you. However, the “AI/Deel Learning” part is what takes away a large chunk of the popularity.
But when you research and give it a thought, it is not what most companies give the most importance to. Or least, it does not yield the maximum outcome for the lowest amount of input. That’s why AI sits at the top of the chart.
So what data scientists do is also dependent on the company size. For example; startups mostly lack sufficient resources and there might only be one data engineer.
That one DS will have to set up the whole data infrastructure. He probably will do everything on the chart except AI because of a lack of resources or may it is not the company’s priority. It can require him/her to write code to add logging, analyze data, build metrics, also perform A/B testing.
They have more resources and have the ability to separate responsibilities between software engineers and data scientists.
Here the “collect” process is mostly software engineer centric. And the “move/store & “explore/transform” part is where data scientists do their magic.
Also, depending on business needs such as product recommendation models and other sophisticated requirements, DS might handle the AI aspect too.
Now let’s talk about large companies. The dynamics here are quite different. Because of the company size and availability of resources, there’s room for more manpower.
That means different people can be working on different things and performing tasks they are good at.
For example: If you are good at analytics, that could be your main focus. So you don’t have to worry about data engineering or AI.
Here the “collect” part is handled by software engineers and the “explore” & “move” aspect is for data engineers. Between “learn” & “aggregate”, we have data science analytics.
As we move to the top i.e; “AI and Deep learning”, it is handled by Research Scientists who are backed by ML Engineers.
So there you have it. A glimpse into the world of Data Science & Data Scientists. Hope you understand it a little better now.
If you are looking to hire some of the best data scientists in the world, get in touch with ARFASOFTECH.
Comments (0)