Vor etwa einem Jahr hat ein Teilnehmer in meines Coursera-Kurses „Big Data Specialication“ in einem der Foren die Titelfrage gestellt. Als ich vor etwa vier Monaten darauf gestoßen bin, habe ich neben anderen Teilnehmern versucht, die Frage zu beantworten.
Solche Abgrenzungen sind natürlich grundsätzlich schwierig. Im Beitrag zur Frage war sogar Business Intelligence und Business Analyst zusammen geschmissen. Grundsätzlich finde ich es auch immer interessant, wenn in einem Big Data oder Data Science-Buch Abgrenzungen sehe. I. d. R. finde ich diese immer etwas seltsam…
Ich hatte etwas Zeit damit verbracht, die Diskussionen zu lesen und darüber nachzudenken. Die Antwort von meiner Seite war dann auch recht umfangreich, weshalb ich gedacht habe, es könnte sich lohnen, diese hier wieder zu geben.
Very interesting discussion. As a BI and DW specialist who is learning more about Data Science and Big Data, let me give my part to this discussion.
Where I‘m working and living (Germany), you can clearly differentiate between a Data Scientist and someone doing BI and DW. This is not primarily about tools. And from my point of view it is changing over time.
Data Scientists I know work with NoSQL, maybe Hadoop ecosystem and Spark and more and more in the cloud. Data comes from everywhere and can be structured or unstructured. Social Media, IoT, Business Data, … And they work with machine learning, statistics, also visualizations. E. g. deep learning with TensorFlow and Keras is very popular and Tableau for visualization and story telling. Some of them are very specialized on certain domains like IoT/time series or banking area (fraud detection, …).
So typical BI/DW-tools (DMBS, Viz-Tools) are also used by Data Scientists. What I would like to see as a Data Scientist is experience in working with math methods and machine learning and knowing specialiced tools like KNIME or know programming with R or Python.
CRISP-DM is a typical process and can be found in different variants. As a result Data Scientists found and explain interesting patterns in data and/or implement data driven solutions to optimize business or extend existing business models (or create new ones like Uber, Spotify, Google, Amazon, …)
But at the end I’m not a Data Scientists. So these are things I,ve learned, what maybe is missing to become on, if ever…
As a BI/DWH guy I follow the process ETL->DWH->BI. Typically with internal business data. My job is to extract, integrate and harmonize data from different sources like ERP systems or databases. We try to create an efficient, current (as needed) and integrated high quality base of data in a core data warehouse (a database) which delivers, based on business specification, transactional and master data.
In times before In-Memory databases, we modeled dimensional schemas delivering data very fast and flexible for queries, reports, dashboards, OLAP analysis or further applications like planning and data mining. For reports and dashboards definition of key performance indicators (KPIs) and a good understanding of the transactional process and master data is very often necessary and part of the project. At the end we deploy the report with BI clients, embedded, in a BI portal, mobile and so on.
While machine learning in DS is rather data driven, OLAP Analysis is hypothesis driven and manual work. At the end both can be done on a DW.
I think on a high level a lot of tasks are very similar. Gathering data. Load data on time or regularly to a kind of database. Integrate data (before doing analysis (BI/schema on write) or while doing analysis (DS/schema on read). Test the solution and deploy it. Maybe working on strategy, governance, operations, authorizations, optimization and so on.
For both there are a lot of tools, methods and approaches doing all this. In the last years I see on the one hand, that more and more classical BI vendors getting open for Data Science and Big Data approaches bringing both worlds together. On the other hand I see in both areas that these are not jobs just for one unicorn but for maybe two (like Data Engineer and Data Scientists) or a whole team. As it is in BI. Very often we have specialists for ETL/DW, for BI Clients or for Planning.
Hope this helps a little bit for future learners.
Maybe on last point. Data Science is much more of interest in these days 🙂 while BI/DW is still there since long time and in a broad range of businesses today. I’m looking forward to learn more and see what happens in the next years with these topics.
Im weiteren Verlauf gab es auch nochmal eine Antwort von einem Mentor mit folgender Meinung:
„In my view, the largest distinction between business intelligence and data science is that the former focuses on reporting what happened in the past, and the later focuses on predicting the future.“
Eine Aussage, welche ich immer wieder höre und etwas seltsam und im besten Fall etwas unzureichend dargestellt finde. Meine Antwort darauf:
I think no one in BI is building a report just to see what happened. This is an interesting discussion which came up very often. Machine Learning too is analyzing past data. Because you don’t have future data…
In BI you work with planning and forecasting (what could be based on predictive analytics or often not). You analyze past patterns and current trends in data to understand influences and changes to make future predictions and support decisions. You simulate and enhance this with expert knowledge like changed processes, planned promotions, new logistic technologies which can not predicted just maybe calculated or simulated.
In BI you also close the loop and bring analytical information back to ERP/OLTP or other operational Systems to support or automate decisions.
Difference between BI and DS is here maybe that in BI decisions and analysis is mostly done manually and hypothesis driven while DS implement solutions which learn by machine and data driven.