This page is dedicated to explaining different concepts that is related to data and their role in a data driven world.

Data Science

Data science is a rather new field of science, which is just started to be more clearly defined. Data science is clearly an emerging field of science, which is clearly shown by the plot below.

The Plot above shows that the google searches for the key words “data science” is at it’s all time high. The science is clearly older that the resent popularity shown in the plot above. Moreover, the plot also tells us that the wide adoption of the key words were not majorly adopted until almost 10 years after the first couple of peaks back in 2004.

According to a great discussion hosted by the Royal Statistical Society the field of data science is not just statistics combined with computer science, but a field that emcompasses a speciality of specific parts of both statistics. Hence, a science in its own right. There are some topics that are heavily associated with data science, which is listed below and covered later on this page.

  • Statistics
  • Machine Learning
  • Big Data
  • Data Mining
  • Text Mining
  • Data Visualisations
  • Modelling
  • Web Scraping

Text Mining

Text mining is the process to convert unstructured data in form of text into structured data and further knowledge and actionable insigts. The process often includes cleaning the text data in order to remove unwanted data and preparing it for analysis. An additional step is to perform a sentiment analysis, which allows comprehension of valency and context to the text analysis. Below there is a basic example which aims to get uninitiated but interested readers a initial understandning. The example exists in two versions, the first does not include R-code and the second one does. Finally, the data set used is the wine mag data set, from Kaggle.

Example with R-code

Example without R-code