Azure Databricks Series - Part 5
Databricks Jobs

Today’s post will cover Databricks Jobs. There is also the concept of a Spark Job which will be covered briefly to try to avoid confusion. Spark Job When running a Spark application there is the concept of a Spark job. At runtime, the Spark driver converts your Spark application into a job that is transformed…

Azure Databricks Series - Part 1
Intro to Azure Databricks

Many companies today have aging data architectures. As you look to modernize your traditional ETL pipeline, there is a tool you should keep in mind: Azure Databricks. During your move into Azure, there will probably be a place for Azure Databricks. In the past, general DTS/SSIS pipelines and SQL Server engines were sufficient but with…

Beginning Statistics for Data Science: Types of Data

Statistics is becoming a must learn topic for anyone looking to get into data science. Look at any data scientist job posting, and you will be hard-pressed to find a listing that does not mention a degree in statistics, mathematics, or some experience in analytics as a minimum qualification. Courses in data science are including…

Data Science for Developers Webinar

What exactly is data science?  How does one become a Data Scientist? Data Scientist has been labelled by the Harvard Business Review, as “the sexiest job of the 21st century.” A quick search of job search sites reveal that this field is in high demand. However, no one can agree on a common definition of…

Top Free Data Science Books

There are probably thousands upon thousands of tutorials, articles, videos, and blog posts on all things data science on the internet now. Yet I’m still a big fan of books. Throughout history books have given wisdom, advice, and knowledge to everyone who wants to read them. Seneca, a Stoic philosopher, mentioned something similar: Men who…

Visualizing Sales Data in Python with Matplotlib

In our last post we interpreted a data set with pandas to gain some insights from it. In this post, we will do the same, but instead of interpreting the raw data we will use visualizations to help us determine patterns in the data. But before we dive into the implementation, let’s review the benefits…

Data Science and the Data Science Process

Before we get into the fun part of working with data, let’s break down how data science involves more than just statistics, why it’s becoming more important, and the data science process. Data Science vs. Statistics In short, data science is extracting knowledge from data. But how is that different between statistics? Data science encompasses…

Microsoft Offers R Server Free to Developers

Microsoft is releasing a free developer edition of its Microsoft R Server, an analytics platform for enterprises based on R, the most popular programming language for statistical computing and predictive analytics. The company has been working to integrate R into its big data products since acquiring Revolution Analytics, a provider of R software and services,…

Azure Event Hubs Primer

Event Hubs are part of the Azure Service Bus service and are designed for extremely high speed ingestion of data such as in an IoT environment.  At full scale, they can process 1 million events per second and are the basis for Azure’s Application Insights service.  Event Hubs implement a stream model that can be…

Amazon Releases New Cloud Services at re:Invent Conference

This is a big week for Amazon developers as Amazon’s annual web service developer conference kicks off in Las Vegas and the Day 1 keynote brought with it a slew of new product announcements.  Among them are a new Business Intelligence service, an interesting new device for transporting petabytes of data between data centers, and…

Azure Machine Learning Studio Overview

Microsoft has created a new diagram to help provide an overview of the capabilities and features available in Machine Learning Studio.  For more information, check out this article on MSDN.

Microsoft Announces New Big Data Azure Services

Microsoft announced several new features targeting Big Data processing including support for HDInsights on Ubuntu Linux, as well as a set of new features in their Data Lake services. The first announcement was of the availability of HDInsights, Microsoft’s Hadoop services, on Ubuntu Linux virtual machines.  Features include the ability to create HDInsight clusters from the…