Why We Chose Python for Analytics

In this blog post, we will talk about why we decided to use Python for analytics in the Cloud Health Analytics Group at Seagate Technology.

For context, we are a group of engineers with widely different programming backgrounds. Some of us have been using tools like C++ and Fortran to solve complex generalized matrix equations and others have been using Matlab and Mathematica for fast analysis and basic modeling of collected data sets. Our group currently focuses on scalable monitoring software. As part of our effort, we work on three major components: telemetry, analytics and visualization. In a recent blog post, we detailed how we visualize the collected data. In this blog post, we will focus on the analytics we have to do or, more precisely, why we decided to use Python as our tool. It really boils down to three S’s: Speed, Support and Scope.

Speed:

By speed, we mean the speed with which new features can be developed. Python is a high-level language, which means it has a number of benefits that accelerate our code development. The first benefit of the high-level character of Python is the fact that it makes prototyping ideas and code fast. Another benefit is that Python is relatively easy to learn. That has been ideal for a group like ours where people – as with many data science groups – come from different programming backgrounds. Finally, but most importantly, there is a great transparency between code and execution. This transparency eases both maintenance of the code (rewriting, finding bugs etc.) and the process of adding to the code base in our multi-user development environment.

Support:

Python is widely used for scientific computing in both academia and industry. As a consequence, a large number of useful analytics libraries are available (and well tested!), including packages for numerical computing, data analysis, statistical analysis, visualization, and machine learning. All you really need to do in order to get going on a topic is to Google ‘Python + [your analytics approach/tool]’ and soon after you can be testing code that does the analytics you desired and have vast amounts of documentation and examples at your hand to guide you.

Scope:

Python supports object oriented programming and advanced data structures such as lists, tuples, sets, dictionaries and so on. Also matrix operations can be used with the numpy library and the package pandas supports data frames. Having these abilities within the Python scope helps simplify and speed up data operations.

Another really important aspect of Python is the fact that Python is freely available and that a piece of code developed on one platform is portable to other platforms. Python runs under both Windows and Linux environments.

Principles we use for our Python programming

To conclude, let us mention a few concepts we use to guide our coding style – also referred to as The Zen of Python:

    “Beautiful is better than ugly.

    Explicit is better than implicit.

    Simple is better than complex.

    …

    Readability counts…” 

In short, we use a coding style which enhances readability and maintainability.

 

Authors: Christian B. MadsenEstelle Cormier, and Javier Von Stecher

2014-04-29T19:28:58+00:00

About the Author:

One Comment

  1. […] a go-to tool for practicing data scientists today. Our colleagues here at Seagate recently wrote a blog on the merits of using Python for analytics. We hope to convince you to include R in your data science toolkit as […]

Comments are closed.