Tuesday, 13 February 2018

Choosing best programming language for Data Science projects



Big Data and Data Analytics are the hottest tech trends of this decade. As far as the business application is concerned, the language used by an engineer doesn’t mean much. But employer’s IT culture and personal preferences change according to the language that is chosen by the IT professional.
The most important factor to consider while choosing the language for big data project is the goal of the project in hand. If you are manipulating data, testing machine learning models and building analytics, you need t choose a language that is best suited for that task. There are set of languages that excel at operationalizing big data or IoT application. Most data science professionals prefer using R language for the same.
If the project is into data science exploration and development, the most popular language used is Python. There are a number of Python tools and libraries that can be used. The professionals exploring big data sets often choose Python over anything else. The language was ranked as number one by IEEE Spectrum recently. In fact, Python is widely sued outside of data science as a general purpose language.
The reference notebook that the professional is using often becomes a determining factor for the preferred programming language. iPython notebook and Jupyter is closely aligned with Python but it also supports Julia, R, and Scala. Another popular notebook, Apache Zeppelin includes SparkSQL, Scala, and Python.
Smith Panchamia, the senior software engineer at MapR said, “Native languages like C/C++ provide a tighter control on memory and performance characteristics of the application than languages with automatic memory management. A well written C++ program that has intimate knowledge of the memory access patterns and the architecture of the machine can run several times faster than a Java program that depends on garbage collection. For these reasons, many enterprise developers with massive scalability and performance requirements tend to use C/C++ in their server applications in comparison to Java.”
Bloomberg heavily relies on Python for its data science projects. But at the heart, it’s always based on C++. Gideon Mann, the head of data science at Bloomberg said, “Most of the time when we’re doing data science, it’s really to build machine learning products. And because we have all of these real-time latency constraints, we don’t want to use something like Python or Java, where you’re going have garbage collection. You need to be a little worried about intermediate lag. By building out everything in C++, you can deploy it and have a fair amount of latency guarantees.”

- Rajat Kabade

No comments:

Post a Comment