I recently had the pleasure of visiting with Arvind Battula, Sr. Data Scientist at Schlumberger. We discussed his background as a chemical and mechanical engineer and his move onto the Data and Analytics team as a data scientist. The following is a transcript of my conversation with Arvind. We discussed his background, his interesting focus areas for data science in oil and gas, and technologies that he believes will help transform the industry.
Kohlleffel: Arvind, you entered the data science world recently on the Schlumberger Data and Analytics team and have a very interesting background coming from both chemical engineering and mechanical engineering disciplines. Tell me about your experience and engineering background.
Battula: Certainly, my background is diverse. I started my formal training as a chemical engineer. After my bachelors, I applied for graduate school in mechanical engineering to deal mostly with computational fluid dynamics. I wanted to pursue a Ph.D. in the same area, but my doctoral work changed direction to focus on nanophotonics, which is the interaction of nanometer-scale objects with light.
Kohlleffel: That makes for quite a compelling base of experience for your data science work. Now that you’ve moved to the Data and Analytics team, where have you focused so far?
Battula: My mechanical engineering background has been very helpful at Schlumberger since we are dealing with designing products that are used in the harshest conditions imaginable on the planet. In everything we do, we must consider very minute design details to ensure the most robust end product. Before we design and build parts and assemblies, we are very thorough in our calculations and modeling–to quantify our engineering and physics assumptions. This is where we leverage data and analytics to bring a new rigor to the process and move beyond some standard linear assumptions which can be obstacles to efficiently model complex phenomena across all variables.
For example, factors like high temperature, high pressure, stress, vibration, corrosion, aging all act in parallel on the mechanical systems. We can look deeply into that data to better understand the combinations of these variables that are causing mechanical failures and then we can bring together the data streams for both physics and engineering.
This non-linear root cause analysis shows us the real world we deal with on a daily basis. It is ideally suited to leveraging big data and analytics and it benefits multiple groups within our company including engineering, manufacturing, sustaining and maintenance.
In my first 9 months on the Data and Analytics team, I have been involved in a wide variety of projects from downhole tool reliability to recruiting optimization, and we’ve been able to leverage a number of analytics tools in that effort including Spark, Hive, YARN, and other components in the Hadoop ecosystem.
Kohlleffel: Have there been any challenges that can only be solved through data exploration and advanced analytics?
Battula: We had a team that was tasked with correlating a diverse set of inputs for new technical recruits, existing recruiter backgrounds and skillsets, and the overall recruiting process. Data and analytics allowed us to perform deep exploration of different trends and variables which would have been impossible to correlate without advanced machine learning and analytics. We arrived at a conclusion that surprised us and could only have come from data analytics.
Kohlleffel: Arvind, you are one of a new generation of data scientists in the Oil and Gas industry today. What is your perspective on blazing a mostly uncharted trail in the industry?
Battula: Data science is a very exciting field and every day presents a new learning experience with new challenges. Whether we are looking at unsupervised machine learning, text analytics, or predictive models, the industry is advancing at a very rapid pace and our company and customers stand to gain substantial benefit from these new analytics. Additionally, the opportunity to apply the leading toolsets to the most challenging problems in our industry is something that I want to do for a very long time.
Kohlleffel: Earlier, you mentioned Apache Spark, Hive, and HBase as technologies that your team uses. What are the top 2-3 technologies that you see having the biggest impact for you and the Data and Analytics team at Schlumberger over the next couple of years?
Battula: Certainly I have a strong opinion on Spark. Spark is a tool going forward that, especially for data scientists, allows you to build an end-to-end pipeline. Spark is rich in APIs from data frames, to data sources, to machine learning.
For a data scientist, building a Spark-based end-to-end solution is quite fascinating, since Spark has taken some of the burden off of the data scientist that would ordinarily require more extensive software engineering skills.
We also see a lot of use cases for both Hive and HBase and have many analytics applications where most of the tables are in Hive and our data science team can write some python-based UDFs to do rapid analysis. It’s another example of a tool where a data scientist doesn’t have to have high-end data modeling or software engineering skills. Spark, Hive, and HBase are all quite powerful in my opinion and demonstrate a great deal of value.
Kohlleffel: What advice would you give someone that is in the business today as an engineer, but is considering a data science role?
Battula: From an engineering perspective, physics-based modeling is clearly required, but not the full solution in all cases. There may be a higher level dimension available to help solve a particular problem and that’s where data and analytics can be applied. Data and analytics can help drive insights which were not clearly visible looking strictly at the physics aspect of the problem. In addition, there is an opportunity to automate and standardize those insights and their associated decisions. That standardization can be highly valuable to an organization with worldwide operations, an extensive set of processes, and differing client requirements.
Kohlleffel: Ok, we’ve heard from Arvind the data scientist and engineer, now give us a small glimpse into Arvind, the individual. What is your favorite restaurant in Houston?
Battula: Easily, Fogo de Chao, as long as someone else is buying!
Kohlleffel: Favorite movie?
Battula: It depends on the language, but my favorite American movie is Spy Game with Robert Redford.
Kohlleffel: Where would you want to live if money were not an issue and you could pick anywhere in the world?
Battula: No question: my native village in India where I have many relatives and much of my family are rice farmers. Being in the farming culture and close to family is something that I really enjoy.
Kohlleffel: Arvind, I really appreciate the time today, and your perspectives on the great work that you and the Data and Analytics team are doing with data science.
2015 年 12 月 14 日