Data Scientist
We are seeking a skilled Data Scientist with 2 to 5 years of experience,
specializing in Machine Learning, PySpark, and Databricks, with a proven
track record in long-range demand and sales forecasting. This role is
crucial for the development and implementation of an automotive OEM’s
next-generation Intelligent Forecast Application. The position will involve
building, optimizing, and deploying large-scale machine learning models for
complex, long-term forecasting challenges using distributed computing
frameworks, specifically PySpark on the Databricks platform. The work will
directly support strategic decision-making across the automotive value
chain, including areas like long-term demand planning, production
scheduling, and inventory optimization.
The ideal candidate will have hands-on experience developing and deploying
ML models for forecasting, particularly long-range predictions, in a
production environment using PySpark and Databricks. This role requires
strong technical skills in machine learning, big data processing, and time
series forecasting, combined with the ability to work effectively within a
technical team to deliver robust and scalable long-range forecasting
solutions.
Role & Responsibilities:
- Machine Learning Model Development & Implementation for Long-Range Forecasting: Design, develop, and implement scalable and accurate machine learning models specifically for long-range demand and sales forecasting challenges.
- Data Processing and Feature Engineering with PySpark: Build and optimize large-scale data pipelines for ingesting, cleaning, transforming, and engineering features relevant to long-range forecasting from diverse, complex automotive datasets using PySpark on Databricks.
- Deployment and MLOps on Databricks: Develop and implement robust code for model training, inference, and deployment of long-range forecasting models directly within the Databricks platform.
- Performance Evaluation & Optimization: Evaluate long-range forecasting model performance using relevant metrics (e.g., MAE, RMSE, MAPE, considering metrics suitable for longer horizons) and optimize models and data processing pipelines for improved accuracy and efficiency within the PySpark/Databricks ecosystem.
- Work effectively as part of a technical team, collaborating with other data scientists, data engineers, and software developers to integrate ML long-range forecasting solutions into the broader forecasting application built on Databricks.
- Communicate technical details and forecasting results effectively within the technical team.
Requirements
Qualifications & Skills
- Bachelor's or Master's degree in Data Science, Computer Science, Statistics, Applied Mathematics, or a closely related quantitative field.
- 2 to 5 years of hands-on experience in a Data Scientist or Machine Learning Engineer role.
- Proven experience developing and deploying machine learning models in a production environment.
- Demonstrated experience in long-range demand and sales forecasting.
- Significant hands-on experience with PySpark for large-scale data processing and machine learning.
- Extensive practical experience working with the Databricks platform, including notebooks, jobs, and ML capabilities.
- Technical Skills: Expert proficiency in PySpark and the Databricks platform. Strong proficiency in Python and SQL. Experience with machine learning libraries compatible with PySpark (e.g., MLlib, or integrating other libraries). Experience with advanced time series forecasting techniques and their implementation. Experience with distributed computing concepts and optimization techniques relevant to PySpark. Hands-on experience with a major cloud provider (Azure, AWS, or GCP) in the context of using Databricks. Familiarity with MLOps concepts and tools used in a Databricks environment. Experience with data visualization tools.
- Analytical Skills: Deep understanding of machine learning algorithms and their application to forecasting. Ability to troubleshoot and solve complex technical problems related to big data and machine learning workflows.
- Preferred Location: Kolkata – should be open to travel to Jaipur & Bangalore.
- Preferred / Good to have: Experience with specific long-range forecasting methodologies and libraries used in a distributed environment. Experience with real-time or streaming data processing using PySpark for near-term forecasting components that might complement long-range models. Familiarity with automotive data types relevant to long-range forecasting (e.g., economic indicators affecting car sales, long-term market trends). Experience with distributed version control systems (e.g., Git). Knowledge of agile development methodologies.
Signs You May Be a Great Fit
- Impact: Play a pivotal role in shaping a rapidly growing venture studio.
- Culture: Thrive in a collaborative, innovative environment that values creativity and ownership.
- Growth: Access professional development opportunities and mentorship.
- Benefits: Competitive salary, health/wellness packages, and flexible work options.