Role and Responsibilities:
- Develop, test, and deploy scalable, low-latency machine learning solutions and pipelines, considering various factors such as data characteristics, problem complexity, and computational resource availability.
- Research and explore the latest advancements in machine learning platform technologies, pushing the limits of what is achievable with ML, while staying current with industry trends and developments.
- Experiment with and prototype new ML platforms tailored to specific environments, creating rapid prototypes and proof-of-concepts.
- Automate ML pipelines using CI/CD principles, promoting consistency, reproducibility, and agility across the development lifecycle.
- Ensure model performance on unseen datasets, guaranteeing that it generalizes effectively without overfitting.
- Conduct thorough testing to identify and resolve potential issues, including bias or fairness concerns.
- Optimize model deployment processes, including unit, integration, and stress testing, ensuring high engineering quality.
- Design and build the next-generation machine learning infrastructure to support the simultaneous operation of thousands of model training pipelines and billions of daily batch predictions.
- Work closely with internal ML teams (such as Data Scientists and MLOps teams) to enhance codebase quality and overall product health.
Technologies Utilized:
- Programming Languages: Python, Go
- Machine Learning Frameworks: TensorFlow, PyTorch
- Cloud Platforms: AWS
- Big Data Tools: Spark, Snowflake
- CI/CD and Orchestration Tools: Github Actions, Airflow
- Monitoring Tools: Grafana
Skills and Qualifications:
- Education: Degree in Computer Science or related field.
- Experience: Minimum of 2 years of proven industry experience.
- Programming Skills: Proficient in Python, Go, or other object-oriented programming languages.
- Strong understanding of data structures, algorithms, and software engineering principles.
- Knowledge of mainstream ML libraries (e.g., TensorFlow, PyTorch, Spark ML and/or cloud solutions (e.g., AWS, Sagemaker).
- Familiarity with CI/CD (e.g., Github Actions, Airflow) and big data tools (e.g., MapReduce, Spark, Flink, Kafka, Docker, Kubernetes).
- Database Skills: Experience in SQL and database management, including SQL query optimization.
- Testing Expertise: Experience with unit testing frameworks.