Job description
Machine Learning Engineer – Advanced Systems & High-Performance Computing
A leading technology-driven organization is seeking a Machine Learning Engineer to design and implement large-scale systems for training and deploying ML models. This role offers the opportunity to work at the intersection of machine learning and high-frequency trading, leveraging cutting-edge technologies to accelerate experimentation and optimize performance.
Key Responsibilities:
- Develop distributed training pipelines and optimize real-time inference systems.
- Enhance ML frameworks with custom libraries and GPU acceleration tools.
- Build scalable models for high-volume, low-latency data processing.
- Collaborate with researchers and HPC specialists to streamline workflows and reduce costs.
- Evaluate and integrate open-source tools to improve model development and performance.
Qualifications:
- 5+ years of experience in machine learning, focusing on training and inference systems.
- Expertise in Python, CUDA, or C++ with proficiency in ML frameworks like PyTorch, TensorFlow, or JAX.
- Hands-on experience with GPU programming and distributed training technologies (e.g., Horovod, NCCL).
- Knowledge of cloud platforms and orchestration tools is a plus.
- Contributions to open-source projects in ML or distributed systems are highly desirable.
Why Join?
This is a rare chance to influence cutting-edge technology and drive innovation in a high-performance environment. If you’re passionate about machine learning and scalable systems, apply today!