Member of Technical Staff (ML Infrastructure) in Menlo Park

Job description

We’re partnering with a cutting-edge AI startup building next-generation infrastructure to power large-scale, intelligent systems. Their mission is to bridge the gap between world-class AI research and production-grade deployment - enabling faster experimentation, high-performance inference, and reliable large-scale training.

As a Member of Technical Staff (ML Infrastructure), you’ll design and scale the systems that keep state-of-the-art AI running - from distributed training clusters and inference engines to agentic frameworks and post-training pipelines. You’ll work alongside a small, elite team of researchers and engineers who move fast, think big, and take full ownership of their work.

What You’ll Do

Design, build, and optimize high-performance ML infrastructure for large-scale training, inference, and evaluation.
Develop and maintain distributed systems that power large compute clusters and AI networking.
Streamline research workflows and accelerate experimentation by improving data pipelines (data collection, loading, SFT, RL).
Enhance inference performance across both open-source and proprietary inference engines.
Establish strong engineering practices for observability, reliability, and scalability.
Collaborate with researchers and product teams to translate cutting-edge ideas into robust, production-ready systems.

What We’re Looking For

Deep expertise in one or more of the following: inference optimization, GPU performance, cluster scheduling, or large-scale infrastructure.
Strong experience with modern ML frameworks (e.g., PyTorch, vLLM, Verl).
Startup-ready mindset - high ownership, adaptability, and comfort working in fast-moving environments.
Passion for bridging research and real-world impact.

Why This Role

High impact: You’ll ship meaningful work in weeks, not months.
Elite team: Work alongside ex-founders, top AI researchers, and engineers from leading tech companies.
Momentum: Well-funded, fast-growing, and laser-focused on building and shipping real products powered by cutting-edge AI.

Member of Technical Staff (ML Infrastructure)