Build and Automation Engineer
KeySkills
Job Description
Job Description:
We are looking for a hands-on AI/ML Infrastructure Engineer with deep expertise in C++, Python, and build automation. This role is essential in enabling scalable, reliable, and high-performance development and deployment workflows for advanced AI/ML components. You will lead the design and maintenance of automation pipelines, build systems, and deployment environments?especially across Linux and IBM z/OS platforms.
You will collaborate closely with AI researchers, system architects, and mainframe engineers to ensure seamless integration and delivery of mission-critical AI/ML systems.
Key Responsibilities
-
Design and implement robust build automation systems for large, distributed C++/Python codebases supporting AI workflows.
-
Develop tools and scripts that accelerate iteration, testing, and deployment for engineers and researchers.
-
Integrate C++ components with Python-based AI systems using tools like pybind11 or Cython.
-
Lead efforts to create portable, reproducible development environments that match production infrastructure.
-
Maintain and extend CI/CD pipelines on Linux and z/OS, emphasizing testing, artifact handling, and release validation.
-
Proactively monitor build and deployment systems to improve reliability, performance, and automation coverage.
-
Collaborate with cross-functional teams to align infrastructure goals with broader AI/ML and system architecture strategies.
-
Contribute to documentation, process improvements, and internal knowledge sharing.
Required Technical & Professional Expertise
-
Strong programming skills in C++ and Python.
-
Expertise with CI/CD tools such as Jenkins, GitLab CI, or similar.
-
Experience with build systems like CMake, Make, Meson, or Ninja, including cross-compilation.
-
Multi-platform development experience, particularly on Linux and IBM z/OS.
-
Proficiency in integrating native code with Python (e.g., using pybind11 or Cython).
-
Strong troubleshooting skills for build-time, runtime, and integration issues.
-
Proficient in shell scripting (Bash, Zsh).
-
Familiarity with Docker or other container technologies for development and deployment.
Preferred Technical & Professional Experience
-
Working knowledge of AI/ML frameworks (e.g., PyTorch, TensorFlow, ONNX).
-
Experience with z/OS build and packaging workflows and maintaining codebases for IBM mainframes.
-
Understanding of system performance tuning in compute- or I/O-intensive environments.
-
Experience with GPU computing and low-level performance profiling/debugging.
-
Familiarity with managing long-lifecycle enterprise systems with backward/forward compatibility.
-
Contributions to open-source projects in infrastructure, DevOps, or AI tooling.
-
Knowledge of microservices, distributed systems, and REST APIs.
-
Experience in MLOps integration, connecting model development with CI/CD pipelines.
-
Strong communication skills and ability to explain technical concepts to non-technical audiences.
-
Proven record of maintaining code quality, performance, and security standards in AI projects.
-
Demonstrated ability to ensure compliance with industry best practices in AI engineering.
-
Strong interpersonal skills with a track record of cross-team collaboration.
-