Automation & Benchmarking Engineer
KeySkills
Company Name
Job Description
Key Responsibilities
Design and develop end-to-end automation pipelines for evaluation workflows, including prompt submission, response collection, result aggregation, and reporting.
Integrate evaluation tooling with developer surfaces such as Gemini CLI, VS Code, and GitHub.
Conduct competitive benchmarking against peer AI tools to measure correctness, verbosity, and usefulness.
Build dashboards and visualization reports using Looker Studio, BigQuery, or Python-based tools.
Optimize system performance, automate error logging, and maintain reproducibility across evaluations.
Collaborate with TPM and data specialists to deliver evaluation automation at scale.
Ensure source code management and deployment compliance in GitLab and Bitbucket environments.