Turing

Ai Benchmark Software Engineer - 75243

📍 Location
salvador, bahia
⏰ Job Type
Full-time
📅 Posted
May 23, 2026
Apply Now

Job Description

Role Overview

We are looking for experienced Software Engineers to design and build high-quality multi-agent benchmark tasks based on real-world software engineering workflows.

In this role, you will create tasks grounded in real open-source code changes such as bug fixes, migrations, and refactors. These tasks are used to evaluate how effectively AI agents can understand large codebases, apply precise modifications, and produce correct, testable outputs.

You will work within a structured evaluation framework (Harbor), define clear task instructions, design verification logic, and decompose complex engineering problems across multiple specialized agents.

What does day‑to‑day look like

  • Build multi-agent benchmark tasks based on real-world open-source code changes (bug fixes, migrations, refactors)
  • Work with the Harbor evaluation framework to run and validate tasks inside Docker environments
  • Write clear, precise ...

Ready to Apply?

Take the next step in your career - we're hiring now!

Apply for this Position