Job Description
Locations
Remote, United States
Overview
At GitHub, we’re building the next generation of AI‑powered developer experiences. We’re looking for a Staff Applied Researcher with deep expertise in Large Language Model (LLM) evaluation, LLM agents, strong engineering instincts, and a bias for action to help shape the future of GitHub Copilot and our AI platform.
This is a high‑impact role where you will design evaluation systems that directly influence how millions of developers experience AI every day.
Responsibilities
- Lead Model Quality & Evaluation
- Design next‑generation evaluation frameworks for code generation, reasoning, safety, multimodal tasks, and agentic workflows.
- Develop scalable automatic metrics, LLM‑judge systems, reward models, and human‑in‑the‑loop evaluation pipelines.
- Establish high‑signal, repeatable methodologies that influence product decisions across GitHub AI.