Search Atlas

Kubernetes Reliability Engineer at Search Atlas

📍 Location
toronto, on
⏰ Job Type
Full-time
📅 Posted
June 05, 2026
Apply Now

Job Description

Be a key player at Search Atlas, architecting Kubernetes-based platforms ensuring robust AI execution with 99.99% reliability. This role demands expertise in Terraform, ArgoCD, and high-concurrency systems.

In the role of Platform Reliability Engineer, you will focus on building and maintaining the Autonomous Nervous System for Atlas Brain. You’ll optimize ML inference pipelines, automate infrastructure processes, and design self-healing systems. The position requires an innovator who can push the boundaries of operational excellence for our autonomous marketing systems.

Key Responsibilities:
• Architect and maintain EKS/GKE-based Kubernetes platforms
• Automate infrastructure deployment with Terraform and ArgoCD
• Optimize high-concurrency crawling systems for real-time decisions
• Establish SLOs for AI execution and agent task completion
• Implement distributed monitoring solutions with OpenTelemetry and Grafana

Req...

Ready to Apply?

Take the next step in your career - we're hiring now!

Apply for this Position