United States Digital Space LLC
SRE/DevOps Engineer - 66765
Job Description
Function Cloud & Data Engineering
Job Title L1 SRE Operations Engineer
Responsibilities
Monitor system health, alerts, dashboards, and logs across cloud and on‑prem infrastructure.
Isolate functional issue with application versus platform.
Execute standardized runbooks for incident resolution, deployments, and routine tasks.
Perform initial triage of incidents and escalates to L2/L2+ as needed to mitigate the issue.
Document new issues, gaps in runbooks, and automation opportunities.
Provide excellent communication to stakeholders during incidents.
Support onboarding of new applications into the operations framework.
Mandatory Skills
System & Infrastructure Monitoring
– Ability to use monitoring dashboards such as Grafana, Datadog, Splunk, Argos, AIOps to identify anomalies, follow alert workflows, and engage escalation. Example: When a Kubernetes pod crash-loop is flagged in Pro...
Job Title L1 SRE Operations Engineer
Responsibilities
Monitor system health, alerts, dashboards, and logs across cloud and on‑prem infrastructure.
Isolate functional issue with application versus platform.
Execute standardized runbooks for incident resolution, deployments, and routine tasks.
Perform initial triage of incidents and escalates to L2/L2+ as needed to mitigate the issue.
Document new issues, gaps in runbooks, and automation opportunities.
Provide excellent communication to stakeholders during incidents.
Support onboarding of new applications into the operations framework.
Mandatory Skills
System & Infrastructure Monitoring
– Ability to use monitoring dashboards such as Grafana, Datadog, Splunk, Argos, AIOps to identify anomalies, follow alert workflows, and engage escalation. Example: When a Kubernetes pod crash-loop is flagged in Pro...