
Site Reliability Engineer III
Benefits: *Fuel Your Growth with Love's - company funded tuition assistance program * Paid Time Off * Flexible Scheduling * 401(k) – 100% Match up to 5% * Medical/Dental/Vision Insurance after 30 days * Competitive Pay * Career Development * Hiring Immediately
Welcome to Love's: The ITOC Site Reliability Engineer plays a critical role in ensuring the availability, reliability, and resilience of enterprise applications, systems, and infrastructure. This role designs and implements automation, monitoring, and observability solutions that address current needs while anticipating future challenges through AI-driven insights. By leveraging AI for predictive analytics, anomaly detection, and automated incident response, the engineer enhances efficiency, productivity, and system performance. The position also collaborates with cross-functional teams to introduce innovative practices and establish best practices for the ethical and effective use of emerging technologies in operational reliability.
Job Duties:
- Ensure availability and performance of critical infrastructure through enterprise monitoring, observability, and proactive failover actions.
- Define, measure, and track SLIs, SLOs, and SLAs to align system reliability with business expectations.
- Administer and maintain enterprise systems, including updates, patching, backups, and capacity management.
- Design, implement, and maintain automation for provisioning, deployment, and configuration using tools such as Terraform, Ansible, and Python.
- Leverage AI-enabled monitoring and predictive analytics to detect anomalies, forecast capacity needs, and recommend proactive remediation.
- Build dashboards and metrics that deliver intelligent insights, enabling trend analysis, performance optimization, and cost forecasting.
- Respond to outages and performance incidents, ensuring rapid escalation, resolution, and AI-assisted root cause analysis.
- Implement automated incident classification, prioritization, and remediation workflows to reduce recovery time.
- Collaborate with development and operations teams to design resilient, self-healing applications and infrastructure.
- Research and adopt emerging technologies—including generative AI—to enhance monitoring, automation, and infrastructure management.
- Partner across teams to integrate AI tools into workflows, promote AI literacy, and establish best practices for ethical AI usage, data governance, and bias mitigation.
Experience and Qualifications:
- 5+ years enterprise systems administration.
- 3+ years automation/scripting (Python, PowerShell, JavaScript).
- 3+ years cloud platform experience (AWS, Azure).
- Strong organizational, analytical, and communication skills.
- Familiarity with datacenter operations and core server, storage, and networking concepts.
- Ability to apply AI concepts in monitoring, automation, and data analysis.
- AWS and/or Azure Solution Architect certification preferred.
- Cloud platforms (AWS, Azure, GCP) administration and optimization
- Infrastructure automation (Terraform, Ansible, Python, PowerShell)
- Monitoring and observability tools (Datadog, Prometheus, Grafana, Splunk, or similar)
- AI/ML applications for monitoring, anomaly detection, automation, and predictive analytics
- Datacenter operations and core server, storage, and networking concepts
- Incident management and root cause analysis (including AI-assisted log analysis)
- System performance tuning and capacity planning
- CI/CD pipelines and DevOps tooling (Jenkins, GitHub Actions, GitLab CI/CD)
- Strong organizational, analytical, and problem-solving skills
- Effective written and verbal communication across technical and non-technical stakeholders
- Cross-functional collaboration and teamwork
- Adaptability in fast-paced and evolving technology environments
- Continuous learning and curiosity around emerging technologies (especially AI)
- Accountability and ownership in incident response and system reliability
- Partners with teams to optimize workflows and enhance efficiency.
- Bachelor degree in Computer Science or related field is preferred
Skills and Demands:
- Prolonged sitting, some bending and stooping
- Eye strain (screen use)
- Manual dexterity sufficient to operate a computer keyboard and calculator
- Occasional lifting of up to 25 pounds
- Requires normal range of hearing and vision
- Additional hours may be necessary.
Our Culture:
Fueling customers' journeys since 1964, innovation leads the way for this family-owned and operated business headquartered in Oklahoma City. With nearly 40,000 team members, travel stops are the core business along with products and services that provide value for professional drivers, fleets, traveling public, RVers, alternative energy and wholesale fuel customers. Giving back to communities and an inclusive workplace are hallmarks of the award-winning culture.
Love's is an Equal Opportunity Employer. Veterans encouraged to apply.
Job Category: Information Technology