Skip to main content
TRYremote

OO’Reilly Media is hiringCloud Operations Engineer

TYPEFull Time
SALARY$128,000 - $174,000
POSTED6h ago

Description

About O’Reilly Media

O’Reilly’s mission is to change the world by sharing the knowledge of innovators. For over 45 years, we’ve inspired companies and individuals to do new things—and do things better—by providing them with the skills and understanding that’s necessary for success.

At the heart of our business is a unique network of experts and innovators who share their knowledge through us. O’Reilly Learning offers exclusive live training, interactive learning, a certification experience, books, videos, and more, making it easier for our customers to develop the expertise they need to get ahead. And our books have been heralded for decades as the definitive place to learn about the technologies that are shaping the future. Everything we do is to help professionals from a variety of fields learn best practices and discover emerging trends that will shape the future of the tech industry.

Our customers are hungry to build the innovations that propel the world forward. And we help you do just that.

Learn more

Diversity

At O’Reilly, we believe that true innovation depends on hearing from, and listening to, people with a variety of perspectives. We want our whole organization to recognize, include, and encourage people of all races, ethnicities, genders, ages, abilities, religions, sexual orientations, and professional roles.

Learn more

About the Team

O'Reilly Media's Cloud Operations Engineering team is a diverse group of engineers responsible for the infrastructure, developer platforms, and automation that let our software and business teams focus on delivering business value — without worrying about how or where their code runs.


We operate at the intersection of platform engineering, site reliability, and cloud infrastructure. We're a collaborative, supportive team that believes in "raising the water level" — giving every engineer the opportunity to grow across our full stack and to actively help their teammates do the same.

About the Role

As a Cloud Operations Engineer at O'Reilly, you'll work on the systems and tooling that power our learning platform. This is not a pure ops role — it's a software-forward engineering position where you'll write infrastructure-as-code, build developer tooling, maintain our Kubernetes platform, and contribute to the internal developer experience that hundreds of engineers depend on every day.

You'll operate across what modern organizations call Platform Engineering and SRE: building reusable infrastructure primitives, maintaining production reliability through solid observability practices, and partnering with product engineering teams to enable faster, safer delivery.

Your day-to-day will vary, but you can expect to regularly encounter:

  • Maintaining and updating our Kubernetes cluster to ensure steady-state operations
  • Writing or extending Terraform modules to provision and manage cloud infrastructure
  • Contributing features to the Python CLI tooling we use to manage infrastructure workflows

What You'll Do

Platform & Infrastructure

  • Design, build, and maintain cloud infrastructure using infrastructure-as-code (Terraform) on GCP
  • Manage and evolve our Kubernetes platform, including cluster operations, workload configuration, and service mesh (Istio)
  • Develop and improve internal tooling that abstracts cloud complexity and improves the developer experience
  • Collaborate with product engineering teams to understand service deployment needs and deliver infrastructure solutions

Reliability & Observability

  • Monitor platform health using Datadog; proactively identify and resolve performance, availability, and security issues
  • Participate in on-call rotation and incident response; drive blameless post-mortems and eliminate recurring issues at their root cause
  • Define and track service-level indicators and objectives (SLIs/SLOs) for critical platform components
  • Implement and refine alerting, dashboards, and runbooks that reduce mean time to resolution

Security & Compliance

  • Embed security best practices into infrastructure workflows (DevSecOps) — not as an afterthought, but as a design principle
  • Help maintain cloud security posture, IAM hygiene, and policy guardrails across our cloud environment
  • Stay current with cloud security developments and proactively surface risks to the team
  • Execute and maintain our automated disaster recovery processes

Collaboration & Growth

  • Work closely with product engineering teams to understand their needs and remove infrastructure friction
  • Document systems, processes, and architectural decisions clearly so knowledge is shared, not siloed
  • Recommend improvements to tooling, architecture, and processes — and help drive them to completion
  • Keep current with the evolving cloud-native ecosystem and bring relevant knowledge back to the team

What You'll Have

Required:

  • Bachelor's degree in Computer Science or a related field
  • 5+ years of experience working in cloud infrastructure, platform engineering, or a related discipline
  • In lieu of degree, equivalent education and/or experience may be considered
  • Hands-on experience with Kubernetes in production environments (cluster management, workloads, networking)
  • Proficiency with infrastructure-as-code tools, particularly Terraform
  • Experience with at least one major cloud provider (GCP, AWS, or Azure)
  • Solid scripting and automation skills in Python, Bash, or a comparable language
  • Experience with modern observability platforms (Datadog, Grafana, or similar)
  • Strong understanding of Linux systems administration
  • Working knowledge of CI/CD concepts and tools (GitHub Actions, ArgoCD, Jenkins, or similar)
  • Excellent communication skills — you write clearly, ask good questions, and explain complex systems accessibly
  • AI-Augmented Development: Has the ability to demonstrate using AI-enabled development tools (e.g., Claude Code, Cursor) to streamline coding, debugging, and infrastructure-as-code authoring.  

Preferred:

  • Experience with service mesh technologies such as Istio or Linkerd
  • Familiarity with GitOps workflows and tools (ArgoCD, Flux)
  • Experience with DevSecOps practices and tooling (Snyk, Trivy, OPA, or similar)
  • Working knowledge of SQL databases (PostgreSQL or MySQL)
  • Familiarity with FinOps practices and cloud cost optimization
  • Experience building or consuming internal developer platforms (IDPs)
  • Configuration management experience (Ansible, Chef, or similar)
  • Relevant certifications (CKA, CKAD, AWS/GCP Professional, or similar)

Our Values

We value engineers who are helpful, respectful, and communicate openly. We believe the best work happens when everyone on the team is empowered to grow, to ask questions freely, and to make things better for the people who depend on what we build. If that resonates with you, we'd love to hear from you.

Additional Information:

  • Salary Range: $128,000 - $174,000
  • At this time, O'Reilly Media Inc. is not able to provide visa sponsorship or provide any immigration support (i.e. H-1B, STEM, OPT, CPT, EAD and Permanent Residency process)
Apply for this job

Please let O’Reilly Media know you discovered this position on TRYremote so we can keep providing you with quality remote tech jobs.

Related JobsSee more remote tech jobs