We're looking for a Senior Site Reliability Engineer who genuinely enjoys the craft. Someone who takes pride in a clean Terraform module, cares about observability because they've felt the pain of flying blind, and believes good documentation is an act of kindness for your teammates. You'll be hands-on with our AWS infrastructure, especially EKS, IAM, and RBAC, building things that are secure by default, not as an afterthought. You'll own our CI/CD pipelines in GitHub Actions, set up guardrails that let engineers ship quickly and confidently, and keep Datadog tuned so we know what's happening in our systems before our customers do. On any given week you might be writing Terragrunt modules, building a Python script to eliminate a tedious manual process, writing a runbook that'll save someone's 2am, or digging through a postmortem with the team with a focus on learning, not blame.
We work in an Agile environment with an on-call rotation. We approach our processes with thoughtfulness and the intent to constantly iterate and make it better. You don't need to have all the answers; you just need curiosity, clear communication, and a willingness to own your slice of the system while keeping it accessible and scalable, enabling us to build together.
What You’ll Do
- Design, scale, and operate resilient, cloud-native infrastructure in AWS with a strong emphasis on EKS, IAM, RBAC, and modern security-first practices.
- Build and optimize CI/CD pipelines with GitHub Actions and GitHub Advanced Security, enabling velocity without compromising safety.
- Own observability across the stack using Datadog (metrics, logging, alerting, and tracing).
- Write and maintain Terragrunt, Terraform modules, and infrastructure-as-code (IaC) automation.
- Develop internal tools and scripts in Python to automate operational workflows and reduce manual overhead.
- Document everything from runbooks to standards so teams stay aligned and systems stay stable.
- Actively contribute to Agile workflows using Jira, with clear tracking of work, priorities, and progress.
- Participate in on-call rotations, postmortems, and continuous improvement efforts — always with a blameless, team-first mindset.
What You’ll Bring
- 4+ years in a Senior SRE or DevOps role supporting production cloud infrastructure at scale, preferably in SaaS, PaaS, high-growth, or fast-paced environment.
- Deep experience with AWS (IAM, EKS, VPC, EC2, Secrets Manager, Serverless) and RBAC.
- Knowledge of compliance standards like HIPAA, HITRUST, or SOC 2.
- Hands-on proficiency with Terraform, Terragrunt, Helm, and container orchestration.
- Proven experience building and maintaining GitHub Actions for CI/CD, including GitHub Advanced Security features like secret scanning and code policy enforcement.
- Strong Datadog experience building dashboards, tuning alerts, setting up monitors, and interpreting telemetry.
- Solid Python scripting experience for automation and internal tools.
- You value clear, accurate documentation as a core part of engineering, not an afterthought.
- Comfortable working in Agile/Scrum environments with well-tracked Jira workflows.
- Practical experience with resource analysis and infrastructure optimization.
- AWS DevOps Engineer Professional Certification
- Familiarity with Lambda, Fargate, and serverless infrastructure.
- Experience with multitenant platforms or customer-isolated deployments.
- Experience with Azure or moving from Azure to AWS
Preferred Experience
Similar Jobs
What you need to know about the NYC Tech Scene
Key Facts About NYC Tech
- Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
- Key Industries: Artificial intelligence, Fintech
- Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
- Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory


.png)