Kontakt.io

Lead Software Engineer - SRE

Reposted 9 Days Ago

Be an Early Applicant

Hybrid

New York City, NY

Senior level

Hybrid

New York City, NY

Senior level

Lead the design and implementation of scalable and fault-tolerant infrastructure on AWS and Kubernetes, mentor engineers, and drive operational excellence.

The summary above was generated by AI

Kontakt.io is building the platform that care operations run on.

We reduce waste, cut costs, and improve revenue by improving throughput, asset utilization and staff productivity. Our platform uses AI, RTLS, and EHR data to enable self-learning agents to automate workflows, adapt in real-time, and orchestrate all of care delivery operations.

Easy to deploy and scale, it gives a clear picture of spaces, equipment, and people, eliminating inefficiencies and enhancing the patient experience. With measurable 10X ROI and over 20+ use cases, Kontakt.io is the go-to platform for better and faster care delivery operations.

We are looking for a Lead Software Engineer - SRE with a strong software engineering foundation and a strategic mindset to drive the reliability, scalability, and performance of our platform. This role is part of our Infrastructure Engineering team and will play a central part in shaping the architecture and direction of our SRE function.

The ideal candidate brings a deep understanding of software engineering principles applied to infrastructure. Rather than maintaining systems, you will lead the design and build them, developing automation, tooling, and resilient architecture that enable high availability and fault tolerance across our entire AWS-based platform.

You’ll work hands-on in designing resilient systems, improving deployment pipelines, and driving incident management practices. As a technical leader, you’ll also mentor engineers, shape technical strategy, and help build a culture of accountability, ownership, and continuous improvement across the organization.

Responsibilities

Lead the design and implementation of scalable, fault-tolerant, and self-healing infrastructure and services across AWS and Kubernetes.
Collaborate with Product, Engineering, and Infrastructure teams to align SRE initiatives with business priorities and platform needs.
Define and drive adoption of SLIs, SLOs, and SLAs to ensure consistent performance and high reliability across the platform.
Own and evolve observability strategies using Prometheus, OpenTelemetry, Grafana, and related tooling.
Design and maintain infrastructure as code (Terraform) and drive GitOps best practices.
Oversee major incident response and on-call practices, including incident reviews and long-term remediation planning.
Mentor and support the growth of SRE and platform engineers, fostering a culture of engineering rigor and operational excellence.
Contribute to the long-term reliability roadmap and architecture of high-throughput, real-time systems in healthcare operations.
Drive process improvements in CI/CD, service ownership, chaos engineering, disaster recovery, and secure deployment.

What You Bring

5+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Engineering.
5+ years of software engineering experience building production-grade systems (Java, Python, Go, or similar).
Proven success scaling high-traffic, mission-critical platforms in SaaS, IoT, or healthcare environments.
Deep expertise in cloud platforms (especially AWS), Kubernetes, and distributed system architecture.
Hands-on experience with monitoring, logging, and observability tools (Prometheus, OpenTelemetry, Datadog, etc.).
Extensive knowledge of CI/CD automation, GitOps workflows, and infrastructure-as-code (Terraform, Helm, ArgoCD).
A track record of leading major incident response and running postmortems with a blameless, learning-focused approach.
Strong understanding of networking, access control, and security within regulated environments (HIPAA, SOC 2).
A leadership mindset—able to drive cross-functional alignment, lead initiatives, and mentor a high-performance SRE team.

Why You'll Love It Here

Own Mission-Critical Reliability – Ensure hospitals and care facilities always stay online with a 99.99% uptime healthcare platform.
Scale AI-Powered Infrastructure – Work on real-time automation and self-healing cloud systems that orchestrate care delivery.
Drive Big Impact in Healthcare – Help reduce waste, optimize resources, and improve patient care with technology that delivers 10X ROI.
Automation-First Culture – Minimize manual ops with cutting-edge automation, observability, and incident response strategies.
Join a High-Performing Team – Work with top engineers, AI experts, and healthcare innovators solving real-world challenges.

Ready to Build the Future of Healthcare?

Apply now and help scale the platform that care operations run on. 🚀

Top Skills

AWS

Grafana

Java

Kubernetes

Opentelemetry

Prometheus

Python

Terraform

133 W 19th Street , New York, New York , United States, 10011

Similar Jobs

TransUnion

Root Cause Support Analyst

3 Minutes Ago

Hybrid

68K-113K Annually

Mid level

68K-113K Annually

Mid level

Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics

The Root Cause Support Analyst will create Root Cause Analysis documents, communicate with customers about technical incidents, support RCA development, and provide 24x7 on-call support.

Top Skills: Crm Reporting ToolsMicrosoft Office SuiteSalesforceSharepointSplunk

Bilt

Manger, Learning and Development

3 Minutes Ago

In-Office

New York, NY, USA

85K-95K Annually

Mid level

85K-95K Annually

Mid level

Fintech • Mobile • Real Estate • Financial Services • PropTech

The Manager of Learning and Development at Bilt leads training programs for new customer service reps, monitors their progress, and collaborates with QA to ensure performance standards are met.

Top Skills: Learning Management Systems (Lms)

Samsara

Product Manager

11 Minutes Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

131K-176K Annually

Expert/Leader

131K-176K Annually

Expert/Leader

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software

The Business Technology Product Manager will lead AI product initiatives, manage sales platform roadmaps, engage stakeholders, maintain product backlogs, ensure agile delivery, and define performance metrics for sales technology solutions.

Top Skills: AIConfluenceCpqCRMJIRASalesforce

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Kontakt.io

Lead Software Engineer - SRE

Top Skills

Kontakt.io New York, New York, USA Office

Similar Jobs

Root Cause Support Analyst

Manger, Learning and Development

Product Manager

What you need to know about the NYC Tech Scene

Key Facts About NYC Tech