Systems Operations Engineer (SysOps)
About the Role: ActionIQ is looking for a Systems Operations (SysOps) Engineer to work in the SysOps Team. Our ideal candidate has prior experience working in a production support or operations team, and understands what it takes to operate a SaaS product.
About the Team:
Focusing on incident management, the SysOps team’s goal is to reduce the impact of an incident and to reduce and limit Mean Time To Recover. SysOps aims to understand and report on real customer impact over time.
As a member of the SysOps team, you will be the bridge that allows for seamless collaboration between SysOps, Engineering and Customer Engagement. Your impact will be felt throughout the company, visible both internally and externally. As a critical team in a startup, your input on improvements to the tools and processes used by your team will be valued - and expected!
One year from now you will have:
- Learned how to monitor, log and remediate incidents in an efficient and effective manner
- Improved and shortened the platform’s mean time to recovery
- Review security dashboard alerts and open incidents for security team
- Worked with the team to automate existing manual tasks and reduce false alarms
- Created dashboards and reports in Datadog and PagerDuty that will aid in reducing false alerts and provide insight into trends that may create down time
As a SysOps Engineer, you will work on a small team with potential for growth in the direction you find most interesting, whether that be software development, systems engineering, DevOps, or a future leadership role.
Responsibilities:
- Coordinate the initial response activities for incidents across the AIQ environment, including creating incident records
- Perform first-level triage of events, such as: issue confirmation, assess impact and gather impact details, review knowledge based tools (TSGs & SOPs) for guidance on how to resolve issues
- Manage low and mid-level severity incidents; escalate high severity incidents to resolution team as appropriate, ensuring each incident has a JIRA ticket assigned
- Support cloud infrastructure and automation in collaboration with multiple software teams
- Participate in a monthly on-call rotation
- Work with the SysOps Manager to define and report on key performance indicators
- Identify gaps in information flow that impact our time to recovery (MTTR) and time to detection (MTTD)
- Maintain compliance with security standards and audit requirements
- Help ActionIQ continue to improve incident management and response by clearly documenting response processes and tools and by leading After Action Reviews
- Maintain a status page, providing timely and relevant information to our customers
- Use telemetry and monitoring tools to communicate status to the rest of the organization
What you bring to ActionIQ:
- Interest in performing system administration duties in cloud environments (ie Lambas, IAM Key Rotations)
- Basic command line linux skills
- Excellent verbal and written communication skills
- Basic network troubleshooting skills
- Excellent general problem solving and troubleshooting skills
- Strong interest in the areas of DevOps, Site Reliability Engineering, Incident Response, Resilience Engineering, and Technical Operations
Tools used by the team include:
- AWS & Google Cloud Platform
- Atlassian suite (Confluence, Jira, Statuspage)
- Git
- Linux command-line utilities and BASH
- Datadog and Prometheus
- PagerDuty
- Python
Our work is broad and complex in nature - please don't rule yourself out if you do not meet every requirement.
Benefits:
- Work with a fun, inclusive, and smart team of people as we build a NYC-based enterprise software company!
- Competitive compensation package, including significant equity component
- Backed by top-tier VCs (Sequoia, Andreessen Horowitz, FirstMark Capital)
- Top notch health insurance benefits, including 12 weeks paid parental leave for both parents.
- We are currently working remotely due to COVID-19, but we will be opening a beautiful new office right on Madison Square Park! All NYC-based employees will have the option to return to the office 3 days per week beginning after Labor Day on an “opt-in” basis. We plan to officially reopen our office in the beginning of 2022.
- Check out our latest blog post here to learn how we designed our return to work plans.
- Work from Home stipend to optimize office set up.
ActionIQ is committed to building an inclusive, equitable, and diverse organization. We embrace equal opportunity for all applicants and seek to foster a culture of belonging for our employees. We recognize and appreciate that the more inclusive we are, the better we will function as a team. AIQ welcomes qualified applicants of any race, color, ancestry, religion, sex, national origin, gender identity, gender expression, age, marital or family status, disability, military veteran status, and any other status or background. Join us on our journey to build a product that will help our customers deliver memorable experiences that will drive loyalty and growth.