Unqork is the no-code platform that's pioneering a new way for companies to build, deploy, and manage complex, enterprise-grade applications. At this moment, Fortune 100 companies are using Unqork to create and deliver software without writing a single line of code.
Gary Hoberman, former CIO of Metlife, founded Unqork in 2017 with a team of hand-picked industry professionals, and together we're creating a massive paradigm shift in the way software is built. If you want to have a hand in defining the future of application development, we want to hear from you.
The Director, Production Visibility Tooling & Telemetry role is a key position in the newly created platform operations team. The role will be responsible for the delivery of process and tools innovation and continuous improvement initiatives across platform operation with assistance from tooling & telemetry perspective. The role will cover the operationalizing of platform/application monitoring / support tools - includes but not limited to operational workflow streamlining, server/app/db load/availability, disk usage, memory consumption and performance in order to maximize application uptime and reduce operational costs. This would ultimately lead to productionizing a NOC or Mission Control Center (MCC). This role will report into the Head of Platform Operations.
What you’ll do:
- Lead the identification and delivery of the tooling & telemetry strategy across the business to drive monitoring improvement, tools integration and development
- Automated tools to remove manual work, detection, recovery and risk mitigation to reduce operational costs in platform operations
- Setup the Mission Control Center - NOC/SOC to alert on exceptions only initially via eyes on glance to ticket driven alerting to live agents and finally to automating remediation via a BOT
- Will perform and lead the framework in setting up System Reliability/Rapid Response/Continuous Monitoring Engineering governance.
- To work with key technology stakeholders, to build a continuous server-specific application (including 3rd party API inventory) telemetry (CPU, service limits and license/cert expiry, Memory and I/O, user requests, concurrent user activity - usage analytics) trending
- Work with DevOps/Security to implement Application Infrastructure telemetry that will monitor and quantify network monitoring in terms of bits per second/packets loss (such http/s traffic vs back-end database traffic)
- Implement tools/tracking of basic application monitoring and telemetry. This will include how application databases access and processing. This includes user related parameters trends, transaction per second, open connections, errors and misconfiguration and tracking over time
- Operationalize the exception monitoring of database queries quantity, response times and quantity of data passed between database and application (averages and outliers)
- Work closely with key technology stakeholders to operationalize cloud specific application telemetry includes (cloud availability, internet latency..etc)
- Implement Dashboards or other visual tools for real-time system telemetry and reporting
- Leverage vendor tools of log parsing for exception or intelligence or consolidation for support group
- Business intelligence to mine logs for insights - patterns
What we're looking for:
- 10+ years experience preferred. 8+ years of experience in tools development at enterprise level, continuous improvement, change management, production support, and/or product or program management, owning and driving end-to-end development of complex, large scale programs from concept to launch, across diverse stakeholder teams preferred
- Bachelor's Degree preferred
- Language/Application experience is required for this role (Perl or Python, Dockers, Hadoop or KDB, MYSQL or other DB (relational/non-relational/time series, highCharts.)
- Unix, Linux, Windows experiences
- 7+ years of managing production and integration support
- Experience with data center build out..etc
- Excellent problem-solving, conflict resolution, active listening, and time management skills
- Independent and self-driven
- Experience with performance metrics, process improvements and Lean techniques
- Excellent analytical and quantitative skills with the ability to use data to manage metrics, drive large scale process improvements, and implement business cases
- Previous work experience with JIRA and Zendesk
- Rudimentary knowledge of database structure
Unqork is an equal opportunity employer, and proud to be committed to diversity and inclusiveness. We will consider all qualified applicants without regard to race, color, nationality, gender, gender identity or expression, sexual orientation, religion, disability or age.