Senior Systems Engineer (High Performance Computing)

Paige

Sorry, this job was removed at 8:45 p.m. (EST) on Wednesday, January 20, 2021

View 278 Jobs

Find out who's hiring remotely in Greater NYC Area.

See all Remote Cybersecurity + IT jobs in Greater NYC Area

View 278 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Paige is a software company helping pathologists and clinicians make faster, more informed diagnostic and treatment decisions by mining decades of data from the world’s experts in cancer care. We are leading a digital transformation in pathology by leveraging advanced Artificial Intelligence (AI) technology to create value for the oncology clinical team.

We are the first company to develop clinical grade AI tools for the pathologist, which resulted in our receiving FDA breakthrough designation for our first product. Paige has also received FDA-clearance for our digital viewer, FullFocus™. We have also established multiple relationships with biopharma, laboratory, and equipment manufacturers that enables Paige to develop an ecosystem ready to help patients receive better diagnoses and treatment.

We’re seeking an experienced Senior Systems Engineer (HPC) to administer and support our High Performance Computing cluster. You will work closely with engineering and data management teams on cutting-edge technologies.

This is an extraordinary opportunity to be part of a high-performing team and to pursue a life-changing mission with unique technical challenges!

Responsibilities:

Design, plan, test and implement innovative hardware designs for an HPC environment
Implement, support, and provide technical guidance for engineering team initiatives and projects
Build automation for infrastructure provisioning, configuration management, and account access (emphasis on SaltStack)
Install, provision, and support complex Cisco Nexus HPC switching environment (RoCE)
Responsible for the design structure and maintenance of an Pure Storage and Qumulo enterprise network attached storage system (NAS).
Regularly evaluate and recommend new tools and technologies for use in existing and future clusters
Deploy patches and updates to operating systems and application software

Required Skills and Experience

Master’s in Computer Science, engineering, information systems or related field, or equivalent years' experience
8+ years’ experience in systems engineer role
Deep knowledge of server components CPU, SSD, GPU, Networking
Deep knowledge of High Performance Computing (HPC) / Cluster technologies with high-speed interconnect fabrics using Ethernet/RoCE and Infiniband
Expert knowledge of SAN and NAS services (iSCSI, NFS, CIFS)
Expert knowledge of TCP/IP networking, network security, and DNS (BIND, Windows)
Expert knowledge of Linux (Ubuntu, CentOS), common UNIX services, and Shell scripting
Strong understanding of high speed HPC interconnects
Strong knowledge of parallel GPU computing, MPI, and RDMA within containerized environments
Strong knowledge of NVIDIA software environment, NCCL, NGC, GPU tools
Strong experience working with operation and administration of workload schedulers such as Slurm, LSF, or PBS
Strong knowledge of virtualization technologies such as KVM/libvirt/QEMU
Experience working with configuration management tools like SaltStack, Chef, or Puppet

Desired Skills

Working knowledge of kubernetes and docker containers within an on-prem HPC cluster
Understanding of data pipelines to include ETL and streaming data such as log data or tool/sensor data to indexes (EMR)
Understanding of cloud platforms and services, particularly AWS
Understanding of Jupyter Notebook technology
Understanding of CI/CD pipelines
Understanding of Agile development methodologies

Read Full Job Description

Senior Systems Engineer (High Performance Computing)

Location

Similar Jobs