Positron AI Logo

Positron AI

Software Engineer, Senior

Reposted 21 Days Ago
Remote
Hiring Remotely in United States
150K-250K Annually
Senior level
Remote
Hiring Remotely in United States
150K-250K Annually
Senior level
The Senior Software Engineer will develop high-performance software for executing open-source LLMs on custom hardware, focusing on optimizations and efficient libraries primarily in C++.
The summary above was generated by AI

About Positron AI

Positron AI specializes in developing custom hardware systems to accelerate AI inference.  These inference systems offer significant performance and efficiency gains over traditional GPU-based systems, delivering advantages in both performance per dollar and performance per watt.  Positron exists to create the world's best AI inference systems.


Role Overview

Senior Software Engineer – Machine Learning Systems & High-Performance LLM Inference

We are seeking a Senior Software Engineer to contribute to the development of high-performance software that powers execution of open-source large language models (LLMs) on our custom appliance. This appliance leverages a combination of FPGAs and x86 CPUs to accelerate transformer-based models. The software stack is written primarily in modern C++ (C++17/20) and heavily relies on templates, SIMD optimizations, and efficient parallel computing techniques.


Key Responsibilities

  • Design and implement high-performance inference software for LLMs on custom hardware.
  • Develop and optimize C++-based libraries that efficiently utilize SIMD instructions, threading, and memory hierarchy.
  • Work closely with FPGA and systems engineers to ensure efficient data movement and computational offloading between x86 CPUs and FPGAs.
  • Optimize model execution via low-level optimizations, including vectorization, cache efficiency, and hardware-aware scheduling.
  • Contribute to performance profiling tools and methodologies to analyze execution bottlenecks at the instruction and data flow levels.
  • Apply NUMA-aware memory management techniques to optimize memory access patterns for large-scale inference workloads.
  • Implement ML system-level optimizations such as token streaming, KV cache optimizations, and efficient batching for transformer execution.
  • Collaborate with ML researchers and software engineers to integrate model quantization techniques, sparsity optimizations, and mixed-precision execution.
  • Ensure all code contributions include unit, performance, acceptance, and regression tests as part of a continuous integration-based development process.


Required Qualifications

  • 7+ years of professional experience in C++ software development, with a focus on performance-critical applications.
  • Strong understanding of C++ templates and modern memory management.
  • Hands-on experience with SIMD programming (AVX-512, SSE, or equivalent) and intrinsics-based vectorization.
  • Experience in high-performance computing (HPC), numerical computing, or ML inference optimization.
  • Experience with ML model execution optimizations, including efficient tensor computations and memory access patterns.
  • Knowledge of multi-threading, NUMA architectures, and low-level CPU optimization.
  • Proficiency with systems-level software development, profiling tools (perfetto, VTune, Valgrind), and benchmarking.
  • Experience working with hardware accelerators (FPGAs, GPUs, or custom ASICs) and designing efficient software-hardware interfaces.


Preferred Qualifications

  • Familiarity with LLVM/Clang or GCC compiler optimizations.
  • Experience in LLM quantization, sparsity optimizations, and mixed-precision computation.
  • Knowledge of distributed inference techniques and networking optimizations.
  • Understanding of graph partitioning and execution scheduling for large-scale ML models.

Leveling & Scope

While this role is currently posted at a specific level, we are a growth-oriented organization and are open to hiring at a more senior level for the right candidate. Please note that this job description serves as a focused but generalized overview of the role; specific responsibilities and impact expectations will be tailored to the experience and seniority of the final hire.


Why Join Us?

  • Work on a cutting-edge ML inference platform that redefines performance and efficiency for LLMs.
  • Tackle challenging low-level performance engineering problems in AI and HPC.
  • Collaborate with a team of hardware, software, and ML experts building an industry-first product.
  • Opportunity to contribute to and shape the future of open-source AI inference software.


Compensation and Benefits

The base salary range for this role is $150,000 – $250,000.

Please note that the figures provided represent the base salary range only and do not include other elements of our total compensation package, equity, or comprehensive benefits.

At Positron AI, we value the unique expertise each candidate brings. While the range above reflects our typical expectation for the position, we reserve the flexibility to exceed this range for candidates whose specialized skills, significant experience, or unique qualifications fall outside the standard scope of the role. Final offers are determined based on a variety of factors, including internal equity, and individual impact.


Equal Opportunity Employer. If you’re excited about the role but don’t meet every bullet, we’d still love to hear from you.


Similar Jobs

Yesterday
In-Office or Remote
120K-215K Annually
Senior level
120K-215K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Design and implement AI/ML systems for healthcare, own inference and delivery, manage/scale production ML systems, build scalable data pipelines and APIs, apply MLOps practices (deployment, monitoring, drift detection), and work with large-scale data and distributed computing frameworks.
Top Skills: Azure Kubernetes Service (Aks)AzuremlDatabricksDockerFastapiHelmKubernetesLarge Language Models (Llms)MlopsNlpPython
2 Days Ago
Remote or Hybrid
143K-243K Annually
Senior level
143K-243K Annually
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Design, develop, and maintain backend services and APIs for an IAM platform; lead the Access Review feature; build identity lifecycle (provisioning/de-provisioning, access policies); implement logging, monitoring, and auditing; ensure scalability, security, and performance; collaborate with product, UX, and security teams to meet requirements and drive platform improvements.
Top Skills: AWSAzureCachingDistributed SystemsDockerGoGoogle Cloud PlatformIamJavaKubernetesMicroservicesNoSQLRestful ApisSQL
2 Days Ago
Remote or Hybrid
143K-243K Annually
Senior level
143K-243K Annually
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Design, develop, and maintain backend services for lifecycle management in identity and access management. Build scalable microservices and REST APIs, implement logging/monitoring/auditing, optimize performance and security, collaborate cross-functionally, and apply AI/ML tools and TDD to improve platform reliability and compliance.
Top Skills: Ai/Ml ToolsAuditingAWSAzureCachingDistributed SystemsGoGoogle Cloud PlatformIamJavaLoggingMicroservicesMonitoringNoSQLPythonRestful ApisSQLTdd

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account