Fusemachines

Data Scientist

Reposted Yesterday

In-Office

New York, NY, USA

140K-190K Annually

Mid level

In-Office

New York, NY, USA

140K-190K Annually

Mid level

The role involves building and deploying machine learning solutions, collaborating with stakeholders, conducting data analysis, and ensuring model performance in production.

The summary above was generated by AI

About Fusemachines
Founded in 2013, Fusemachines is a global provider of enterprise AI products and services, on a mission to democratize AI. Leveraging proprietary AI Studio and AI Engines, the company helps drive the clients’ AI Enterprise Transformation, regardless of where they are in their Digital AI journeys. With offices in North America, Asia, and Latin America, Fusemachines provides a suite of enterprise AI offerings and specialty services that allow organizations of any size to implement and scale AI. Fusemachines serves companies in industries such as retail, manufacturing, and government.Fusemachines continues to actively pursue the mission of democratizing AI for the masses by providing high-quality AI education in underserved communities and helping organizations achieve their full potential with AI.Salary Range: US$ 140,000-190,000/yearRole Overview

We’re hiring a mid-to-senior Machine Learning Engineer / Data Scientist to build and deploy machine learning solutions that drive measurable business impact. You’ll work across the ML lifecycle—from problem framing and data exploration to model development, evaluation, deployment, and monitoring—often in partnership with client stakeholders and internal delivery teams.

You should be strong in core data science and applied machine learning, comfortable working with real-world data, and capable of turning modeling work into production-ready systems.

Key Responsibilities

Problem Framing & Stakeholder Partnership
- Translate business questions into ML problem statements (classification, regression, time series forecasting, clustering, anomaly detection, recommendation, etc.).
- Collaborate with stakeholders to define success metrics, evaluation plans, and practical constraints (latency, interpretability, cost, data availability).
Data Analysis & Feature Engineering
- Use SQL and Python to extract, join, and analyze data from relational databases and data warehouses.
- Perform data profiling, missingness analysis, leakage checks, and exploratory analysis to guide modeling choices.
- Build robust feature pipelines (aggregation, encoding, scaling, embeddings where appropriate) and document assumptions.
Model Development (Core ML)
- Train and tune supervised learning models for tabular data (e.g., logistic/linear models, tree-based methods, gradient boosting such as XGBoost/LightGBM/CatBoost, and neural nets for structured data).
- Apply strong tabular modeling practices: handling missing data, categorical encoding, leakage prevention, class imbalance strategies, calibration, and robust cross-validation.
- Build time series models (statistical and ML/DL approaches) and validate with proper backtesting.
- Apply clustering and segmentation techniques (k-means, hierarchical, DBSCAN, Gaussian mixtures) and evaluate stability and usefulness.
- Apply statistics in practice (hypothesis testing, confidence intervals, sampling, experiment design) to support inference and decision-making.
Deep Learning
- Build and train deep learning models using PyTorch or TensorFlow/Keras.
- Use best practices for training (regularization, calibration, class imbalance handling, reproducibility, sound train/val/test design).
Evaluation, Explainability, and Iteration
- Choose appropriate metrics (AUC/F1/PR, RMSE/MAE/MAPE, calibration, lift, and business KPIs) and create evaluation reports.
- Perform error analysis and interpretation (feature importance/SHAP, cohort slicing) and iterate based on evidence.
Productionization & MLOps (Project-Dependent)
- Package models for deployment (batch scoring pipelines or real-time APIs) and collaborate with engineers on integration.
- Implement practical MLOps: versioning, reproducible training, automated evaluation, monitoring for drift/performance, and retraining plans.
Documentation & Communication
- Communicate tradeoffs and recommendations clearly to technical and non-technical stakeholders.
- Create documentation and lightweight demos that make results actionable.

Success in This Role Looks Like

You deliver models that perform well and move business metrics (revenue lift, cost reduction, risk reduction, improved forecast accuracy, operational efficiency).
Your work is reproducible and production-aware: clear data lineage, robust evaluation, and a credible path to deployment/monitoring.
Stakeholders trust your judgment in selecting methods and communicating uncertainty honestly.

Required Qualifications

3–8 years of experience in data science, machine learning engineering, or applied ML (mid-to-senior).
Strong Python skills for data analysis and modeling (pandas/numpy/scikit-learn or equivalent).
Strong SQL skills (joins, window functions, aggregation, performance awareness).
Solid foundation in statistics (hypothesis testing, uncertainty, bias/variance, sampling) and practical experimentation mindset.
Hands-on experience across multiple model types, including:
- Classification & regression
- Time series forecasting
- Clustering/segmentation
Experience with deep learning in PyTorch or TensorFlow/Keras.
Strong problem-solving skills: ability to work with ambiguous goals and messy data.
Clear communication skills and ability to translate analysis into decisions.

Preferred Qualifications

Experience with Databricks for applied ML (e.g., Spark, Delta Lake, MLflow, Databricks Jobs/Workflows).
Experience deploying models to production (APIs, batch pipelines) and maintaining them over time (monitoring, retraining).
Experience with orchestration tools (Airflow, Prefect, Dagster) and modern data stacks (Snowflake/BigQuery/Redshift/Databricks).
Experience with cloud platforms (AWS/GCP/Azure/IBM) and containerization (Docker).
Experience with responsible AI and governance best practices (privacy/PII handling, auditability, access controls).
Consulting or client-facing delivery experience.

Certifications (Strong Plus)
Candidates with at least one relevant certification are especially encouraged to apply:

Cloud certifications: AWS, Google Cloud, Microsoft Azure, or IBM (data/AI/ML tracks)
Databricks certifications (Data Scientist, Data Engineer, or related)

Nice-to-Have

Causal inference experience (e.g., quasi-experimental methods, propensity scores, uplift/heterogeneous treatment effects, experimentation beyond A/B tests).
Agentic development experience: designing and evaluating agentic workflows (tool use, planning, memory/state, guardrails) and integrating them into products.
Deep familiarity with agentic coding tools and workflows for accelerated product development (e.g., AI-assisted IDEs, code agents, automated testing/refactoring, repo-aware assistants), including strong judgment on quality, security, and maintainability.

Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.

Important: Immigration Sponsorship Policy

Fusemachines is unable to proceed with candidates who require any form of work authorization or immigration support from the company. This restriction applies to all types of support, including:

Direct Company Sponsorship: Such as H-1B, J-1, or TN visas.
Employer of Record: Listing Fusemachines as the immigration employer on any government documentation.
Written Documentation: Providing letters or other support for any work authorization (e.g., OPT, STEM OPT, CPT).

Top Skills

AWS

Azure

Databricks

Docker

GCP

Keras

Python

PyTorch

SQL

TensorFlow

500 7th Avenue, New York, NY, United States, 10018

Similar Jobs

Capital One

Data Scientist

4 Days Ago

Hybrid

New York, NY, USA

136K-169K Annually

Senior level

136K-169K Annually

Senior level

Fintech • Machine Learning • Payments • Software • Financial Services

As a Senior Associate Data Scientist, apply generative AI to enhance customer experience, working collaboratively to build machine learning models and automating workflows with large datasets.

Top Skills: AWSHugging FaceLanggraphLlamaindexPyTorchWeights And Biases Weave

Capital One

Data Scientist

7 Days Ago

Hybrid

New York, NY, USA

147K-201K Annually

Senior level

147K-201K Annually

Senior level

Fintech • Machine Learning • Payments • Software • Financial Services

As a Principal Associate Data Scientist, you will lead machine learning model development for credit decisioning, collaborating with cross-functional teams to drive business outcomes through data insights.

Top Skills: AWSCondaH2OPythonSpark

Capital One

Data Scientist

11 Days Ago

Hybrid

New York, NY, USA

269K-335K Annually

Senior level

269K-335K Annually

Senior level

Fintech • Machine Learning • Payments • Software • Financial Services

As a Director, Data Scientist - Generative AI Systems, you will lead a team to develop AI products using NLP and generative AI technologies, partnering with cross-functional teams. Your role involves leveraging open-source programming and machine learning to convert customer data into actionable insights and enhance customer interactions with financial services.

Top Skills: Aws UltraclustersHugging FaceLangchainLightningPyTorchVectordbs

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Fusemachines

Data Scientist

Top Skills

Fusemachines New York, New York, USA Office

Similar Jobs

Data Scientist

Data Scientist

Data Scientist

What you need to know about the NYC Tech Scene

Key Facts About NYC Tech