Senior Software Engineer (Baseball Data + Machine Learning)

Sorry, this job was removed at 5:29 p.m. (EST) on Wednesday, December 11, 2019
Find out who's hiring in Greater NYC Area.
See all Data + Analytics jobs in Greater NYC Area
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Senior Software Engineer

Reporting to: Sr. Director of Engineering, Baseball Data

Location: New York, NY


Launched in 2001 as the tech arm of Major League Baseball, MLB.com is renowned for creating experiences that baseball fans love - and we're just getting started!

Job Overview:

The new Machine Learning team at Major League Baseball builds and maintains models to analyze and evaluate what’s happening on the field. As part of the Baseball Data department, we leverage tracking data from individual plays and players to develop solutions using data science, machine learning, deep learning, and computer vision. While we do work on projects that may not be public, much of our work can regularly be seen on broadcasts. 

We are responsible for taking projects from the initial ideation phase into data exploration and model training in shared Jupyter notebooks and into production. We use specialized toolchains that help with scheduled orchestration of ETL processes, model (re)training, and a large amount of real-time data. We are moving towards Google BigQuery as our Data Warehouse platform. We love Python, and regularly develop complex Python scripts to interact with APIs and SQL datastores.

The Software Engineer will be responsible for accelerating and solidifying the infrastructure and services upon which our work relies. This includes creating, optimizing, and scaling the data pipeline to support this work. The Software Engineer will be crucial to the Machine Learning team’s ability to produce and deliver, and will interact with our team of Data Scientists and Machine Learning Engineers on a daily basis. 


Responsibilities:

  • Produce High Impact Work - As a core member of the Machine Learning Team, you will build, evolve, and scale state-of-the-art machine learning system infrastructure powering MLB’s data and ML platform. Your work will have a direct impact on our business’ bottom line.
  • Employ a Broad Range of Technology - You’ll have the freedom to use the right tools for the job, whether it’s vanilla SQL or a distributed processing framework such as Apache Spark. We run our processes within Amazon Web Services and Google Cloud Platforms’ ecosystems, so we can take advantage of their managed services such as DataProc, DataFlow and Kubeflow -- if they help us get the job done better or faster.
  • Leverage Diverse Data Sources - You’ll work with many data sources, including:
    • Player tracking and pose estimation data, as well as ball tracking data which powers MLB’s Statcast
    • User interactions with our MLB.tv and Video on Demand products.
    • Video clips from every pitch of every game
  • Build and Support - You will own and operate deep learning training systems, models serving systems, and dataset management pipelines. You’ll embrace the DevOps mentality to build and support data applications in the cloud. You’ll deploy using infrastructure as code.


Basic Qualifications:

  • Expertise in Python, specifically interacting with data APIs and automating tasks.
  • Expertise in SQL.
  • Experience with standard software engineering methodology, e.g. unit testing, code reviews, design documentation.
  • Experience working with large (Terabyte-scale) data sets
  • Experience with an MPP Data Warehouse such as BigQuery, Redshift, or Teradata
  • Experience with cloud infrastructure.
  • Comfort in a Linux environment and with basic server administration tasks.
  • Significant experience with Data Engineering or ETL Engineering.

Preferred Qualifications:

  • Previous experience supporting Machine Learning modeling or its product integration

Experience with any/all of the following:

  • Apache Airflow
  • Google BigQuery
  • DevOps - Jenkins/Ansible/Terraform
  • Docker / Kubernetes
  • Exposure to ML techniques and programming 
  • Experience in deep learning model training
  • Experience with data processing and storage frameworks like Google Cloud Dataflow, Hadoop, Spark, Cassandra, Kafka, etc.
Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

Located in midtown west Manhattan across from Radio City Music Hall provides easy commuter access. Great restaurants in heart of NYC, let's go!

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about Major League BaseballFind similar jobs