Data Engineer at Graphika
Graphika empowers the world to understand and navigate the Cybersocial Terrain. We create large-scale, in-depth maps of social media landscapes and conversations to discover how communities form online and how influence and information flow within large scale networks. Our interdisciplinary team uses our unique, patented set of technologies and tools to create and apply new, rigorous analytical methods to answer difficult questions about online conversations.About the Role
Graphika seeks an experienced data engineer to join our technology team. The technology team at Graphika builds the tools that drive our cutting-edge analysis platform. We work with large scale graph algorithms and streaming data to tackle interesting questions in new ways. The Data engineer will contribute to building and scaling our various data pipelines, working closely with our data science and analysis teams. The data engineer will also collaborate with various other members of the team (including other backend engineers, frontend engineers and product team) to help plan and implement solutions to fix business problems.
This job is not an analyst or data science role. It is not intended as a stepping stone to either of those roles within the organization. It is not directly involved in the highly publicized reports Graphika generates. This job ensures the robust, clean data on which those reports and further scientific discovery can be based with integrity.
Areas of Responsibility
- Help create and optimize large-scale batch and real-time data pipelines that ingest large quantities of structured and unstructured data from a variety of sources
- Actively own systems which support diverse applications across Product, Tech, and Labs teams
- Design and implement ETL processes through cloud-based solutions
- Share ownership in ensuring the quality of our data and data infrastructure
- Consistently test code and systems for robustness
- Strategize around new data storage solutions and support existing ones
You have demonstrated the ability to build, deploy and maintain large-scale, data-driven solutions. You love to take on complex data-related problems, and can direct your own work. You have the skills and desire to interrogate data sets to understand their various foibles, and respond accordingly. You have a working knowledge of CS fundamentals like algorithms, data structures, and time complexity. You can imagine and design architectural solutions at scale.
You think beyond the task at hand to deeply understand the 'why' behind what you are doing. You can maintain a focus on shipping software products, understanding that done is often preferable to perfect.
You are an enthusiastic teammate, who engages in collaboration and proactive discussion. You are an effective communicator who can explain technical concepts to product leaders, customer support, and other engineers. You work with confidence and without ego. You have deep knowledge and exercise a high degree of ownership in your daily work. You have loosely-held, defensible ideas, and advocate for what you believe is right. You can surface your unarticulated assumptions. You are also adept at identifying and evaluating trade-offs, willing to be proven wrong, and quick to support your fellow teammates.QualificationsRequired:
- Experience in writing production quality software in Python which is understandable, testable, and has an eye towards maintainability.
- Familiarity with AWS services: S3, Lambda, Kinesis, SQS, etc, or similar cloud-based tools
- Knowledge of and ability to interact with DevOps tooling (Terraform, Ansible, Packer, Docker, etc.)
- Knowledge of tradeoffs between different distributed systems architectures
- Comfort with designing and scaling massive munging efforts on unstructured data
- Experience with the Python data science stack (numpy, pandas, matplotlib, sklearn, Jupyter, etc.)
- Ability to lead data architecture discussions
- Knowledge of SQL and common relational database systems such as PostgreSQL and MySQL
- Familiarity with schema design for a variety of domains
- Well-informed about data storage solutions
- Dedication to code quality, automation and operational excellence: unit/integration tests, scripts, workflows.
- Ability to work legally in the US without visa sponsorship
- Hands-on experience with Apache Spark
- Acquaintance with social media data sources and formats
- Experience with workflow management systems (such as Airflow or Luigi)
- Knowledge of NoSQL technologies like Redis
- understand and appreciate good software engineering practices, including version control, code reviews, testing, and refactoring
- are comfortable debugging and optimizing code
- write tests to make sure code is reliable
- help shape technical decisions within the team
- collaborate within and across departments to ensure successful product creation
- have the ability to pick up new tools and technologies as needed
Bachelor's degree or equivalent work experienceBenefits
- Unlimited PTO, with a company-mandated minimum of ten days of vacation time taken per year.
- 100% healthcare (health, vision, dental) premium coverage for employees; 50% premium coverage for families
- For NYers, access to "Graphikafé," our NYC small office setup with bookable hotdesks, meeting rooms, and phone booths
- Remote personal office setup stipend + 20% of home internet costs covered
Graphika is growing! Despite the downturn and accompanying reductions in other sectors and companies, Graphika is retaining current employees and is actively hiring for full time positions.
In the BeforeTimes, Graphika's Technology Team was fully co-located in our NYC office. On March 12, 2020, Graphika moved to a fully-distributed model, and we've been working together as a company to respond to the changing realities of the AfterTimes. As a result, we are happy to consider applicants who are located in the continental US, with the caveat that the Technology Team works on Eastern time and begins their day at around 10am. Daily Standup is at 10:30am EST.