The Trends Data Scientists are Watching

Artificial intelligence is all around us.

Unlocking your phone with facial recognition requires AI. So does using a voice assistant. As you type an email or text, spell check — which is powered by AI — ensures your content is free of errors. Your recommended streaming options on Netflix are generated by AI, and your bank’s fraud detection system is dependent on it.

AI is not just the imaginary tech of our favorite sci-fi media — it is very real, and it powers our entertainment, security, work and almost everything else we touch.

As an AI subset, machine learning is also all around us: in healthcare, spam filters, commute estimates and more. Allied Market Research points out that machine learning — and AI as a whole — is driving innovation in data science and will drive the entire industry’s growth.

According to Verified Market Research, the AI market was valued at more than $51 billion in 2020. By 2028, it is projected to reach more than $641 billion — a compound annual growth rate of just over 36 percent. Machine learning will grow at a rate of nearly 39 percent between 2021 and 2029, from $21 billion to almost $210 billion.

Data scientists are watching that growth and the tech trends that accompany it. Built In NYC sat down with future-focused data scientists from Northwestern Mutual, Notified, iCapital and Reonomy to learn which developments in AI and machine learning they’re taking note of and how they anticipate those trends to affect their work and the data science industry as a whole.

Northwestern Mutual colleagues on a team video call — Northwestern Mutual

Ning Xu

Assistant Director, Data Science and Analytics • Northwestern Mutual

Northwestern Mutual designs tech that helps make finance accessible anywhere and to anyone.

What’s one data science trend you’re watching closely right now, and why?

One data science trend that continues to draw increasing attention is AI bias. It’s not uncommon to see headlines about companies being sued over AI-generated discrimination, like a photo algorithm biased against certain ethnicities or a credit assessment algorithm biased against certain genders.

I believe that bias always exists at some level. Unfortunately, AI systems could magnify those biases to create a bigger concern. We’re seeing implementation of machine learning and AI applications across all industries and at a large scale. Within the insurance industry, we see AI used to identify prospective customers, assess risk, and to determine premiums, underwriting and claim assessment. While AI can have numerous benefits and be a driving engine for critical decision making, it can also increase the potential scale of bias. If these AI systems are trained based on biased data or historical inequalities, they could reinforce the bias against certain groups by gender, race, income and more.

What influence will this trend have on your industry?

The insurance industry is highly regulated, and we’ve seen a continuous increase in regulatory oversight and guidelines on AI usage. Regulations are being placed on AI-relevant data privacy — the California Consumer Privacy Act went into effect in 2020 and impacts how we collect, store and use data.

Northwestern Mutual plays a lead role in the insurance industry, and we’re on the forefront of enforcing change to reduce bias. The data science and analytics vertical within Northwestern Mutual has dedicated roles in bias testing and monitory function. Our analytics review committee examines all machine learning models and evaluates potential bias. As data scientists, it’s critical that we review new regulations and guidelines and think through how to build and deploy responsible AI and machine learning models. We pay extra attention to the data to make sure it represents what should be and not just what is.

Adrian Czebiniak

Chief Data Officer • iCapital

iCapital’s online financial platform changes how alternative investments are bought and sold through the development of tech-based solutions.

What’s one data science trend you’re watching closely right now, and why?

I am closely watching composite artificial intelligence: the practice of breaking down a problem into smaller chunks that can be solved by different AI techniques and then combined to achieve a solution. By combining hyper-specific technologies in a seamless way, we are moving from a one-size-fits-most approach to one that is custom tailored.

Twenty years ago, the norm was large monolithic applications that do everything. Now, the general approach is to break applications into small or even micro-sized services that focus on very specific solutions. Those applications are combined into an interface that is easy to build and update. You can then develop solutions much faster. Data science will provide business value quickly and even allow business users to get more involved in the composition of these algorithms. That will lead to a more collaborative approach with more comprehensive results. It also allows us to keep building out our toolkit. This is how we truly scale.

What influence will this trend have on your work?

Composite AI is already influencing how we approach problem solving and building out our data science platform. We built our internal vision platform with a modular approach, which allows us to swap out different engines to suit the document types and data extraction needs. We are taking this a step further by solving data extraction problems with a multi-staged approach: using a pipeline of different algorithms best suited to extract, detect and interpret data. This has yielded a very flexible platform where we can build models in-house, use off-the-shelf models from large providers and integrate with hyper-specific partners.

Dmitriy Khots

VP Data Science • Notified

Notified offers a suite of communications solutions to help companies reach and engage customers, investors, employees and the media.

What’s one data science trend you’re watching closely right now, and why?

I’m watching machine learning operations (MLOps), particularly a special component of it called continuous delivery for machine learning. It’s an automation framework for building, delivering and monitoring machine learning models to integrate data solutions into products, which ensures actual value realization from investments made into data teams. Everyone has a unique approach to how they solve data productionalization problems, and thought leadership is yet to emerge on standardization of such solutions. Creating such industry best practices and tooling are important for companies to minimize development costs while realizing return on investment quicker on machine learning deployments.

What influence will this trend have on your work?

This trend has led to the emergence of MLOps teams within software delivery organizations; their integration into existing software delivery practices such as scaled agile framework; and formation of a loosely-coupled data exchange framework across portfolio products that turns data into an actual, managed product. These are all great things for the data teams; however, a few things need to be kept in mind: Data teams will need to change and adapt their skill sets to evolving ecosystems and processes and be prepared for potential ideation and procedural conflicts with existing development operations teams.

How are you getting ahead of this trend now? Are there any new technologies or approaches you’re adopting?

I have been watching this trend closely as it evolved from batch scoring to online learning, and I created key actions to get ahead of the game. I involved business stakeholders to create easily successful pilots, which led to ongoing executive support and funding for next steps. I invested in an MLOps team and created a custom training curriculum for the team. Following the training, the team held hackathons with production deployments. I also have repeatedly proven ROI which led to more team and tooling investments.

Carlos Espino

Data Science Manager • Reonomy, an Altus Group Business

Reonomy is an AI-powered data platform for the commercial real estate industry. It provides CRE insights to top brokerages, financial institutions and commercial services providers.

What’s one data science trend you’re watching closely right now, and why?

We’re keeping an eye on scalable machine learning. As our organization grows, we are building models that require a lot of data and are flexible enough to be used in different applications. Scalable machine learning architectures are critical for us to support the priorities of our business.

Some of the key principles that we need to consider when building scalable architectures:

Choosing a proper framework, language and hardware. With high volumes of data, we need to rely more on distributed computing frameworks and cloud-based solutions.
Standardizing the process of building and applying across the organization. This includes using standardized project organization, development of shared features, versioning of code, data and models, and proper modularization of ML pipelines.
Performance monitoring. Once a model is developed, data scientists must ensure they implement monitoring metrics to indicate if the model is performing as expected in production, and if the performance is consistent over time. In a scalable ML system, we should be able to react if the model is drifting by automating, retraining and continuously integrating new versions of a model.

What influence will this trend have on your work or your industry more broadly?

The universe of commercial real estate data is mostly fragmented. There are many public datasets, like those from local county assessors and census data, private datasets, like company profiles, and proprietary datasets.

We are applying the principles of scalable ML to bring together all of this information. Our platform is capable of going through hundreds of millions of records to organize, link, and standardize the information available on properties, transactions, people and companies. This results in a knowledge graph of commercial real estate: an ontology built for our business. We can offer users a holistic view of any given commercial property’s sales and debt history, record of ownership, tenant information, owner portfolios, property details and more.

The ability to connect the information from various sources together into a knowledge graph enables us to unlock capabilities for the industry that have yet to be realized, including predicting what will sell next, estimating the assessed value of a property, determining its asset type and more. With that, we will deliver faster discovery of valuable intelligence.

The Trends Data Scientists Are Watching

Some of the key principles that we need to consider when building scalable architectures:

Recent Articles