How These Teams Turned Their Data Into a Product

In years gone by, upselling involved a sales representative speaking directly to a customer, and suggesting vaguely related items in hopes of enticing them to increase their purchases. Now, with data and analytics products constantly developing in real time, a recommendation algorithm does the heavy lifting.

As of October 2021, Amazon was valued at $1.75 trillion, a huge portion of which can be attributed to its sophisticated recommendation engines. By collecting and utilizing data on when customers purchase their items, how they rate those purchases and what other people with similar buying habits are purchasing, Amazon is able to recommend and offer suggestions to upsell items to their customers. This strategy presents a substantial value, but taking the large quantity of data that a company collects and turning it into a usable product is no easy feat.

“To bring our new product to life, we first had a lot of questions to answer,” explained William Watson, data engineering lead at MayStreet. “Deciding on constraints and making these decisions was a massive team effort, activating our entire organization and client base.”

Brian Duke, VP of data science at Petal, also highlighted the team effort needed to mold data into a product. “We’ve recognized tremendous value in bringing cross-functional teams together with diverse perspectives to drive innovation and scale,” he said.

From news algorithms on social media to analytics dashboards and forecasting tools, companies that are able to utilize their data as value for consumers are setting themselves apart in the industry. Built In New York spoke further to Duke and Watson about how they’ve taken the data their companies collect and created products to benefit their customers — and the challenges they overcame on their way to success.

Brian Duke

VP of Data Science • Petal do not use

When did you first realize that your data may have some untapped value?

Petal was founded in 2016, at a time when new sources of financial information — like bank account transaction data — were becoming accessible digitally. We believed these data sources could provide a useful substitute to traditional credit scores for groups who lack credit history and had been underserved as a result. We spent years building the software and models necessary to clean, categorize and process banking history for complex use cases. Only then were we able to build underwriting models, using ML and AI, to make predictions about credit risk.

With the Petal credit card program, we honed our approach over several years through testing, learning and improving our software and models. Interestingly, the pandemic illuminated the true value of our unique cash flow underwriting approach. Traditional bureau-based credit scores were lagging indicators of credit risk and quickly became unreliable, and we realized that cash flow information provided us with a far more real-time signal of credit-worthiness that we were able to lean into. Since then, we’ve doubled down on this approach, grown the business by more than five times and are now offering this platform and technology to others.

How did you bring this product to life? How did you collaborate with other teams to do it?

Petal’s innovative cash flow underwriting methodology was first applied to our credit card product, which is issued by WebBank, Member FDIC. Now we’re furthering our mission by making this unique underwriting technology available to other credit and financial services providers via Prism Data. Developing a new suite of data products and underwriting methodologies within the highly regulated consumer credit category has been made possible only through close collaboration across our product, engineering, data science, risk, legal and compliance teams. We’ve recognized tremendous value in bringing cross-functional teams together with diverse perspectives to drive innovation and scale.

We were able to overcome these challenges by channeling our shared experiences using other data products.”

What’s the biggest technical challenge you faced along the way? How did you overcome it?

The biggest challenge we’ve faced to date has been taking a set of internal tools and productizing them for use by other businesses. While these products had already been created for Petal, we quickly realized that replicating them as scalable B2B solutions would require new skill sets and practices. We relied heavily on the diverse experience of our data science, engineering and product teams to ensure that we’d design and develop a set of products that we’d happily use if we were customers. We were able to overcome these challenges by channeling our shared experiences using other data products — both good and bad — into developing Prism Data’s products. As an example, we know that many fintechs and banks prefer to have their internal data science teams develop their own ML models, so we created our insights product with that flexibility in mind, allowing some clients to leverage insights within their own models, while others can use the Cash Score off the shelf.

William Watson

Data Engineering Lead • MayStreet

When did you first realize that your data may have some untapped value?

Our new vision for MayStreet was formed during a typical company meeting where the topic of discussion was our method of data delivery to customers via binary packet capture files. This vehicle, plus a bit of our proprietary software, enables users to convert these binary files into a human-readable format.

It is standard operating procedure in our industry for every company to build its own data processing pipelines, but we knew there had to be a better way. We wanted to streamline the process so that our clients did not have to create their own pipelines and manually select which subset of our data to store. Likewise, we wanted them to be able to take full advantage of our data and processing expertise.

We quickly realized MayStreet needed to move from being a software and files company to an analytics company. After some discussion, the question was asked, “What if we use the same software we are giving to customers to generate our normalized format into a data lake?”

A new product was born, enabling users to query our massive datasets directly and get answers to questions without having to mess with files.

How did you bring this product to life? How did you collaborate with other teams to do it?

To bring our new product to life, we first had a lot of questions to answer. What type of database would we want? Would we choose to stay in files behind the scenes? If we stayed in files, would we want an open-source or closed-file format? How would we generate the data for our data store each night reliably and efficiently?

After making some key initial decisions, our data and C++ teams collaborated to get efficient programs and pipelines in place for the data processing. As we progressed, we iterated with product and sales to vet the data and system performance. Then, when we were close to achieving an architecture we could envision in production, our SRE and DevOps teams stepped in. Together, they worked on automating deployments, monitoring and security by leveraging several tools, including Airflow, Parquet, custom C++ toolkits, AWS and a specialized SQL query layer.

Gathering the input of our many teams and stakeholders during the development process resulted in a robust product that addressed the needs of our users.

Gathering the input of our many teams and stakeholders resulted in a robust product that addressed the needs of our users.”

What’s the biggest technical challenge you faced along the way? How did you overcome it?

Our biggest technical challenge proved to be data size. MayStreet captures billions of rows per day across hundreds of different data venues. In order to provide a normalized data product with perfect data delivered on time each day, we had to account for the sizable number of data jobs that would have to run on an ongoing and nightly basis. Every microsecond of processing would be multiplied, equating to a huge amount of time spent on normalization.

We spent hours building performance debugging and a test harness, which enabled us to run every message type through the normalization process and gather metrics. Additionally, we made build-by-build comparisons and were able to see if a seemingly innocuous change had a negative performance impact.

This proved to be not just a technical challenge, but a financial one. Every megabyte of output data size matters, as it translates to real dollars being spent on storage when factored across our system. By using efficient columnar compression and deploying only the feeds we need, Just-In-Time saves us money and makes our product profitable from day one.

Recent Articles