Building AI-first products: a Conversation with Hilary Mason

BBG Ventures
8 min readMay 7, 2024

--

At BBG Ventures, we love coffee & conversation, and we love a beta. A few weeks ago, we test drove our first AI coffee & conversation with some of our oldest NYC tech ecosystem friends, Matt Hartman & Becca Harris Lewy at Factorial. With the ever-present question of “should every start-up be an AI-start-up?” lurking in the background, we wanted to explore the question of what AI-first product design really looks like in practice. Who better to kickstart the conversation than ML and data OG, Hilary Mason from Hidden Door?

At Hidden Door, Hilary is developing a gameplay platform that uses generative AI and LLMs to allow users to play in the world of their favorite book, movie, or TV show. She is also the Founder of Fast Forward Labs, an applied machine learning and AI research and product company, which she sold to Cloudera in 2017. Previously she was the Chief Scientist at Bitly and Data Scientist in Residence at Accel. Hilary began her career in academic machine learning, where she realized that her inclination to build things that people could use made her better suited to the world of startups.

Here are some of the highlights from our conversation with Hilary, which covered everything from AI-first product design, why “the AI” is kind of BS, the uses of chat as an interface (and what’s next), plus how to think about building an AI team:

You were at the forefront of big data, and emerging technologies– what did being Chief Scientist at Bitly entail? It was a job title I’d never heard of before. I got to invent the future of possibility for the business through the data we were collecting. That data was the “exhaust” of people on very rapidly scaling social media [Twitter] around the world, just communicating with each other. It was fascinating at a human level as it offered a lens into where people’s attention globally was going. Before that time, this type of data set had never been accessible before; it was really interesting at a technical level because we were trying to build systems to infer things at the scale of potentially billions of data points a day. Thousands of requests per second is a non-trivial problem.

What is the process like to build a machine learning product? Building an AI product is a different intellectual process from classic software engineering.

One of the big mental shifts working in ML versus being a classic software engineer is that as a SWE you know where you are, you know the thing you’re doing is possible from the beginning, and more or less how you’re going to accomplish it. There may be pieces of your system you have to play around with to see what works best, but the process is one that is fairly well understood. All of our typical software engineering processes are built around that form of development; they are all designed around the assumptions you make going into a project that are largely determined.

Now, with ML and AI you come into the world of trying to build something probabilistic. The dirty secret of building ML products is that you start out with one problem you think you’re going to solve: you try to solve it and realize it’s not going to work, not with the data you have or the tools you have. So you have to change your problem to be something a little more achievable but still useful. Then you try it again, and by the time you’ve got something that actually works, you have typically gone through this three to four or five times. And then, you have to build a product interface around it; and you have to design a business model that makes sense with all of that.

So while some of the tool sets are the same as for traditional SWE the intellectual process is very different. It’s not an easy thing either. For someone who is a software engineer to start to work in a machine learning environment– I think it takes a year to adapt.

What should founders think about in hiring when building AI-first? It’s a mindset shift for everyone working on the problem, not just engineers. It requires the builders to take on the burden of standardizing the definitions, so the answers are consistent no matter how the questions are asked (assuming the questions are the same).

How do you think about building AI-first products? Is “the AI” all powerful? When we talk about “the AI” — it’s funny — sure, it can have a meaning, but it’s not really “a thing”. I like to think about building in this context as building goals from data and rendering those models within a hierarchy of how we would define data science versus machine learning versus AI. Machine learning is essentially building models that can make predictions about things that resemble the data that the model was initially trained on. But with generative AI or systems like outlines or diffusion models, you’re essentially able to create artifacts directly from data or input that is unstructured.

What is important about this moment and the way people talk about AI as some powerful, intelligent thing on its own is that it is not, in fact, all powerful or intelligent. We take a whole bunch of language and compress the representation of relationships between things in that language; then we make it (in a very crude and uncontrollable way) language that “looks like” the language that went into it in the first place. What are the side effects of this? First, by design, it is mediocre. That is, it is the averaging of the output of what was underneath it. I mean way underneath it, like all the shit of the internet, including the not good stuff. Secondly, in that compression, by design, you’re magnifying the biases that are in that underlying data.

That is the nuance of it– we can acknowledge it can be a tremendously useful piece of data, but it implies a bunch of things you have to do to use it well (i.e. you can’t just plug into OpenAI as a chat API and expect everything to work and to not have errors). Those errors have the potential to be fairly catastrophic. Consider a chat interface for how to use medication which could be wrong or how people might depend on chat for news information which could be fake.

Great point…we do generally interact with AI products through chat today — what is your perspective on chat’s utility and a company that uses it well? Chat is a horrible interface for everything except for chat! Because natural language queries are implicit, about 90% of what comes out of a chat is not useful. When building a chat product, it is important to be very thoughtful about how someone will actually interact with it. A well built chat requires the product to take on the burden of standardizing questions and responses.

A company that does chat really well is Duolingo because their chat is an unobtrusive sidecar to something else. They have deliberately designed 90 to 120 second chat interactions that users can have in the flow of their language practice. Duolingo chat has one specific goal, and I think they’ve done a beautiful job of building very short, deliberate interactions.

What are new interfaces we haven’t even begun to think about? What’s your wackiest idea? This is a tough one… when people come to any product, they come with a set of expectations, try to leverage visual metaphors to meet their expectations so they can quickly understand, and have a mental model for what to do so they don’t have to learn from scratch.

We’re building a game at Hidden Door right now. The team working on it comes from wildly different backgrounds which means that we have a wide variety of perspectives on default metaphors. In the kind of game like Hidden Door you normally have a player (the dungeon master) who has put a ton of prep to plan the adventure and create characters– we won’t have that because our system does it. However, the system doesn’t know what mood you’re in, so we chose cards as the interface, where the card carries certain expectations and as you lay down a card, the story starts to write itself and changes with the laying down of each new card. This could take us in the direction of romance or horror, and they are all very visceral. It could have been text instead (i.e. write a prompt to start your story), but we tried versions of that and it didn’t go well. What we have found is that if you give people a text box, they get writer’s block because it feels like homework. If you want to use text, you have to give really smart defaults that show you the range of potential options.

I love having bad ideas, and exploring the creative spaces they can open up. One of my whacky ideas involves an AI version of me and a 3D print of my head… I envision a mini 3D head with a pico projector inside. If we worked closely together, I would ship you my head, and we could talk. This would remedy the downsides of remote work and bring my teams closer together. I have many more.

What strategies can be employed to ensure transparency and user confidence in AI & ML products? There are always paths to manage transparency and user confidence on the product side, and as the builders of these products, we should never condescend to our users. We should always assume that they’re smart and using the product for a reason. With that in mind, I think the appropriate level of transparency depends on the way the information is being presented and used.

One major pitfall of LLMs is that they will use hallucinated language with great confidence, and most people do not have defenses against that yet. In the product design, it is important to highlight your error bar and share your certainty. You can also share whether the information has been verified or even some of the data that went into the prediction. On the back end, if you have an answer with a low trust score, you can choose to not present the response or present something that says “here’s another question we can answer that you may be interested in”.

What is the underpinning of your work in ML/AI? What’s really important?Throughout all my experience, I have always kind of focused on doing good data science that is building ethics first and systems that are aware of the impact they’re having (read more here). (BBGV: Words to live by!)

If you’re a female or diverse founder building an AI first product (we’re especially interested in the future of work, health, climate, consumer & education) we would love to have you join us at our next Coffee & Conversations. You can always pitch us, too.

About BBGV

BBG Ventures is a seed and pre-seed venture fund leading investments in female & diverse founders who are uniquely qualified to build for our polycultural future — it was one of the first funds to put a stake in the ground around this thesis in 2014. These founders are bringing new thinking to sectors where change is overdue, such as healthcare, future of work, fintech, climate and consumer, solving problems for millions of Americans via B2B and B2C business models. BBG Ventures has invested in over 100 female-led companies over three funds. 100% of Fund III companies have a female founder; and 81% have a founder of color.

--

--

BBG Ventures

BBG Ventures is an early-stage fund backing female and diverse founders with big ideas that will reshape the way we live.