Saturday, July 12, 2025
Google search engine
HomeTechnologyRaiza Martin on Constructing AI Purposes for Audio – O’Reilly

Raiza Martin on Constructing AI Purposes for Audio – O’Reilly



Generative AI within the Actual World

Generative AI within the Actual World: Raiza Martin on Constructing AI Purposes for Audio

Play Episode

Pause Episode

Loading

Mute/Unmute Episode

Rewind 10 Seconds

1x

Quick Ahead 10 seconds

00:00
/
36m 00s

Subscribe
Share

Audio is being added to AI in all places: each in multimodal fashions that may perceive and generate audio and in purposes that use audio for enter. Now that we are able to work with spoken language, what does that imply for the purposes that we are able to develop? How can we take into consideration audio interfaces—how will individuals use them, and what’s going to they need to do? Raiza Martin, who labored on Google’s groundbreaking NotebookLM, joins Ben Lorica to debate how she thinks about audio and what you’ll be able to construct with it.

In regards to the Generative AI within the Actual World podcast: In 2023, ChatGPT put AI on everybody’s agenda. In 2025, the problem shall be turning these agendas into actuality. In Generative AI within the Actual World, Ben Lorica interviews leaders who’re constructing with AI. Be taught from their expertise to assist put AI to work in your enterprise.

Try different episodes of this podcast on the O’Reilly studying platform.

Timestamps

0:00: Introduction to Raiza Martin, who cofounded Huxe and previously led Google’s NotebookLM group. What made you suppose this was the time to commerce the comforts of massive tech for a storage startup?1:01: It was a private resolution for all of us. It was a pleasure to take NotebookLM from an concept to one thing that resonated so broadly. We realized that AI was actually blowing up. We didn’t know what it might be like at a startup, however we wished to attempt. Seven months down the street, we’re having a good time.1:54: For the 1% who aren’t aware of NotebookLM, give a brief description.2:06: It’s principally contextualized intelligence, the place you give NotebookLM the sources you care about and NotebookLM stays grounded to these sources. One in all our most typical use circumstances was that college students would create notebooks and add their class supplies, and it grew to become an knowledgeable that you may discuss with.2:43: Right here’s a use case for owners: put all of your consumer manuals in there. 3:14: Now we have had lots of people inform us that they use NotebookLM for Airbnbs. They put all of the manuals and directions in there, and customers can discuss to it.3:41: Why do individuals want a private day by day podcast?3:57: There are a number of completely different ways in which I take into consideration constructing new merchandise. On one hand, there are acute ache factors. However Huxe comes from a special angle: What if we may attempt to construct very pleasant issues? The inputs are a bit of completely different. We tried to think about what the typical individual’s day by day life is like. You get up, you verify your cellphone, you journey to work; we considered alternatives to make one thing extra pleasant. I believe rather a lot about TikTok. When do I take advantage of it? After I’m standing in line. We landed on transit time or commute time. We wished to do one thing novel and fascinating with that house in time. So one of many first issues was creating actually personalised audio content material. That was the provocation: What do individuals need to take heed to? Even on this quick time, we’ve discovered rather a lot concerning the quantity of alternative.6:04: Huxe is cellular first, audio first, proper? Why audio?6:45: Coming from our learnings from NotebookLM, you be taught basically various things once you change the modality of one thing. After I go on walks with ChatGPT, I simply speak about my day. I seen that was a really completely different interplay from after I sort issues out to ChatGPT. The flip facet is much less about interplay and extra about consumption. One thing concerning the audio format made the sorts of sources completely different as properly. The sources we uploaded to NotebookLM have been completely different on account of wanting audio output. By specializing in audio, I believe we’ll be taught completely different use circumstances than the chat use circumstances. Voice continues to be largely untapped. 8:24: Even in textual content, individuals began exploring different type elements: lengthy articles, bullet factors. What sorts of issues can be found for voice?8:49: I consider two codecs: one passive and one interactive. With passive codecs, there are a number of various things you’ll be able to create for the consumer. The issues you find yourself taking part in with are (1) what’s the content material about and (2) how versatile is the content material? Is it quick, lengthy, malleable to consumer suggestions? With interactive content material, possibly I’m listening to audio, however I need to work together with it. Perhaps I need to take part. Perhaps I would like my buddies to hitch in. Each of these contexts are new. I believe that is what’s going to emerge within the subsequent few years. I believe we’ll be taught that the sorts of issues we’ll use audio for are basically completely different from the issues we use chat for.10:19: What are a number of the key classes to keep away from from sensible audio system?10:25: I’ve owned so a lot of them. And I really like them. My major use for the sensible audio system continues to be a timer. It’s costly and doesn’t reside as much as the promise. I simply don’t suppose the know-how was prepared for what individuals actually wished to do. It’s arduous to consider how that would have labored with out AI. Second, one of the crucial troublesome issues about audio is that there isn’t a UI. A sensible speaker is a bodily machine. There’s nothing that tells you what to do. So the training curve is steep. So now you’ve a consumer who doesn’t know what they will use the factor for. 12:20: Now it may accomplish that rather more. Even with out a UI, the consumer can simply attempt issues. However there’s a danger in that it nonetheless requires enter from the consumer. How can we take into consideration a system that’s so supportive that you just don’t should provide you with methods to make it work? That’s the problem from the sensible speaker period.12:56: It’s fascinating that you just level out the UI. With a chatbot it’s important to sort one thing. With a wise speaker, individuals began getting creeped out by surveillance. So, will Huxe surveil me?13:18: I believe there’s one thing easy about it, which is the wake phrase. As a result of sensible audio system are triggered by wake phrases, they’re at all times on. If the consumer says one thing, it’s most likely selecting it up, and it’s most likely logged someplace. With Huxe, we need to be actually cautious about the place we imagine shopper readiness is. You need to push a bit of bit however not too far. Should you push too far, individuals get creeped out. 14:32: For Huxe, it’s important to flip it on to make use of it. It’s clunky in some methods, however we are able to push on that boundary and see if we are able to push for one thing that’s extra ambiently on. We’re beginning to see the emergence of extra instruments which can be at all times on. There are instruments like Granola and Cluely: They’re at all times on, taking a look at your display, transcribing your audio. I’m curious—are we prepared for know-how like that? In actual life, you’ll be able to most likely get essentially the most utility from one thing that’s at all times on. However whether or not shoppers are prepared continues to be TBD.15:25: So that you’re ingesting calendars, e mail, and different issues from the customers. What about privateness? What are the steps you’ve taken?15:48: We’re very privateness targeted. I believe that comes from constructing NotebookLM. We wished to verify we have been very respectful of consumer knowledge. We didn’t practice on any consumer knowledge; consumer knowledge stayed personal. We’re taking the identical method with Huxe. We use the info you share with Huxe to enhance your private expertise. There’s one thing fascinating in creating private advice fashions that don’t transcend your utilization of the app. It’s a bit of tougher for us to construct one thing good, nevertheless it respects privateness, and that’s what it takes to get individuals to belief.17:08: Huxe might discover that I’ve a flight tomorrow and inform me that the flight is delayed. To take action, it has needed to contact an exterior service, which now is aware of about my flight.17:26: That’s a great level. I take into consideration constructing Huxe like this: If I have been in your pocket, what would I do? If I noticed a calendar that mentioned “Ben has a flight,” I can verify that flight with out leaking your private data. I can simply lookup the flight quantity. There are a number of methods you are able to do one thing that gives utility however doesn’t leak knowledge to a different service. We’re making an attempt to grasp issues which can be rather more motion oriented. We attempt to inform you about climate, about site visitors; these are issues we are able to do with out stepping on consumer privateness.18:38: The best way you described the system, there’s no social element. However you find yourself studying issues about me. So there may be the potential for constructing a extra refined filter bubble. How do you ensure that I’m ingesting issues past my filter bubble?19:08: It comes right down to what I imagine an individual ought to or shouldn’t be consuming. That’s at all times tough. We’ve seen what these feeds can do to us. I don’t know the right components but. There’s one thing fascinating about “How do I get sufficient consumer enter so I may give them a greater expertise?” There’s sign there. I attempt to consider a consumer’s feed from the angle of relevance and fewer from an editorial perspective. I believe the relevance of data might be sufficient. We’ll most likely take a look at this as soon as we begin surfacing extra personalised data. 20:42: The opposite factor that’s actually essential is surfacing the right controls: I like this; right here’s why. I don’t like this; why not? The place you inject rigidity within the system, the place you suppose the system ought to push again—that takes a bit of time to determine methods to do it proper.21:01: What concerning the boundary between giving me content material and offering companionship?21:09: How do we all know the distinction between an assistant and a companion? Basically the capabilities are the identical. I don’t know if the query issues. The consumer will use it how the consumer intends to make use of it. That query issues most within the packaging and the advertising. I discuss to individuals who speak about ChatGPT as their finest pal. I discuss to others who speak about it as an worker. On a capabilities degree, they’re most likely the identical factor. On a advertising degree, they’re completely different.22:22: For Huxe, the best way I take into consideration that is which set of use circumstances you prioritize. Past a easy dialog, the capabilities will most likely begin diverging. 22:47: You’re now a part of a really small startup. I assume you’re not constructing your individual fashions; you’re utilizing exterior fashions. Stroll us by means of privateness, given that you just’re utilizing exterior fashions. As that mannequin learns extra about me, how a lot does that mannequin retain over time? To be a extremely good companion, you’ll be able to’t be clearing that cache each time I sign off.23:21: That query pertains to the place we retailer knowledge and the way it’s handed off. We go for fashions that don’t practice on the info we ship them. The following layer is how we take into consideration continuity. Individuals anticipate ChatGPT to have information of all of the conversations you’ve. 24:03: To assist that it’s important to construct a really sturdy context layer. However you don’t should think about that each one of that will get handed to the mannequin. Loads of technical limitations forestall you from doing that anyway. That context is saved on the software layer. We retailer it, and we attempt to determine the fitting issues to cross to the mannequin, passing as little as potential.25:17: You’re from Google. I do know that you just measure, measure, measure. What are a number of the alerts you measure? 25:40: I take into consideration metrics a bit of in another way within the early phases. Metrics at first are nonobvious. You’ll get a number of trial habits at first. It’s a bit of tougher to grasp the preliminary consumer expertise from the uncooked metrics. There are some fundamental metrics that I care about—the speed at which individuals are in a position to onboard. However so far as crossing the chasm (I consider product constructing as a sequence of chasms that by no means finish), you search for individuals who actually like it, who rave about it; it’s important to take heed to them. After which the individuals who used the product and hated it. While you take heed to them, you uncover that they anticipated it to do one thing and it didn’t. It allow them to down. You must hear to those two teams, after which you’ll be able to triangulate what the product appears prefer to the surface world. The factor I’m making an attempt to determine is much less “Is it successful?” however “Is the market prepared for it? Is the market prepared for one thing this bizarre?” Within the AI world, the fact is that you just’re testing shopper readiness and wish, and the way they’re evolving collectively. We did this with NotebookLM. After we confirmed it to college students, there was zero time between after they noticed it and after they understood it. That’s the primary chasm. Can you discover individuals who perceive what they suppose it’s and really feel strongly about it?28:45: Now that you just’re outdoors of Google, what would you need the muse mannequin builders to deal with? What facets of those fashions would you prefer to see improved?29:20: We share a lot suggestions with the mannequin suppliers—I can present suggestions to all of the labs, not simply Google, and that’s been enjoyable. The universe of issues proper now could be fairly well-known. We haven’t touched the house the place we’re pushing for brand spanking new issues but. We at all times attempt to drive down latency. It’s a dialog—you’ll be able to interrupt. There’s some fundamental habits there that the fashions can get higher at. Issues like tool-calling, making it higher and parallelizing it with voice mannequin synthesis. Even simply the variety of voices, languages, and accents; that sounds fundamental, nevertheless it’s truly fairly arduous. These prime three issues are fairly well-known, however it would take us by means of the remainder of the 12 months.30:48: And narrowing the hole between the cloud mannequin and the on-device mannequin.30:52: That’s fascinating too. Right this moment we’re making a number of progress on the smaller on-device fashions, however once you consider supporting an LLM and a voice mannequin on prime of it, it truly will get a bit of bit bushy, the place most individuals would simply return to business fashions.31:26: What’s one prediction within the shopper AI house that you’d make that most individuals would discover shocking?31:37: Lots of people use AI for companionship, and never within the ways in which we think about. Virtually everybody I discuss to, the utility may be very private. There are a number of work use circumstances. However the rising facet of AI is private. There’s much more space for discovery. For instance, I take advantage of ChatGPT as my working coach. It ingests all of my working knowledge and creates working plans for me. The place would I slot that? It’s not productiveness, nevertheless it’s not my finest pal; it’s simply my working coach. Increasingly more individuals are doing these sophisticated private issues which can be nearer to companionship than enterprise use circumstances. 33:02: You have been imagined to say Gemini!33:04: I really like the entire fashions. I’ve a use case for all of them. However all of us use all of the fashions. I don’t know anybody who solely makes use of one. 33:22: What you’re saying concerning the nonwork use circumstances is so true. I come throughout so many individuals who deal with chatbots as their buddies. 33:36: I do it on a regular basis now. When you begin doing it, it’s rather a lot stickier than the work use circumstances. I took my canine to get groomed, and so they wished me to add his rabies vaccine. So I began desirous about how properly it’s protected. I opened up ChatGPT, and spent eight minutes speaking about rabies. Persons are turning into extra curious, and now there’s a direct outlet for that curiosity. It’s a lot enjoyable. There’s a lot alternative for us to proceed to discover that. 34:48: Doesn’t this point out that these fashions will get sticky over time? If I discuss to Gemini rather a lot, why would I change to ChatGPT?35:04: I agree. We see that now. I like Claude. I like Gemini. However I actually just like the ChatGPT app. As a result of the app is an efficient expertise, there’s no purpose for me to change. I’ve talked to ChatGPT a lot that there’s no manner for me to port my knowledge. There’s knowledge lock-in.



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments