Kris Gutta – Twilio Media Streams [SIGNAL TV Interviews]
Articles Blog

Kris Gutta – Twilio Media Streams [SIGNAL TV Interviews]

– This is the audio stream
that’s coming through- – Oh my God, do you see this scroll? This looks like the Matrix (laughs) – Hi YouTube, the following interview
took place August 6, 2019 at Twilio SIGNAL
Conference in San Francisco at Moscone West. It previously aired on
our SIGNAL TV broadcast over on In this interview, Kris Gutta from Twilio sits down with Corey Weathers to talk about Twilio’s new
product: Twilio Media Streams. I hope you enjoy this interview
and live coding session. – What’s up, friends! Welcome back to SIGNAL TV. Super glad to have you. It’s your boy, Corey “Hurricane” Weathers. I am here, featuring one of our, we kept this like a secret,
we didn’t tell people what was gonna happen because we wanted the keynote to happen. Shout-out to all of our friends who tuned in to the keynote earlier today. And so, we’ve got our first
Twilio product session and this one is all about media streams. And so I’m super excited to have Kris Gutta with us here right now. What’s up, Kris? – How you doing, Corey? – I’m hanging in there,
how ’bout yourself? – Great. – We’ve been talking a lot about Twilio, we’ve been excited about the keynote, we’re really excited to
see somethings happening here on the stream and so, before we jump into that
stuff, I have to ask: how’s your SIGNAL been? – It’s been fantastic. I mean I just love being at SIGNAL I get to see customers that I interact with day-to-day in person. – Yeah. – Like I get to see my colleagues that I talk to frequently in person. So it’s just incredible,
I just love SIGNAL just because of how we can
interact with everyone. It’s amazing. – Well, this is our first time
kinda doing this experience where we’re taking pieces of SIGNAL, and taking it home to our
friends from around the world and we’ve got folks joining in from the UK and other parts of Europe. We’ve got folks joining in from Florida and other parts of the United States. So it’s super good to have
all of these folks here which is exciting. – Yeah, I gotta say
when you first mentioned that we were doing this, I thought it was an incredible idea- – Thank you. – To be able to stream content from Twilio at the SIGNAL to the world. – Yeah – And I was first to sign up because I thought it was exciting. – That’s great! And so we’re excited to have you. It’s now time for us to
talk about one of the things that got announced earlier
today during the keynote: Media Streams.
– Yup – Can you describe what
it is for the person who has no context, missed the keynote. – Absolutely. So first, I’m a Product
Manager from Voice Teams, so I work with a lot
of for voice customers and as you can imagine, if
you’re familiar with Twilio – Oh we’ve got friends joining. – One of those customers. – Hi. – Hey Jonathan, how are you doing? – What’s up, Jonathan? – Oh you know. – Friends this is Jonathan De Jong he’s VP of Engineering over at GLOBO. Kris was just talking about
how we legitimately have, we got to get him in the shot here. – Let’s get in there. – We were just talking
about how he uses SIGNAL to connect up with customers. – Oh yeah. – Who he doesn’t always get to see, and so it’s super good to have you popping into the booth here. – Yeah, it’s great to see you guys. I mean we’re really excited
for this new product. It’s going to help us a
lot with our call center and yeah, I couldn’t be happier to be here talking with Kris. We just finished talking,
actually, at the session But being part of this has
been a great experience. – We love how excited you are about this. We really can’t wait to
get you on the stream. So, next time we got to make
sure we schedule your own time. – Yeah, of course. (laughing) – Thanks for popping in. – [Jonathan] Thanks guys! – Thanks for stopping by. – [Jonathan] Yeah, good to see you guys. (laughing) – I love how excited he is. We didn’t even really get to
jump into the product yet. This is going to be amazing. You all should have the same excitement. – Jonathan, he’s amazing. He’s a great customer, as you can imagine. This is the kind of relationship
that we like to have with our customers, where
they feel like they can come in and talk with us anytime
and have that interaction. – Absolutely. – That’s why SIGNAL’s great. – That’s what Signal’s for. – It really builds these
type of relationships with the customers and
different developers around the world. Where they can feel like
they can contact us anytime. – No I absolutely love it. I love that he felt so
comfortable to just pop in here. I love that we got him
a little bit on the mic, even though we got lapel mics. This is great.
– I know, yeah. (laughing) – Anyway let’s get back into it. What is media stream? Let’s talk about it. – Sure, so in simple terms, imagine that me and Corey
are talking to each other, and that’s a call that’s
going through Twilio platform let’s say through, connected to Corey. With the media streams,
what you can do is, as we’re talking to each other, you can actually fork the media, the actual RTP of the phone call. – Whoa. – And receive that over web sockets. – Really? – Right, so you can imagine, you can do a lot of different things. – Yes. – Like you can transcribe that call. – Right. – Imagine if the same
conversation that’s taking place between an Uber driver and a passenger or a contact center agent and a customer, you can transcribe these conversations – In real time?
– In real time. – Oh my God! – Right, so once you
have the transcription of what’s actually happening in real time, you can do a lot of different things. – Yeah. – One, you can analyze the
content of the transcription to analyze sentiment. Is there a negative sentiment
in the conversation. – That’s right.
– Right? And so, you can actually
then, you can do this across thousands of calls – Yeah.
– Because it all automated. – Yep. – And so you surface
these conversations where if there’s a stressed
customer or there’s a agent that needs help, pop that, you know pop that up to your supervisors – Yeah. – so they can jump in and help. – Cause today, you do that very manually. – Yeah. – And another thing is
you can extract the intent of what the customer is saying. Push information to the agent. Here’s a classic example,
imagine that you’re in an Uber or a Lyft or
driving down the street and you saw a rent sign
or a For Sale sign, you call the number and say, “Hey, I was driving down Bush
Street at Bush and Powell “and I saw there’s a house for sale, “I’d like to know more about it.” – Oh my God!
– Right? So, as you’re speaking, imagine the agent that is answering the
phone kept the results of all the open houses on Bush Street – On Bush Street
– and Powell – That is amazing!
– So, it basically reduces the
amount of time is takes to help customers, and it just
creates a better experience. – Oh my goodness, this is amazing. – It is, I’m happy tell
you, it’s been exciting just working with this product
from the very beginning and launching this product. – Yeah. – Just because everything
we’ve done so far has been APIs that gives you functionality. – That’s right. – But the media streams we
are giving you a raw audio. – Absolutely. – So you can just build whatever
the use case that may be. You can build voice authentication, so you can actually authenticate over the sound of your voice
– That’s right. – ends up capturing critical information
– That’s exactly right. – There’s a lot more possibilities. For example, Jonathan
is building something to analyze background noise. – Wow! – So, that he can basically
detect if there’s an agent that has the background,
a very noisy environment. – Wow! – Ha cen remove that
automatically from routing so that agent doesn’t get calls – Wow! – and that situation is addressed, which basically streamlines
your customer experience. – That is amazing! Oh my God, can we see a demo? – Yes! – Okay, so we’re going to
bring up Kris’ computer I want to give a quick
shout out to Layla Codes It who came and it was the hand of God who fixed our panel here. You all were calling it
out, thank you Chat Room. Also, really quickly
before we hop to the demo, I want to say a quick thank
you to a number of followers who just follow the channel,
we’ve got a Straw-hat Raider, we got An-bruta, we’ve got Lord of 96 as well as Money Montano. Thank you so much for
hitting the follow button. If you like what you’re seeing, if you like what you’re hearing, if you want to follow the
SIGNAL conference experience for the next day and a half,
hit that follow button. You’ll know when we go live, you’ll see our schedule of content and all the amazing live
demos and interviews to come. Okay, let’s get back to it, – All right, – So, here’s Kris’s computer. – Yeah, so, I have a couple of demos, that, a Corey was talking, I was like, “Oh what demos should I show?” – Yeah. – So, I’m going to show two demos. One, I’m going to just dive right into something I’ve already built, so it just saves us a little bit of time, so I can do second demo. What I have here is a simple Studio Flow. So, if your not familiar
with the Twilio Studio, it’s our regional builder
that our customers can use to drag and drop and build
your voice application. In case you’re wondering,
I can write code, but I decided to use Studio today just because it’s simple
and easy to visualize. – Well it’s super simple
– I was going to – to follow this.
– Right. Exactly, so here I have a simple Flow. It says when you call, say please hold and what I’m doing is, I’m
basically taking the phone call and forking it three times. – Whoa. – So, three independent forks, right? – Really? – Yes, so one fork is going to go over ngrok, our favorite tool. – Okay, we love ngrok. – And come to my laptop,
so you can actually see the stream going through. Another one is going to go
directly to Google Cloud so, that from Twilio it’s
gonna to go to Google Cloud, it’s going to transcribe.
– That’s right. – It’s going to show the transcription. – That’s exactly right. – And the third one is going to AWS where I’m basically
putting the stream into SQS so that later on I can create multiple consumers to analyze audio in real time – [Corey] This is amazing.
– [Kris] Yeah. – [Corey] And we’re doing
all of this in Studio? – [Kris] We’re doing all this in Studio, and you can see there’s a state that just protects your speech. Here’s stream one if you can see, I’m zooming in a little bit here. – [Corey] Well zoom just a little bit. – [Kris] Yeah, there you go. And that’s basically
sending it to the Google. This one’s going to ngrok and this one’s going to AWS, and I’m parking the car in a queue because if the call ends the stream disconnects – Oh, cool. – So for now, I’m parking
the call in a queue. – Got it. – And I’m just going to publish this – And we’re publishing this. What happens when we publish this flow? – So, what happens now
is the, so far everything I’ve been doing is a draft,
so when I publish it, whatever I’ve been
working on becomes live. So that when I make a phone call – Got it.
– Then the new Flow takes in effect. – Got it.
– It just allows you to make changes and
save it and publish it later – Right. – Without having to
break your application. – That’s cool.
– That’s really the key here and I already
had wired a phone number here so, I’m going to quickly
make a phone call. – We’re going to call that number? – We’re going to call that number. – Okay. – I’m going to, would you like to call? – Oh, I would love to call. – Do it. So I have ngrok running
here and I have a simple, let’s see where I am right now, okay. – I’m not love that we’re in the terminal. Do you mind bumping up
the zoom a little bit? – Not at all, so by the way, the code I’m showing you
right here is something that’s available for you all to use. We have a public GitHub recall where it records samples. – Nice. – So to get started with media streams, once you have the repo, it
takes you like 10 minutes. – Oh my goodness! – Or less to start transcribing your call. – Ten minutes or less,
now you heard it here. Listen, if it’s not 10 or less,
don’t blame me, blame Kris. – That’s right.
(laughing) That’s right. So, I have a simple FH
here, no bells and whistles and sorry, let me just point
you the number right here. – Yeah, so we’ll call
this number, 505-539-2147. Now we’ve just broadcast
this to the internet. – I know (chuckles)
– Let’s see how this goes. – I’m a product manager, not an engineer, so I don’t write scalable applications. (laughing)
Thankfully, we have engineers for that but as you can
see this is an audio stream that’s coming through
– Oh my God, do you see this scroll?
– my laptop. – This looks like the Matrix. (laughing) – Go ahead and say something Corey, well there you go, your transcription – Oh, look at that, it’s
trying to figure out what I’m saying and it’s doing it live and I am not even doing anything except calling into the number. – [Kris] That’s right. – [Corey] This is crazy. Oh my God, I’m shocked,
we’ve just seen the Matrix. We’re now doing a live call. Let me hang up before we continue. – And here’s the SQS, like I said. You have also the messages
that are actually in SQS. So you can process them at a later time with multiple consumers. – Oh my goodness. – I’ve got to say, I get excited every time
I do a demo of this API because I am super excited. My weekends are now actually consumed on what can I do, what
kind of demos I can build- – Yes, with the media streams. – Yeah, actually, I
should’ve brought it today. I built a Raspberry Pi car. – Wait, wait. – It’s called Pi Car.
– You built a Raspberry Pi car? – Yeah, so we can actually
stream audio to it and then have it move forward,
(laughing) backward, and all that stuff.
(laughing) I just really, I didn’t bring it today, but I just couldn’t. So I have to do something. – Okay, well, I got to ask two questions. Before I do, I gotta say,
you said you got excited, the chat room said,
“Corey just got shook.” I sure did because I saw the Matrix, and I never thought I would
see the Matrix in real life in like a way that actually worked. But I gotta ask the question
because folks are asking, what is it that we see
here in the console? Is it the same thing we see in SQS? – Yes, yes, absolutely right, so what I’m showing you here is, just to give you an example, that when we fork the media and stream, we’re kind of sending
it to you in a developer friendly fashion, so it’s
JSON pay load that comes to you every 20 milliseconds
with the basic data, so it haves enough for you
to know what the call is. There’s unique SID as
well as audio packet. By the way- – Everything has it’s own unique SID. – Yes, the stream
– Oh, the streaming has it’s own unique SID. – And you have a chunk
number that you can use if the messages are arriving
in a specific order. – For a specific chunk, oh my goodness. – Right, so if you have
a distributor system, you can put it in. For example, SQS don’t always go in order so you can always use
that order for items. One thing really cool
is when you are silent you just basically see
something like this, so you can actually see analysis of this. So this is when you’re speaking, right? So, this is actually audio
data when you’re speaking – And then there goes silence. – And then silence, and
this is all silence. – Oh this is amazing. – Right, so you can do SIGNAL analysis, you can do lots of different things. We have customers who have said – Yeah. – they’re looking for background analysis. So, one thing I was going
to show was this morning I actually disabled a sentiment
analysis aspect of this. – Okay. – I don’t know if I get to show you that, but what I was going to show you was – I love that we’re doing it live friends. This is how we do the things,
the most dangerous demos, we do them live. – So I just basically disabled it, I was going to show for example,
if I find the right thing. Yeah, it’s right here, I basically said (8-bit music) Do that and then I save and
then all I have to do is… – Now we’re deploying this to G-Cloud. – Yeah. – I should’ve done this
at the very beginning ’cause it does take a little
while to spin out the new stuff but essentially you get the idea. So, now the whole thing is
running, that is auto-scalable and this is a page that
basically comes up from that. So maybe you know for fun, if
you have a few minutes here, I can just show you on my laptop, Corey. – We sure do. Do you want me to call back in? – In a second, actually. – Well, while you’re doing that, there was a good question
that came into the chat room, and I thought it’s a good
question for us to answer, which is, so we did this
off of a phone call, will this also work through like audio on say a WhatsApp call? – So, it will work on any call
that is active within Twilio, – Oh!
– So if you’re filming with Twilio as long
as you have a call SID, and the call is active,
you can fork to media. – Okay. – So, that includes WebRTC,
SIP calls, PSTN calls. – That is amazing! – And very soon you’re
going to also be able to do that with video calls. So, you can actually fork
to video steam as well. – That is amazing, thank you
for that question C Sharp Fritz and welcome to the stream,
super glad to see you here. I love that we’ve made this
accessible, easy, kind of to consume and use. – Absolutely, so let me see if this works. We’re doing it live so… – Do it live! – Let’s see, all right, so
go ahead and call back in and perhaps Corey, you
can show some emotions about how you’re feeling about this, and so we can actually, let’s make sure that actually is working. – Absolutely, absolutely,
I am so excited to be here on the screen with Kris. I’m really excited. I’m hoping that we get a happy face. We haven’t gotten a happy face yet. – You’ve got to pause for a second here. – But we’ve paused for a second, – There you go. – And there goes our happy face, now Ellie Face has asked us to get mad. So, I’m going to say that
this is disappointing. (laughing) This is hurting my feelings and now we’ve done this in real time. Oh my God!
– Are you also very angry? – I am very angry! Oh, look at that face! Look at the shock here. So when Corey tries to feign anger, this is what this looks like. (laughing) Thank you, Kris, I love how this goes. This is amazing. – Yeah, so what’s happening is, I was only doing sentiment
analysis on a completed stream, so, if you don’t pause then it’s going to continue to transcribe – As one stream. – As one stream ,and
then once they’re done, we send it over, but I think you get the idea of it. The key here is that
you can take the text, and do a lot of different things with it. – That’s exactly right. – As you can see there’s
a profanity filter on. – That’s good, yes, so
it catches the things. – (laughs) Absolutely. – Okay, so I gotta ask, folks are asking, you know as we start to think about how to the next steps here, folks are asking, how can they get their
hands on this today? – Yeah, it’s easy, so two things. One is, we have API
docs publicly available and second, our wonderful DevAngels team has put together a quick tutorial. – Shout out to the
DevAngelists on the team. – That’s right, Craig Dennis,
big shout out, he’s amazing. He helped me put together a tutorial. So, once you start, from the beginning to the end of the tutorial,
five minutes in, you’re out. You’re off and running. – Wait we have another guest. – Oh no, Ricky. How are you doing? Hey!
– Oh my goodness. – Hey!
– It is Ricky Robinett, friends, now if you caught
the keynote earlier, Ricky was our fun friend who
decided to kick off his shoes and through down his luggage
– I loved that! At the end of the keynote,
it was a ton of fun. Ricky, we’re glad to
have you on SIGNAL TV. – [Ricky] Yeah, I don’t want
to interrupt, but I did. (laughing) Are y’all talking media streams? – We were. – [ Ricky] I mean – We were talking media streams. – [ Ricky] How amazing are media streams? – We just had a ton of fun
with sentiment analysis on media streams. Talk about the amazing things that you can do
– That’s how Corey looks when he’s angry. – [Ricky] Yeah, yeah, I’ve seen that. (laughing) The Hurricane gets going and – The Hurricane has feigned anger and there, that’s what you get. (laughing) – I was also showing
how we can easily deploy into Google Cloud with one step, but it’s taking a while because it needs to spin up an instance, but I think, yeah. I’m excited, as you can
imagine, with media streams. It’s an amazing product. – Yes, I am excited too. I’m super glad that you
had the chance to stop by. I’ve got to ask Ricky, since he’s here, Ricky, the chat room wants to know, did you get your shoes back? (laughing) – [Ricky] I did, I did,
I was really nervous, but thank you, thank you for caring. (laughing) – They were concerned, I
see the question there. Revertibles, Ricky did get his shoes back, I just wanted to make sure we all knew. – [Ricky] Yes, thank y’all. – Hey, thanks for stopping by. – Thanks for stopping by Ricky.
– Thanks for coming, Ricky. – [Ricky] Congrats on the launch. – Thank you. – And we’re actually about to
move over to our next guest but before we do that, we’re going to say a
big thank you to Kris. – Thank you Corey. It’s amazing.
– Hey listen, you are super popular. Ricky stopped by, Jonathan stopped by. I wish our next guest would be so popular, let’s see friends. – (laughs) Thanks a lot,
thanks for everyone joining. – Listen, we’ve got some
more content coming up. We have, we called a hidden session, our next guest is going
to be another phenomenal Twilio employee, a person
by the name of Ashley Roach. – Oh, yeah.
– Who’s going to spend sometime going
over one of the products that we saw earlier today, we
all got really excited about, that is the Twilio CLI. So join us back here in about 12 minutes, we’re going to get started,
2:30 pm, Pacific Time. And until then, it’s your
boy, Hurricane Weathers. I’ll see you soon. – Thanks everyone. – [Host] If you like what
you saw in this interview, please consider heading
over to and giving us a follow there. We’ll have more content
like this there very soon.

One thought on “Kris Gutta – Twilio Media Streams [SIGNAL TV Interviews]

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top