Articles

Google Cloud Platform (GCP) Essentials (Google I/O’19)


ALEXIS
MOUSSINE-POUCHKINE: Hello, and good afternoon, Google IO. Thank you for being here. Thank you for everybody on
the live steam, as well. The cloud is what powers
an endless number of web and mobile apps, and
that makes them magical. The easy access to potentially
huge amounts of compute, storage, and to
machine learning APIs is what makes the
cloud so special. And something that
you probably should try to leverage in your
applications whenever possible. My name is Alexis, and
I’m a developer advocate with Google Cloud. Now, a few questions for you. How many Android developers
do we have in the room? How many web developers? Anybody here has
used Firebase before? A good number, that’s great. Now, to be fair, server side– or otherwise,
back-end development– can look tedious to some folks,
or just not your cup of tea. And that’s fine. Luckily, Google has
this rich, open, and developer-friendly cloud,
Google Cloud Platform, or GCP for short. Let’s take a look and
start with Firebase. You might know Firebase for
all the amazing features it offers to enhance
your client code– authentication, crash
reporting, analytics, A/B testing, and the fairly
recently announced ML Kit. I’d like here to
cover everything that is not a client-side feature. And that’s because Firebase
is a wonderful way for Android and web developers to
leverage Google Cloud Services without exposing them
to all the knobs of GCP. So first, there’s cloud storage. This will let you manipulate
user-generated files and application files,
such as pictures, videos. Think of it as a
way to save files to the cloud for later retrieval
from any authorized device or service. Next, Cloud Firestore
is the JSON database for all your application data
with real time notifications. This means that if a specific
value changes in Firestore, all your connected
users can be notified. This is truly amazing. Cloud Functions is the easy
way to add server-side logic without having to manage
any server or any cluster. Simply provide a piece of code. specify the event that triggers
the execution of that code. And that could be, for example,
an image uploaded to a bucket, or some data changing
in Filestore. Finally, ML Kit for
Firebase offers the ability to add machine learning
features to your apps using APIs that run either in
the cloud or on device. So this includes text
recognition, face detection, barcode scanning,
image labeling. And as announced at the
technical keynote yesterday, we now have also something
called AutoML Vision Edge, as well as on device translation. So why am I talking
about Firebase, here? Well, it’s simple. These features that
I’ve just described are all based on Google
Cloud Platform, or GCP. And if you only remember one
thing from this presentation, it should be that you
should start with Firebase– if you haven’t done so already– and then grow to use GCP. When you create a
Firebase project, it is, in fact, a GCP
project in every aspect. Resource grouping, identity
management, and billing. This gives you a great
way to easily use some cloud services– before you graduate
into using GCP– when you need to. So let’s talk about
Google Cloud Platform and give you a sense
of how it can extend what we’ve just briefly seen. So Google Cloud
Platform, GCP, is big. I mean, really big. We recently had a large
event in San Francisco called Cloud Next with tens of
thousands of participants, with some sessions just focusing
on one specific feature. So I can’t possibly be
covering everything here. Instead, I’ll try
to cover what I believe might be useful
to you and give you a few tips along the way. Maybe you already have some
sort of back end software, and you’re just
looking for a place to host it, potentially,
in the cloud. So that will help you spend
less time on infrastructure management, get potentially
better security, and essentially spend more
time writing your app. If you are familiar
with virtual machines, Compute Engine is a great
environment for you. Compute Engine, or GCE, offers
virtual machines, disks, and networking. As you can see, we can go
from very small to very large amounts of CPU and memory. And that can be
done using sliders, as shown here, to get the
perfect configuration. You can also customize
your virtual machines by adding GPUs and Cloud
CPUs, tensor processing units. There are many delightful
features about GCE, but one of my favorites
is the little time that it takes to
provision and boot a VM. But it’s probably best if
I show you this in action. So with that, let’s switch
to the demo machine, and let’s create a VM. Let’s take most of the defaults. I will call this instance
one, just to make it more fun. Create this in Europe. And you can have
different presets. You can see the sliders
that I just talked about. You can change the image,
the OS that you’re using. There are many to choose from. You can set identity and API
access settings and setup a firewall to
allow or disallow– which is the default– any HTTP or HTTPS traffic. So let’s actually go ahead
and create that instance. So we’re actually provisioning
here the instance. We’re creating this, in
this case, with a DBN image. With all the defaults, this
one has one CPU and, I think, four gigabytes of memory. And it is being created,
as I said, in Europe. And that’s it. We have the VM that’s
available, that has an external IP address. And we can, at this point,
SSH into the machine. And notice, I didn’t have
to install anything here. I’m just using a web browser,
clicking through a button, and here I am, actually
connected to that machine that didn’t exist a few seconds ago. So I can look at it, and I
can do things such as apt-get. Update. And I have a fully
working machine, which I can resize,
which I can delete, which I can recreate
very easily. So, with that, let’s
go back to slides. This is just a few seconds
to get a new machine running in a Google data center. You can also do this with
the command line, which means you can easily automate
the process of spinning off a machine, or multiple machines,
when you need them, and also shut them down when you
no longer need them. There are many features
with Compute Engine, some that I would like to point
out are live migration, which is really a unique feature
to migrate running applications from one machine to another. Preemptible VMs offers
you short-lived, low cost virtual machines, something
that I would recommend that you use if your jobs,
your running processes, can be interrupted. We also have some
automation features, such as instance groups, to
administer virtual machines in batches, to achieve
some level of scalability, and even to do auto repair
on production instances. Now, let’s switch gears
a little bit and talk about storing data, and
databases in particular. Storing data is likely to be
vital for your application. You could, of course, spin
up a VM– as we just did– install your favorite
database there. But here, we’ll go through some
cloud, or GCP native, storage and database options,
just a few to give you a sense of what’s available. Chances are you have files
that you would like to store. And cloud storage, in this
case, is a pretty obvious choice here. You can put all of your files
into one or multiple buckets, and prices for storing these
varies depending on what we call the storage class. To give you a sense
of the cost, you can host all of the
internet archive– and that’s about 10 petabytes– for less than $100 per month,
all with no disk to manage. No capacity issue
to worry about. So cloud storage offers
different classes, as I said, of storage. These classes range
from having data as close as possible
to the end user– this is multi-regional–
to archival needs. The regional option
here is good for when data needs to be close to the
machine that processes it. The long-term storage,
Nearline and Coldline, are technologies that
come with retrieval cost but are ideal for archival data. But really, best of all here
is that there’s a single API to manage the lifecycle
of your objects, regardless of their
storage class. And you can move objects
from one class to the other, so data that’s no
longer used can be pushed back to
something like, let’s say, the Coldline storage. Now, for user data, such as
their profile information, the transactions associated
with those users, you may want to use a
relational database. And rather than
setting up, managing, and securing your
own installation, you can use Cloud SQL instead. This will free you up
from monitoring uptime of that database,
from managing backups, and from applying
security patches. Cloud SQL also
offers the ability to define replicas for a
highly available setup. Their currently supported
databases are MySQL, Postgres, and SQL Server coming soon. Now, NoSQL databases
are known to scale, regardless of the amount of
data that you throw at them. And Cloud Firestore
is no exception. In fact, this database offers a
JSON structure with no schema. It is strongly consistent. It is indexable. And it is serverless, meaning
that there is no infrastructure to size, provision, or manage. And it comes with great SDKs
for mobile and web development. If you’ve ever used the real
time database for Firebase, it’s all the good
things you know with added strong consistency,
better querying capabilities, multiple data centers
around the world, and still the great notification
and offline capabilities that you know. So which one do you choose? Well, for files such as
images, PDF documents, it’s pretty easy. Cloud storage Is most
likely what you need. For user data,
transactional information, good old relational databases
offered in a Cloud SQL package are probably a good choice. If you need horizontal
scalability, a schema-less database, change
notifications, mobile offline support, you’re probably looking
at Cloud Firestore, which you can use, by
the way, regardless of any other Firebase features. Now, let’s go beyond VMs and
into Cloud native solutions. Now, don’t get me wrong. Virtual machines are great. But management remains your
responsibility– provisioning, patching the OS, updating
it, and all the securities left for you to do, which
means time not spent developing your app. And also, the scalability
is pretty much vertical. You can make the VM bigger,
but you can’t really play on the horizontal axis
with clustering technologies. And availability is
also something that is hard to add after the fact. So here, we’ll talk about
cloud native approaches to running your code– Kubernetes, Cloud Functions, App
Engine, and the newly released Cloud Run product. So maybe by a show of
hands here, how many people know and use Docker containers? Great. So containers are solving the
works on my machine problem by packaging an app with
all of its dependencies– including its runtime– into a container. But if you use
containers, you also probably know that that
does not solve all problems. You still need to schedule
and scale those containers. You need to manage their
health, monitor them, and more. This is where schedulers,
such as Kubernetes, come into the picture. As the inventor of
Kubernetes, Google offers Google Kubernetes
engine, or GKE, a fully managed Kubernetes service. If you were starting
from virtual machines, or from your own servers,
setting up Kubernetes clusters would mean that you have to
deal with, well, actually creating the virtual machines,
attaching some storage, installing the actual
Kubernetes software, setting up some
networking and security. And that’s a lot of work. Instead, with Google
Kubernetes Engine, GKE, it takes one command line
and just a few minutes. Once created, the
cluster is ready to host your containerized applications. Kubernetes version upgrades,
auto repair of failing nodes, and other features are provided
out of the box with GKE. Now, while GKE offers a
wonderful, portable platform, it comes with a requirement
to first containerize your application. It also still requires
creating and managing a cluster with worker nodes. So this isn’t really
serverless, which we define as something that
has no server management, which is fully secure by default– this is not your problem– and that has to pay per
use through auto scaling, including scale to zero. This all sounds nice. So let’s take a quick
look at some GCP products that actually qualify
for this definition. So an obvious and popular
example of serverless is Functions as a
Service, or FAS. And Cloud Functions is
a great implementation of that paradigm. Simply upload some
code written in Python, and go in Node, or even in Java,
along with its dependencies, and define the
event that triggers the execution of that code. The events can be
anything from an HTTP request coming in to a file
being uploaded to a bucket. Data changing in Firestore. Messages being posted
to a Pub/Sub topic. Cloud Functions has
been used by customers to implement everything
from, what we call, glue code to a fully fledged microservices
based applications. But enough talking. Let’s see Cloud
Functions in action. If we could go back
to the demo machine– This is a function
written in Node that will be triggered
when a file– a picture, in this case– is
uploaded to a specific bucket. So we have an event
for a given bucket, for a given file
that has a name. So the first thing we do here is
we download that file locally, and then we use a library
called ImageMagick to actually do the resize of that
picture, and to resize it to width of 256 pixels
while preserving the ratio. We write those to a local file. And if everything goes
well, we upload the result with a prefix called
resize to that same bucket. So this is the code. There’s some actual metadata
in terms of dependencies, where we declare dependencies on Cloud
Storage, which we listen to, and the ImageMagick
version that we use. We also define what triggers
the execution of this. And in this case, it’s
an upload to this bucket, which I can click on. This bucket is empty, and
I suggest that we actually upload a picture there. So as we do this, we can look at
the picture that was uploaded. And we can go back here. And hopefully– as I
refresh the bucket– we have a second
picture that’s there. And that has been the resized
version of the initial one. So here you go. These are Cloud
Functions in action. And maybe we can move
now back to slides so I can tell you that
these are available in multiple languages, as I said. This was Node, but you can
use Python, Go, and Java. And there are many events that
can trigger them, not just file uploads. I think Cloud Functions
is the easiest way to access one of the
many powerful GCP services, from machine learning APIs, to
other storage and processing solutions. Now, as a developer,
you may want to have yet even more
freedom in the languages and the frameworks that you use. And most importantly,
you may want to hand over a
carefully crafted Docker image instead of source code. Cloud Run was
announced last month at the Cloud Next conference. And it is here to offer you
a truly serverless experience for your stateless
HTTP container images. So the events are
HTTP, and there needs to be no state preserved
by the container for this to work. But if that is something
that works for you, well, simply build your image,
upload to a registry, and create a cloud run
service using that container. At that point, your
app is now deployed and running in the Cloud. And you can forget about the
provisioning and managing of servers. Cloud Run does that for you. It will automatically, and
quickly, scale up and down based on the incoming traffic. It will even scale to zero,
meaning, no traffic, no cost. But even more, you can
use the same container on your own GKE cluster. If you really want to understand
and master the underlying infrastructure, you can
do so since this is all actually written on top of
an open source technology called Knative, which
provides an abstraction layer on top of Kubernetes to
provide a server environment. So choice is good, right? Cloud Functions. Cloud Run. Well, guess what? There’s even more choice. App Engine is the mother
of serverless at Google. And it offers the ability
to build and host entire web applications with multiple
services while still retaining the source deployment approach,
all of that, obviously, with serverless benefits. It comes with
versioning built in. It provides out-of-the-box
traffic splitting to implement things such as cannery
deployments or A/B testing, all at the click of a button. It also supports a
long list of languages, including recent versions
of Python, Java, Go, PHB. And we recently
announced Ruby, as well. So at this point,
you might be confused about which one to use. Let me suggest that you
think of it this way. Which artifact would
you like to deploy? Would you like to
give me a function? Would you like to give me an
app that has multiple services? Or would you like to
give me a container? All of these are serverless. All of these will scale to zero. And we will manage the entire
infrastructure for you. OK, so intelligence. Now, I’m not trying to imply
here that your apps are not built by smart
developers, but instead, that there are some
amazingly low hanging fruit to make those
applications even smarter. I’m talking here mostly
about machine learning, and specifically about
easy-to-use APIs, which any developer can call,
regardless of their ML skill set. We call these AI
building blocks. And they can group into
the following categories– the Vision API and the
Video Intelligence API, this is the site category. We have the language category,
with natural language and translation APIs. And we have the
conversation category, with speech-to-text,
text-to-speech, and Dialogflow APIs. I mentioned ML Kit
for Firebase earlier. What you have here is the
server-side machine learning APIs that ML Kit actually uses. So these APIs are available
via RESTful endpoints, making them easy to be
called from any part of your application
or your architecture. Certainly, if you’re building
mobile Android or iOS apps, ML Kit for Firebase is a
great way to use these. So let’s look at the Vision
API, which is one of those APIs. And if we switch back
to the demo machine, I can actually test this
API right from the browser. I can upload a
picture, same picture. Did I click on it? All right, let me refresh this. And I am not a robot, I hope. So I’m sending this
to the Vision API, asking it to return
and tell me everything it finds about that
picture, from entities to landmarks, to text, to
web properties and entities. So it has detected that
this is indeed a landmark. This is Notre Dame, in Paris. There are a bunch of
labels that it found. This is machine
learning working for you with a pre-trained model. There are web entities
with all the things that it finds on that picture. It even finds text. If you were to zoom in, you
could see that the barge here is called Nouvelle Seine. There’s some image properties,
such as dominant colors. And last but not least, there’s
what we call Safe Search. Is this picture safe from a
adult, spoof, medical violence, or racy point of view? If you have a website that
has user generated content, you probably owe it
to yourself to use something, such
as the vision API, to make sure everything
that’s upload is actually something you
can then show to other users. All of this is actually
the result of a request, asking for landmark
detection, face detection. We don’t have any in this one. Object localization,
we don’t have any. But image properties,
crop hints, web detection. And the result is a JSON
document, which is actually, parsed and presented
here, in this UI, but which, you typically would
be using in your application to enhance your application. So if we move back to
slides, AI building blocks, in the form of API calls,
can be extremely powerful. And they’re really
easy to set up. Again, they’re really
just an API call. So those APIs, such as
the vision API, are great. And we call these
pre-trained models, meaning that Google did the
heavy lifting of training a model, leaving you with
the easy prediction part. Send an image. Get a result back. But what if you wanted
to build your own model from your own data to better
fit your business needs? This is where Cloud
AutoML comes in. This is another part of
our AI building blocks. AutoML lets you create
your own custom machine learning models with an
easy-to-use graphical interface. These models can be specific
to your business needs and trained with your own
data with minimal effort and little to no coding. Now, if you’re an ML
developer, a data scientist, you may want to have complete
control over the training and prediction phases. But you probably do not want
to have too much infrastructure overhead. And this is where the
cloud AI platform comes in. This is Google’s data science
development environment. We offer AI platform notebooks. These are managed
Jupiter Labs notebooks integrated with all the big
data products you find in GCP. Cloud TPUs, and the newly
announced Cloud TPUs pods, are hardware
accelerators designed to speed up machine learning
workloads for training, and prediction, and inference
programmed with TensorFlow. Deep learning VM images are
pre-configured GCE virtual machines for deep
learning applications that use TensorFlow,
PyTorch, Sidekick Learn. And it’s trivial to
add Cloud TPUs or GPUs to these virtual machines. So Cloud AI platform
offers tools and products for probably the entire
lifecycle of machine learning development if you’d
like to control everything. Now, switching gears
a little bit here. We’ve talked about
storing the data. Let’s talk now about
processing your data at scale. Chances are your apps, web apps,
mobile apps, back end apps, generate some valuable data. And you would like to
turn this into insights. This takes potentially
massive amounts of compute power,
data processing resources in general. And the good news is that GCP is
really great and amazing place to do just that. BigQuery is an amazing product
to make sense of your data. Simply send your data– as much as you like– along with an SQL query. The data will be processed
in just a few seconds, thanks to BigQuery’s
unique and really massive back end architecture. Cloud Dataflow is Google’s
implementation of the Apache Beam programming model. And it offers to
process and transform massive amounts of data both in
batch and in streaming modes. Cloud Dataproc is a hosted
Apache Hadoop and Spark version that will spin up a
fully managed cluster in less than 90 seconds. It will resize it dynamically
and offer, overall, great cost performance for
any Hadoop or Spark job that you have. Let’s take a look at
a quick BigQuery demo and move back to the
demo machine, please. So 400,000 GitHub
repositories, one billion files, and one question. Spaces or tabs? Well, we might have
the answer today. We look at all of those files. We just ignore those there
are less than 10 lines long. And for every file, we
give a plus one if its tabs or a plus one if its spaces. And if it’s a mix, we just
decide to vote only one for whichever comes more
often, spaces or tabs. This is the GitHub,
the real data. And what we have in this query
is the ability to run a query against that table
and to actually– for every single line
in every single file– run a regular expression,
counting the number of tabs and the number of spaces,
and summing up all of this. So what I suggest is
we actually run this. This will process 133 gigabytes. It shouldn’t take more than,
say, 10, 12, 13 seconds, maybe, and hopefully
give you an answer. Down to 16 seconds. 18. That’s not actually bad. And so for every language,
every popular language– and we base the query on the
extension of the files we found in the repos– we calculate a ratio. Does it have more spaces or
more tabs for every language? And you could see that Java
tends to be more about spaces where Go is all
about tabs, clearly. So there you have it. We actually know
the answer to one of the most crucial
and important questions we’ve had in this industry. Spaces. Now, the amazing thing
here about the query is that we’ve analyzed
each file, again, with a regular expression. For 133 gigabytes of
code in 10-ish seconds in interactive
time, as we call it. There is no need to
go and grab a coffee. Or come back the next
morning to get the answer. So you can iterate
them on your queries. And this is what the
graph looks like. And credits should
all go to Felipe. And you can query this. All the details are there. You can run the query yourself. And you can look at
all the data and see how it evolves through time. But the answer is spaces. So, to recap, we’ve
talked about how Firebase is a great foray
into Google Cloud Platform. We discussed virtual
machines and databases to bring the software stacks
that you love and know to GCP. Next, we talked about
Cloud Native and serverless and how to choose
the right solution. And finally, we covered adding
machine learning and data processing to your
apps and architectures. Before I close, let me
share a few tips and tricks as you get started with GCP. Google Cloud Console is
where you will likely spend a fair amount of
time exploring and using the platform. This is where you
configure billing accounts. You create and you
manage projects. You manage all
your GCP resources, regardless of the
data center location. Every product and every
service has its own section in the console. It has dashboards, detailed
configuration, and settings. There is Cloud Identity
and Access Management. It’s a team of people working,
or an entire group of people, so you set people up with
the right permissions. And there’s even a mobile app
for monitoring and managing your apps and
resources on the go. So while the console is
super powerful, flexible, you can also do everything
with a gcloud command line. So for every action
in the console, there is a gcloud equivalent. So gcloud is our scriptable
and almighty CLI. Cloud Shell is a shell
environment hosted on GCP, and it manages your
projects and resources. It’s accessible
from a web browser. And it’s powered by a small
virtual machine with persistent disk space, and
up-to-date software– Git, Docker,
containers– I mean, compilers, all of these things
for all your development needs. And it even comes with
a web code editor. So GCP resources are the
fundamental components that make up Google Cloud Services. Typical examples include Compute
Engine, virtual machines, Cloud Pub/Sub topics,
Cloud Storage buckets, Cloud Functions, and so forth. And those resources
can be organized into projects and folders. This means, for instance, that
once you delete a project, all the resources
attached to it can also be deleted, which is a great
way to keep a clean environment and to keep your
costs under control. So what do you need
to get started? Well, the first thing you
need is a Google account. You can create a
new Google account, or you can use an existing one,
such as your Gmail account. I would recommend that you
enable billing for your project and that you sign up
for the $300 free trial to get started at no charge. If you do not sign up
for the free trial, you can still benefit from the
fairly generous, always free tier that GCP offers. So $300 is actually
quite a bit of money, and enough to kick the tires
of GCP in a number of ways. You could have six VMs
running for one year– they’re fairly small,
but that’s six of them– or a four Node
cluster of bigger VMs for a Container Engine, or
GKE cluster, for three months non-stop. Or you could store ten terabytes
in a multi-regional storage bucket, which is the best
performing storage class, for one month. The billing section
of the Cloud console is where you manage
billing accounts. And you link project to
those billing accounts. And a billing account is
really a payment method, one or more credit cards or
bank account details. You can change a billing
account for a given project at any point of
time, and you could set budget alerts that helps
you manage cost, and set up triggered actions for
projects or accounts, as well. You can also generate billing
exports as well as reports to better understand your span. This is important when you
start using Cloud at scale. So beyond the web console,
the command line tools, GCP also comes with a number
of built-in additional tools. To start with,
every project comes with a private by default
Get repo called Cloud Source repositories, which is free
for up to five users, and 50 gigabytes of storage. So staying with resources that
are private to your projects and teams, GCP also comes
with container repository to store your container images. Once your container
image is in the repo, that means it’s on
Google’s network, which means, in turn, that the
deployment to GKE and Cloud Run are really fast. And when it comes to building
container images, or any code, for that matter,
there’s Cloud Build, a fully managed CI/CD platform. This includes building–
as the name implies– but also deploying
to VMs, to GKE, and to serverless products. Now, product naming, for us
and for everybody, is hard. But remembering
all of these names can be overwhelming
for you, as well. So here’s a cheat sheet
with concise definitions of all products. GCP in four words or less. And this covers every
product I’ve talked about but everyone, also,
I didn’t talk about. Now, as I come to the
end of the session, I’d like to do a shameless
plug for a series of videos on the Google Cloud
YouTube channel called– no surprise– GCP Essentials. The goal here is to cover–
in fairly short episodes– something that is
helpful for people who are actually getting started
with Google Cloud Platform. Check out the video, please. And if you like
them, do subscribe. So that was a lot
of ground covered. I’m leaving you with some links,
including the one for the four words or less. I hope this was time
well spent for you. I hope you will consider
bringing the awesomeness of GCP to your existing
and upcoming apps. Thank you to everybody
on the live stream. And for everybody, feel free
to hit me up on Twitter. Thank you very much. [APPLAUSE] [MUSIC PLAYING]

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top