Importance of Natural Resources

Apache Kafka Ecosystem “As-a-Service” on GCP with Confluent Cloud (Cloud Next '18)



hello it's 1:55 as we all know this is the time when your circadian rhythms pashya body into the most alert state ready to learn about Kafka so if you're here to learn about appacha Kafka and confluent cloud they managed Kafka service on GCP you're in the right place if you are here to learn about stream analytics and ingestion and delivery of event streams to real-time applications you are also in the right place if you are not here for any of these things you're also in the right place because surely these topics will become important for your organization sooner than later my name is kir I'm a product manager in Google cloud platform and I work on stream ingestion and delivery this means that I spend my day working with users of Google cloud pub/sub our native ingestion and delivery service and users of Kafka the open source event ingestion and delivery system and today I'm finally able to extend a proper welcome to the Kafka users to GCP now that we have a managed service to talk about now this has been a long time coming I try I try thank you for those Snickers see now I don't have to ask you to raise your hands if you run cough Korean production or not and for everyone else no worries I'll catch you up in a second and apologies for the bad joke so here's how we'll do this I will spend a few minutes setting the stage I will talk about what we mean by streaming ingestion and delivery what companies do with stream ingestion and delivery systems we'll talk about application patterns and I will answer the question of when does it make sense to use Kafka on GCP even though we have a native service and then I will do my real job and get out of the way of the engineers and have boy gasping uncommon stage bogie is a director of engineering at the New York Times who has put a kafka cluster on the way of every single piece of published content in this large media company he'll tell us why and what they did for his company and the cost of running Kafka in that critical role and once we've once we've gotten through that Gwenn Shapira the senior architect from confluent and a longtime cough card we could we'll come and share confluence vision for what happens when Kafka and the entire ecosystem of Kafka tools by easily accessible within an organization and she will give you the demo as well so let's let's get into it streams of events everywhere we're getting increasingly increasingly good at capturing interactions with devices digitizing them so this is no secret and as engineers were increasingly often challenged with turning these streams of events into useful features useful interesting entertaining things for for our users so for example in gaming and media we're using in session user activity to adjust gameplay as users play we're adjusting content based on browsing history and we're certainly adjusting the advertisements as they show up now in e-commerce well first of all if you in e-commerce you already also are in the e-commerce and media business but you're also building recommendation systems right and so the recommendations that you show me on page two had better know what I just did on page one 15 seconds ago in finances is of course been a long-standing problem well let's say the risk group needs to aggregate events that are trades in various departments of a bank as well as the as well as all the market activity that's that's happening outside so what's new is the sheer scale now it used to be so so people say maybe twenty years ago you could you could have a very successful large business with a thousand person call center generating a hundred events per second any messaging middleware kept up with that and things were great zoom to now and I'm routinely speaking to customers of JCP ingesting thousands hundreds of thousands millions events a second and doing something useful with them in real time so what we need is not just messaging middleware but a high throughput ingestion and delivery system for streams of events to make sense of all of us and of course Kafka is sort of become the de facto standard of this kind of system in the open-source community so I wanted to take a couple minutes to just walk you through some of the kind of prototypical patterns that I see when I talk to users of Kafka and cloud pub/sub on an example of a sort of a hypothetical e-commerce business that got its hands on a high throughput ingestion and delivery system for streams of events so the journey usually starts with collecting user interaction event for events for analysis if you're going to create a useful experience for your customers you need to have your analysts and data scientists go to the warehouse and and actually figure out what's working and so the way we build this is I'm shopping for for shorts and flip-flops so I'm I'm viewing the shorts page on my mobile device I tap on the shorts that generates a request to state this front end that front end is stateless because it's able to take advantage of a high throughput event ingestion API to persist the event it just publishes it right through no longer has to worry about writing it to disk no longer has to worry about writing it to the database returns to the user on the other side now we we use the delivery side of the API to start building processing pipelines so we'll start with the data flow pipeline that just grabs the events parses them turns them into structured data puts them in the warehouse so a bigquery of course soon enough you realize that you don't actually need all the data you need some aggregates are filtering so you just keep adding these pipelines as you understand what you need the beautiful thing is your front ends are stateless your back ends can you can add your backends whenever you wish because in the middle there is this there's this thing that is able to scale with the throughput and and decouple the two systems so once you've done this you or data scientists make their first discovery and you're ready to create your first feedback loop they discover that users tend to buy things they've seen before and so we so we want to build as a simple feature where when I'm browsing for shorts I want to see the history of all the shorts that I've seen before all right I'm just the strip of my user history so we will build a recently viewed service every time a request comes in for a product page the front ends reach into the service with a with a user profile and get back a list of a list of products I've browsed perfect very simple the question is how do we populate the user history and that's simple as well just another and another data flow pipeline a browsing event comes in a page view you grab the user ID product ID stuff it into BigTable now you have a profile so this is working your revenue is going up you're happy your boss is happy you're ready to take the next step which will illustrate another kind of common pattern which is handling events that are not that I just changes in a database alright we're doing change data capture and so the goal now is to capture it changes in a in a database of an existing service and materialize them in something else so what we want to do now is build an experience where when I'm browsing for shorts I get a recommendation for flip-flops because they're relevant not my browsing history right so we're building you might like the service that has a product graph you give it a Product ID it gives you other product IDs we don't get this from the customers activity stream but we might have an inventory management service or some other similar system that your buyers use to sort of select and onboard products to you ecommerce store and they're writing they're writing content there may be saying this comes in such-and-such color it's such-and-such brand this is this is starting to look like useful data that you can use to build build a product graph except that the day basis is provisioned for your buyers right these are the ten people I'm for responsible for what products are sold not the million end-users you have I'm so to do so to deal with that problem we captured a change stream from from the product database turn it into events and from there we can build any number of materialized views of that such as the product graph so these are the common patterns now out of habit I've been showing you i'm GCP hexagon cloud pub/sub my first product at Google so forgive me but of course in reality Kafka does all of these things and increasingly not just ingestion and delivery but also stream processing with K streams and K sequel so the question is why would you want to use Kafka on GCP an orgy well it turns out that the answer to this question is there these three reasons why you would mmm there are many others but it's it turns out to be often the right choice so reason number one is is easy migration you're rarely migrating to Cod because you're unhappy with your Kafka cluster you're probably migrating because you want to change change you warehouse so maybe you want to take advantage of google kubernetes engine for to run your application front-ends Kafka will will migrate happily in live in cloud so no reason to change change what doesn't need to be changed second is if we take a step back you're when you start migrating you're probably building a hybrid environment maybe that's part of your sort of long term migration strategy maybe that's the end state where you want to land and so in this case you want your data exchange and access layer to be something that works both on premises and across public clouds so a native service like cloud pub/sub is not the natural solution in this case and you will see cough Kay and conflict cloud in particular I'm very good at this problem and then finally if you're doing if you're doing event sourcing so you're this Kafka is absolutely the right starting place event sourcing takes advantage of Kafka's particular way of doing event ingestion and delivery which is that which is it's it's a complete vlog right it allows you to store events as they come in in an append-only log and then serve them out in order when you have a complete log that you can read with high throughput many many times over you can use it to store the the set of changes to a state or database and then materialize that database as many as many times as you want without putting a load on your source database as you might do if you for example we're experimenting with something that takes a lot of different materializations like an elastic search cluster so I've done sourcing hybrid systems and quick migrations and three very good reasons to start cough con GCP of course easy succeed now you have a problem you have to run Kafka well running Kafka well turns out to be hard as really any system but I'm a product manager remember I don't get to run anything in production they don't let me well what what I do get to do is I do get to meet people who do so I'd like to invite borgir to tell you if his experience running Kafka at the New York Times thank you so what we using Kafka for at the Times is making published content available to our readers and this is what that looks like so we have a blog post out that describes how that actually works I'm not going to go into detail on that now only to say that this is an example of event sourcing or specifically a log based architecture so the core principle in this is that we maintain a log of every asset published by the New York Times since the beginning of 1851 and up until events being published in real time and now I hope since we're all here this week there are two main principles for running such a log we need to two main requirements we need ordered playback because the publish events we are writing to the log have causal relationships between them so the ordering matters and the own the other thing is that we need infinite retention we want this to be available forever an only Kafka fulfill both of those requirements now it's because Kafka is such a critical piece of our infrastructure the news don't get out here if our Kafka cluster is down and that's a very big deal at the Times so because of that we put a lot a lot of work of effort into keeping this Kafka cluster available and reliable so we have a dedicated team of engineers running it and more concretely we're running this on GCE we might have chosen gke if we had started over now we have an ansible script that handles all our Kafka deployments rolling updates and all of that we have set up multi-region replication using mirror makers so everything published one Kafka cluster it becomes available everywhere latency is is very important to us since this is news so we also have implemented our own log off set monitor that lets us know if one of these log consumers fall behind and we're also taking full advantage of Kafka's ACL specifically to limit rights because anything being written to this copper cluster and goes directly onto the site so we have to keep very tight control and who gets to write to it in in addition to all of this we're also very conservative in how we use the cluster so for instance we don't allow other teams at the times to use this cluster for their own things and as a consequence of that using things like Kafka streams Casey : Kafka Connect is a lot harder than we would want it to be and in spite of all of these things still sometimes go wrong so so Kafka is highly available in the sense that if a broker goes down things are still fine when all the brokers are logging excessively and run out of disk at the same time that doesn't really help it also didn't help us when we configure mutual SSL authentication and our whole communication between our Kafka workers stopped very suddenly which also brought the entire cluster down maybe the most frightening case we've seen was when one of our consumers had it's offset reset and immediately starting creating a version of the New York Times home page with old news which is really not something we want luckily we managed to stop that before it got to the reader but we we do we do remember that day and basically as a Technology Strategy at the Times we want to run manage services whenever we can it's not – we we want to be good at news we don't want to be good at running infrastructure and the reason we run our own Kafka our all and put on put in all the time and resources and attention to that is because when we started this project two years ago there was no manage Kafka on TCP as I had long discussions with care about when News and not at running kappa we on the other hand that confluent our kind of crappy news on the other hand we were very very good it's running kava we've been running Kafka in production our team since the first lines of code for Kafka were ever written at LinkedIn about a year and change ago we announced a controlled cloud as a way to let us manage your Kafka in the public cloud now in order to really see the value of what we're doing here I want to take you few years back back to the days were of Kafka still in first invented at LinkedIn 10 years ago we started seeing the microservices trend and the more micro services we had the more databases each micro service had the more lines of communication going back and forth between micro services and database the our entire infrastructure became very complex very hard to manage now I'm not going to this talk is not about simplifying the complexity of IT even though that's a good topic of discussion it's about how you can help here and really simplify your architecture so you have one who I write run Apache Kafka you can now have connections between the different applications and Kafka everything that is every vendor this publish to Kafka is available to everything and this obviously greatly simplifies you bill decentralized architecture that allows you to deploy new services and give them access very easily to all the data that your organizational health at scale in a very reliable distributed system now of course the messy infrastructure once we started doing cloud migrations and hybrid cloud it did not get any better in fact it got a lot worse we took the master to head in one data center and now we have a mess in five the other centers and in this cloud provider and in the other cloud provider and now that it's going on public cloud it has real costs and it has real security risk so from the mess just got a lot more expensive so think about the same architectures that simplify micro-services deployment on prem now applied for both cloud migration and in the cloud it really allows you to have one channel of communication to secure each event is replicated once which drives down costs and you have this nice much easier to manage and monitor architecture out there but that's not it because you have all the data available in Kafka four streams you can start experimenting with everything the clouds have to offer you want to try data flow you want to try tensor flow you want to try bigquery even if you never did it before you're totally a MySQL shop very easy to just stream events from Kafka to bigquery and running experiments so you really have a lot more possibility out there and this is really the vision that inspired Apache Kafka connect everything to everything from on purim to the cloud micro services and databases and make all the data available to the entire organization kind of breaking down all those silos vision but when building a platform it's not just about the platform itself that's a lesson that we all learned from eBay you build a platform to sell stuff but the point is the stuff you actually sell the power of this centralized platform is from the things that connect to the platform that bring in the data and take out the in from create value to the business the value is really in the micro services in the ecosystem so confluent includes a lot of components that makes it very easy to create those very lively ecosystems that make data available for organization one of the coolest ones is schema registry which allows you to make changes to the schema of the events to the data that you are transmitting independently between the producer of the data and the consumer so the producer wants to make some changes and you can do it in a very safe ways our journey is all about making change and evolution safe another cool tool is cask you realizing Borg I mentioned it it allows you to create stream processing applications using a very familiar secret like language we also have clients in pretty much every language you're seeing like eight here there's like 50 of them confuse if you like PHP here you go Kafka PHP clients that's kind of the power of open sources at once you have an open protocol everyone can create clients you're not depending on anyone to do it for you now we mentioned databases and integrating micro-services and databases is really what allows you to run your entire organization of one stream of events Kafka has the kafir' Connect infrastructure and kapha Connect allows you to plug in connectors and we have a hundred plus connectors allowing you to get data from different databases into Kafka from Kafka to other databases from Kafka to another Kafka if you want data from Vertica to show up in bigquery people actually do it you have a vertical connector getting data one way bigquery connector the other way and you were good to go this is incredibly powerful especially since it requires almost no coding and we run it for you so that's the whole thing right you don't you want to use it you want to gather your data you want to build your micro services you don't want to actually run any of it and the reason you don't want to run any of it is that Kafka is a distributed system and running distributed systems is difficult we should know writing a distribution systems were difficult enough in the first place it's kind of tricky but as we all know you have debugging is twice as hard as coding you're not allowed to be as smart as humanly possible while coding because you need to be twice as smart while debugging something race conditions and scheduler timing all this has been a sort of first sight for a very long time now Operations is all the fun of debugging with time pressure customers yelling at you money picking away what can be more fun so I want to just share with you three kind of reasons that it's a good idea to let someone else run Kafka for you the first is that random stuff happens and random stuff is usually not that good stuff this is the feathers that we expect to see we kind of know that this will fail and machines may fail and the network will have issues those we plan for people who forget to renew their ssl certificates not so much planned for but the thing is that you will you have a strong team and you have you build a strong system that is capable of overcoming failures we have the experience and you have the tools and you have the monitoring in place you can build very resilient systems the three keys of this requires experience and expertise you can build it in-house a lot of people do Berger has an amazing team that manages to build something very resilient but as you said it may not be your business to do that even when nothing absolutely fails you still have to make changes to the system you can't just say it works today let's not touch it ever again because even though there is a risk of change every upgrade has a risk of new bugs and things going on there is also a risk of not changing right we know that if you don't evolve an adult you'll be left behind new features actually have values they allow you to implement things faster and some people are kind of servers they cannot use case cure because they know that it will allow them to move faster and deliver value faster why not offload the risk to someone who is being paid to take this risk and go on your way you get the benefits and this is really the key you focus on your ideas we focus on running Kafka and everything will be amazing so I want to jump into a demo kind of kind of semi live demo where I show you how I actually build one of the architectures that Kier mentioned streaming data and joining it and doing some processing to it using confluent cloud on multiple clouds so it will be hybrid cloud Emma okay so you can see here I have two clusters in confluent cloud this is the confluent cloud UI you can see I have AWS and GCP I'm going to move data between them if I you can see how big my clusters are huge right okay those are demo clusters I want to create a new cluster I can choose a name and then I can choose how large or small the cluster is going to be which cloud is going to be on and you can see that as I changed my cluster configuration the pricing on the right can automatically adjusts to reflect my new cluster if you pay a lot of attention you can see the GCP slightly cheaper than the competitors once I'm happy with the configuration and you can see that if I want more storage it's going to be slightly more expensive than cases where I want less storage I'm a fan of configuration I'm happy with the price I'm happy with my setup I go in in this case I'm cancelling and let's see how you manage those clusters that I now got so you can see here this is the list of topics that I already have in my DCP cluster and then if I go to AWS you can see the list of topics that I have an AWS cluster and suppose if I want to see what is in the topic I can look at this scheme I mentioned the schema registry and that allows me to keep comparable you can see it in this topic I have users and you have the usual suspect fields I have user ID and region and Jenner and kind of the stuff I want to know about those users now suppose that I want I have those users in AWS and you know maybe they arrived from an s-something the dumps data in s3 I also have maybe some pageviews I have a bunch of stuff running on AWS but I really want to process it on GCP because GCP has things like tinsel flow and peak where you know a lot of stuff that I really enjoy using like as a developer you always have your favorite tools and some of them happen to be in GCP getting data from cloud to cloud is actually a hard problem this is something that you think it's going to be easy until you basically go and try to do it and I figure out that if you all you know is the stuff within your cloud it's not they're not going to help you move anywhere else conference replicators that is based on kafir Connect basically allows you to copy some data so you go into kafir connect and you choose what topics you want to copy so you can have a white list a black list maybe everything except the test topics a regular expression you can basically type in the topics that you want to copy and then you can see a bit down you have which where does the data come from in this case it comes from AWS and a bit further down it's going a lot of security configs and it's going to be worse the data going to it's going to GC once I'm happy with the oh one more configuration to talk through I can rename the topics so I can say that the topic was used to could be called a WS dot users I want to rename it to start with GCP so it'll kind of make a bit more sense and know where that is came from where it went to what is the topic you those here are called topic names are kind of common once I'm happy with the config I just go ahead start this connector replicator and let's go and see the topics in GCP so here we go as we did the rename so it's GCP dot a WS dot users now the first thing I want to check is that my scheme arrived intact that my data is more or less the way I expect it to be so I go in and say Brau schema once again and you kind of say yep looks the same I have the region the gender everything that I expected now let's go back and see what is the data actually like so we go to a different topic you'll remember replicated two topics the pageviews and my users let's go to the page views a newscast QL to see what is the data in my topic so I do query using SQL and it's the first thing I have to do is create a stream which is basically a materialized view on top of the topic I have one already existing so no big deal and once I go in you can see it's very simple I do select star run the query behind the scenes it starts a stream processing job a consumer and pretty soon you'll see a stream of events on the screen and this is a live stream of events you can see the user ID and the page the user is viewing you can see the timestamp and as data keeps coming into the topic you can see that I have more and more events to go through you can see this is a pretty busy topic so you can see that we are accumulating more and more events in there now this is kind of a simple query suppose if they want a subset of the events suppose that I won't only page views from a single gender in the audience well I need the page view topic does not have anything about gender but it has user ID and the user topic has the gender of each user so I can join those two streams of and then do a filter based on the other field very normal thing this is a continuous query so it's a query that basically does an ETL job that runs all the time the results go into a new topic now what do I do with results that go in your new topic well I may want to analyze them that's why I ended up in GCP in order to analyze it so what I want to do is take those events dump them into say bigquery and now the sky's the limit right you can use the Google Data studio and there's tons of tools so I go again to cover connect and create a sim connector that uses Google bigquery as the connector so I pick the topic page the subset of page views that are already created I pick the bigquery connector I give it a name say Google bigquery and let me start configuring this topic and you can see it doesn't do normally feels big worried those are configurations should be super familiar which tables do I want the topics to go into what is the project what are the field all those things that I've already used to say once I'm happy with my configuration I'm click OK and I'm done and think about it I just this was a seven minute demo we got data out of AWS into GCP who I read it created a continuous transformation and dumped the data into bigquery food into further analysis so this is kind of Fame if I may say so myself kind of a big deal it used to be incredibly difficult because I know I've been trying to do it for a pretty long time so this is really one part of the value of having an open source open standard software is that you are not locked into a winner you can deploy and have the same experience in every cloud and you can easily move between the clouds now if you remember two things from this presentation the first is that streaming platform gives a lot of value to the business and data sharing it gives a lot of value to the business but it really takes an ecosystem of tools that exists around Apache Kafka you cannot do it with just a messaging system you need a lot more than that and that if you don't want to go through the pain of having to learn how to manage it yourself which takes a while you may want to run confluent cloud on publicly out of your choice hey DCP is a nice one and have make us do it for you now in order to go home and try it yourself confident cloud arrives in two flavours conference cloud and the prize means that you have to call a salesperson and talk to them this is our unlimited scale enterprise level we have SLA as we have support we have we busy peering you have security options when borghi wants to move in to confer cloud we link that maybe he'll choose to do it this is probably the options that he'll use now if you are a team of developers who want to experiment with kappa you don't neeed you trusters you don't need tons of support you just want to try a bunch of stuff create a lot of clusters pretty much what I did here in the demo just spin up a bunch of clusters and try stuff out confident cloud professional is the tool for you and your team so I hope this I hope you'll hole run home and I'll basically look at our dashboard and see this uptick in cloud professional users because you are all trying it that's it will still be around for the rest of the event yes so if you want to catch us with expert questions that you're shy to ask in public will be super happy confluent has a boost down on a mosconi's house if you want to catch some of our cloud experts who are going to be around thank you all for coming [Applause] you


Leave a Reply

Your email address will not be published. Required fields are marked *