How PBS makes every penny count... For viewers like you

EPISODE 2 50 mins Dec 02, 2024

0:00 / 0:00

About this episode

In 1969, Mr. Rogers went before the US Senate Commerce Committee to request funds to support the growth of Public Broadcasting. His heartwarming request is often remembered as a pivotal moment in American television history, highlighting the importance of educational programming that serves the public good. Today, that mission endures, and Mike Norton and his team at PBS continue to honor it in their own way – by stretching every public dollar as far as it can go. In a world where resources are finite but the demand for quality, accessible content is ever-growing, these engineers work behind the scenes to maximize efficiency, leverage innovative technologies, and make every penny count… for viewers like you. Fred Rogers’ Testimony: https://www.youtube.com/watch?v=fKy7ljRr0AA

HOSTS

Dr. Werner Vogels — CTO, Amazon

Simon Elisha — GM, AWS Podcasts

GUEST

Mike Norton — VP of Cloud Services & Operations, PBS

Episode Transcript

This transcript was generated automatically and may contain minor errors.

Simon Elisha: Welcome to the official AWS Podcast. G’day, Simon Elisha here, and I’m happy to bring you a special series with Werner Vogels. Here’s Werner to tell you all about it.

Werner Vogels: Thanks, Simon. Welcome to the Frugal Architect podcast, where we dive into the journeys of technology leaders building cost-aware, sustainable, and modern architectures. These are longer form conversations where we explore these topics in depth. And we hope you enjoyed them.

SE: Good to have you here and we’re joined by our very special guest, we’re joined by Mike Norton, and Mike is the VP of cloud services and operations at PBS. Mike, welcome to the program.

Mike Norton: Thank you for having me.

SE: So good to have you here. I’m sure most of our listeners, myself included, have watched many PBS products over time, so it’s nice to sort of be on the other side of the, of the table here.

MN: It’s good to be here.

WV: Well, especially because, there’s some amazing products have come out of PBS and the amazing story is, of course, that you are a nonprofit organization. Yet, you have a tight budget, so we’re really looking forward to hearing your stories.

MN: Absolutely. I’m happy to share it.

SE: And big, big reach as well, which is interesting, so maybe for those who don’t know who PBS is or who maybe aren’t clear of the mission of PBS, tell us a bit about that.

MN: Yeah, so, PBS, we are a nonprofit. We were founded in 1969 with a mission to educate, inspire, and entertain. And, you know, if you’ve ever seen, if you haven’t seen it, I recommend watching Fred Rogers when he went before Congress and asked for the money to get this going. [It’s a must-watch.] And honestly, I watch it every year because it is inspiring. It’s why I do what I do, and I think it’s why we do what we do.

SE: That’s amazing. And what about your own role? So you’ve had an interesting journey with PBS and your own career, so maybe give us a little bit of a potted history of your career and talk to us about the work you’ve, you’ve sort of entered into at PBS over the years.

MN: I started out as a developer. I mean, my dad brought home the original Macintosh in 1985 because he was going to his doing his MBA and he got the budget to get a computer and I tooled around on that thing for years with my brothers and it led me into tech and so I started out as a developer, loved writing software, but I was never the kind of developer that liked the whole I write something and I throw it over the hedge to somebody else. This is back before we had cloud, back before we had terms like DevOps and agile manifestos and any of that. I just knew that I’m writing this stuff, I should also have a hand in how it runs, all to say. Moved on to PBS and into operations. I spent some time at PBS as a contractor in 2009, which was just as we were starting to use AWS. Went on to do some other things and then came back in an operations role, led our ops team, which, yeah, I’m sure we’ll get into this, but the, it was our firefighting. It was, we were the ones responsible for keeping things running. Took a brief time away from PBS. Realize that I missed the mission. And came back in an architectural and now running our cloud team.

SE: Amazing, amazing. It’s interesting how many people start their careers diving really deep into the technology, and that sort of, I think maintains a passion for detail and optimization and things you have to find. I think, I think Werner, we’re seeing a trend here of inbuilt constraints into folks.

WV: No, I also think that, You know, no matter how much, if you would say you go up in the stack, eventually, tweaking around with PDB files or, tweaking around at the operating system level, eventually you get steps up, you look at the bigger picture. But I do think every day you miss it on one hand, on the other hand, it also gives you a much broader view of the whole stack, you know, that even if you’re a domain expert, let’s say in financial systems or something like that, you still understand the file cache, you still understand interrupt schemes and things like that, which gives you a complete view. So I do think, it’s a sort of a natural progression over time. But it enriches you, but you also miss it all the time.

SE: Until you’re in the middle of it and frustrated with anything, you can’t make anything work. But maybe let’s talk about that, Mike, because you touched on a topic that is close to many people who are listening hearts, which is not for necessarily good reasons, which is firefighting. Something’s broken, it ain’t working, we’ve gotta fix it, and all. Industries experience different pressures, be they production schedules or a banking run. I would, I would have to guess and you can tell me. Broadcasting would seem to be an even more intense feeling here because you’ve got audiences and programs and stuff. Tell us about, I guess, where PBS found itself in the firefighting phase and what that meant for you and what that looked like for your team at the time.

MN: Absolutely. Broadcast technology is a hard one because you can’t control what people are going to watch and when they’re going to watch it, and if they’re, you know, what is going to be a quiet hit. Downton Abbey didn’t pick up until I want to say it was season 2 or 3 that all of a sudden it was a thing. And I spent every January for several years, not even getting to watch the darn show because I was sitting there at 9 o’clock on Sunday nights, trying to make sure that everything was just running, and when it would end and say, To learn more, go to PBS.org/masterpiece. I was sitting there and going, no, don’t. I, it is hard, we joke with the kids team that we probably know the average bedtime of the American child because we know when the traffic dies off on PBS Kids. But on the general audience side, are things like our dramas and our news programs and whatnot. Things all of a sudden just become popular. We have some idea about what time people watch things and when things are starting to get popular, but we also have to deal as a technology team with the fact that the business side of the house is changing things. As Netflix starts releasing the idea of binge watching, we started changing. OK, if you are a donating member of a PBS station, you can now watch all these things, the night it releases. That then affects the technology. When I first started at PBS, the streaming of various programs happened the next day. It would air on broadcast, and the next day it would be available on streaming. So actually Monday nights were harder than Sundays for those dramas, and then they started releasing these things in binge packages, and starting to try to make sure you could watch it at the exact same time that it was streaming out over the air. And that changes the technology needs. We had a lot of thundering herds of traffic, sometimes prepared for. Sometimes just out of the blue, it could be a politician mentioning something that happened to be on a Newshour episode 4 years ago, and suddenly that’s popular. And it might only be popular for a couple of days, couple of minutes, but we have to be prepared for those sorts of shifts.

SE: So it’s fascinating, Werner, there, we’re really talking about a shift in business requirement and customer requirement. Beyond technology here, aren’t we, really, when we think about it.

WV: Well, in some sense here, you have no control of the customer requirements. I mean, the customer does whatever he or she wants to do. It’s more like, where’s the business willing to put its money against, which are the things that needs to be highly available all the time, or which things, maybe on the back burner, or maybe time shifting or things like that, those should be all be business decisions, not as technologists making the decisions for the business.

MN: Exactly. I have been asked from time to time, can you please predict what our CDN bill will be next month or for the next year? And I have to say, is Ken Burns releasing a documentary this year? Do we know what’s gonna be popular? I can tell you what our price per gigabyte streamed is. We can do some level of estimation about around if this is going to be popular, this will probably do that. But we’ve also had some cases where we’ve decided to make some changes around how we encode our video, and those have led to extreme cost savings because we figured out better encodings that are smaller file sizes. But still quality content, and that throws off all your estimates. There is so much more than technology here. It’s, it’s business decisions, it’s User decisions.

WV: By the way, I’m, I’m just curious, is HEVC so much more efficient than any of the other encodings?

MN: It is. I want to say we have halved our CDN bill by re-encoding our popular content. It’s insane, and it’s thrown all of, all of our estimates off in a good way. That was both a technology and a business decision, and that’s where I think, Werner, you’re talking about this, that you’ve got your roots in the deep technology, but then there’s just so much more to it, and it’s that I think is what I love about being in my role is that it’s not just technology. I’m not doing technology for technology’s sake. I’m sure there are plenty of people out there who, love moving bits and bytes around and that’s what gives them joy. But what I love is being able to see how I’m saving a company money, a company with a mission that doesn’t, doesn’t have a lot of money, and so, We just do the best we can with what we have.

WV: Well, I also assume that all Sesame Street episodes are not in 4K yet.

MN: No, and those are decisions we have to make around what content is the most convincing to see in 4K, right? Like an episode of Nature or Nova or, a drama, maybe you want that in 4K. Something like an old kids show that came out 20 years ago, that’s animated, doesn’t need to be in 4K and frankly, The parents who are streaming those shows over their cell phone in a doctor’s office are probably happy that it’s not in 4K for their data bill. It’s all these things to weigh and decisions to make around what can we do with what we have, and how do we make as good of an experience for the viewer as we can. Without trying to just do it because we can.

SE: And it’s interesting that, there’s a, there’s a couple of nuanced things you’ve, you’ve touched on there, which is, just cos you can doesn’t mean you should. As a neophyte to broadcasting, my default position would be, well, let’s get it the best, resolution we can and everything should be 4K and why wouldn’t you do that? And yet when you actually think about the situation, what you’ve articulated there is that there are, there are actually many times where you don’t want that. And that has a benefit. And then the other element is you’ve also then challenged almost a foundational concept, which is the encoding algorithms and mechanisms that you’re using, which, it’s that classic thing of, well, if it ain’t broke, don’t fix it, just leave it, don’t touch it. Yet you’ve challenged that. Help us think about that mindset a little bit more about what sort of questions were you asking or what sort of things were coming up to get you to that point. Cause you didn’t just wake up in the morning and go, you know what we’re gonna do, we’re gonna change our codec, cause that sounds like a really easy project.

MN: No, that was not an easy project. And I think our mindset was, we’ve been doing this for a long time this way. We were based on Apple’s original encoding ladders. Because that’s what we did, it is again the hard part of being a nonprofit in this space. We’ve got less employees across the company than I would guess a Netflix or an Amazon Prime has in engineers thinking about these problems. And that’s across the entire company. We did at one point realize, OK, we’re, we’re spending a lot on our CDN. We’re still using the original encoding ladders. Let’s just take a look. And we’ve got some tooling in place to look at what the user experience is, being able to understand, you know, if people are making it past the, we don’t do ads, but we have our funding pods at the beginning, this is, you know, if you’ve watched any PBS drama, you’ve, you’ve probably, thought about going on a river cruise and on the, on the Danube, but we have tooling in place to understand. Are people falling off at that point? Where are people’s videos starting to lose quality and then come back? And so we use that tooling to then say, all right, well, let’s just try. Let’s, let’s see what happens if we use Media Convert to try a different encoding. And so we started with a couple of, just a video here, a video there, but the key is we had that tooling in place to be able to then measure and say, there’s no change or it’s actually even better for the user, and it’s also then costing us like half as much. So, why not? We had to make some decisions then. We have a large back catalog, and that back catalog is Like I said, sometimes popular, sometimes not. It’s popular when it becomes popular, but we have a lot of local content. Not a lot of people are going and watching, you know, that eight year old clip from Newshour or that 10 year old episode about breakfast places in Minnesota, but when they become popular, become popular. But really, most of our content is, it’s the fresh stuff. That’s, that’s what’s costing us, that’s what people are watching. And so we just, we started by saying, let’s just change how we’re encoding the stuff coming in. Let’s leave the back catalog as is. And we then spent a lot of time going back through that back catalog and finding the things that are still popular. People still like to watch the entirety of Downton Abbey. So let’s re-encode that. But maybe we don’t worry about that, you know, that one episode of Newshour from 8 years ago. It’s not worth the one-time cost to re-encode it if it’s only getting watched twice a month. We ran through Athena queries. On cloud front logs out just out the wazoo trying to understand what are people watching and that is hard work because It’s not like you have a log entry that says somebody watched this show. If you understand streaming video, you’ve got playlist files and then 6-second chunks of video, and you’re trying to piece all that together and understand. What were people watching, when is this worth the money one time to reconvert for a long term savings. It’s like forensics. It is, it is, and it’s the same thing you do with server sizes and You know, ElastiCache cluster sizes, and it’s trying to just understand what do we need. And where can we treat these things? Where can we spend our money to optimize? But where is it not worth it? Yeah, yeah, and that’s the huge thing is realizing those places where it’s just not worth the time. Yes, that thing is, this old ElastiCache cluster that is, we’re we’re probably spending too much money on it than we should, but the cost of optimizing it isn’t worth it for that particular thing. Let our money is better spent doing this.

SE: Mike, one of the things you talked about earlier was the concept of thundering herds, and for those listening who’ve not come across that wonderful experience, it’s basically load that comes unexpectedly and often brings down the backend system it’s loading, and then has that wonderful thing where as the system’s trying to come up, more people keep hitting refresh, and destroying your service and everyone has a really bad day. Now you have an awesome example of a thundering herd and how you solved it, and, It relates to one of my favorite PBS programs actually, the Ken Burns Vietnam War program. Tell us about what happened when this illustrious producer produced an amazing program about a topic very close to not just the hearts of Americans, but people around the world for different perspectives. What happened?

MN: We got a lot of load all at the same time. It, it was rough. It was before we had entered into really entered into our move to containers, and so we were using — I miss it dearly, but OpsWorks was a wonderful, it was a wonderful product at the time. And for those who don’t know, it was Amazon’s, it was a wonderful move from building servers by hand. It was on our path to containers was OpsWorks when we first went into the cloud. Gosh, we were using Amazon before you guys had auto scaling. We had to, just a couple of years ago, move some things out of EC2 Classic because we had things that were running before there were VPCs. We were old school and that was in a sense, when we first started, AWS was a data center in the cloud. For us. We hadn’t yet moved to that idea that servers are cattle, not pets, and we had not yet moved to that point, and so Opsworks was A step on that journey, but our servers, it, by the time all of the recipes got applied to the base image, the traffic was already gone. In that scenario, what we should have done was probably have pre-applied a bunch of those recipes and built an AMI that was baked

SE: partially baked, somewhat baked.

WV: I do think one of the, one of the cool things with ops works was that it had time-based scaling. Yes, I think that’s sort of in your particular case where basically all your images are still servers. not flexible or whatever, but the fact that you could say, and at 6:30 tonight, we need 10 more of those, and you need to be ready by that time. And I think that sort of was one of the biggest wins of using ops works.

MN: Yes, we absolutely, we had a combination of time-based scaling because we knew at this time of day, just scale up, you need some more servers for this thing, and that time of day was different for different things like for kids, scale up more in the morning when parents hand their kids their iPads so they can go back to sleep, and for GA it was do that at 5 o’clock at night, but you also then could we then also had the scaling based on metrics, because we knew we needed more servers available for load at, from 5 until 9 Eastern. But you never knew when something was gonna get popular, so also have metric space scaling, but it still couldn’t scale fast enough.

SE: Tell us about that scale, because I mean you talked about the fact that you gotta, you gotta have metrics, you gotta be monitoring and with this particular situation, you had the metrics, like you were tracking memory and you were tracking CPU. You’re doing the right things, but you weren’t necessarily getting the scaling you thought you were gonna get. Help us understand what was, what was going on and what sort of popped into your head.

MN: We had all the right things, but it then comes back to having some deep knowledge of how the systems work. It’s one thing to say, I can throw more money at this, and scale up more servers and whatnot. There was stuff happening. At the OS level that we weren’t prepared for. We did a lot of work. I think I spent Multiple nights in the office that week just trying to keep it going and in the end it was containers that was the solution was doing containers and we started our journey into containers with running our own EC2 hosts for them and in with ECS. I am not ashamed to say we’re not a Kubernetes shop. I don’t feel that we have that level of complexity that Kubernetes is necessary. ECS has been fine. And has done the job and especially when we moved to Fargate where we didn’t have to manage underlying host servers, it’s been a huge win. I mean we’ve, we have cut costs on our compute. Dramatically, and the fact that I can then pay for Fargate with a savings plan, means I don’t have to be sitting around trying to divine what kinds of instances we need and are we reserving, M5 larges, or are we doing C’s. It, it’s that’s just out of my hands. I pay for a savings plan. And it’s done.

SE: It makes it nice and easy, and I think the fact that there’s cost savings associated certainly helps. But I’m guessing that when you, after many of those late nights, walked in the office one day and said, hey team, let’s do this thing called containers, the business side and your colleagues didn’t all go, yes Mike, that’s what we’ve always dreamt of doing. Let’s go ahead and do it. What did you have to, like, how did you sell this? How did you explain it? How did you even get them to recognize this was something worth doing.

MN: It was a job. I have been blessed with really good partners on the engineering team who They see that we’re suffering. My operations team is the one who gets the page. They’re the ones who get the call at 10 o’clock at night, 11 o’clock at night, whatever time of night, our engineers are not on that call tree, but they’re the ones who we have to call in ultimately a lot of times because if it’s a A scaling problem or an AWS problem, my team can handle it. If it’s a The engineer just wrote a really, really bad. function that is not optimized, then we have to call them in. What I think we were able to do was to help the engineering team understand, I’m thinking about law one on the Frugal Architect, cost is a non-functional requirement, but guess what, so is performance, so is security, and that’s, that’s sort of been my mission to the product development team at large at PBS has been to help them understand my team. We’re not ogres, but we are, we are gonna be here to say, I know that the powers that be have said, please put this button in place and make it look like this, and that’s wonderful stuff, but my team is there to make sure that it works, that it’s works under load, that it’s secure, that it doesn’t cost us an arm and a leg. And so my team is really the team of non-functional requirements, and that move to containers, we had teams that were already doing containers for development environments. And so what it really was a lot of work over time. To get ourselves on product roadmaps. And to help the product teams understand this is a win. This is, I know it’s not a flashy new feature. But being able to deploy 8 times a day, if, or 20 times a day, or 1000 times a day, that’s important. Because it means that when you have a need come in, Deployments are just literally committing code and the CICD takes it on and it’s deployed and we can roll back easily. That wasn’t always the case. And so getting people the most work my team had to do was just helping teams get into Docker in general, and then from there we were able to say, OK, great, we’ll take this the rest of the way, but let’s then talk about task definitions and All the things in ECS and it was a lot of just working together and communicating.

WV: In, in that context, Mike, if your team is responsible, let’s say for cost performance and resilience, how do you make sure the engineers are also, let’s say, have that on their plate because it can’t be just on your plate. I think it’s a shared thing. So how do you make sure that everybody else has their noses in the same direction?

MN: At least once a year, I try to Talk with the engineers and just show them their costs and, just in like we have lunch and learn kind of things that we do engineering, round tables, and I’ll show them this is our bill and The number of times that I have had people’s eyes just open up like dinner plate size, when they’re like, wait a minute, I had no idea. Another thing we’ve been doing is, my team is the one, the team that’s responsible for Just looking at our RIs and savings plans and whatnot, and for a long time, we just for reasons, not having a lot of people, we were doing just renew the things that are already out there. And then we’d find ourselves 6 months later with a bunch of unused RIs. Last year, I asked my team, I said, Have a conversation. With each team Before we renew our RIs. And just make sure that they understand what they’re running. And that they’re OK with that. And I had to then have a conversation with our finance team and say, You’re gonna see some a spike in on-demand costs for a little bit, cause I need a few weeks to have my team go and sit down with each of the teams and make sure they’re right sized. We had a fascinating example where there was a team that was running, I wanna say it was a 8 XLI3 ElastiCache instance. Turns out they didn’t need it. When we talked with them and showed them the metrics, all of a sudden somebody’s, a light bulb went off over somebody’s head, and they’re like, hold on, I turned that up for an annual meeting we had where we were told this cannot go down during this meeting with the stations. 2 years ago So, we took, I wanna say that was a $68,000 a year instance that was running. And we downsized it 8X and then reserved it, which means I effectively got about a 15X savings. By just having a conversation that took 20 minutes. And so it’s those kinds of conversations because The beauty of the cloud is giving engineers the ability to not be blocked and to deploy things, but the curse of it is that they’re going to, lick their finger, put it up in the wind, and go, I’m not gonna get fired if this doesn’t go down, so I’m gonna just Pick the thing that seems beefy enough to handle load. Getting better at observing it, getting better at understanding it, to your point in time, and it’s getting the teams to understand it. We’re not there yet where I want to be. In a perfect world, I would have each team understand exactly what their product costs. And be aware of it. And I’ve got teams that are more understanding of that, and teams that are less understanding of that, and I’ve got teams that are actually putting quarterly. spend 2 points of the sprint on cost, and they will come back to me with $100 a week in savings. And I’m gonna give them the highest praise because it’s something they’re trying,

SE: which shows they’re paying attention, yeah.

MN: It’s crazy because you could have a team spend, several cycles in a sprint, and they find $100 in cost, and that’s awesome, and then you have a team that goes and re-encodes our content library, and it saves us. Tens of thousands of dollars a month. And it, but it’s all good, because it’s all about understanding this isn’t free.

WV: Do you, do you often see that, I mean, you must have a view on the container side of things, when you, if you’re running things. Do you ever go back to a team and say, Joe’s set of containers look remarkably like yours, but yours is twice as expensive? Do you have any insights into kind of things like that?

MN: Yes, we have to be giving people the ability to own their own. Stuff, their own destiny, and there’s a lot of trade-off that has to happen around. How much do we force and how much do we just say, Yeah, you’re all doing things your own way, but it’s not my job necessarily to come in and say, this is the way. It’s trade-offs, right? It’s everything is trade-offs. I don’t want my team to be the police, but I also don’t want to have a wild west scenario, which is when we moved. To the cloud, that was, that was, it was wild west. We were just doing whatever we wanted when we moved to AWS because we suddenly didn’t have Big Brother IT.

SE: You could move with speed, yeah, yeah. You could move with speed, but let’s unpack that a little bit more cos it’s this interesting dichotomy and I think your organization’s worked really hard to get to sort of the right place in that, which is, you don’t wanna be the department of no, but not everyone can do everything. So you need this sort of these guardrails. How do you think about guardrails, how did you implement those? What’s the approach there, cos I think that’s super valuable for a lot of other organizations too.

MN: Sure, one of the things we’re trying to do is we’re trying to build terraform modules, for example. This is the PBS way we do S3 buckets. This is the, and make sure it’s tagged right, it’s, you’re not accidentally setting it up with public access, so on and so forth. I haven’t gone all the way down this route yet, but there is a scenario where my team could be building product for the engineers. And looking at it in that way, I think thus far. Our attempts at that, a lot of it has been sort of a field of dreams. If you build it, they will come, and they don’t always come. And so, again, it comes back to the, how much do you lay down the law and how much do you. Guide and inspire people to do the right thing. In some ways it’s like parenting, like you all you can do is try to instill values in your kids and then set them off. And hope that that goes well, versus I’m going to be the police here and you can only do it this way, and you can only do it that way. And so it’s a lot of it is just trying to make people aware, trying to make people partners. In this venture, and the more we do that, I feel like the more People actually amaze me, when all of a sudden they are coming to me saying, hey, I found this cost savings, I found this better way to do things. I mean, we had, we’ve had engineers who have built Like a whole Deployment systems for applications that we didn’t ask them to do that, we didn’t tell them to do that, but they just realized there’s commonalities across these various applications. Let’s come up with a, opinionated way of deploying applications and The hard thing is, being as small and scrappy as we are, is that then you then have to look at those things 8 years later and say, this isn’t working anymore, the guys who built this don’t work here, and it was never an actual product, it was just a skunkworks thing that we built. Do we kill it? Do we improve it? Do we replace it? It’s just constant change and constant optimization, and it doesn’t all happen at once. Yeah,

WV: it’s often interesting to see if you don’t really track every detail detailed component of your site or something like that. We had features on Amazon.com that we turned off after 2 years and no customer ever complained about it. Did we know that? Well, often the ownership in all of this is extremely important, who owns that piece of piece of software or who owns that particular service, and if nobody’s there, but it’s just running by itself, those are good flags to start to look at it.

MN: Absolutely. I have had cases there are people who will refer to me as the grim reaper. Because I will find old websites that, yes, they’re running. They look terrible. We have got sites that have a fax form to buy the VHS tapes. That’s awesome. And I have sometimes gone to teams and said, hey, so we’ve got this thing. I feel like we should probably kill it. It’s, it’s probably gonna be a security problem. It’s probably, it’s just running, and they’ll be like, no, no, no, no, we’ve got, all these people who are using it, and I’m like, all right, it costs us X amount of money per month. And they’re like, kill it with fire, fine. It’s gone and it’s being able to bring that kind of like, OK, I’ve, I’ve looked at this is a problem, this is what it’s costing us, and it’s bringing that data to the to the equation and that conversation. That isn’t always there, and maybe it’s better at companies that are for profit and they have a better like sense of what is this costing us per user of our SAS platform and so on and so forth. We don’t have a lot of that, but I can bring to teams and say, This thing It’s running on an old version of PHP and it’s costing my team time and money to patch it and upgrade it and monitor it and all these things, and then have that conversation. Is this like what value is, but that being said, we do all kinds of things just because it’s the right thing, not because it’s cost effective. And so that’s that other. Balance there is, we have a mission to what we’re doing. And so sometimes we’re doing things because it’s the right thing to do, not because it’s cost effective. And so that’s what I love about the role, is that I’m not just trying to be the guy shaving every penny, and just

SE: squeezing the dollars and cents all the time, yeah.

WV: Yes. Yeah, plus, plus probably everybody else has the same sense of mission if you work at PBS, whether it’s customer centricity. Versus, at what cost do you deliver it. But, you know, if you’re on a mission to be really customer-centric, the bottom line actually doesn’t really matter that terribly much because it’s all about what you do for your customers.

MN: Exactly, exactly. We are here to serve the American public, and that means sometimes you do something that is, doesn’t necessarily make sense on paper, but you do it, but it also means being able to understand trade-offs and realize this thing just isn’t making sense anymore. I mean, we used to have a game for kids called Cart Kingdom. You could build your go buggy and drive it around in this thing. And there was a point in time where the grant that funded that had ended. And we were still spending a lot of money running it, and we had to make a hard decision. That’s not worth the money it’s costing us to run, it’s not necessarily furthering the mission. Where it needs to be. So we had, we have to make decisions. About what we cut, what we keep, and What I love about it is that there is a mission behind it, and so it’s not just dollars and cents.

SE: It’s not just arbitrary or capricious, it’s there’s values, there’s processes, but it sounds as well here, Mike, that you’re really helping the organization and your own part of the organization is trying to move from firefighting to fireproofing. And it’s a different discipline, isn’t it? It’s a different mindset and it’s not obvious to folks straight away cos often we’ve all come from firefighting rather than fireproofing.

MN: It’s totally different. It requires thinking about things ahead of time. It requires getting into the conversation early, too often. My team will get called in. A day before a launch. And they don’t even know where they’re launching it. There’s like, hey, I heard you’re the guys to talk to you about getting this up in the cloud, and, you know.

SE: Tomorrow

MN: We’re having to have some hard conversations about, OK, hold on, why did you build it this way? You know, we do things with containers, you’re not doing that. And so, yeah, fireproofing, to my mind is about being in the conversation early, early, early, so that you can help guide those technical decisions that are made, because, again, going back to the beauty of the cloud is that everyone can make decisions as needed, but the curse of it is that everyone is making decisions as they need, and now, you have a team that is dealing with the fallout of that. And so the more that my team can be involved early and talk through, hey, you’re choosing to use whatever the technology, you’re, you’re, you’re building this RDS. Does it really need to be a full database? Is this, is this something that you could use with Dynamo? Those kinds of decisions made early on mean that you don’t have all these sunk costs of developers and people just making decisions that they don’t realize are going to be a problem. So, yeah, it’s that mentality shift around having Those conversations early on. What I’ve been trying to help people understand at PBS is that it’s all those non-functional requirements. Yes, you’ve been asked to make a donate button that is blue, and this many pixels by this many pixels, and it’s gonna do X, but there are implications to what you’re doing. And please, please, please have my team involved from the beginning. We could know, you know what, we made a, we made a bad RI purchase last year, and we’ve got a fleet of unused RI’s for M5 whatevers. So please just, let’s just use those. But they don’t know that because they’re not that intimate with the bill every month.

SE: It’s that mind shift change that really is important. Now, Mike, as we, as we come towards the end of time here, there’s a story I’d love you to tell because it’s always fascinating to hear what goes on behind the curtain of things that we all see and almost take for granted. So one thing that I’m sure most of our listeners are familiar with is a little doodle on the Google website that will change from day to day depending on events and things going on and, You had an interaction with the doodle, and the doodle is more than an image. Tell us about that.

MN: We got selected for Google Doodle back in the day, and we just, our on-prem infrastructure was not prepared for it and not in the slightest. And it was to the point where I want to say from the stories I’ve heard, there were people. At PBS who were just saying no, we can’t be featured on the Google Doodle and the decision was made, move that website to AWS. And problem solved. And it was, it’s the same story for how we ended up in streaming in AWS was that the first streaming video app at PBS was, we didn’t even have an engineering team at that time, was delivered to us, and you could have exactly 2 people watching a video per server. And our IT department was like, no, we cannot give you 100 servers to handle your load. And so we moved to AWS again, that was back before there was even auto scaling in AWS. So we were, we had people like writing scripts on cron jobs to scale things up at certain times and scale things down, but it made things work and what it did was it It meant that we didn’t have those, no, you can’t do this because the hardware is not there or our network can’t handle it, we can just pivot and make this work. And I know from the streaming side of things, that meant we were spending a lot of money on AWS instances from 5 till 9 o’clock at night or whatever, but it gave us time to fix the application to make it actually perform properly, and Then we didn’t need 100 servers at one time. You, we could have a lot less to handle all that load. And the same thing with the Google Doodle, like you want to have that kind of presence and

SE: you want to, we want to take the opportunity,

MN: right, those opportunities are important. And so being able to pivot and even if it means that this is not the best way to do it long term. We can get some breathing room. To fix it And then do it the right way. In order to run this thing, you have to actually rack some servers and run them. That doesn’t work. But in a world where you can say, you know what, we can handle the extra cost for the 48 hours it takes us to get this to better, that is the best, the best part. And

SE: it’s interesting to hearing about sort of those past stories, I guess. Werner, from your perspective and from mine as a bit of an older hand, like this is the great example of this feedback we’re constantly getting from customers saying it would be great if my server could turn on at this time and turn off at that time, etc. These are these constant signals that have always come through and it’s so gratifying to see customers have this in their hands now.

WV: Yeah, there’s a great balance to be had there. Yeah, so it’s and it’s difficult. Yeah, there’s always conflicting kind of, kind of opinions. The other thing of course is that when you guys first start moving to the cloud, it is still 2008, something like that, 2009, I believe. These days, the expectations for any digital services are that it’s always up and it’s always performant and every content you can get anywhere in the world where you are. The expectations have changed dramatically in the past 1015 years, and that puts a lot of stress on, I think, on every engineering and every operational organization as well because, you know, what we expect of a, of a website in 1999 was that it will be offline half of the time, or you wouldn’t be surprised. Now, this is something that we do not expect anymore. Yeah, and you’ll get a call from your board or from your boss if things are offline for 5 minutes. So expectations have changed dramatically, I think over the past 15 years, which makes us engineer think very differently about engineering and about cost because those expectations come at a cost.

SE: It’s very true, and you can’t have that that construction icon that we used to have in the old days on the on the websites.

MN: Exactly, and those decisions are the hard ones because it’s not up to a technical team to define uptime. That’s a business requirement. And I have at times been asked, how do we make sure that we’re, fully highly available. Like, well, I can double our costs. I can put it, I can put it in two regions, but I don’t think you need that because, generally speaking, being across two AZs, I mean, we’re in US East One, like we’re old school. We are in US East One, and we’re across two AZs, and it’s fine. It’s good enough until somebody tells me, you cannot absolutely ever be down at any minute, then I will go and I will build that, and that won’t be all of our infrastructure, that will be the things that you need me to. And so, our streaming video is S3. Fronted by cloud front, that’s always up. It’s, it’s already multi-region, multi-pop, it just works. I don’t have to worry about that. But until somebody comes and tells me, you must absolutely 100% be able to bring a new video from our partners on Friday afternoon. This has to be up, then I’ll have to make some decisions. And present them data and say, all right, I can do that, but that means that I have to do these things. And like, you want me to use, global load balancers and all these things, sure, I can do that, but a lot of what my team is doing is trying to help communicate. This is up all the time for the most part. Yes, it’s down sometimes, how much does that matter? What do you want us to do? Here’s what it will cost to do it. And then have those conversations.

SE: Makes sense, makes sense. Now we’re talking frugality, we’ve talked about the mission. Mike, there is, there is a $1 that was spent with PBS that I think you should tell us about because it’s pretty remarkable and probably helps us understand why you have clearly such passion for the work that you do.

MN: Yeah, we had a little boy named Noah who liked our shows and he sent us a $1 in the mail. And said, I would love to have a show, I think it was called Superheroes to the Rescue, and the kids team got that. And they turned around a quick website and sent it back, hey, go check out this URL and still is up and running today. PBSkids.org/superheroes to the rescue. And it has a dear Noah, thank you for, sending us this dollar. here’s what we could do. And that’s what it’s all about is knowing that we are inspiring these people and inspiring children and educating. Adults and children, it’s that mission that is, I think, what drives. All of the people working at PBS. It’s being able to know that we’re doing this not for an extra buck. Not there’s anything wrong with that, but we’re doing it because we’re making a difference, and it really comes down to the viewers like you.

SE: That’s so awesome, so awesome. Mike, thanks so much for sharing a bit more about your story and the PBS story as well today.

MN: Thank you for having me. This is awesome.

SE: And Werner, as always, fascinated to hear all your stories and insights as well.

WV: Well, I think in this particular case, I should really point to Mr. Rogers’ testimony before Congress because I think if you want to have some dust in your eyes, I think that’s a pretty good one to watch.

SE: Yeah, very dusty rooms, very dusty rooms.

SE: And thanks everyone for listening. We do love to get your feedback. AWS Podcast at Amazon.com is the place to do it. And until next time, keep on building.

Laws of Frugal Architecture

How PBS makes every penny count... For viewers like you

About this episode

Episode Transcript

Listen to this episode

Share this episode