Intuit’s DevSecOps: War Games, Gamification, and Culture Hacking
By Derek Weeks
17 minute read time
Wow, if you ever wanted to learn about Rugged DevOps (some call in DevSecOps), sit down for a spell with Shannon Lietz, Ian Allison, and Scott Kennedy from Intuit. We discussed a number of important topics including internal war games, culture hacking, gamification of Rugged DevOps, and starting as a small team. There are 100 gold nuggets in this conversation for novices and experts alike. Just yesterday Shannon shared her story on the first stop of the Nexus World in Dallas TX. She'll also be with us in Chicago on April 27th. To catch Shannon as a keynote on the Nexus World Tour, register here.
Derek: We're at the RSA 2016 Conference DevOps Connect Event. I have some of the Intuit DevOpsSec team here with me today. We're going to talk to them a little bit about Rugged DevOps and how things work over at Intuit. Let’s start with some introductions.
Ian: I'm Ian Allison. I help run the Red Team at Intuit, which is, I guess you'd say, an interesting way of taking control of security at our company. We try to get ahead of the attackers by basically being the attackers. We're essentially ethical hackers. We go after all of our own stuff to make sure we can find where the deficiencies lie in all of our software.
Shannon: I'm Shannon Lietz. I've been working at Intuit three-and-a-half years and helped to found the 24 x7 DevSecOps capability at Intuit, leading the Red Team, our security operations capability, our cyber SOC, and what we also consider “blue teaming:” being able to hunt for defects.
The organization has really had to transform how we do software development, because we're a 30-year-old software company. We are now seeing the traditional way of putting together software really embracing DevOps. For us, it's been exciting to really work in the industry with Rugged DevOps, trying to help build security into the DevOps movement.
Scott: I'm Scott Kennedy. I run the forensics and threat intelligence part of cyber work.
Derek: Shannon (@devsecops), tell me a little bit about software supply chains and how that vision of software development has impacted the way you see things at Intuit.
Shannon: That's really a great question. It was interesting when Josh Corman and I first talked, we had a lot in common. One of those things was the software supply chain. What I really love about the concept is being able to have processes driven a certain way so that you can reduce defects.
Having worked for Toyota in the past and understanding the supply chain mentality, you get a sense of how you could put something together better, incrementing on it, figuring out how to share that process, and then really figuring out what things are important. Having that notion of fewer, better suppliers was really a core concept.
I love the idea of transparency, building things a certain way, and really getting into continuous improvement. You need to look at things from an opportunities perspective -- making sure you're not just looking to make things perfect. You're looking for those opportunities to improve over time.
https://www.youtube.com/watch?v=nRaURhaTH9o
Derek: As we think about Rugged DevOps within your security team, how do you measure the success of what you're doing? What kind of metrics are you looking at that matter to the business?
Shannon: We measure everything. For example, mean time to remediation (MTTR). Once somebody finds a defect, we analyze that defect from the time it got into the supply chain to when it actually gets resolved. We track everything from mean time to remediation, to when did the ticket get created, to being able to look at when the code actually got published, to when it actually got found, and then we work on those things over time. We really try to uplift.
____
Once somebody finds a defect, we analyze that defect from the time it got into the supply chain to when it actually gets resolved.
____
We leverage JIRA just like software development team does. We register our defects and figure out how to get development teams to take responsibility for those ideas. It goes through their process of release and regression testing. As part of that, we look back to see where our opportunities are.
As an example, we started out where things may have taken weeks. We then reduced it down to days and ultimately got it down to hours. We've seen defect resolution where it's now minutes. When it's something we've discovered that was just a mistake by an engineer, we realize “mistakes do happen.” We found that our cycle times also help us to find fault stack vulnerabilities in real time, because we get to do end-to-end testing more aggressively utilizing this method.
Derek: How has consistency in your operations helped with Rugged DevOps and has it fragility within the organization?
Ian: One of the things we do is to utilize a golden image for all of the AMIs (Amazon Machine Image) we use, for all of our customers, and we require everybody to use these AMIs. We've also built some really interesting automation around scanning these AMIs, so one thing we realized quickly when we first started native US, when we try to do full vulnerability scans against another system, if it's set up to autoscale, we all of a sudden have 50 systems. Right? We can't ... It's really hard to do a full vulnerability scan right against the system, so we came up with a way to share back all of the AMIs with a special account. Then we bring those up and we scan them. Then we grade them.
Based upon the vulnerabilities that are found, you'll get a letter grade, like A through F, based upon the system you have. While we always strive to have our base image be an A and people continue to run on older images. But they get graded, and those grades get pushed up, so everybody in their org structure gets to see what the grade is for their account. I think by being a little standardized basically with these images lets us know what's in everything, and we have a grade for everyone. It helps everyone have a really good idea of like where they stand when it comes to a security standpoint.
____
Based upon the vulnerabilities that are found, you'll get a letter grade, like A through F...so everybody in their org structure gets to see what the grade is for their account.
____
Derek: That's not only a grading but a policy enforcement governance kind of role that grading plays. How rapid is the feedback loop in that grading system for the teams that you're working with?
Shannon: It's really quick, and we've discovered through some science that having component based resources like AMIs provides us an advantage when doing things like remediating vulnerabilities. Using AMI based resources, we have seen that when there’s a defect in it, we can find and remediate all of the defective AMI’s quickly. That improves everyone's security across the company.
So if you're just picking out really good components, keeping track of those components and adding security into them, then you'll actually see a different effect across our pipeline. A single change can actually have a dramatic effect on reducing the problems within the pipeline.
Ian: It's really interesting. This morning I got an email from somebody that said, "Why did our baseline AMI go from an A to a C today?"
We had just received notice of a new vulnerability. Our stuff caught it, we scanned it, we pushed the grade out to our portal where all our customers go to look at the grades. Our customers were able to see that change quickly.
They could now say, "Wow, it changed from an A to a C in less than 12 hours." I think the feedback is really important. The other important thing is that we have people going and looking. I wouldn't be getting emails about why has this changed if people aren't actually looking and wanting to make their grades better.
Derek: You mentioned customers. Are these internal customers?
Ian: Internal.
Shannon: Yeah, for our development teams, we as a security team really have changed how we think about things. It used to be that the security team would go out and govern. Basically you got the fear of the security team coming in, descending upon you.
We've really changed how that happens within our organization. We grade our resource components and we grade the way in which our applications come together. That changes how developers want to operate because they really want to figure out how to get better grades in security. And it creates a learning dynamic that incentivizes somebody to improve continuously.
___
That changes how developers want to operate because they really want to figure out how to get better grades in security. And it creates a learning dynamic that incentivizes somebody to improve continuously.
___
Derek: Does it create a competitiveness or gamification of who has better grades?
Shannon: Absolutely, which is why we did it in the first place. To your point there, gamification is something where when you start to grade components like that, you can actually start to leverage a leaderboard concept. We do have leaderboards as part of this. We have APIs where you can actually pull down your grades and include them in your automation. With these, you can make governance decisions.
If you sort of have that "game afoot," right, your leaders can then ask for a specific grades within their pipeline. That up-levels the system, and you just see a continuous improvement lifecycle come to bear. Ultimately you see fewer defects, and ultimately you get to the notion of Six Sigma in our way of thinking. DevOps is really about continuous improvement and embracing automation. Embracing that concept allows us to get to fewer defects faster.
___
DevOps is really about continuous improvement and embracing automation. Embracing that concept allows us to get to fewer defects faster.
___
Derek: As you embraced continuous practices and DevOps practices, were there points when you realized certain old ways of doing things weren’t going to enable you to move forward?
Scott: In looking at the progression of what we've been doing, one of the decisions that was made in Intuit and one of the things that I saw was really unique was the way they decided we were going to migrate into AWS. Our idea was to have the chaos team be the first people out, and that's the security team. So the security team was the one that was going out and finding out how to use each of the products that AWS has and creating the concept of whitelisting. Each product was rated as to whether or not it met security’s requirement.
Therefore, no team can go ahead and pull down this new cool tool that AWS released yesterday and use it in production because it's not been “whitelisted.” That can go into their scoring. Their scoring is not only used by the development teams but also is useful when reporting to the Board. When the board asks, "How are we doing as a company across the entire organization?" We can say that product A got a lower score than product B, and then they turn to the VP in charge of it and say, "Well, why?"
___
When the board asks, "How are we doing as a company across the entire organization?" We can say that product A got a lower score than product B, and then they turn to the VP in charge of it and say, "Well, why?"
___
We decided to not rush into the cloud, but to take a careful, considered approach and migrate in a very intelligent and well-thought-out way. At the same time, we gave the chaos team the ability to make the mistakes and grow and learn, so they can immediately turn around and share the mistakes with everyone else. The could say, "Hey, these are the things that didn't work for us. We came across a lot of problems, especially when you look at things like accounts and account roles.”
How do you control when you have thousands of accounts and you need to have some sort of administrative control?
You can either have a gigantic effort to force your namespace and your active directory to be the source of control. Or you can use the the vendor-specific tools like IAM and have each account have their own separate islands but with the concept of cross-account roles you can then do remote administration from a centralized account. You have it locked-down. You have the capability to have a restricted group and be able to remotely go in and do the necessary actions.
That also gives you audit trail. That also gives you multifactor built in, because the AWS products get those things added to them.
Shannon: I think when it comes down to it, I think culture-hacking your environment can have a profound effect, especially when you're going through a DevOps transformation.
Derek: What is culture hacking?
Shannon: That's a great question -- we use it when really trying to figure out how we as a security team can change and transform. A lot of the things that take place in a company are really based on traditional processes: What has worked before, and why would we change something that is working, right? If you're really going to go into an innovative frame; if you're really really going to get into that next-generation innovation; if you are trying to figure out what's going to work in that...it's never going to be the thing that is working. It's going to be the thing that you're going to learn as you go to that next step.
Culture hacking is really about looking at the people who are operating right now and trying to figure out how you're going to help them go from A to B, making that change. What is that the experience going to be like?
What we have done, to Scott's point, is we've forced our security team to have empathy for the DevOps teams. We go through the process of developing something in the cloud, utilizing it as a method of taking their paranoia and trying to balance the notion of getting something done within a specific time frame. We try to really wrangle what it takes to do those things securely and safely, but ultimately still be able to deliver for the business.
I think that culture hacking really comes into play when you're trying to figure out how to move somebody from the rock they're on to the rock you need them to be on and trying to figure out what those mechanisms are.
___
Culture hacking really comes into play when you're figuring out how to move somebody from the rock they're on to the rock you need them to be on.
___
Derek: Part of your security practice is looking at open-source and third-party components and your own binaries. Can you shed some light on how Intuit is using Sonatype solutions to better manage those vulnerabilities?
Shannon: Yeah, Sonatype Nexus is a fantastic platform. We love the Nexus repositories. We love how you guys put together a community. We learn a lot.
A lot of our DevOps practice is working together with it. We've put together our Nexus repositories to do code signing and figuring out how to really secure our pipelines a certain way. We are taking advantage of the fact that we can pick up components, track them, and then scan them [for known vulnerabilities].
That's allowed us to reduce the defect count that goes to production. Actually scanning and looking for vulnerabilities within our components and our open source libraries, allows us to make better decisions about what we're including in our software.
Derek: When you govern what open source, third-party or proprietary components are being used by developers is there any kind of feedback from the teams saying, "Hey, you're restricting my behavior, not improving my innovation"?
Shannon: What we've found is that notion of security approvals, exceptions and gates really doesn't work. Quite often you just create a culture where developers are going to go out and do it, and then you're going to find out about it. When it comes to really partnering and being boundaryless about how you think about security in your business, it's all about transparency. It's all about benefits. It's creating things like a security markdown file within your repository manager. It’s about taking responsibility and accountability for the things that you're doing from security perspective in your development process. It’s ultimately having an attacks.md file, keeping track of what's out there, keeping track of your open source, understanding what components you're leveraging, and why you made the decisions that you made to bring those things into your project.
___
It’s about taking responsibility and accountability for the things that you're doing from security perspective in your development process.
___
At a top level, all of those things work. But really having tools that can help the decisions that were made by some of the other open source programmers that you're getting contributions from is really necessary. All of the things that they might be deciding are also part of your decision tree, and ultimately you're rolling all of that and bundling it together. The attack surface is not just the decisions that your team is making, but the ones that you share across the code base that you've got.
Derek: Your practices are very mature. You've clearly developed them over a long time, and some people watching this might think, "Well, Intuit's a huge organization," and it may be daunting to them if they haven't started down the path of Rugged DevOps. Can you be a small team and have success in these kind of practices?
Shannon: We're not exactly a huge organization, but we are relatively large in size now. When we got started I believe I was one of maybe three people that started this, only a couple years ago. We have hired into our group pretty extensively to help grow it, and some of the things that we've done have really allowed us to operate differently, to bring in people and have them immediately be successful. Our practices allow someone Day One to be able to work with the environment, to be able to develop code, to be able to contribute code that week.
We do things like weekly demos, where we actually do video demos. A person has to come in, program something, secure something, operate it, and create a demo, all within their first week. So having the right bar for those folks is really important, but more importantly, our Red Team leader here (points to Ian), he came in and just is amazing, has created a Red Team pretty much out of thin air. So is having somebody from forensics, who's just done an incredible job to help us, to make it so that we have a lifecycle where we can snapshot something and be able to learn from it when it's actually offline.
___
A person has to come in, program something, secure something, operate it, and create a demo, all within their first week.
___
Those are the types of practices where you start to extend yourself past the normal baseline practices of processes today, and really think past that about how you're going to support innovation. You get into it very quickly. You get a learning culture. You get people who know that making mistakes, and figuring out how to learn from them, is okay. That's a really important of that actual culture that you're putting in place.
Ian: Yeah, I was going to say, it's all about iteration, right? We started small, and we just continually iterate on what we're doing to try to get better and be better at what we all do.
When I first started this journey, I was a security guy - a pen tester. It was always the developer’s fault. Developers always made the mistakes. I always had to clean up after them. But after six months of developing Ruby APIs and Ruby and working my butt off in code, the empathy was there.
I really understand what the developers are going through and why they make the choices they do. But I think by allowing us to help them, by creating tooling that allows them to self-serve, to understand it without making them ... helps them make themselves more secure without them having to become a security professional. I think that's kind of our ultimate goal.
Shannon: Being friendly hackers, right? Basically going out and attacking them so their applications don't get attacked by external attackers is really part of that frame.
Scott: The Red Team shift at the company has been profound, because you see how people react. When the Red Team started, it was not as well shared, and a lot of people suddenly were very upset that they were attacked by the Red Team. But when it was pointed out, “well, what would you rather have happen? Would you rather have somebody in China do this to you, that didn't work with you, didn't sit next to you and help you fix the product, or would you like to have a friend who, by the way, their job is to attack?”
When we went through several drills and actually practiced the muscle of defending the company against an attack, and people were upset. "Oh, I had to do all this work."
My response to them: "Well, you did the right work."
Scott: "You did the right thing. You saw something bad. You did it. You did good. You practiced the muscle. Now when it happens again and it's not the Red Team, I know that you’ll know what to do. You know that the process works, and we can actually defend the company faster and more securely."
___
You know that the process works, and we can actually defend the company faster and more securely.
___
Derek: Yeah. That's an incredible story. Thank you for sharing it.
My final question: if you could pick a superpower in dev, security, or ops that you would have in the organization, what would it be?
Ian: To me they're all like, they're the same, right? That's what we do, DevSecOps, right? We try not to actually separate them out, because I think once you start to separate them out, you start to lose perspective.
Scott: Yep.
Ian: There's a good thing about having them all be one thing, so I'd choose them all.
Scott: It's been pretty consistent. DevSecOps is the answer. What was the question? (Laughter)
Shannon: I think the reason we went out and created DevSecOps was just simply to change how we thought about doing development and technology, and to really to get ahead of it, to realize that attackers weren't setting up appointments or meetings to help you figure out how they were going to attack your software, and so then why were we? Why were we operating at a fragile level?
I think that the superpower that I would like to have is DevSecOps, because I know that we are going through the process of creating a less-fragile capability in security that will allow us to get ahead of attackers, make it much harder for them to go after the software that gets built, and we're seeing those improvements. That's actually a great thing.
Derek: It sounds really exciting, and it's very cool, so thank you all very much. I really appreciate it.
All: Thank you!
If you loved this interview and are looking for more great stuff on Rugged DevOps, I invite you to download this awesome research paper from Amy DeMartine at Forrester, “The 7 Habits of Rugged DevOps””.
As Amy notes, “DevOps practices can only increase speed and quality up to a point without security and risk (S&R) pros’ expertise. Old application security practices hinder speedy releases, and security vulnerabilities represent defects that can leave a company open to cyberattacks. But DevOps practitioners can leap forward with both increased speed and quality by including S&R pros in DevOps feedback loops and including security practices in the automated life cycle. These new practices are called Rugged DevOps.”
Written by Derek Weeks
Derek serves as vice president and DevOps advocate at Sonatype and is the co-founder of All Day DevOps -- an online community of 65,000 IT professionals.
Explore All Posts by Derek Weeks