Are Open Source Vulnerabilities Hiding Out in Your Repository?

Abstract:

Bill Ledingham / Black Duck, May 2016: Artifactory streamlines use of open source for your development teams giving a central location where they can select and access the latest versions of the open source components they rely on. But unless you are vigilant, open source vulnerabilities hiding in your repository can quickly infiltrate your apps. Here’s what you need to know to stay secure while realizing the benefits open source. In this session you will learn how open source vulnerabilities can sneak into your codebase, why traditional security testing approaches often miss open source vulnerabilities and what you can do about it, and best practices in mitigating open source vulnerabilities throughout the development lifecycle.

Talk Transcription:

Okay. I think we’ll kick things off. Good afternoon everyone. My name is Bill Ledingham. I’m the CTO for Black Duck Software. Just as a bit of quick background, Black Duck has been in business for about fourteen years now. Focused on helping companies manage their use of open source software. And manage a lot of the associated risks around open source. Historically, that primarily had to work with license compliance. Making sure you weren’t inadvertently mingling GPL code into your codebase and having problems based on that. These days it’s much more focused around some of the security issues a company in the use of open source. So, known security vulnerabilities that exist in a lot of the open source components that are out there. Making sure that you’re aware of what you’re using, how you’re using it, and whether or not there are any known vulnerabilities in the code that you’re using.

In terms of the agenda for this afternoon. Just briefly what I’m going to go through is some of the recent trends in open source software, then focus specifically on some of the trends around security vulnerabilities in the area of open source, and finally finish up with some of the best practices around. How you can leverage tools like Black Duck in conjunction with Artifactory to keep known vulnerabilities out of your codebase and keep on top of the issue if you will.

Given that it’s a fairly small group today, please feel free to interrupt me at any point in time if you have any questions and hopefully I’ll be able to jump in and address those.

For the past 10 years, Black Duck has been conducting an annual survey in conjunction with North Bridge Venture Partners. We just recently finished up our most recent survey about a month ago. This was the tenth survey that we’ve done. Just some interesting trends as you look back over those last ten years. When we first started the survey about five percent of the code in use within the Fortune 500 was open source. These days, over 35 percent of the code base that’s out there in use is based on open source. So over the last ten years we’ve really seen a dramatic increase in companies using open source and of interest, you know, recently there’s been much more of an effort and an initiative to give back to the open source community. So not just having companies use the open source, but also looking at having their developers contribute and give back to a number of open source projects.

So just looking at some of the highlights over the last ten years, in 2008, really was the realization that open source was no longer just a niche area associated with operating systems, Java, et cetera, but was really much more pervasive in terms of impacting a range of different technologies. Everything from web based applications, to mobile applications, core infrastructure, development tools, and et cetera.

In 2011, that was the first time we saw that cost efficiency was no longer the top reason and rationale for why companies were adopting and using open source within their organizations. Companies began to view it much more strategically in terms of how open source could enable them to speed up their time to market, help them deliver new functionality, new applications and services, and really also look at how they can guide the direction of various open source projects that were strategic for them.

In 2011, […] also had the notable quote around the fact that software was beginning to eat the world. In 2013, we saw the corollary within the world of open source around if, you know, software was eating the world, then in 2013, open source was starting to eat the software world. Just in terms of its impact on the direction of technology and all the different areas that it was touching upon.

And then, you know, we’ve seen that trend continue over the past two years, where all the major new areas of development. Everything from cloud computing, to big data, to analytics, to ongoing development in the mobile space, a lot of those technologies are being driven by open source projects. And in fact, in a lot of those technology areas, 60 to 80 percent of the code base these days is open source.

So just, you know, looking at our most recent survey results as I mentioned that was just completed about a month ago, some of the key trends that here again we’re seeing over the last year. Open source is really the starting point for how you build and deliver applications these days. And it’s not just the applications that were open source is being used, it’s really how companies are integrating, building, testing, and deploying those applications. So just looking at the advances in everything from the continuous integration tools to tools like Artifactory and the binary repos, to technologies like Docker and containers and how those applications are getting delivered and deployed to the market.

One of the interesting areas out of the survey was, each year we look at kind of what are the top technologies that people cite as to how they’re using open source. Last year, the top technology was really around big data, followed by cloud computing. This year, interestingly, was operating systems was cited, you know, once again as kind of the top area of technology for open source. That didn’t make quite sense initially, but then, you know, once we thought about it, it did make a lot of sense. Just looking at how operating systems are being redefined through the use of container technologies and now, kind of, the notion of a stripped down operating system. Whether it’s Alpine Linux, or unit Kernels really kind of tailoring the operating system environment to the task at hand. Kind of marrying that to the application, making sure you only need the essential part of the operating system to actually run and deploy the application.

Here again, open source is being looked at as a driver of innovation. Adrien in his keynote yesterday mentioned the use of APIs and how microservices, REST APIs are really driving a lot of the technology around how these services get delivered to the market. So that’s another kind of key area of innovation that we’re seeing. It’s not just the stack that makes up the application, but how that stack is architected and delivered to the market. Increasingly we see applications built using open source components. So the notion is you’re now are starting to assemble applications as opposed to writing and creating applications from the ground up. So the notion is really much more want of assembling the components with a little bit of proprietary code on top as opposed to writing everything from scratch.

That also, you know, goes into this notion of the software supply chain. And as more and more applications and products are sourced from different places, it’s increasingly important to know, kind of, where that software stack is coming from. And what open source is being used in that stack. Because that’s really another big source of how vulnerabilities can get into your organization. It’s not just the software that your developers are creating and writing, but all the other source that you’re bringing in through various applications.

In the area of challenges, so while there are many benefits to the use of open source, in terms of speeding up times to market, making it easier to deploy new products and services. We still do see challenges around how companies are managing their use of open source. As I mentioned, historically there are risk factors around license compliance. These days, a key concern is around the security vulnerabilities that maybe in the open source that you’re using.

Really, there’s no more notion of kind of a secure network that you can firewall off. If you look at how software’s used, how applications are deployed, increasingly, applications are all either internet facing or even if they’re not internet facing externally, within your own networks, they’re accessible via the network. And as a result, it’s not just network vulnerabilities you need to worry about, you need to be increasing aware of any potential application vulnerabilities as they present a way in which hackers can get into your applications and access sensitive data. A lot of the recent data breaches in the area of, you know, the financial networks, a lot of those have been through application vulnerabilities that exist in some of the web infrastructure that’s been used within financial services.

In terms of how companies are, kind of, managing their use of open source, unfortunately we find only 50 percent of the companies have any sort of policies and procedures in place for how they manage the use of open source. Just looking out, you know, across the audience here, how many of your organizations have policies in place in terms of how you manage it. So, little over 50 percent. So, you know, thumbs up. You’re actually doing better than a lot of the recipients of the survey and the respondents to the survey.

The other challenge that we saw are those companies that did have policies in place, a lot of times they were not really well enforced in terms of making sure developers were going through the process, flagging potential issues, and addressing those issues. So a lot of times it’s pretty easy to get around some of these policies and in spite of what companies think they have in place, it’s very easy today, as you’re all aware, for developers to go out to GitHub, grab some code, put it into their application, kind of backdoor channel into the organization. So you really need to have tools in place by which you can understand the open source in your codebase and manage it accordantly.

In terms of security vulnerabilities in particular, only about half the companies, or a little less than half the companies had policies and procedures in place for how they handle and track vulnerabilities that have been identified in the open source software that they’re using. So this is a big area of opportunity going forward for how companies need to get a better handle on this. It really first starts with identifying what open source you’re using, knowing the specific open source components and their associated versions, and then having awareness around whether or not there are any known vulnerabilities that exist in those components and versions.

Another trend that we saw of the companies that, you know, are trying to manage some of these open source vulnerabilities, you know, I mentioned only a third had process in place for how they manage that. In the organizations that were managing it, about half of the organizations had no point person or organization identified for who is going to take the lead on tracking and mitigating these vulnerabilities in their use of open source. In the companies that did have some sort of policy in place, we are seeing kind of more onus being put on the development organization to handle and mitigate some of these vulnerabilities.

So we’re seeing companies trying to push back earlier in the dev cycle the awareness around these vulnerabilities and having the dev team be responsible and have ownership for that. And this will tie into some of the best practices later on of how you give the development organization earlier visibility to some of these problem areas.

Looking at why kind of open source vulnerabilities are a big problem just at a high level, first of all, open source presents an easy target. Obviously it’s widely used, you know, components like OpenSSL, it’s all over the place out there. Used in close to half the web applications that are out there. Increasing number of embedded applications, the IOT space, anywhere where you have connected devices, connected software and applications, you know, you see open source being used. Hence, you know, if they’re vulnerabilities in those components, it presents a big area of risk.

In addition, you know, obviously, it’s easy to access the source code behind a lot of these projects. So hackers can access the source code, understand the vulnerabilities in a little bit more detail, and, you know, come up with different ways of potentially exploiting those vulnerabilities.

Also, for these vulnerabilities, they are publist — or published in sources such as the National Vulnerability Database that NIST maintains. Of the more than 70,000 vulnerabilities that are in NVD, about half of them apply to open source components and projects.

And finally, in a lot of cases, you can go out onto the web and there are exploits that are already, you know, have been written for a lot of these vulnerabilities that hackers can go out, access, and then use and try against different web applications to see if they’re able to penetrate those applications and services.

Another big problem area for the use of open source is, you know, the responsibility of who maintains code, who patches and fixes the vulnerabilities. As opposed to commercial software, or open source projects that are supported by a commercial organization, like RedHat in the case of RedHat Enterprise Linux, in the area of open source, really the onus is upon you as an organization, you as a user of the open source code. So in the case of open source, I’m sorry, in the case of commercial code, it’s up to your vendor to test their software, to issue regular patches for any vulnerabilities that have been uncovered and it’s their responsibility really to keep that software updated.

In the area of open source code, the community itself around the various projects, we see that community being very responsive, however once they’ve issued a patch or new version of the project, it’s up to you to be aware that patch exists and is available and then, you know, having the onus on you to go out and download that patch or update, test that new version, and incorporate it into your software. So it’s just a lot more effort involved to kind of understand what’s out there, keep track of all these different issues.

Looking at some of the notable vulnerabilities that are out there. Has everyone heard of Heartbleed at this point? How many people have heard of Heartbleed? So most of you, thankfully, at this point have heard of Heartbleed. So this was really the first well publicized vulnerability in an open source component. Publicized about two years ago, it was a vulnerability within the OpenSSL component that as I mentioned, widely used in about half the websites that are out there. Since Heartbleed, how many additional vulnerabilities do you think have been discovered in OpenSSL? Over 45 additional vulnerabilities have been uncovered in OpenSSL since Heartbleed.

So it’s not just the notion of understand, be aware that a vulnerability is out there, patch it once, and forget about it, you have to constantly stay on top of new releases of these components that are out there to address the ongoing stream of new vulnerabilities that are being uncovered and discovered.

What we’ve seen is the open source community has really taken notice of a lot of these issues and there’s been much more effort put into uncovering additional vulnerabilities within these open source projects. And fixing those vulnerabilities. For example, the Apache foundation has their CII project, or their Core Infrastructure Initiative aimed at helping the various project teams around best practices associated with security. So they actually have a badging program in place for different open source projects for those projects that are following some of these security best practices. So the good news is the open source community is really moving to address the issue, the bad news is, as a result, you’re seeing many more vulnerabilities discovered and many more patches and versions come out that you have to stay on top of.

Couple of other things to note on this slide. A lot of these vulnerabilities have been in the code base for a number of years. It was only really recently that they were discovered and uncovered. Also, all these vulnerabilities were discovered, not through automated tools, you know static or dynamic analysis tools, they were uncovered by security researchers pouring over the code, looking at different issues in the code and figuring out where the vulnerabilities exist.

So that’s another challenge in this area. So even though you maybe running a lot of static and dynamic analysis tools within your organization, unfortunately in a lot of cases, those are only uncovering kind of generic issues within the code. And don’t really go after a lot of these more esoteric vulnerabilities that end up getting published in the NVD and other sources.

So oftentimes these are, you know, too complex. It really requires a security researcher to, you know, pour over the code to find these vulnerabilities. And of the more than 3,000 vulnerabilities that were uncovered last year, less than one percent of those were discovered by use of some of these automated tools. So really, while you’re running a lot of these tools in your own software development processes, that’s good, it catches a lot of issues in your own proprietary code, it’s not necessarily going to address a lot of the issues that are being uncovered in the open source software that you’re using.

Just looking at the trend, in terms of number of vulnerabilities that are being disclosed and uncovered each year, over 10 new vulnerabilities are discovered each day. And so here again it’s something that’s, you know, difficult to keep up with in terms of all these new discoveries and then analyzing whether or not your own code is impacted by any of this.

One of the things that black dug — Black Duck does as part of our business is we conduct a number of audits of different company codebases, typically as part of a […] transaction. So anytime one company is buying another company, a lot of times they’ll require a […] the code to understand, you know, how much open source is in there, are there any license compliance issues, and likewise, you know, are there any security vulnerabilities that are in there that need to be potentially remediated.

So just looking at some of the data over 200 audits that we’ve done over the last nine months. You know, some of the interesting things pop out of those audit results. First of all, you know, 100 percent of the time, i.e., in every one of those more than 200 audits, we found code or open source code that the company wasn’t aware that they were using. So going back to this issue of developers, you know, finding backchannels to bring open source code into the code base. We find that that is very prevalent out there and in spite of having some process and procedure in place for tracking the open source, most of the time, companies aren’t able to uncover all that through the processes that they have in place.

Of, you know, the applications that we scanned, the typical number of open source components was over 100. So 105 components on average within these commercial applications. So you can see, open source is being fairly widely used within a lot of applications today.

A couple of other things. The average number of vulnerabilities in these commercial applications was over 22. So, you know, even though these were commercial products, there were a large number of vulnerabilities in the open source components that were contained within those applications. And of those 22 vulnerabilities on average, over 40 percent of those were in a severe category, i.e., a CVSS score of greater than seven in terms of the severity of the vulnerability.

Another interesting thing was the average age of these vulnerabilities, over five years. So a lot of these vulnerabilities had existed in the code base for a long time. Just laying in the code and even notable vulnerabilities like OpenSSL. OpenSSL, even though it’s two years old, was still in 10 percent of the applications that we scanned. So, you know, the challenges for companies to know what they’re using in terms of open source and being a little bit more proactive in terms of how they address it.

So just looking at, kind of, how companies are addressing the problem today, as mentioned, in most cases, not too well. You know. You know within the group here, you mentioned, you know, we have 50 percent of the companies have some process in place in a lot of organizations that we talked to, simply no process or the process that they have is one that’s very manual in nature. So at the end of a dev cycle they ask the dev manager, what open source are you using. The dev manager then goes to his or her teams and says, okay, tell me all the open source components that you’re using. Out of that, they get a handful that are listed. The dev manager then creates a spreadsheet that lists all the components and versions and that becomes their tracking mechanism. And then, you know, let alone mapping any known vulnerabilities to those components. You know, that’s another level of effort that many organizations simply don’t go through because of the amount of effort that’s required there.

So, what is kind of an ideal solution and best practice for how you should be looking at the problem and hopefully addressing the problem? The first step, really, is around giving guidance to developers early in the dev cycle, around the software, i.e., the open source components that they’re using within their code. So being able to flag early on in the process, any issues that exist in either the licenses attached to the open source or the security vulnerabilities that may exist in those applications — those application components and versions.

And then, you know, finally, as I mentioned also, being able to identify any and all open source that you have in use. So putting automated scanning tools in place, like Black Duck. That every time you, for example, a new build of your product, or new release of your product at a minimum, you want to scan through the code, identify the open source that’s being used within that codebase and then inventory that. So create what’s known as a […] material or a listing of all those open source components and versions and then keep track of that over time. And as part of that process, you know, mapping any known vulnerabilities that exist to those components and versions and putting a remediation process in place and tracking that over time so that you’re making sure that you’re making forward progress in terms of remediating the issues that have been found, tie it into some of your standard processes around how you manage issues through applications like Jira, et cetera.

And then, as I mentioned, because a lot of these vulnerabilities continue to be uncovered in existing components, you need to be able to track it over time. So that, you know, your version of OpenSSL or your version of Tomcat that you’re using today, if a new vulnerability is reported tomorrow or a week from now, you have the visibility to do that impact analysis and map it back to all the places where you’re using that version of that open source component across your codebase and across your applications.

So turning to the topic of, you know, process and procedures and kind of what are the best practices around that. First and foremost, you know, it’s very important to have policy in place but you want to make sure that that policy doesn’t get in the way of developers doing their job. If it does, it will be circumvented by developers in terms of how they get their job done. They’ll just bypass the process if it’s too onerous.

Lot of organizations, you know, a number of years ago, they started with a really heavyweight process where they required all their developers to get pre-approval of any open source software that they were using. They wanted to make sure that that software, you know, had the appropriate license attached that it wasn’t going to represent a risk to their intellectual property. So they forced the developer, before they even touched the code, to issue a request that was then reviewed by the legal department, who looked at the license and made sure it was acceptable for the use that the developer intended for that particular open source component. The problem is, that approach no longer scaled. Just through the sheer volume of open source in use, you can’t have developers waiting one to two weeks for the legal department to get back and tell them that whether or not it’s acceptable to use those components and versions.

So increasingly, we really recommend an approach which is exception based in terms of how you run process and controls within your organization. So you define certain criteria and as long as that open source meets that criteria, developers are free to use it within your organization. So as long as the license type is of a certain variety, based on how that application’s being deployed. You know, if it’s going out externally versus a SaaS application or just internal use, you evaluate the license type in association with the intended use to know whether or not it’s permissible based on the obligations that that license or whether you want to flag it on an exception basis and have somebody in legal review that and get the final sign off for that.

Likewise, you want to have the same sort of review in place for any security vulnerabilities. So having policies that say, we’re not going to allow the shipment of a new version of an application if there are any severe vulnerabilities that exist in that. And then, you know, tying that back into your process to make sure that the dev team addresses those issues before it ships out the door.

The good news is there’s a lot of APIs in all these different tools that makes it very easy these days to tie these tools together. So many of these tools have REST APIs that allow easy integration so you can do everything kind of soup to nuts in terms of how these tools plug into your continuous integration and continuous deployment workflows. So for example, you can tie the scanning into a Jenkins build process, invoke the scan, and then flag any issues as part of the scan and part of the build process.

So looking at how kind of that works in conjunction with Artifactory. So if you look at your CI and your build system, you look at your artifacts in Artifactory and you look at tools like the Black Duck Hub that can do the scanning and identification of the open source. So in terms of how the basic workflow operates.

First of all your build process in your CI environment is going to build the necessary — it’s going to pull the necessary dependencies from Artifactory as part of the build process. Then, as a step in the build, you can evoke a scan of the new artifacts that have been just generated by the build. Scan the code, identify any open source in the code. Create that build of materials and then understand the associated risks, whether it’s a license compliance issue or a vulnerability. Then based on policy, you can decide whether or not that build is allowed to proceed.

In a lot of cases, the companies, you know, especially if it’s a release build, they’ll fail the build if the open source components do not meet the policies that they’ve set. As long as they do meet policy, then you allow the build to proceed and then the artifacts are written back to the repo and the process is allowed to continue from there. So that’s one approach that we see as a best practice for keeping open source components and artifacts out of Artifactory, especially the release part of Artifactory if they have any known issues.

And finally, kind of another approach is just using various APIs within Artifactory. And some of the new APIs that were just announced yesterday in Xray and being able to leverage those to pull information out of Artifactory and write metadata back into Artifactory. So another approach here is every time there is a new artifact placed in Artifactory, we can go through a process whereby we get a notification of that artifact. Here again we can scan the artifact to identify the open source code and make sure that it meets various policy criteria. And then if it does, you can promote that artifact from your snapshot repo into your release repo or to go from dev into prod, et cetera. So as long as it meets criteria, you’ll allow it to proceed though the build process and to then be available for use.

Same thing applies to Docker containers. You can use the same methodology to scan not just applications but also scan containers and then as containers are used and deployed, making sure those containers are up to date and meet the different criteria you set for allowing their deployment out there. So you can then constantly monitor for new vulnerabilities. If something’s uncovered that has a high severity, then you can flag that container and say, okay, it no longer meets policy, we’re going to go through this notion of denoting that container or not allowing it to be further deployed.

So through some of these techniques, those are ways in which you can get a handle around this process of understanding the open source, how you’re using it, and putting some controls in place.

So on that note, any questions.