No Internet? No Problem. Use Artifactory with an Air Gap

Virtually all development organizations need access to remote public resources such as JCenter, NuGet Gallery, npmjs.org, Docker Hub etc., to download dependencies needed for a build. One of the big benefits of using Artifactory is its remote repositories which proxy these remote resources and cache artifacts that are downloaded. This way, once any developer or CI server that has requested an artifact for the first time, it is cached and directly available from the remote repository in Artifactory on the internal network. This is the usual way to work with remote resources through Artifactory.
Accessing remote resources

 

There are, however, organizations such as financial institutions and military installations, that have stricter security requirements in which such as setup, exposing their operations to the internet, is forbidden.

Air gap to the rescue

To accommodate these use cases, we recommend a setup with at least two Artifactory instances; one on the DMZ and another on the internal network, a setup commonly known as an air gap.

We normally see one of two scenarios:

  1. No network connection
  2. One-way connection

No network connection

In this scenario, the two Artifactory instances have no network connection between them. To get dependencies from the internet, the external instance has to download them, export them to an external device (such as a hard drive or USB flash drive), and then the internal instance has to import them for use by the developers and CI servers.

Exporting and Importing

 

Getting Dependencies

Here are two ways you could get dependencies from remote repositories:

  1. Dependencies Declaration
    Strip down your code, leaving only the dependencies declaration. Install the stripped-down code on a virtual machine on the DMZ which has the tools needed to run it (for example, if you’re developing npm packages, you would need the npm client installed on the DMZ machine). The corresponding client requests the dependencies you need through Artifactory which recursively downloads them, as well as any nth level dependencies that they need too.
  2. Dedicated Script
    Implement a script or mechanism that iterates through all the packages you need and sends “head requests” to Artifactory so it downloads those packages from the remote resource. For example, to pull versions 2.1.0 to 2.1.5 of jquery from JCenter you might use the following code snippet:
for i in ‘seq 0 5’; do curl -uadmin:password -l http://localhost:8081/artifactory/jcenter/org/webjars/jquery/2.1.$i/jquery-2.1.$i.pom


Export and Import

Here are two ways you could export new dependencies (i.e. those downloaded since the last time you ran an export) and then import them to the internal folder.

  1. Groovy script with AQL query
    Write a Groovy script that uses an AQL query to find the right artifacts based on the date they were downloaded. This way we can make sure to export only those artifacts that have been downloaded since our last export. For example, to export all artifacts that were created after August 16, 2016, you might use something like the following code snippet:

    // replace this with your AQL query
    def query = 'items.find({"type":"file","created" : {"$gt" : "2016-08-16T19:20:30.45+01:00"},"repo" : "generic-local-archived"})' 
    
    // replace this with your Artifactory server
    def artifactoryURL = 'http://localhost:8081/artifactory/' 
    def restClient = new RESTClient(artifactoryURL)
    .
    .
    .
       try {
           response = restClient.post(path: 'api/search/aql',
                   body: query,
                   requestContentType: 'text/plain'
           )
       } catch (Exception e) {
           println(e.message)
       }


    To download all artifacts exported into the file system or a portable drive (into a “today’s date” folder):

    public download(RESTClient restClient, List itemsToExport, def dryRun) {
       def date = demoFormat()
       def folder = "$date"
       def dir = "/path/to/export"
       def file = new File("$dir/$folder");
       file.mkdirs();
       dryMessage = (dryRun) ? "*** This is a dry run ***" : "";
       itemsToExport.each {
           println("Trying to download artifact: '$it'")
           try {
               if (!dryRun) {
                   def response = restClient.get(path:String.valueOf(it))
                   if (response.status == 200) {
                       String s = it.toString().substring(it.toString().indexOf("/") + 1)
                       file = new File("$dir/$folder/"+s)
                       file << response.getData()
                       println("Artifact '$it' has been successfully downloaded. $dryMessage")
                   }
                   else
                       println("response status: '$response.status'")
               }
           } catch (HttpResponseException e) {
               println("Cannot download artifact '$it': $e.message" +
                       ", $e.statusCode")
           }
       }
    }
    


    Now, we’re ready to take this folder and import it into the chosen repository on our internal Artifactory instance

  2. Using JFrog CLI
    Since exporting is basically copying files from one location to another, JFrog CLI is the perfect tool to accomplish that. What we’re looking to do is to download all the new packages from the “clean repository” in the external Artifactory instance, and then upload them to the internal instance. The most straightforward way to download the new files is with the following command:

    jfrog rt dl generic-local-archived NewFolder/

    “But wait,” you say. “Doesn’t that download all the files?” Well, it looks like it, but since JFrog CLI is checksum aware, it only downloads new binaries that were added since our last download. Under the hood, JFrog CLI actually runs an AQL query to find the files you need, so your response looks something like this:

    [Info:] Pinging Artifactory...
    [Info:] Done pinging Artifactory.
    [Info:] Searching Artifactory using AQL query: items.find({"repo": "generic-local-archived","$or": [{"$and": [{"path": {"$match":"*"},"name":{"$match":"*"}}]}]}).include("name","repo","path","actual_md5","actual_sha1","size")
    [Info:] Artifactory response: 200 OK
    [Info:] Found 2 artifacts.
    [Info:] [Thread 0] Downloading generic-local-archived/jerseywar.tgz
    [Info:] [Thread 1] Downloading generic-local-archived/plugin.groovy
    [Info:] [Thread 1] Artifactory response: 200 OK
    [Info:] [Thread 0] Artifactory response: 200 OK
    [Info:] Downloaded 2 artifacts from Artifactory.

    Now you can take “NewFolder” to your internal instance and upload its contents, again, using JFrog CLI:

    jfrog rt u NewFolder/ generic-local-archive

    And since JFrog CLI uses checksum deploy (similar to the case of downloading), binaries that already exist at the target in the internal instance will not be deployed. The output below shows that only one new file is checksum deployed, apex-0.3.4.tar.

    [Info:] Pinging Artifactory...
    [Info:] Done pinging Artifactory.
    [Info:] [Thread 2] Uploading artifact: http://localhost:8081/artifactory/generic-local-archived/plugin.groovy
    [Info:] [Thread 1] Uploading artifact: http://localhost:8081/artifactory/generic-local-archived/jerseywar.tgz
    [Info:] [Thread 0] Uploading artifact: http://localhost:8081/artifactory/generic-local-archived/apex-0.3.4.tar
    [Info:] [Thread 1] Artifactory response: 201 Created
    [Info:] [Thread 2] Artifactory response: 201 Created
    [Info:] [Thread 0] Artifactory response: 201 Created (Checksum deploy)
    [Info:] Uploaded 3 artifacts to Artifactory.

A simple way to formulate complex queries in a filespec

But life isn’t always that simple. What if we don’t want to move ALL new files from our external instance to our internal one, but rather, only those with some kind of “stamp of approval”. This is where AQLs ability to formulate complex queries opens up a world of options. Using AQL, it’s very easy to create a query that, for example, only downloads files created after October 15, 2016 and are annotated with a property workflow.status=PASSED, from our generic-local-archived repository into out NewFolder library. And since JFrog CLI can accept parameters as a filespec, we create the following AQL query in a file called newandpassed.JSON:

{
  "files": [
    {
      "aql": {
        "items.find": {
          "repo": "generic-local-archived",
          "created" : {"$gt" : "2016-10-15"},
          "@workflow.status" : "PASSED"
        }
      },
      "target": "NewFolder/"
    }
  ]
}

…and now feed it to JFrog CLI:

jfrog rt dl --spec NewAndPassed.json

Now we just upload the contents of NewFolder to the internal instance like we did before.

One-way connection

Some high-security institutions, while requiring a separation between the internet and their internal network, have slightly more relaxed policies and allow a one way connection. In this case, the internal Artifactory instance may be connected to the external one through a proxy or through a secure, unidirectional HTTP network connection. This kind of setup opens up additional ways for the internal instance to get dependencies:

  • Using a smart remote repository
  • Using pull replication

 

Smart Remote Repositories

A remote repository in Artifactory is one that proxies a remote resource (such as a public repository on the internet like JCenter). A smart remote repository is one in which the remote resource is actually a repository in another Artifactory instance. 

Here’s the setup you can use:

Smart remote repositories

 

The external instance on the DMZ includes:

  • Local repositories which host whitelisted artifacts that have been downloaded, scanned and approved
  • A remote repository that proxies the remote resource from which dependencies need to be downloaded
  • A virtual repository that aggregates all the others

 

The internal instance includes:

  • Local repositories to host local artifacts such as builds and other approved local packages
  • A remote repository – actually, a smart remote repository that proxies the virtual repository in the external instance
  • A virtual repository that aggregates all the others

 

Here’s how it works:

  • A build tool requests a dependency from the virtual repository on the internal Artifactory instance
  • If the dependency can’t be found internally (in any of the local repositories or in the remote repository cache), then the smart remote repository will request it from its external resource, which is, in fact, the virtual repository on the external instance
  • The virtual repository of the external instance tries to provide the requested dependency from one of its aggregated local repositories, or from its remote repository cache. If the dependency can’t be found, then the remote repository downloads it from the remote resource, from which it can then be provisioned back to the internal instance that requested it.

 

There’s a small setting you need to remember. The virtual repository on your internal instance must have the Artifactory Requests Can Retrieve Remote Artifacts checkbox set.

Virtual repository setting

Pull Replication

In this method, you download dependencies to the external Artifactory instance (in the DMZ) using either of the methods described above. Now all you have to do is create a remote repository in your internal instance and configure to invoke a pull replication from the “clean” repository in your external instance according to a cron job to pull in all those whitelisted dependencies to the internal instance.

A note about JFrog Xray

If you are using JFrog Xray with your internal Artifactory instance, then you face a similar problem. In order to scan your indexed artifacts, it must ingest data on issues and vulnerabilities from the various feeds it is connected to. The primary feed comes from the global database server maintained by JFrog, and you need an to synchronize with it periodically to download new data. When you have a live internet connection, the database is synchronized automatically, however without an internet connection, you have to go through a manual process similar to the one described above for dependencies using offline mode. Basically, you download the Xray database update using JFrog CLI on your external instance in the DMZ (Xray will generate the required command for you), copy the downloaded files to the internal Xray server, and then do a local update.

Organizations that have to work off the grid, pay a price to maintain their enclosed, secure environments that are disconnected from the internet, but that doesn’t mean they can’t use leading tools that are available on the market. With a bit of tooling and scripting, and maybe some manual interaction, they can use an air gap to access JFrog Artifactory (and JFrog Xray) to get a nearly-online experience while maintaining their strict security policies in their development environments.

 

Start Your Free Trial

Leave a Reply

Please type the code shown*


Can't read the image? click here to refresh

Artifactory Bintray JFrog Mission Control Xray
Popular Posts