AQL: A Comprehensive Query Language for Repositories

One of our mantras here at JFrog is that we’re community driven. At each roadmap meeting we discuss features arising from trends in the industry along with feature requests we frequently get from the Artifactory community. AQL is just such a feature. No… nobody said, “Hey guys, why don’t you build a super efficient query language that finds anything in our repositories, and utilizes our database architecture so it will be unique to Artifactory and nobody else can do it.” But what did happen is that we kept getting requests for variations on the REST API calls that perform different search functions. A classic example is in searching for artifacts by creation date. In version 2.2, we released Artifacts Created in Date Range. This REST API call finds artifacts created in the date range you specify. But then, after getting more and more requests asking for the ability to search by date, but on different fields, we released Artifacts with Date in Date Range in version 3.2. So now you can search for artifacts based on when they were created, modified or downloaded. You can see a similar scenario with Artifact Latest Version Search Based on Layout released in version 2.6, which was then followed by Artifact Latest Version Search Based on Properties released in version 3.1.1. And there are more examples with API calls being embellished with more parameters and options. So eventually we realized you can please some of the people some of the time, but no matter how much we want to keep our community happy, we can’t predict all the different features that everyone needs. No matter what enhancement we add, someone will need something else.

The gift of empowerment
You know that old proverb about giving a man a fish vs. teaching him to fish? We decided that the right approach is not to keep ”giving new APIs” to satisfy more and more search needs, but rather to empower the community with the ability to build ANY search query your collective imagination can dream up, and thus…Artifactory Query Language was born.

That’s the essence of AQL. By providing such an extremely flexible language that lets you specify any number of search criteria, combined in any logical configuration, with any set of filters and displaying any set of output fields, not only can we satisfy any request we have ever received, we can also satisfy any future request that nobody has dreamed of yet.

The proof is in the pudding
So let’s see some AQL in action and look at real-world examples.

Example 1Artifactory can run on Raspberry Pi.

Meet George.
George is a DevOps manager.
George is not happy today. His developers are complaining that the system is getting sluggish and cluttered with alot of old unused artifacts, which they insultingly call “junk”, but a hardware upgrade is not an option for George at the moment.
It’s time for spring cleaning. Especially from the remote repository caches where large artifacts can live on and on. Do we really need all those terabytes?

First let’s find old artifacts, those that were created over a year ago (assuming today is May 18, 2015). This one’s easy with AQL:

items.find({"type":"file",
 "created":{"$lt":"2014-05-18T"}})

Note how we can use comparison operators like “$lt” with dates, and that we’re taking advantage of the default built into AQL that multiple criteria are compounded using “$and”.

But like my grandma used to say, “Old does not necessarily mean useless.” Let’s make sure that we don’t delete any artifacts that are still used. George only wants to delete something if it hasn’t been downloaded in the last 6 months. So let’s add another criterion to our query, this time using the “downloaded” field in the “stat” domain:

items.find({"type":"file",
 "created":{"$lt":"2014-05-18T"},
 "stat.downloaded":{"$lt":"2014-11-18T"}})

But since Murphy lurks at every corner, George wants to make sure nothing important is deleted, so he will skip “release” repositories.

items.find({"type":"file",
 "created":{"$lt":"2014-05-18T"},
 "stat.downloaded":{"$lt":"2014-11-18T"},
 "repo":{"$nmatch":"*release*"}})

Here, we added a criterion that the repository name does not include the pattern “*release*”. Note that if you have any repository with “release” in its name – but it’s not a release repository… it will also be skipped, and you may want to rethink your naming convention.

And just to complicate matters for this example, let’s say George also wants to limit himself to “large” artifacts bigger than 1000 bytes.

So here is the final AQL query that finds all artifacts in the system that answer all of those criteria:

items.find({"type":"file",
 "created":{"$lt":"2014-05-18T"},
 "stat.downloaded":{"$lt":"2014-11-18T"},
 "repo":{"$nmatch":"*release*"},
 "size":{"$gt":"1000"}})

So if you’re not salivating yet, let’s look at one more example. Debian is good subject matter because it involves architectures, components and distributions which make for great searchable material, but similar scenarios can come up for YUM, npm and virtually any other type of component.

Example 2

Let’s find all Debian components that

  • are in the “trusty” distribution
  • have been downloaded over 1000 times
  • and are linked to either of the amd64 or i386 architecture

I challenge you to find these components with conventional search tools

With AQL it’s much easier.

items.find({"$and":[{"$or":[{"@deb.architecture" : "i386"} ,{"@deb.architecture" : "amd64"}]},{"@deb.distribution" : "trusty"},{"stat.downloaded":{"$gt":"1000"}}]})

Since it’s difficult to follow the query logic on a one-liner like that, here is the same query spread out over several lines and properly indented:

items.find(
     {
         "$and":
         [
             {
                 "$or":
                 [
                     {"@deb.architecture" : "i386"} ,
                     {"@deb.architecture" : "amd64"}
                 ]
             },
             {"@deb.distribution" : "trusty"},
             {"stat.downloaded":{"$gt":"1000"}}
         ]
     }
 )

Dog food tastes great.
At JFrog we live by “Eat your own dog food”. We use Artifactory to continue developing Artifactory, we distribute Artifactory through Bintray, and now we’re happily using AQL in both projects. We use it to implement cleanup policies, ferret out very specific artifacts, and even to find a very specific subset of modules based on different properties in order to assemble a build for Bintray with a relatively simple query that cuts our scripting down by 80%. And that’s just the tip of an infinite iceberg. AQL was designed to be so flexible that it can meet any search requirements you’re likely to dream up in the foreseeable future. Just think of all the cool dashboards you can populate with the results of AQL queries.
Dashboards populated by AQL
In fact, we’re so confident that AQL can meet your needs that we’re going to hold an AQL challenge, and defy you to provide us with a search scenario that AQL can’t handle. Stay tuned and give it your best shot.