Where We're Going, We Don't Need Control Flow

02 September 2013

I recently stumbled upon this article from a former neighbor's Twitter feed, which is about the generativity of programming languages, and how that relates (or doesn't) to modern copyright law. It's a great article by a student who understands technology, (some aspects of) philosophy of technology, and communication; hope for the future, n'all that. That said, it's formed more as a question than a roadmap, and it tends to get both lost in some of the technical weeds, and not far out enough into others. I'm starting to do that myself here, but for its deep-dive-into-C-compilation-for-a-short-article, notable non-mentions include Prolog, progenitor of 'version 1' in the article, and Clang, a now-ubiquitous-thanks-to-Apple C language compiler frontend, which adds yet more generative complexity to the same examples given.

So let's look at that first example, which caught my eye:

alice = one_of(1,2,3)
bob = one_of(1,2,3)
claire = one_of(1,2,3)
forbid(alice = bob  or  bob = claire  or  alice = claire)
forbid(bob = 3)
forbid(alice – claire = 1  or  claire – alice = 1)
forbid(claire < bob)
display(alice, bob, claire)

which is accompanied by the statement: "As it turns out, almost no mainstream programming language can easily express a program that looks like the first version." Good thing I don't like mainstream programming languages. So as mentioned, we can start with the fact that Prolog can do this, and more than likely informs the example (though I'm fairly certain this is at least some level of pseudocode, without brew install swi-prolog). But it's cool to note; Prolog is basically a notation for an inference engine.

I've got another blog post broaching the topic, but I've recently moved on to Lumosity, where we're using Cascalog. You can hopefully see the portmanteau coming a mile away. However, it actually refers not to Prolog at large, but Datalog, a super-early internal DSL of Prolog, which strictly refers to the inference of data and their relationships. The other half of the portmanteau is Cascading, which I'm not even going to bother linking, because it's not at issue here, and Get On With It, besides. In sum, Cascalog is a variant of a subset of Prolog, except that it runs on Hadoop, so you can reason over datasets unimaginable in 1977 (much less extant).

Anyway, after reading the article, being newish to Cascalog, and reading the quoted statement above as a challenge, my eyes lit up, and I had to write this trivial motivating example in the declarative version:

(<- 
  [?alice ?bob ?claire]
  ((one_of [1 2 3]) ?alice)
  ((one_of [1 2 3]) ?bob)
  ((one_of [1 2 3]) ?claire)
  (cross-join)

  (forbid ?alice = ?bob) (forbid ?bob = ?claire) (forbid ?alice = ?claire)
  (forbid ?bob = 3)
  (forbid ?alice - ?claire = 1) (forbid ?claire - ?alice = 1)
  (forbid ?claire < ?bob)))

There are some differences, to be sure. The first line is the projecting the values of the tuples out (which is the last line in the first example). The sixth line is a bit thornier. I'm not entirely sure yet (70%, say) of the technical reasoning, but because the primitives of Cascading don't, and weren't intended to, map one-to-one with that of Datalog, you can't just arbitrarily introduce streams of values. Cascading was ultimately built for ETLs, not reasoning over arbitrary data stores, and so it lends itself better to either transforming a single stream of tuples (including tuple generation, like a flatmap), or joining data that has some relation to the main stream of tuples (e.g. an inner or outer join on a lookup table). We really want the cross product of three tuple streams (or generators, in this case, the mystical one_of in the original example), and so we can use cross-join.

That said, using Cascalog 1.10.2, the current stable in Clojars, I wasn't able to get it working. While Cascading most certainly is available for enterprise support, the DSLs, written[ish] and maintained by Twitter, are, well, not. So things can get weird, if you walk into the weeds, like in cross-join--it's simply something that that docs say you shouldn't need in a Hadoop workflow very often and I'm not surprised it may have suddenly stopped working. Fortunately, we're using Leiningen, so this was as easy as bumping down the version of Cascalog, and re-running lein deps. Boom, works. I also introduced a filter operation forbid-sub to encapsulate subtraction (the usage here read like: "operand1 minus operand2 must not equal operand3"). There's probably a cleaner way to handle that, but Cascalog can get funny with Clojure-native operations on vars in a query, in a way that I'm not entirely familiar with yet, and fine, you have to build these out to discrete map/reduce tasks in Java, it's worth some pain.

Edit: Actually, in retrospect, there was a clean[ish] immediate solution here. Clojure supports "arity overloading", which basically means you're free to provide multiple implementations of a function, based on the number (though not type) of arguments. If we're worried about simply mimicking the interface informally laid out by the example, simply adding a five-argument implementation with op1, operator, op2, comparator, comp2 will suffice.

For the interested, the full gist is here.

†: Years ago as a recent grad, I naïvely asked a startup CEO over a phone interview why he'd choose to start the company in Silicon Valley, and not New York or elsewhere. Granted, there are some very, very smart people in New York, but the culture just doesn't hum with the kind of intellectual cross-pollination that goes on here. On the best days, it yields ideas that will change the world for the better; increasingly, just white men wearing Google Glass.

‡: By here, of course, I mean the Bay Area in general. In choosing to live [and love] in Oakland, I've basically resigned myself to never working outside of SF or Oakland for the foreseeable future.

comments powered by Disqus