Hard Boundaries, Soft Boundaries

Computer scientists use filters, ≥ signs, intersections (sql), and other forms of what I would call “hard boundaries”.

Grep either finds what you’re looking for, or it doesn’t.
The condition inside your while(){ loop either trips true and the interior code runs, or it trips false and it’s skipped.
You either follow someone on twitter, or you don’t.
You either crawled a webpage, or you didn’t.
In exploring a code tree or other graph, you either look at the node, or you don’t.
Two people either are Facebook friends, or they aren’t.
The tweet either included a word from this list, or it didn’t.

But, one needn’t be so conceptually constrained. Thinking in a fuzzy logic sense, it’s possible to create a “soft” boundary.

To use a classic example from Bart Kosko’s book, although the American legal system imposes a “hard boundary” on adulthood (OK, a series of hard boundaries–16, 18, 21, 25), one really passes into adulthood gradually over time. (Unless you have your first kid at 16, in which case you grow up real quick. But talking about the upper-middle-class college-enrolled set here: most of them grow up slowly.)

That’s nice in a philosophical, contemplative way. But can we use the soft-boundary concept for anything useful? I think so.

For example, in this neo4j video (minute 5) Marko Rodriguez gives us the following line of Gremlin code:

g.v(1).outE.filter{it.label=='knows' & since > 2006}.count()

We could either be naïve about this and treat 2006 as a hard boundary, or make it a variable and perform sensitivity analysis. In fact, any time we see a number we could turn it into a parameter – ending with a hull of list. We could poke about in that parameter space and by doing so get a better idea of the shape of things than setting a naïve tripwire.

Is there a design pattern for this?

Notice also his gremlins can “be” on multiple nodes at once. That’s certainly not a binary data structure to the codomain. Other non-binary aspects to his graphs:

different words (“coloured edges” in graph parlance) like “speaks”, “has worked with”, “had a child with” – all of the richness and drama of Quine’s ontology of language wrought in the connectome of the graph
the network structure itself
and of course edge weights

Here’s an example from Unix for Poets:

cat bible | grep Abel | uniq -c

So-called “bright lines” appear also in the law (married vs not), statistical regression (dummy/indicator variables), and tax brackets (under $15,000.00 or ≥ $15,000.01).

They’re frustrating because they’re discontinuous. (Actually tax brackets are not but the first derivative is discontinuous.)

Imagine the following (non-existent, stupid) tax system:

If you make under £30,000/year you pay no tax.
If you make ≥ £30,000.01/year you pay 50% tax on every dollar you made (all the way down to £0.01).

It’s frustrating because it’s discontinuous. I might not go as far as to say that continuity, smoothness, holomorphicity, analyticity and so on are “natural to the human mind” – if in fact we can just take a monolithic view on “the” mind – but continuity and smoothness certainly seem–to me and to other mathematical writers I’m thinking of–like they’re more fair, just, or sensible.

Imagine you’re trying to catch an email spammer, and you’ve determined that the character ! is a good trigger for spams. You could either

set a hard boundary: more than 3 !’s, flagged for spam; ≤ 3 !’s, not flagged
or you could count the number of !’s in the text

The latter approach is more flexible:

you can change the parameter 3 to something else
you can pass the count through a function (like a sigmoid, monotone convex or monotone concave function, or the cumulative-prospect-theory function)
As in minute 14 of this d3.js video you can add (something like a) “blending” parameter
you can set a known algorithm (like logistic regression) to find the optimal parameter value for you
you can combine the ! count with other variables (like counts of the word herbal or counts of the forenames of people in the mail user’s address book)
you can combine the ! count with other variables and use a known algorithm (like a backprop net) to set all the optimal values for you
maybe you can find a way to half-instantiate your desired response when the count is “at half mast” or “in a middling range”.

Back to catching spammers, I drew up an idea for tumblr to catch its spammers a while ago. I noticed a few telltale markers of spam accounts:

quick liking in succession
squatting on a hashtag
high number of likes
no / low content in the title
at first the spammrs were not reblogging stuff (now they not only reblog but post fakey “original” looking text posts … that’s counter-evolution for ya) so they usually had no posts on their blog page
exist ads on the sidebar

They opted for social proof (let people “block” spammy likers from their dashboard and flag them as suspected spammers), which seems to have worked out very well. So I’m not saying “soft boundaries are always better” or something – just that if a “hard boundary” is preventing you from thinking about a problem like you want to, you can get around it pretty easily!

I think computer scientists do use soft boundaries, although they might not draw the same analogy to the “crisp” > sign as I am.

tag clouds don’t just count words – they increase the display size of the word depending how large the count is (maybe the sqrt of the count?). That tag clouds count different words rather could also be construed as a “coloured” codomain.
you don’t just return a webpage or not return a webpage in your crawler. You might get a 404, or you might get a 302. Or you might get a 200, 500, 303, 504, and so on. Additionally the page might be in HTML, JSON, or might simply flip a switch (“turn on my remote TV recording device”).

Business people (I’ve found) think naturally in terms of soft boundaries as well. If your client / boss is using the word “score” you can mimic that directly with what I’m calling a “soft boundary”.

All you’ve got to do is make up a functional that “measures stuff” any way you want, and slide your > sign along the resulting smooth scale.