On Duct Tape and Fence Posts
There's a parable in the security industry. A person owns some land in the desert, and they want to prevent trespassing. So they start building a fence. They want the area to be really secure, so they build a really tall fence post, to make absolutely sure that no one could climb over it. But they never get around to building the rest of the fence; any intruder could simply walk around it. The height of the fence post is irrelevant.
Consider this scene from Burn Notice:
This is fence post security. It doesn't matter how well-defended the door is if the walls around it are so flimsy. A smart attacker will generally attack the weakest part of a system; good security requires that all possible methods of attack are covered.
This idea is closely related to the principle that a chain is only as strong as its weakest link. A chain is minimalist; there are no redundant links, so every link is playing an essential role, and if any of them fails, the whole system fails. Fencepost security is what happens when someone fails to consider this and spends their resources reinforcing a single link, leaving the others untouched.
It's an easy mistake to make. People think to themselves "in the current system, what's the weakest point?", and then dedicate their resources to shoring up the defenses at that point, not realizing that after the first small improvement in that area, there's likely now a new weakest point somewhere else.
Fence post security happens preemptively, when the designers of the system fixate on the most salient aspect(s) and don't consider the rest of the system. But this sort of fixation can also happen in retrospect, in which case it manifest a little differently but has similarly deleterious effects.
Consider a car that starts shaking whenever it's driven. It's uncomfortable, so the owner gets a pillow to put on the seat. Items start falling off the dash, so they get a tray to put them in. A crack forms, so they tape over it.
I call these duct tape solutions. They address symptoms of the problem, but not the root cause. The underlying issue still exists and will continue to cause problems until it's addressed directly.
Did you know it's illegal to trade onion futures in the United States? In 1955, some people cornered the market on onions, shorted onion futures, then flooded the market with their saved onions, causing a bunch of farmers to lose money. The government responded by banning the sale of futures contracts on onions.
Not by banning futures trading on all perishable items, which would be equally susceptible to such an exploit. Not by banning market-cornering in general, which is pretty universally disliked. By banning futures contracts on onions specifically. So of course the next time someone wants to try such a thing, they can just do it with tomatoes.
Duct-tape fixes are common in the wake of anything that goes publicly wrong. When people get hurt, they demand change, and they pressure whoever is in charge to give it to them. But implementing a proper fix is generally more complicated (since you have to perform a root cause analysis), less visible (therefore not earning the leader any social credit), or just plain unnecessary (if the risk was already priced in). So the incentives are in favor of quickly slapping something together that superficially appears to be a solution, without regards for whether it makes sense.
Of course not all changes in the wake of a disaster are duct-tape fixes. A competent organization looks at disasters as something that gives them new information about the system in question; they then think about how they would design the system from scratch taking that information into account, and proceed from there to make changes. Proper solutions involve attempts to fix a general class of issues, not just the exact thing that failed.
- Bad: "Screw #8463 needs to be reinforced."
- Better: "The unexpected failure of screw #8463 demonstrates that the structural simulation we ran before construction contained a bug. Let's fix that bug and re-run the simulation, then reinforce every component that falls below the new predicted failure threshold."
- Even better: "The fact that a single bug in our simulation software could cause a catastrophic failure is unacceptable. We need to implement multiple separate methods of advance modeling and testing that won't all fail in the same way if one of them contains a flaw."
- Ideal: "The fact that we had such an unsafe design process in the first place means we likely have severe institutional disfunction. We need to hire some experienced safety/security professionals and give them the authority necessary to identify any other flaws that may exist in our company, including whatever processes in our leadership and hiring teams led to us not having such a security team working for us already."
As this example shows, there isn't necessarily a single objective "root cause". It's always possible to ask "why" another time, and the investigators have to choose where to cut off the analysis. So a "duct tape fix" doesn't refer to any specific level of abstraction; it refers to when the level at which someone chooses to address a problem is not appropriate for the situation, either because the level at which they addressed it is so narrow that it's obvious something else is going to go wrong, or because there exists a fix on a deeper level that wouldn't cost significantly more.
Duct tape fixes are so tempting because they're so easy up front, but often they spiral into higher costs when the cracks keep appearing and you have to keep putting on more and more pieces of duct tape.
One time I was discussing a simple program to check the precision of a decimal number, and due to floating point errors it would fail on specific inputs like 0.07. One person suggested that I should fix this by multiplying the input by an arbitrary constant and then divide this constant out at the end, recommending a particular constant that they had discovered made the program succeed on the 0.07 example I had given. I pointed out that this didn't actually fix the core problem and just shifted the errors to other numbers, such as 0.29. Their response was that I should make a list of all the numbers that were most likely to be given as inputs, and find a constant that succeeded on all the numbers in the list, resigning myself to occasional errors on the uncommon numbers.
This is not how you design a reliable computer program. Checking a number's precision is not a complicated mathematical concept, and there were various one-line fixes I could have applied that would make the function work properly on all potential input numbers, not just some of them. But this person had anchored on the first solution that came to mind, and insisted on tweaking it to cover each new special case rather than realizing that their whole approach was fundamentally flawed.
Or consider the current approach to designing AI chatbots. They have a tendency to say inappropriate things, so companies use reinforcement learning from human feedback to try to curb this behavior, where they give it examples of what not to say, and train it to avoid saying those things. Every time a new version comes out, someone discovers a new unwanted behavior, the company adds that example of what not to do to their reinforcement learning dataset, and goes "ok, all fixed!"
But of course it hasn't been fixed. Someone else is just going to find a new input prompt that leads to inappropriate behavior.
The core problem is that a large language model is a general text-prediction engine, not an agent with any particular goal system. You can tweak it by penalizing strings of text that look a certain way, and hope that once you give it enough examples it will learn to fully generalize, but this is iffy. Sure, it might work someday, like continuing to put additional screws into an unstable structure might eventually make it stop wobbling. But it hasn't worked so far, and it would be a better to understand the underlying forces at play.
Duct-tape fixes have been around for a long time. Consider the (apocryphal) story:
Plato had defined Man as an animal, biped and featherless, and was applauded. Diogenes plucked a fowl and brought it into the lecture-room with the words, "Here is Plato's man." In consequence of which there was added to the definition, "having broad nails."
If Diogenes were to find a way to squash the chicken's nails down to be broad, would Plato then admit it was a man? Or would he find some other physical characteristic that humans happen to have and chickens happen to not, and append that to the definition?
Plato's mistake here was using a characteristic that is only correlated with the thing in question. When optimization pressure is then applied to this definition, it fails, as per Goodhart's law.
Consider someone who is given a list of photos and asked to write a computer program that identifies when a photo contains a bird. The programmer notices that all the bird photos they were given contain a lot of leaves, and all of the non-bird photos contain no leaves. So they write a program that counts up the green pixels and returns "bird" if the number is high enough.
This program outputs the correct results on all the example photos they were looking at, but it will fail very quickly when applied to any new photo. The programmer successfully found a feature of the photos that divided them into the desired final categories, but it was not the relevant feature.
This is the danger of duct-tape fixes. They lull people into a false sense of security, making it look superficially like the problem has been addressed, while the real issue is still there, lurking.