Home > Knight Capital > Knight Capital Error Explained

Knight Capital Error Explained


Beginning on July 27, 2012, Knight deployed the new RLP code in SMARS in stages by placing it on a limited number of servers in SMARS on successive days. On August 1, Knight did not have supervisory procedures concerning incident response.

Knight Capital Trading Error

and 10:00 a.m. Where is the business rule that says "first do no harm".

The RLP essentially sought to create a pool of traders within the NYSE that would be allowed to pay retail investors fractions of pennies more for their stocks. This story is true – this really happened. http://www.bloomberg.com/news/articles/2012-08-02/knight-shows-how-to-lose-440-million-in-30-minutes The mistakes could be in the instructions, in the interpretation of the instructions, or in the execution of the instructions.

Knight's largest business is market making in U.S. The system should be associated with the Cynefin "Complex" domain - a complex adaptive system. But if your system is wildly broken, this might not be doing what you think it's doing.

I also don't understand why people insist on using integer flags for things that matter. They're a broker where they likely have some SLA-ish agreement with clients, or face repetitional risk at the very least. For them to not to post bids and offers and give up that order flow -- that would not have been acceptable for management. With real orders and real dollars.

At scale a single failure cannot be allowed to halt the deployment, either. This is an architectural mistake.

As Hunsader explained it: In some stocks like Nokia and Exelon, we saw lots of what looked like wash trades and we thought this has to be the same player on both sides. My point is that Wealthfront is doing Continuous Deployment in an environment where even small changes can cause substantial harm, and they seem to be doing fine at it. Knight did not retest the Power Peg code after moving the cumulative quantity function to determine whether Power Peg would still function correctly if called.

The principal is exactly the same as sending a mass customer mailing to a holding pen for validation to ensure that your macros are correct, people aren't multi-sent, to do throttling. I don't think that's true, and places like Wealthfront and Etsy are good counterexamples. Rather than trying to pretend they will never make mistakes, they assume they will and work to be ready for it.

equities blew itself up in spectacular fashion and had to remove itself from the trading entirely.

However I found it disappointing when I actually tried to put the best practice into action. Had Knight implemented an automated deployment system – complete with configuration, deployment and test automation – the error that cause the Knightmare would have been avoided. When the market opened at 9:30 AM people quickly knew something was wrong.

Additionally their process (or lack thereof) was inherently prone to error. I'm also somewhat surprised that Knight's mark-to-market and realized losses during the day on Wednesday were only $ 200MM. I would have expected them to be larger. There are a bunch of tools, serving different communities that are used, but most of them operate in the context of the single server.

Basically, Power Peg would keep track of the child orders and stop them once the parent order was completed. So that's sort of like a meta-model factor. What about the people who lost their jobs because, oops, there was a bug? Guy overloaded a flag and in a future upgrade forgot that it was overloaded.

As Baruch's professor Donefer put it: I'll tell you right now that they tested it dozens of different ways. By using this site, you agree to the Terms of Use and Privacy Policy. Looks more like a product failure… Failure to understand what you are really doing. Using the #Cynefin framework provides a better characterization of this '#DevOps' failure.

They are a rapidly growing company but they aren't NASA building space shuttle software.Another answer relies on observation: there have in fact been several major newsworthy trading software crashes in the What could possibly go wrong? in the past there were rules so people dont send money to the wrong place in the stock exchange. Background Knight Capital Group is an American global financial services firm engaging in market making, electronic execution, and institutional sales and trading.

On August 1st 2012, Knight Capital went from having $365 million in available cash to $460 million in debt. if the deployment to all servers had worked, they would have been ok. We'll wait to see what happened with Goldman. After all, a configuration represents an executable specification.

What seemed to surprise folks in the industry most about the mishap is that Knight has cultivated a reputation as one of the best market-making firms in the business, with trading systems that are considered among the best. Automation is a tool, but it is only one tool and it still requires a craftsman to wield it appropriately. Similarly, relying on environment variables is extremely risky as well. Allspaw's post on this incident (http://www.kitchensoap.com/2013/10/29/counterfactuals-knight...) is much better. When they realized they had a problem, the first likely suspect would be the new market making software.

And it won't tell anyone about it, because that's not its function. You can't leave land mines lying around and then blame the poor guy who steps on one. If you find yourself afraid to pull old code out, you've got probably got a problem.