Not to mention deployment scripts that check return codes. We can also only hope that references to "written test procedures" for the unused code refer to systematic tests, as opposed to a

It is worth following the story of Knight Capital to realize the need of Orchestration. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. On August 1, Knight relied primarily on its technology team to attempt to identify and address the SMARS problem in a live trading environment. Humans make mistakes. https://en.wikipedia.org/wiki/Knight_Capital_Group

What a shock. Unfortunately, the trading algorithm the program was using was a bit eccentric as well. Never re-purpose a variable. Where is the business rule that says "first do no harm".

There was no kill-switch (and no documented procedures for how to react) so they were left trying to diagnose the issue in a live trading environment where 8 million shares were being traded. That multiplied the problem until the eventual kill switch. Configuration is as much a part of your program as code is, and configuration changes should go through the same lifecycle - pull request, code review, release, deploy to staging

Although not as damaging as the May 2010 flash crash, the Knight glitch highlights structural problems that have contributed to the botched Facebook initial public offering and sapped investor confidence. Code with unused, and therefore untested, configuration possibilities is a disaster waiting to happen.

Since they were unable to determine what was causing the erroneous orders they reacted by uninstalling the new code from the servers it was deployed to correctly. Knight Capital Stock There’s always a spread between the two prices, with the "ask" being a few cents or more above the "bid". This will significantly defray much of Knight Capital’s losses for the day, but we don’t know if it’s enough to allow the firm to survive the blow. The rocket's inertial guidance system failed to convert a piece of data from a 64-bit format to a 16-bit format.

Reversing the errant trades cost almost half a billion dollars. In 2012 Knight was the largest trader in US equities with market share of around 17% on each the NYSE and NASDAQ.

I asked why he did that and he replied that he didn't want yet another flag in the system. Retrieved 15 October 2014. ^ Caroline Vatetkevitch, Chuck Mikolajczak (August 1, 2012). "Error by Knight Capital rips through stock market".

Offices in London, San Francisco and Sydney. They also set the buy/sell points well outside where the markets were currently trading to ensure that nothing would actually execute.

This resulted in a high speed stock trading company that did $21 billion in daily trades to go bankrupt in 45 minutes. In October 2012, the SEC convened technology and trading experts in Washington, D.C., to discuss best practices and systems used for generating and routing orders, matching trades, confirming transactions, and sending

On August 1, Knight did not have supervisory procedures concerning incident response.

That's foolish. So once they realize that something is wrong, they'd roll back the code, the bug still stays… Precious minutes have gone by. An old pal of mine who's following the story closely (and is also deep in both IT and trading) told me that the company set up the software to work with

On the other hand, I don't think that the Knight Capital IT guys would just fire the program up for testing and not put in solid parameters to ensure that it wouldn't execute trades. According to its website, the firm's market-making unit executed a daily average of $19.56 billion worth of equities in June, with a volume of 3.1 billion shares.

Activities[edit] Knight's largest business is market making in U.S. equities. As a former manager I know what it's like when a software professional wants to over engineer the hell out of something, and as a software engineer I'm also aware of the need for proper processes. The remainder of the document is definitely worth a read, but importantly recommends new human processes to avoid a similar tragedy. When computer bugs affect the financial markets -- something that's happening more and more often -- the losses can be tallied precisely.

None of the ops failures leading to the bug were related to humans, but rather, due to most likely horrible deployment scripts and woeful production monitoring. More specifically, Knight did not have supervisory procedures to guide its relevant personnel when significant issues developed.

That would have triggered enough clues early on to the engineers and ops folks. No hot-hot failover to a cluster with the previous version. Any time your deployment process relies on humans reading and following instructions you are exposing yourself to risk.

Initially, Knight Trading group had multiple offices located in the United States and in other cities around the world. What they should have done is kill Power Peg with fire, deploy, verify, and THEN deploy the new functionality. They may get at least a partial reprieve. Background: Knight Capital Group is an American global financial services firm engaging in market making, electronic execution, and institutional sales and trading.

