Anna Savarin - Accidental Bad Guys: Systemic Problems at Scale
Do you know what happens when you add hundreds of nodes to your monster Hadoop cluster? Neither did I, until I embarked on my DevOps journey. Whether related to the NameNode, an external provider, or an internal dependency, the issues that arise can be surprisingly disruptive.
Let's take a look at some (shameful) operational incidents, in a massively distributed environment, with one common theme: the attacker is not who you think. Local mirrors, network and application throttling, more pessimistic network configurations, avoiding cascading effects -- even paying attention! -- can prevent pissing off lots of people in your building and on the internet. Whether you are more Dev than Ops, and no matter the size of your operation, you will find that scaling up linearly can bring exponential trouble.
Find out why every character in this story got angry at some point, and what everyone learned in the end.