[OP] The decay of reliable infrastructure.
I started writing this six months ago as a terrible opinion piece surrounding cloud computing in general. In it, I shredded many cloud conceptions regarding scalability/cost and highlighted the fact that not all needs fit in cloud sized containers. (I’m sure there is a docker joke in there somewhere).
However, my problems ultimately boiled down into two categories;
A) The people who force the statement: “[we] need to cloud!”. (usually directors being courted by the amazon sales team)
B) Cloud providers reliability. (or, their lack of it, and pushing of issues up the stack)
The former is a company problem, and isn’t an issue with the concept of the cloud at all, it’s more of a
nail->hammer issue. However! The latter is certainly where most of my gripes lie.
I was #triggered when I read an internal memo from our Director of Infrastructure at Ubisoft- and while his post and name will remain out of this article for job security and personal accountability reasons- nonetheless I will give you the gist of the article.
“Hello team, as you know we had a pretty big outage a few weeks ago.”
“As a service provider we give you 3 availability zones, don’t expect them to be up”
“Resiliancy is an application problem.”
“Don’t blame us unless all of our availability zones are down”.
Even if you’re new to the server administration industry; the above should look familiar to you. It is basically Amazons policy when it comes to managed hosting.
This is a deep concern of mine because historically if a database server went offline it was the Operation Engineers job to ensure that everything came back properly and that no data was lost during the transition, now they have no access to the infrastructure since it’s all consolidated and the people who are in charge of recovery do not care about individual bits.
If you’ve ever designed highly consistent systems like banking systems or even something a little less consistent like fucking profile saves you understand that from a developer perspective it’s already very difficult to attain consistency without corruption or loss of committed transaction. And I’m underselling that fact.
The point is this; leaky abstractions aside, “the cloud” has successfully made Sysadmins at large not try for consistency or availability. The onus is now on Developers and so far that hasn’t gone down well*.
I’m writing this at 2am in the morning because to be perfectly honest this kind of thing pisses me off, operations staff complain about devops taking their jobs and yet they do nothing to ensure that their role is actually completed properly.
But that’s just my opinion as a person trained by greybeards.
* - and that’s just this year so far.