At some point in the past few years, all of us were sold on the idea that boxing people into roles was not an ideal situation. It was not so much that we wanted every one to be a polyglot but that we wanted people to care about more than just what they had worked on. For example a developer should care about:
1. Page performance, error reporting etc when developing relevant parts of the application.
2. How the application she/he developed is performing in production.
3. How the infrastructure is created and maintained that serves as production.
so on and so forth.
What I have noticed recently (possibly could have existed long since then) is that while the developers are leaning towards understanding the state of the world as it exists post deployment, we are perhaps not allowing sys admins an opportunity to contribute pre deployment. The extent of their involvement as it exists is during the inception phase, designing the infrastructure (Thank you AWS) and possibly informing the development team of setting up endpoints for monitoring. At no point are we allowing them the opportunity to write code.
The unidirectional flow emerges from a reasonable assumption. Knowledge of how an application is going to be deployed and used is very beneficial for structuring the application itself at times. And a developer has a better chance of quickly pinpointing source of any and all errors that emerge upon usage. All these perceptions, and more have correctly started the thought process that a developer _should_ care and know more about things like deployment, monitoring etc. But its a two way street.
One of the other aspects of the DevOPS movement has been the refusal to silo people. And while the above reasons point out why silo’ing developers is bad, siloing ops is still happening to some extent in my opinion. I look around and I can count on one hand the number of sys admins who are writing code that does not have anything to do with an AWS stack, or a monitoring plugin. We are somehow still preventing operations from pairing on straight and simple business logic development. This is a case for many companies. A simple answer to this is usually an explanation of the workload that requires pure operational expertise. That usually happens when:
1. Sys admins are still treated as a separate pool of knowledge which feature teams regularly pull from in order to get work done. They are not embedded into feature teams as a resource for purely that team.
2. A business is doing so much work that it is impossible for a sys admin to be focussed on expanding her/his development skills. So they must concentrate on operational exercises and actually require help from developers to do the small hanging fruit while they do the more important stuff.
There are other reasons but the above are the main ones that I keep hearing. Both of these are not reasons but excuses that can be worked at in a similar manner to how we convinced developers that caring about prod is good. But the lack of that thought process is whats troubling. We are not even willing to think about our day to day work in a different manner which would in the long term allow sys admins to contribute more towards development.
A question was asked recently when I was discussing this issue with some people as to the benefit of a sys admin pairing on development that is not ops focussed. I see many:
1. Continuously working on different areas of the application would give a sys admin an in-depth idea of what was important and what was not. This can feed into monitoring prioritisation.
2. Applications can be structured better to deliver things like faster page times, better DB structures and more efficient use of external dependencies based on the experience and knowledge of the sys admin.
And many more.
There is also the case that we are neglecting the people who might want to learn more about software development. Not all, but there are definitely those who do want to. And if the reason why they cannot or don’t want to participate is that their workload does not allow them to, then that is an issue much bigger than propagation of dev ops culture.
Hiring good sys admins is not hard. Hiring good engineers is hard. Hiring good engineers is hard and both sets of people, developers and system administrators fall into this category, IMHO, of engineers. If you simply wanted to hire anyone, thats not hard. Hiring someone good, well thats a different thing altogether.
I understand that there are obstacles to achieving the goal of helping sys admins contribute to writing non operational code but we should at least acknowledge that there is a gap and do something about it. Treat them as a any development resource just as we are starting to think of developers as possible operational resources.
Its time we made this stream bidirectional.
Some days ago Matt and I had to deploy a rails application. This was an application that both Matt and I had no idea existed about 2 days before we ended up deploying it so we were quite looking forward to it on a very rainy wednesday! (Why does it always rain when I am deploying???)
A day before Matt had tried to deploy the application to staging to see whether we could test it out before production got her hands on it, but the staging environment had not been puppetised for this particular app. When we realised that deployment to staging might not possible, we spoke to our resident friendly release manager Aaron and laid out several plans. The first action was for me to test the shi^H^H^H hell out of the app locally (with production data) as much as I could while Matt tried his best to puppetise staging.
Early wednesday morning, I managed to test the app completely and ensured that the changes some awesome developer *cough* had made worked. Matt had also finished his spike with staging but no goodness. So we both looked at each other and decided to deploy it, to production herself.
Four hours, a plate of sushi and a coffee later, the thing had been deployed and we had ensured that it worked in production and informed the relevant parties!
From this all, I have learnt a few things:
1. Even though we both knew nothing about the app, I ensured that Matt was involved in it from the very beginning, and it helped a great deal. Springing deployment surprises on OPS is not cool and helps no one, least of all you as a developer. JIRA and Zendesk are cool, but nothing, nothing will actually substitute going over and talking to your fellow OPS people.
2. Puppet. Puppet is cool. The fact that we could quickly spike things in staging and decide whether to deploy there or not was simply because puppet allowed us that level of automation. Invest of in such tools, whether they be puppet, chef or babushka
3. We did not panic and delay the release when we found out that we could not test it in staging. We sat with Aaron and figured out the risks of each scenario and associated rollbacks. We did as much testing as we could do before hand and then went ahead with the deployment. To quote my colleague Jon Eaves, “Fear is the mind Killer”.
Above all, it was great fun to work with Matt, and deploy something along the way :). This has made me realise that software development, and deployment, is amazingly rewarding when done with awesome people.
My thoughts on DevOPS on the REA Engineering Blog.
It should not be marketed as position to slot one person in, or even rotate people through. A software developer who writes the code should be as concerned about his/her code working in production as he/she is in development. Definitely operations are more experienced with deployment and maintenance of products in production like environments, but that does not mean a developer should not base his software decisions before being aware of implications of deploying it in production, or be unaware of the status of his code after it passes quality assurance. The thinking behind devops is the critical bit of software development.
On the second and final day of the YOW 2010 conference, there were a few management oriented sessions in the evening, a few technically descriptive ones in the morning and two very nice focussed ones in the middle. These were the DevOPS session presented by Michael .T. Nygard and one about taming large builds by Chris Mountford
Chris started talking about how they worked at Atlassian, particularly on JIRA and how they have problems testing and managing the tests. He showed us a few screen shots from many of the builds and frankly they were monstrously large, especially the one about JIRA, which between the time he had made his presentation and presented it, grew larger. He mentioned some very nice techniques that could be used to prevent and/or maintain monstrously large builds as they are very bad. Why ? Because builds are there to give you better feedback and quick. A build that takes 10 hours is of no use in an Agile world. Another thing, which might interest the business owners more, is that large builds have large turn around time and consequently more time needed by the developer to fix the build, which is less time she or he is spending on the work that the business wants done. The one thing I figured that Atlassian does that we don’t do are Canary Builds. But soon, I think we will. A very good session that was very informative, and slightly reassuring that we are not the only ones wrestling with huge monstrous builds.
It was refreshing to have another session which based on actual events and the conclusions and/or lessons that could be and were drawn from them. I have seen many speakers over the last two days conjure up scenarios to fit the example on the upcoming slide, but this was very nicely structured around real world events, with pictures as proof. The whole session was basically an operational and development postmortem of a few events and why those events demonstrate the invaluable aspect of no barriers between developers and operations.
Then Michael went on about how they diagnosed incidents along with developers after chaotic incidents that led to downtime. How they went step by step through any causes and how they actually constructed a KanBan wall without realising it. How process went out of the window and how that had both good and bad consequences. He also reflected on how that defined, to an extent, the intersection of the roles a system administrator and a software developer.
All in all, of all the talks in these two days, these two are the ones I have enjoyed the most. Practical, simple, and extremely well presented.
Today was fun. A lot of fun! I paired all day today with Jared Hunter from operations and we spent all day deploying about four applications written in three different languages to two different environments using three different deployment tools. And today’s attempts have made me realise a lot of things
OK fine, I am lying. It wasn’t a lot of things!!!
But this experience has made me realise a few things!
Operations are over worked. I knew this fact of course. I knew this in the same sense as I know getting shot or being stabbed hurts. There is a lot of difference between knowing and realising. We spent nearly two hours in total debugging failed deployment because certain assumptions were made in the applications which did not coincide across all environments. Sometimes the environments were at fault, sometimes the applications.
I have learnt that one cannot leave deployment towards the end of the iteration. It must be planned at the same time as when we kick our iteration off. When we discuss the work planned in the iteration, it is at that point, we must write our deployment notes, and keep adding to them as work keeps getting done. The deployment notes should be nothing less than a technical card and must be approached by the business in the exact same way as developer asking to remove some technical debt.
The problem with leaving deployment planning towards the end of the iteration is that people have to now remember work they did at the start. Do it at the same time as you hand over the card to QA and you are done with it; no more chasing up after the fact.
I have learnt that having consistent deployment procedures when you deploy applications written in three different languages is not possible. Layers and layers of decoration buffers must be applied between the deployment procedure and the applications for that to happen and God help you if you have to then debug something. Java is very different from Ruby and Perl. And even though the latter are both interpreted, the way they manage dependencies and interact with other system layers is different. At least the way we do it. Choose one platform, if you can, and stick with it, if you can. It makes deployment, debugging, rollbacks and dependencies much much easier.
I learnt Puppet is very cool. I managed to give access to a bunch of users across different environments, documented the change and kept the history of the change for posterity using puppet in a few simple steps. I will however have to shelve my idea of using Babushka and Puppet together. (I think I had had too much coffee at that time).
Last thing I learnt is that canadians have a very good sense of humour, no matter how much you needle them about maple syrup, or ice hockey or you know, basically being Canadian. They are a lot of fun to work with! Mad props to Jared and Kerri for giving me that belief. Come to think of it, even the tree we associate with Canada, the Maple tree and the tree associated with Kashmir, the Chinar tree are similar.