[The following is a comment I made on Lobsters [1] when asked about our development methods. I think it's good enough to save, and what better place to save it than this here blog. So here it is.]
First off, our stuff is a collection of components that work together. There are two front-end pieces (one for SS7 (Signaling System 7) traffic, one for SIP (Session Initiation Protocol) traffic) that then talk to the back-end (that implements the business logic). The back-end makes parallel DNS (Domain Name System) queries [1] [2] to get the required information, muck with the data according to the business logic, then return data to the front-ends to ultimately return the information back to the Oligarchic Cell Phone Companies. Since this process happens as a call is being placed we are on the Oligarchic Cell Phone Companies network, and we have some pretty short time constraints. And due to this, not only do we have some pretty severe SLA (Service Level Agreement)s, but any updates have to be approved 10 business days before deployment by said Oligarchic Cell Phone Companies. As a result, we might get four deployments per year [2] [3].
And the components are written in a combination of C89, C++98 [3] [4], C99, and Lua [4] [5].
So, now that you have some background, our development process. We do trunk based development (all work done on one branch, for the most part). We do **NOT** have continuous deployment (as noted above). When working, we developers (which never numbered more than three) would do local testing, either with the regression test, or another tool that allows us to target a particular data configuration (based off the regression test, which starts eight programs, five of which are just needed for the components being tested). Why not test just the business logic? Said logic is spread throughout the back-end process, intermixed with all the I/O (Input/Outout) it does (it needs data from multiple sources, queried at the same time).
Anyway, code is written, committed (main line), tested, fixed, committed (main line), repeat, until we feel it's good. And the “tested” part not only includes us developers, but also QA (Quality Assurance) at the same time. Once it's deemed working (using both regression testing and manual testing), we then officially pass it over to QA, who walks it down the line from the QA servers, staging servers and finally (once we get permission from the Oligarchic Cell Phone Companies) into production, where not only devops is involved, but QA and the developer who's code is being installed (at 2:00 am Eastern, Tuesday, Wednesday or Thursday, never Monday or Friday).
Due to the nature of what we are dealing with, testing at all is damn near impossible (or rather, hideously expensive, because getting actual cell phone traffic through the lab environment involves, well, being a phone company (which we aren't), very expensive and hard to get equipment, and a very expensive and hard to get laboratory setup (that will meet FCC (Federal Communications Commission) regulations, blah blah yada yada)) so we do the best we can. We can inject messages as if they were coming from cell phones, but it's still not a real cell phone, so there is testing done during deployment into production.
It's been a 10 year process, and it has gotten better until this past December.
Now it's all Agile, scrum, stories, milestones, sprints, and unit testing über alles! As I told my new manager, why bother with a two week sprint when the Oligarchic Cell Phone Companies have a two year sprint? It's not like we ever did continuous deployment. Could more testing be done automatically? I'm sure, but there are aspects that are very difficult to test automatically [5] [6]. Also, more branch development. I wouldn't mind so much this, except we're using SVN (Subversion) (for reasons that are mostly historical at this point) and branching is … um … not as easy as in git. [6] [7] And the new developer sent me diffs to ensure his work passes the tests. When I asked him why didn't he check the new code in, he said he was told by the new manager not to, as it could “break the build.” But we've broken the build before this—all we do is just fix code and check it in [8] [8]. But no, no “breaking the build,” even though we don't do continuous integration, nor continuous deployment, and what deployment process we do have locks the build number from Jenkins of what does get pushed (or considered “gold”).
Is there any upside to the new regime? Well, I have rewritten the regression test (for the third time now) to include such features as “delay this response” and “did we not send a notification to this process.” I should note that is is code for us, not for our customer, which, need I remind people, is the Oligarchic Cell Phone Companies. If anyone is interested, I have spent June [9] and July [10] blogging about this (among other things).
[1] https://lobste.rs/s/uqe2ww/kubernetes_maximalism#c_p1o0c9
[2] https://boston.conman.org/
[9] http://boston.conman.org/2021/06
[10] http://boston.conman.org/2021/07
[11] https://lobste.rs/s/uqe2ww/kubernetes_maximalism#c_9uaopq