đŸ Archived View for dioskouroi.xyz âș thread âș 29367687 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
Ugh, no. Iâve worked in a codebase where CI would reject changes that had too much âcode complexityâ. Youâd constantly have to find clever ways to split up your code, when doing so did not make sense, to appease the complexity checker. Oh yeah, and if you ever make a one-liner change, you might end up
being forced to do a full refactor because that one line pushed the complexity threshold over the edge. The results: PITA for developers and worse code. What a crock of shit.
I was removing unused functionality, work became held up as the code complexity was too high (never mind I'd just reduced it). I don't know what tool they were using, and they didn't share the output. I asked them to document what they found, got a mouthful on this not being their responsibility.
Sounds like a group that didn't understand the tool and just put it in place. Metrics and tools make good servants but poor managers.
Sounds like a place to leave as soon as possible.
Unless the tool just meassured for amount of change and flagged it for review, which might make sense, as you can also mess up by removing things, you _think_ are unused.
Just run.
I'm sorry about your experience, but from what it seems, your problem was in your team. Any measure can be turned into a bad policy, code complexity is a red herring here.
Couldnt yout just raise the threshold by tweaking the config with your PR instead?
I do this all the time with other automated checkers (linters, etc.). I don't see why this should be different. If another human agrees it shouldnt be a problem.
Because if you change that config file, people responsible for it will be added to review, and will beat you with a stick for touching it without consulting change with them?
That sounds like a people problem that will manifest in all sorts of terrible ways and not really an indictment of code complexity metrics.
Cyclomatic complexity (which I think is the most common one used) also doesn't really map to what people understand as complex. A giant switch statement with one return per case isn't nearly as complicated as two large nested if/else blocks. Nesting almost inevitably means more contexts you need to keep track of when reading code, particularly nester conditionals.
Use the tool, don't become its slave.
I always had good experiences with it. I found abstractions that really did make sense, instead of treating it like a chore.
Did you also have the same feedback from your IDE?
In other words would it have been less painful if you didn't have to suffer the long iteration times required to get feedback from some remote CI job?
The thing is that cyclomatic complexity for example (most popular complexity measure that linters use), it doesnât make sense. Most of the time high cyclomatic complexity of a method is indicative of high business logic complexity⊠which is fine.
And dogmatically saying that methods shouldnât have that many lines and branches and that you should just come up with better abstractions doesnât help anyone, whereas having closely related functionality concentrated in a single place, rather than synthetically exploded into N different files, well this does help.
Iâve found most CI checks like this are a crock. Forgot empty parentheses for that function definition with an arity of 0? Bzzzzzzzt! Sorry! Build failed.
Itâs just foolish.
The only reason a CI should ever fail is if it catches a defect from making it into production.
> The only reason a CI should ever fail is if it catches a defect from making it into production.
Mechanical checks for things like formatting rules, linting errors, and reasonable tools to verify code complexity, as long as they don't produce false positives, are all important to run as part of CI.
This talks about code complexity a lot. This, however, is not the chief source of complexity in many code bases. The number of tools needed to build things is the bane of the modern software developer. Also, the use of microservices where none are needed results in and enormous increase in complexity. Code complexity is relatively easy to fix compared to all of this.
I have seen a team be forced to use microservices for things still running on the same machine years later, with absolutely no benefit other than the powerpoint architecture slides looking fancier.
> I have seen a team be forced to use microservices for things still running on the same machine years later, with absolutely no benefit other than the powerpoint architecture slides looking fancier.
The main driving force for microservices is not technical but organizational. Therefore, there are plenty of non-technical issues, such as for example budget to grow and split a team, that make or break the adoption of this style of architecture.
Maybe. These particular ones were never properly decoupled.
That may have well got them extra sales, which gave the team pay rises....
Yup. Long gone are the days of concise packaging in single executables.
Quite literally, software engineers today spend most of their time fighting dependencies and poorly built delivery machines.
second this. toolchain complexity is way more of a PITA than code complexity for me these days.
Toolchain simplicity is a key reason for why I like Go. One binary to do it all.
Exactly. Every dependency is a liability.
Indeed; I tried to compose yet another list of factors contributing to (or approaches to estimating) software project complexity in the past, with code complexity being just one out of 18, and not seeing it as particularly outstanding. Perhaps just a bad title.
The real answers about complexity come from thinking about why we even care. It's because our feeble minds have to build internal models of the code so we can work with it. The cognitive aspects of building those models is why complexity matters.
What things make it more difficult to build those models? A partial list, mostly as others have mentioned:
- tool and library dependencies
- nested conditions
- loops and especially nested loops
- asynchronous processing, callbacks, etc
- non-descriptively named variables and functions
- using non-standard code patterns for standard functionality
- delocalized code, as in, you have to navigate somewhere else to see it (throws off your working memory)
By one study, developers using Eclipse for Java spent 27% of their time just doing code navigation.
The starting point for code complexity is about how our minds work.
As many people have said, "it's easier to write a program than read it."
Not just build internal models of the code, but also an internal model of the execution of the code. For example, a mental model of the scoping rules. As a developer you have to read a function and build a model of _when_ variables are assigned values - you have to execute the code in your head to understand what the state will be at a particular point in time. I think a big part of complexity is having to imagine what the current in-flight state is - the larger the current in-flight state, the harder to reason about how the code will interact with it.
I would argue complexity is also the wellspring of corner cases. The more corner cases, the more mercurial- and that isnât just due to the limitations of our minds.
Some blog linked from here (maybe jvns.ca?) made the case that depth of your project's software dependency tree is an important metric. The more crap you have to pull in, the more things can go wrong. You're better off with a large program with no dependencies, than a somewhat smaller program with a ton of dependencies.
Language features on the other hand can let you develop complex programs quickly and reliably, by catching errors before the code gets deployed and so on.
What about the depth of dependencies of any part of the standard libraries for a given language?
I think that doesn't count, as long as the install is in one piece. That's how Python got its popularity. Its "batteries included" approach meant you got lots of stuff in the stdlib instead of having to chase it all over the interweb. Unfortunately they seemed to have abandoned that approach in more recent times.
Python has a lot of batteries included, but nearly everyone finds it necessary to bring in ârequests.â And then once you have one pip package, whatâs a few more?
I use urllib instead of requests just to get rid of a dependency. It does all the same stuff with slightly uglier calls. More to the point, Python Central seems to now favor shovelling off stuff to external dependencies.
Isn't the standard library definitionally depth=1?
One thing I hate is having like 20 different Python repos for one small company. At most places I've worked at, you have basically one thing you do business-wise, but it's split into what I believe are arbitrary repo delineations. This causes trouble with dependency resolution in your build systems and IDEs and increases cognitive load and merge complexity for changes across repos.
Just put the whole folder structure into one repo! This would reduce build complexity, and you can set up the configs for your tools and CI one time rather than 20. You can still have several services out of one repo, if you want to, but it's easier to reason about and easier to change those service delineations later, where in 20 repos you're having to "clone and cut code" to separate things. In one repo, you just move code around as you split or merge services.
I routinely create a "super repo" for myself at these companies using submodules, so that I can actually work with the code more easily, but that still requires me to check in maybe 5 or more PRs for one feature, so it's not ideal. This only solves the developer's problems with local tools and still requires more complex debugging since the services are not actually in one repo under one config for deployment.
I always liked the notions of coupling and cohesion because they are simple to understand and you can see at the glance of an eye if a particular bit of code would have good metrics for those, without actually bothering with the metrics. E.g. long list of parameters or imports == high coupling, large number of fuctions in a module == low cohesion. Specifying exactly how much isn't that useful. It's easier to think in terms of "relative to the rest of the code". Debating what is too high or too low even less so. But if you are struggling with working with a particular bit of code, being able to identify why it is hard to deal with is useful; especially if you know how to fix it.
But mostly metrics should not be telling you things you can't already know just looking at the code; if it looks complicated, it probably is. Metrics only become useful when you need to tell without looking. Sometimes that's useful.
There were attempts to predict bugs by looking at complexity metrics. As I recall, the research found that when you adjusted for code size, none of the metrics mattered. In other words, just use LOCs as your metric.
This is correct.
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=code...
AFAIK the one unambiguously relevant metric for how many bugs you will find in a codebase is how many its "clients" define as acceptable.
Any kind of internal code quality affects the productivity of the debugging procedure, not the final number of bugs.
The last section on coupling reminded me of the concept of connascence[1] which I've found really helpful when talking about code.
[1]
Thanks for this link. TIL. Also, the videos linked from the site were useful
My first job as a dev was for a consultancy and I had to review a code base for a client to support their argument that they require a rewrite. I had no idea about any of this so googled metrics, found cyclomatic complexity, and wrote a load of bullshit about the results of analyzing the code base showing it was complex. It served its purpose - they got corporate to accept a rewrite - but Iâve never used those metrics again.
Number of lines of code is a good metric for complexity. The hard part is estimating how many lines of code is reasonable for a specific feature. Some complexity is necessary. The problem is unnecessary complexity.
> Number of lines of code is a good metric for complexity.
I don't agree with that premise. LOC are a metric for code size, not for complexity.
I've found that in practice the number of statements is a more reliable indicator for code size than lines of code. (For typical imperative languages anyways.)
That's not a premise -- it's a data-driven conclusion. Lines of code correlates strongly with all other complexity metrics, and is way easier to compute and explain to someone else.
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=code...
Lines of code is a very good approximation so long as each line is compiled into a similar number of bits across different systems and languages.
The total size of the compiled source code in bits is literally the entropy of the system so it's the definition of complexity.
I think you applying that definition in your way to the issue of source code complexity is outlandish.
Why is it outlandish? You're confusing the reliability of using source lines of code as a metric for measuring the productivity of developers with measuring the complexity of a system. It's a bad metric for measuring productivity but a good metric for measuring complexity.
The problem with using lines of code as a metric for developer productivity is precisely that it leads to developers introducing unnecessary complexity into the system since they try to add as many lines of code as possible for implementing any feature.
There is no drawback in keeping the source lines of code to the minimum amount necessary to get the job done.
It may be outlandish because minimizing the number of bits, when taken to the extreme as in Algorithmic Information Theory, leads to very obfuscated code [1].
[1]
https://www.ioccc.org/2012/tromp/hint.html
I find this to be true even in cases when you have large projects. Sometimes the scale of the project will necessitate adding 'extra complexity' to a specific set of core modules but this is only in order to reduce the complexity of other peripheral modules. IMO, you should never add extra lines of code to a module beyond what is immediately necessary unless it allows you to reduce even more lines of code in other parts of the code.
I've never proposed using LOC as a measure of productivity. I won't entertain your dishonesty any further.
Well in that case I really don't understand your reasoning. I can't think of any other reason why you cannot see that lines of code is the closest and most measurable representation of the information content of the system's logic. Information content/entropy is the most rigorous way to measure complexity.
IMO, lines of code is even better at measuring complexity than compiled bytecode because it accounts for complexity from the developer's point of view (which is what the question is asking).
While some lines of code require more effort from a typical developer to understand than other lines, it doesn't matter so much once they're averaged out over thousands of lines and thousands of different developers (each with their own slightly different perception of complexity). It's reasonable to factor out individual perception of complexity.
I propose a challenge, similar to the Obfuscated C challenge, to devise code that is impenetrable to the human mind and yet fantastically clean to all metrics.
There are a lot of things you can do to lower these sort of metrics without addressing actual complexity, sweeping the problem under the rug. Perhaps a better metric for software complexity would be the amount of work the computer has to do, or the number of instructions it has to execute.
Isn't this the true for all metrics? They are useful for pointing out problematic areas, but as soon as you start optimizing only based on the metric things will get worse.
Number of instructions is probably not a good metric. Without any loops/jumps you might have a lot of instructions, but a very low complexity.
An endless loop executes a lot of instructions, but does not have to be complex.
Complexity isn't a single thing though. The problem can be complex, a specific implementation can be complex, the codebase can be complex.
I recall implementing some linear algebra numerical code. The problem was a bit complex, resulting in a bit of code complexity.
However I realized I had some extra information I hadn't used, and I spent half a day going over the math again. After a couple of pages of derivations I could narrow down the result to a couple of dot products.
So, I ended up with a commit where I had 100 or so lines of comments including equations to justify my two lines of code.
The implementation became super-simple, but why it worked was suddenly not so simple. I had effectively moved complexity from code-space to problem-space.
The problem is not computer processing speed. The problem is human cognitive load.
"Perhaps a better metric for software complexity would be the amount of work the computer has to do"
By that definition a loop doing 1 billion times a simple calculation would count as very complex, even though it is very easy to understand.
LOC would be better, even though code can be dense and complivated, or very verbose and simple.
> Perhaps a better metric for software complexity would be the amount of work the computer has to do, or the number of instructions it has to execute.
Complexity is a measure of human difficulty in comprehension, not mechanical difficulty in execution...
Our work, as developers, pushes us to take many decisions, from the architectural design to the code implementation. How do we make these decisions? Most of the time, we follow what âfeel rightâ, that is, we rely on our intuition.
So no engineering best practices. That explains very good the quality of SW.
The Einstein quote is that "something should be a simple as needed but no simpler", but how to be pin down "required" in an objective / quantifiable way?
Somehow it is more important to measure "gratuitous" complexity, redundant complexity that is not justified by present or plausible future requirements...
The problem is that the code itself does not capture requirements, so code analysis can give you absolute indicators but never an "efficiency" measure (how efficient and justified the measured complexity)
Isn't the general goal rather to avoid/eliminate overly complex cruft (that every programmer should be able to recognize) than to find the maximally simple solution to a problem?
that seems like a more tractable goal if confined to local analysis (something like lines of code in a function or file unit) but people generally try to come up also with overall complexity that seems harder
The number one metric: Consistency. If an app is similar to itself in all places, it's very easy to understand. Better yet, if it's similar and consistent with how other things have been built, we can call it "clean code".
Back in the day we called these things architecture, but I'm old and salty.
What if the code is consistently crappy and complex all around. Would that make it simple? Trick question, old man. It would not. I am not exactly young but damn I am sweet
You're not wrong. "Consistency" is a poorly defined principle and basically used to justify "what I'm doing is good (consistent). What you're doing is bad (inconsistent)"
And it was called Uniformity back in the day, not "architecture".
A better name for "consistency" is "following project conventions" which IS well defined.
I agree that consistency is good in general. I don't think "project conventions" is well defined.
In most places project conventions are either not actually defined anywhere or if they are defined in writing, they're usually either very very old and outdated vs. the actual conventions that everyone is currently using or it's just one guy updating the text and hitting everyone else over the head with the document to push his opinion through.
I personally like to be 'locally consistent'. I don't care how old and crusty the code base is. If the file that I have to change or add to calls everything a "giraffe", I will call my stuff "giraffe" as well, even if it really is a "gorilla". If I start calling it a gorilla, nobody will understand that the gorilla is the same as the giraffe if they don't have the same background knowledge I have. Unless I do a refactoring and I am changing the giraffes to gorillas. Which might either be a first PR to "clean up" or a follow up PR.
Unfortunately I see so many people not doing that and it wreaks havoc with the code base. Especially if we're now outside of the place that defines the giraffes and gorillas. It's really hard for the caller to figure out that they're one and the same thing.
>In most places project conventions are either not actually defined anywhere or if they are defined in writing, they're usually either very very old and outdated vs. the actual conventions that everyone is currently using or it's just one guy updating the text and hitting everyone else over the head with the document to push his opinion through.
So either admit your project doesn't have conventions, in which case, don't nit people who don't follow whatever convention exists in your head but isn't documented, or document the project conventions. You cannot have your cake and eat it too.
That "one guy updating the text" is at least explicitly documenting expectations.
Per your "local consistency" comment, if "giraffe" is called "gorilla" in every other file (aka "global consistency"), it sounds like you're just setting up more work for some developer to take care of. Consider leaving a comment and start using the globally consistent name, rather than propagating more inconsistency (technical debt). Perhaps more aptly put, what you call "local consistency" sounds like "global inconsistency" to me.
Oh I absolutely agree with that statement. If you have someone in your place that is actually just documenting what everyone is doing and what everyone is doing is consistent, that's awesome! More power to them.
At my current place we do a lot through linters and automatic code formatting for example and we collectively agree on when and how to change the configs for that. Eliminates a whole class of "arguments" (X spaces vs. tabs anyone?) and it's relatively easy to "convert" new hires to it as well. They can either adapt to it, make a really good argument for changing the configs via a widely circulated PR or they aren't a cultural fit to us.
My point is that this guy usually isn't. In the vast majority of places I've been or seen it's the other version of him. I.e. the equivalent of the guy at the regulars table that edits Wikipedia to prove his point in a discussion.
The number of confused comments from PMs when engineering timelines are provided for âeasy tasks.â Why does moving an ad above the fold require 3 weeks? Because it implicates 6 different teams.
I didn't see anything on Function Point counts in the article. No advocating it _per se_ but it was, at one time, considered one of the more useful ways to evaluate codebase complexity.
FP analysis is a measure for the "amount of functionality" not complexity. Maybe in relation to, e.g., the number of statements (statements/FP) it could make sense.
Number of function points is, given implementation language, strongly correlated with lines of code. Lines of code is, in turn, strongly correlated with all other complexity measures.
In other words, function point count is a measure of complexity.
What if I told you that most software complexity doesnât come from the code but from the software requirements?
Verbosity.
Number of unnecessary abstractions.
No mention of function points.
I agree. Certains functions are way more complicated and hard to show value for than others, which is really hard to comprehend for some managers.
Functionalities like undo/redo take a lot of planning, coordination and integration efforts than others like a simple export function, but good luck selling that to any marketing or product owner for xx manndays.
I still think that this subject is way too specific to be generalized like this, but general thumb of rules still apply like good estimation and technical planning.
Function points measure the complexity of the problem you are solving, not of the software you are writing.
tldr, blood pressure
If it looks ugly, or you have to navigate too much (long files or a lot of external dependencies/dependency chains)... it's complex.
That's all you need to know really
It is a simple, systematic, math-based method that `The Math-based Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with Principle-based Warehouse/Workshop Model`, it makes development a simple task of serial and parallel functional pipelined "CRUD".
- Its mathematical prototype is the simple, classic, vivid, and widely used in social production practice, elementary school mathematics "water input/output of the pool".
- The code must meet the following three basic quality requirements before you can talk about other things. These simple and reliable evaluation criteria are enough to eliminate most unqualified codes.
- Function evaluation: Just look at the shape of the code (pipeline structure weight), and whether the function is a pure function. - Functional pipelined dataflow evaluation: A data flow has at most two functions with side effects and only at the beginning and the end. - System evaluation: Just look at the circuit diagram, you can treat the function as a black box like an electronic component. - Code Quality Visualization: - For Lisp languages, S expression is contour graph, can be very simple transformation into contour map, or 3D mountain map. - If the height of the mountains is not high, and the altitude value is similar, it means that the quality of the code is good. - For non-Lisp languages, you can convert the source code into an abstract syntax tree (AST), and then into a contour map, or a 3D mountain map.
Simplicity, Unity, order, symmetry and definiteness. ---- Lin Pengcheng, Programming aesthetics The chief forms of beauty are order and symmetry and definiteness, which the mathematical sciences demonstrate in a special degree. ---- Aristotle, "Metaphysica"
My programming aesthetic standards are derived from the basic principles of science. Newton, Einstein, Heisenberg, Aristotle and other major scientists hold this view.
The aesthetics of non-art subjects are often complicated and mysterious, making it difficult to understand and learn.
The pure function pipeline data flow provides a simple, clear, scientific and operable demonstration.
Simplicity and Unity are the two guiding principles of scientific research and industrial production.
- Unification of theories is the long-standing goal of the natural sciences; and modern physics offers a spectacular paradigm of its achievement. It can be found from the knowledge of various disciplines: the more universally applicable a unified theory, the simpler it is, and the more basic it is, the greater it is.
- The more simple and unified things, the more suitable for large-scale industrial production.
- Only simple can unity, only unity can be truly simple.
In the IT field, only two systems fully comply with these 5 programming aesthetics:
- Binary system
The biggest advantage is that it makes the calculations reach the ultimate simplicity and unity, so digital logic circuits are produced, and then the large-scale industrial production methods of computer hardware are produced.
- The Math-based Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with Principle-based Warehouse/Workshop Model
- Software and hardware are factories that manufacture data, so they have the same "warehouse/workshop model" and management methods as the manufacturing industry.
- From the perspective of system architecture, it is a warehouse/workshop model fractal system. It abstracts every system architecture into a warehouse/workshop model .
- From the perspective of component, it is a pure function pipeline fractal system. It abstracts everything into a pipeline.
- It adheres strictly to 10 principles and 5 aesthetics, and it consists of 5 basic components.
- It uses the "operational research" method to schedule the workshop to complete tasks in optimal order and maximum efficiency.
The Math-based Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with Principle-based Warehouse/Workshop Model
https://github.com/linpengcheng/PurefunctionPipelineDataflow