-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Short Description
Service codebases inherited from CSM often make extensive use of package var globals, where a package declares some var PublishedThing and functions both inside and outside the package access and modify it directly. This is an anti-pattern and makes it difficult to debug, refactor, and test service code.
We should refactor service code to use structs with methods, instantiate struct instances, and pass those instances to the subsystems that need them instead.
Definition of Done
Code that currently uses package vars instead uses a struct or simple type instance, which functions either either instantiate or receive as arguments.
For variables derived from the environment, only one package (usually main) pulls from the environment to create instances. This package passes those instances to all downstream package functions.
If there are cases where we wish to continue using a package var, we have a documented design review, expected usage, and appropriate safeguards.
Context
Example
What it looks like
The power-control database is the DSP field of a globals struct, and the service main() builds it from the environment before persisting it to the package var.
Downstream functions then retrieve the global singleton and invoke its methods
power-control uses similar globals for its SMD client, Vault client, Redfish client, etc.
How it breaks
BSS uses a kvstore global to store its etcd connection. We added a no-etcd mode to BSS, but some actions caused it to segfault. This segfault occurred because the checkState() function expected something to set kvstore to a value other than nil, but nothing does so in no-etcd mode. Because checkState() uses the kvstore package var, rather than receiving an argument, it's not obvious to a developer refactoring upstream functions (buildBootScript()) that they're not providing something that downstream code wants.
Philosophy
Why is using package vars bad?
Using package vars saves some up-front design work. You do not have to think about how you'll pass your structs from where they're populated to where they're used; they're simply always available to anything that wants to use them,
Past this quicker up front work, however, package var globals don't provide any benefits I know of. While figuring out how you'll ferry structs to downstream functions isn't necessarily trivial, developer UX within functions that use them is the same once you get there.
Using globals does have a lot of downsides, however:
- No chain of custody: any function with access may set (or fail to set) a package var. If another function tries to use it and finds an unexpected value (or absence of value), you cannot easily trace back to find why. When using argument var, that chain is clear. This makes it easier to identify the root cause of bugs.
- Poor readability: related to the above, an unfamiliar developer cannot understand the flow of data through the code at a glance using globals. This makes refactoring particularly difficult: the compiler is much less able to warn you when you've forgotten to provide important state.
- Concurrency is hard: you cannot handle concurrency with a simple "use separate instances" model where appropriate. The var is a singleton, and if two functions have to interact with it simultaneously, they'll either need to manage locks or pray that race conditions don't happen. We currently rely on the latter, bad option.
- Testing is hard: tests will often want to set up separate instances of structs to test different segments of functionality. To do this, test harnessses must be able to set up differently-configured instances. With a package var, tests must all use the same var, and will likely not be able to run parallel. For example, since power-control only has the one storage driver, we have to run separate CI jobs to test Postgres and etcd. Unit testing is particularly hard--mocking isn't really an option.
I am not alone in this. Others have covered the pitfalls of globals extensively:
- https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil
- https://old.reddit.com/r/golang/comments/15x4qla/am_i_wrong_about_making_everything_global/
- https://embeddedartistry.com/fieldatlas/the-problems-with-global-variables/
- https://dave.cheney.net/2020/02/23/the-zen-of-go#_avoid_package_level_state
How to fix it
Refactoring to remove globals is not trivial. You will need to design new structs to hold state information and change function signatures to pass them. These are breaking changes, and many globals are used extensively througout the codebase. As such, you likely won't want to have other ongoing work while you replace them, as attempting to merge after will be difficult unless the other changes are trivial.
I'd like to eventually provide an example PR to demonstrate what a replacement looks like in code. Pending that, at a high level, the refactoring process is:
- Create an issue in the service repo, link it to this one, and describe the scope you intend to tackle.
- Remove the package var definitions.
go vetshould now yield many complaints about undefined variable access. - If you aren't already familiar with how other code uses the variable, review it. You'll need a high-level understanding of which subsystems use which components to inform the next step.
- Sketch out structs that will contain the former globals and use methods to interact with them. You'll want to group related functions together to avoid always passing former globals as argument, but should take care to avoid bundling too much functionality together.
- Rewrite
main()andTestMain()to instantiate the new structs and initialize them with instances of the former globals.
Refactoring all globals (or at least all globals in a given package) at once will better inform your class struct design, but may be too much work or difficult for colleagues to review all at once. Develop a plan of attack in the review step and consider whether you want to break work into multiple PRs.
Exceptions
There are some cases where a package var may make sense. For example, the standard http package exposes a built-in instance of its Client struct (and friends), to allow consumers to use methods like Get() without instantiating their own client. You can, however, create your own client instance, and probably should for most complex use cases. Cases where we want to continue publishing a var will need to pass through design review, provide a way to instantiate private copies also, and provide for concurrency safety.
Pseudo-constants are another case for vars: Golang does not allow declaration of constant structs with mutable fields (such as a map). However, if you don't need to expose no-instance methods
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status