cross-posted from: https://programming.dev/post/6513133
Short explanation of the title: imagine you have a legacy mudball codebase in which most service methods are usually querying the database (through EF), modifying some data and then saving it in at the end of the method.
This code is hard to debug, impossible to write unit tests for and generally performs badly because developers often make unoptimized or redundant db hits in these methods.
What I’ve started doing is to often make all the data loads before the method call, put it in a generic cache class (it’s mostly dictionaries internally), and then use that as a parameter or a member variable for the method - everything in the method then gets or saves the data to that cache, its not allowed to do db hits on its own anymore.
I can now also unit test this code as long as I manually fill the cache with test data beforehand. I just need to make sure that i actually preload everything in advance (which is not always possible) so I have it ready when I need it in the method.
Is this good practice? Is there a name for it, whether it’s a pattern or an anti-pattern? I’m tempted to say that this is just a janky repository pattern but it seems different since it’s more about how you time and cache data loads for that method individually, rather than overall implementation of data access across the app.
In either case, I’d like to learn either how to improve it, or how to replace it.
It sounds like you’re more-or-less describing memoization? A method caches its input and output, often in some sort of dictionary, and utilizes that cache to return if it receives the same input multiple times.
I’m not caching or reusing method results however, and even the inputs are not necessarily cached for multiple uses. I’m just preparing all potentially required input data before the method is actually called so I don’t have to do any loads within the method itself, so the method is just pure code logic and no db interaction.
For example, imagine you have a method that scores the performance of an athlete. The common “pattern” in this legacy code base is to just go through the logic and make a database load whenever you need something, so maybe at the beginning you load the athlete, then you load his tournament records, then few dozen lines later you load his medical records, then his amateur league matches, etc.
What I do is I just load all of this into a cache before the actual method call, and then send it into the method as a data source. The method will only use the cache and do all the calculations in-memory, and when it’s done the result would be in the cache as well. Then outside of the method I can just trigger a save or abandon it to persist the result. If I want to unit test it, I can easily just manually fill a cache with my data and use it as the data source (usually you’d have to mock custom response from the repository or something like that, inject an in-memory repository with the same data anyway or just resign to using an integrated test).
It’s like I’m “containerizing” the method in a way? It’s a pretty simple concept but I’m having trouble googling for it since I don’t know how to call it.
What language are you using? Is a good idea to limit db calls, but maybe we can help with specific techniques idiomatic to your language
Ah sorry, forgot to mention it here because I originally posted it on csharp and then crossposted. I’m specifically thinking about c#, EF and .net core for web dev.
I don’t know .net and sometimes quite some janky code, but I think in this case I would preload everything I definitely needed, locking the records I’m modifying. Then use ConcurrentDictionary.GetOrAdd(Tkey,Func<…>) to load values I might need only when they’re needed.
I work with a code base that is perhaps going through a similar transition. Performance hasn’t really been a consideration and so when new functionality is tacked on we’re frequently making a new API call even though we might already have the data somewhere else.
I don’t have a name for the pattern or anti-pattern, but people’s responses seem to indicate that it’s largely a good thing, the change that you’re making. I’m reminded of a Martin Fowler-esque or TDD idea that a function should either retrieve data or process it. However I wasn’t able to find a blog post about that with a quick search.
If you find or run into that article later please share it, I’d definitely like to read it!
Sounds like Command-Query Separation (CQS)
It states that every method should either be a command that performs an action, or a query that returns data to the caller, but not both.
Sounds like the command pattern to me.
Regardless of what pattern it is, you have a clear performance need and a testable implementation. That’s a win.
Beyond looking for a pattern, I’d look at what your doing to make sure you’re not loading a ton of extra dependencies of your know you won’t use them.
Also, you generally want a database transacting to be one logical unit of work, that all commits or all rolls back together, if you’re combining multiple transactions is likely what you want, but be aware that you might be holding locks for longer, so you might be introducing contention.
By the same token, make sure you’ve got records locked if you need them locked. If you had atomic updates before, or your first update locked the records you needed, you may need to lock records explicitly to keep your database consistent.