• 0 Posts
  • 84 Comments
Joined 2 years ago
cake
Cake day: June 21st, 2023

help-circle
  • The distribution is super important here too. Hashing any value to zero (or h(x) = 0) is valid, but a terrible distribution. The challenge is getting real-world values hashed in a mostly uniform distribution to avoid collisions where possible.

    Still, the contents of the article are useful even outside of hashing. It should just disclaim that the width of the output isn’t the only thing important in a hash function.




  • Ah yes, one of the major questions of software development: to comment, or not to comment? This is almost as big of a question as tabs vs spaces at this point.

    Personally? I don’t really care. Make the code readable to whoever needs to be able to read it. If you’re working on a team, set the standard with your team. No answer is the universally correct one, nor is any answer going to be eternally the correct one.

    Regardless of whether code comments should or shouldn’t exist, I’m of the opinion that doc comments should exist for functions at the very minimum. Describe preconditions, postconditions, the expected parameters (and types if needed), etc. I hate seeing undocumented **kwargs in a function, and I’ll almost always block a PR on my team if I see one where the valid arguments there are not blatantly obvious from context.



  • It’s less of an issue now, but there were stability issues in the early days of DDR5. Memory instability can lead to a number of issues including being unable to boot the PC (failing to post), the PC crashing suddenly during use, applications crashing or behaving strangely, etc. Usually it’s a sign of memory going bad, but for DDR5 since it’s still relatively young it can also be a sign that the memory is just too fast.

    Always check and verify that the RAM manufacturer has validated their RAM against your CPU.


  • Air cooling is sufficient to cool most consumer processors these days. Make sure to get a good cooler though. I remember Thermalright’s Peerless Assassin being well reviewed, but there may be even better (reasonably priced) options these days.

    If you don’t care about price, Noctua’s air coolers are overkill but expensive, or an AIO could be an option too.

    AIOs have the benefit of moving heat directly to your fans via fluid instead of heating up the case interior, but that usually doesn’t matter that much, especially outside of intense gaming.


  • Very few things need 64GB memory to compile, but some do. If you think you’ll be compiling web browsers or clang or something, then 64GB would be the right call.

    Also, higher speeds of DDR5 can be unstable at higher capacities. If you’re going with 64GB or more of DDR5, I’d stick to speeds around 6000 (or less) and not focus too much on overclocking it. If you get a kit of 2x32GB (which you should rather than getting the sticks independently), then you’ll be fine. You won’t benefit as much from RAM speed anyway as opposed to capacity.





  • Where do you draw the line on “smart” features? Tab should not add indent spaces? Encoding or newline mechanisms? Determining EOF newline?

    For a very basic default editor, I would expect it to include only what I typed, no “smart” features, no IDE features, nothing else, and use CRLF (on Windows) for newlines with at most a setting to configure it in the editor for that session.

    Basically, I wouldn’t expect anything more than what nano does. If I want a fancy CLI editor, I’ll install one. At its core though, it should exist only to edit the text content of a text file and do nothing else. It should be as stable as possible, and have as little scope as possible, in my opinion.

    With that said, basic text editing features, like undo/redo and cut/copy/paste would be nice. Bonus points if it even works with the system clipboard.

    Edit: to add to the question of whether an automatic newline should be added, Windows has no requirement for terminating text documents with newlines, so I would not expect one. What happens in POSIX environments by tools written for those environments seems irrelevant here - if a valid text document in POSIX must be terminated by a newline, then a text editor there would naturally be expected to add one, or at least support adding one, but that has nothing to do with Windows.


  • The only part of this process I’d consider automating with a LLM is summarizing the changes, and even then I’d only be interested looking at a suggested changelog, not something fully automated.

    It’s amazing to me how far people will go to avoid writing a simple script. Thankfully determinism isn’t a requirement for a release pipeline. You wouldn’t want all of your releases to go smoothly. That would be no fun.




  • But how can we then ensure that I am not adding/processing products which are already in the “final” table, when I have no knowledge about ALL the products which are in this final table?

    Without knowledge about your schema, I don’t know enough to answer this. However, the database doesn’t need to scan all rows in a table to check if a value exists if you can build an index on the relevant columns. If your products have some unique ID (or tuple of columns), then you can usually build an index on those values, which means the DB builds what is basically a lookup table for those indexed columns.

    Without going into too much detail, you can think of an index as a way for a DB to make a “contains” (or “retrieve”) operation drop from O(n) (check all rows) to some much faster speed like O(log n) for example. The tradeoff is that you need more space for the index now.

    This comes with an added benefit that uniqueness constraints can be easily enforced on indexed columns if needed. And yes, your PK is indexed by default.

    Read more about index in Postgres’s docs. It actually has pretty readable documentation from my experience. Or read a book on indexes, or a video, etc. The concept is universal.

    May you elaborate what you mean with read replicas? Storage in memory?

    This highly depends on your needs. I’ll link PG’s docs on replication though.

    If you’re migrating right now, I wouldn’t think about this too much. Replicas basically are duplicates of your database hosted on different servers (ideally in different warehouses, or even different regions if possible). Replicas work together to stay in sync, but depending on the kind of replica and the kind of query, any replica may be able to handle an incoming query (rather than a single central database).

    If all you need are backups though, then replicas could be overkill. Either way, you definitely don’t want prod data all stored in a single machine, usually. I would talk to your management about backup requirements and potentially availability/uptime requirements.


  • Pronouns are pointers. “Let us (let’s) move it over there.” Both “us” and “it” indirectly refer to something else by a new name. Like pointers, the pointees are defined by some context external to that sentence/statement (usually earlier sentences/statements or some other actions). The meaning of “us” and “it” can change as well in different contexts, and as such, those words are not bound to one value (and “rebinding” those words by changing contexts does not change the values they were previously bound to).


  • This seems like the same problem that lifetimes solve in Rust - tracking when values are no longer used and thus fall “out of scope”. Automated tooling should really be doing lifetime analysis of these values, and that seems to me like it would fall well out of scope of what GenAI can be trusted to do.

    If this is such a huge problem, are you able to create finalizers that close the resources instead, or better abstractions for managing the LTs of these resources? I don’t write Java anymore, but this seems like a problem better solved by other tools.


  • If you are new to something and want to learn, seek resources and educate yourself with them. Learning takes time, and there are no shortcuts.

    A hot DB should not run on HDDs. Slap some nvme storage into that server if you can. If you can’t, consider getting a new server and migrating to it.

    SQL server can generate execution plans for you. For your queries, generate those, and see if you’re doing any operations that involve iterating the entire table. You should avoid scanning an entire table with a huge number of rows when possible, at least during requests.

    If you want to do some kind of dupe protection, let the DB do it for you. Create an index and a table constraint on the relevant columns. If the data is too complex for that, find a way to do it, like generating and storing hashes, sorting lists/dicts, etc just so that the DB can do the work for you. The DB is better at enforcing constraints than you are (when it can do so).

    For read-heavy workflows, consider whether caches or read replicas will benefit you.

    And finally back to my first point: read. Learn. There are no shortcuts. You cannot get better at something if you don’t take the time to educate yourself on it.


  • For your second part:

    A lot of open source projects exist to make people’s lives easier at work. The people developing these projects are often also people who have jobs as devs and have a use for the projects. It just so happens that it’s easier to use these libraries at work and share them with others when they’re more permissively licensed, and there are community benefits when people all contribute back to it.

    There’s nothing wrong with wanting to go the AGPL route and forcing everyone into open source, but that makes it much harder to use these tools at work, which often kills the motivation behind building them in the first place.

    I tend to be of the opinion that community tools should be GPL/AGPL, while libraries can be anything. It works as a compromise for both - so devs can have an easier time at work while also forcing contributions back to community-developed tools.

    Edit: I should also mention dual licensed AGPL/paid commercial. That model is probably my favorite, but unfortunately uncommon.