• 2 Posts
  • 494 Comments
Joined 1 year ago
cake
Cake day: October 4th, 2023

help-circle

  • BitTorrent and Hyphanet have mechanisms that do this.

    Magnet URIs are a standard way of encoding this.

    EDIT: You typically want a slightly-more-elaborate approach than just handing the network a hash and then getting a file.

    You typically want to be able to “chunk” a large file, so that you can pull it from multiple sources. The problem is that you can only validate that information is correct once you have the whole file. So, say you “chunk” the file, get part of it from one source and part from another. A malicious source could feed you incorrect data. You can validate that the end file does not hash to the right value, but then you have no idea what part of the file that some source fed you is invalid, so you don’t know who to re-fetch data from.

    What’s more-common is a system where you have the hash of a hash tree of a file. That way, you can take the hash, request the hash tree from the network, validate that the hash tree hashes to the hash, and then start requesting chunks of the file, where a leaf node in the hash tree is the hash of a chunk. That way, you can validate data at a chunk level, and know that a chunk is invalid after requesting no more than one chunk from a given source.

    See Merkle tree, which also mentions Tiger Tree Hash; TTH is typically used as a key in magnet URIs.

    EDIT2:

    Can’t think of a way to do it with a DHT

    All of the DHTs that I can think of exist to implement this sort of thing.

    EDIT3: Oh, skimmed over your concern, didn’t notice that you took issue with using a hash tree. I think that one normally does want a hash tree, that it’s a mistake to use a straight hash. I mean, you can generate the hash of a hash tree as easily as the hash of a file, if you have that file, which it sounds like you do. On Linux, rhash(1) can generate hashes of hash trees. So if you already have the file, that’s probably what you want.

    Hypothetically, I guess you could go build some kind of index mapping hashes to hashes of hash trees. Don’t know whether you can pull the hash off BitTorrent or something, but I wouldn’t be surprised if it is. But…you’re probably better off with hash trees, unless you can’t see the file and already are committed to a straight hash of the file.

    EDIT4:

    I mean:

    $ rhash --sha1 --hex pkgs 
    7d3a772009aacfe465cb44be414aaa6604ca1ef0  pkgs
    $ rhash -T --hex pkgs 
    18cab20ffdc55614ed45c5620d85b0230951432cdae2303a  pkgs
    $
    

    Either way, straight hash or hash of a hash tree, you’re getting a hex string that identifies your file uniquely. Just that in the hash tree case, you solve some significant problems related to the other thing that you want to do, fetch your file. Might be more compute-intensive to generate a hash of a hash tree, but unless you’re really compute-constrained…shrugs


  • Jia Tan was the username used by a group — probably a state intelligence agency — on GitHub to try to attack the xz open source package. The effort aimed at trying to take over the project, and lasted for years. They managed to get a compromised package briefly into the unstable versions of some major Linux distros that created a backdoor in the openssh daemon and came close to being widely deployed across Linux servers, which would have been a very severe compromise of a huge range of systems. The account vanished when the compromise was discovered.

    The user here is, as a joke, using the same name.

    EDIT:

    https://en.wikipedia.org/wiki/XZ_Utils_backdoor


  • tal@lemmy.todaytoGaming@beehaw.orgneed retro game recommendations
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    5 days ago

    Those are all mature systems, and I’d say that rankings for games on old systems are reasonably consensus at this point. You can just search for “best system whatever games” and get lists, look for games in genres you like; I’ve had luck doing that in the past, as that avoids a lot of the chaff.

    I personally probably have gone back and played Super Metroid the most on the SNES, but depends on what one likes. If you like RPGs from that era, different set of games.

    For this ranking of SNES games, as an example:

    https://www.ign.com/lists/top-100-snes-games/92

    1. The Legend of Zelda: A Link to the Past
    2. Chrono Trigger
    3. Super Metroid
    4. Final Fantasy VI
    5. Super Mario World

    I’d say that probably those games are going to cluster near the top of any list of SNES games.







  • Hmm.

    For the early titles listed, when the games came out, Linux was pretty irrelevant from a gaming standpoint.

    Later, many games that had cross-platform releases used engines that provided cross-platform compatibility. Those games would have been written to the platform, so I’m sure that ports weren’t as easy.

    Now, the games are very elderly. The original team will be long gone. I don’t know if there’s anyone working on those at all – unless a game represents some kind of continued revenue stream, there isn’t a lot of reason to keep engineers on a game.

    WINE runs them fine, so there’s a limited return for Blizzard to do a native port. In fact, as I recall, Starcraft was one of the first notable games that WINE ran…I remember Starcraft support being a big deal around 2001, IIRC. The original Warcraft was for DOS, so you can run that in a DOS emulator.

    I doubt that the investment in a Linux-native port in 2025 is going to get much of a return relative to what other things one could do with the same resources.

    I guess maybe I could see an argument for World of Warcraft, as a very successful, long-running MMORPG that still has players and still represents revenue. But I think that I’d be surprised to see native ports of most of their earlier library.


  • and uses btrfs send/receive to create backups.

    I’m not familiar with that, but if it permits for faster identification of modified data since a given time than scanning a filesystem for modified files, which a filesystem could potentially do, that could also be a useful backup enabler, since now your scan-for-changes time doesn’t need to be linear in the number of files in the filesystem. If you don’t do that, your next best bet on Linux – and this way would be filesystem-agnostic – is gonna require something like having a daemon that runs and uses inotify to build some kind of on-disk index of modifications since the last backup, and a backup system that can understand that.

    looks at btrfs-send(1) man page

    Ah, yeah, it does do that. Well, the man page doesn’t say what time it runs in, but I assume that it’s better than linear in file count on the filesystem.


  • You’re correct and probably the person you’re responding to is treating one as an alternative as another.

    However, theoretically filesystem snapshotting can be used to enable backups, because they permit for an instantaneous, consistent view of a filesystem. I don’t know if there are backup systems that do this with btrfs today, but this would involve taking a snapshot and then having the backup system backing up the snapshot rather than the live view of the filesystem.

    Otherwise, stuff like drive images and database files that are being written to while being backed up can just have a corrupted, inconsistent file in the backup.


  • Wouldnt the sync option also confirm that every write also arrived on the disk?

    If you’re mounting with the NFS sync option, that’ll avoid the “wait until close and probably reorder writes at the NFS layer” issue I mentioned, so that’d address one of the two issues, and the one that’s specific to NFS.

    That’ll force each write to go, in order, to the NFS server, which I’d expect would avoid problems with the network connection being lost while flushing deferred writes. I don’t think that it actually forces it to nonvolatile storage on the server at that time, so if the server loses power, that could still be an issue, but that’s the same problem one would get when running with a local filesystem image with the “less-safe” options for qemu and the client machine loses power.


  • NFS doesn’t do snapshotting, which is what I assumed that you meant and I’d guess ShortN0te also assumed.

    If you’re talking about qcow2 snapshots, that happens at the qcow2 level. NFS doesn’t have any idea that qemu is doing a snapshot operation.

    On a related note: if you are invoking a VM using a filesystem images stored on an NFS mount, I would be careful, unless you are absolutely certain that this is safe for the version of NFS and the specific caching options for both NFS and qemu that you are using.

    I’ve tried to take a quick look. There’s a large stack involved, and I’m only looking at it quickly.

    To avoid data loss via power loss, filesystems – and thus the filesystem images backing VMs using filesystems – require write ordering to be maintained. That is, they need to have the ability to do a write and have it go to actual, nonvolatile storage prior to any subsequent writes.

    At a hard disk protocol level, like for SCSI, there are BARRIER operations. These don’t force something to disk immediately, but they do guarantee that all writes prior to the BARRIER are on nonvolatile storage prior to writes subsequent to it.

    I don’t believe that Linux has any userspace way for an process to request a write barrier. There is not an fwritebarrier() call. This means that the only way to impose write ordering is to call fsync()/sync() or use similar-such operations. These force data to nonvolatile storage, and do not return until it is there. The downside is that this is slow. Programs that are frequently doing such synchronizations cannot issue writes very quickly, and are very sensitive to latency to their nonvolatile storage.

    From the qemu(1) man page:

             By  default, the cache.writeback=on mode is used. It will report data writes as completed as soon as the data is
           present in the host page cache. This is safe as long as your guest OS makes sure to correctly flush disk  caches
             where  needed.  If  your  guest OS does not handle volatile disk write caches correctly and your host crashes or
             loses power, then the guest may experience data corruption.
    
             For such guests, you should consider using cache.writeback=off.  This means that the host  page  cache  will  be
             used  to  read and write data, but write notification will be sent to the guest only after QEMU has made sure to
             flush each write to the disk. Be aware that this has a major impact on performance.
    

    I’m fairly sure that this is a rather larger red flag than it might appear, if one simply assumes that Linux must be doing things “correctly”.

    Linux doesn’t guarantee that a write to position A goes to disk prior to a write to position B. That means that if your machine crashes or loses power, with the default settings, even for drive images sorted on a filesystem on a local host, with default you can potentially corrupt a filesystem image.

    https://docs.kernel.org/block/blk-mq.html

    Note

    Neither the block layer nor the device protocols guarantee the order of completion of requests. This must be handled by higher layers, like the filesystem.

    POSIX does not guarantee that write() operations to different locations in a file are ordered.

    https://stackoverflow.com/questions/7463925/guarantees-of-order-of-the-operations-on-file

    So by default – which is what you might be doing, wittingly or unwittingly – if you’re using a disk image on a filesystem, qemu simply doesn’t care about write ordering to nonvolatile storage. It does writes. it does not care about the order in which they hit the disk. It is not calling fsync() or using analogous functionality (like O_DIRECT).

    NFS entering the picture complicates this further.

    https://www.man7.org/linux/man-pages/man5/nfs.5.html

    The sync mount option The NFS client treats the sync mount option differently than some other file systems (refer to mount(8) for a description of the generic sync and async mount options). If neither sync nor async is specified (or if the async option is specified), the NFS client delays sending application writes to the server until any of these events occur:

             Memory pressure forces reclamation of system memory
             resources.
    
             An application flushes file data explicitly with sync(2),
             msync(2), or fsync(3).
    
             An application closes a file with close(2).
    
             The file is locked/unlocked via fcntl(2).
    
      In other words, under normal circumstances, data written by an
      application may not immediately appear on the server that hosts
      the file.
    
      If the sync option is specified on a mount point, any system call
      that writes data to files on that mount point causes that data to
      be flushed to the server before the system call returns control to
      user space.  This provides greater data cache coherence among
      clients, but at a significant performance cost.
    
      Applications can use the O_SYNC open flag to force application
      writes to individual files to go to the server immediately without
      the use of the sync mount option.
    

    So, strictly-speaking, this doesn’t make any guarantees about what NFS does. It says that it’s fine for the NFS client to send nothing to the server at all on write(). The only time a write() to a file makes it to the server, if you’re using the default NFS mount options. If it’s not going to the server, it definitely cannot be flushed to nonvolatile storage.

    Now, I don’t know this for a fact – would have to go digging around in the NFS client you’re using. But it would be compatible with the guarantees listed, and I’d guess that probably, the NFS client isn’t keeping a log of all the write()s and then replaying them in order. If it did so, for it to meaningfully affect what’s on nonvolatile storage, the NFS server would have to fsync() the file after each write being flushed to nonvolatile storage. Instead, it’s probably just keeping a list of dirty data in the file, and then flushing it to the NFS server at close().

    That is, say you have a program that opens a file filled with all ‘0’ characters, and does:

    1. write ‘1’ to position 1.
    2. write ‘1’ to position 5000.
    3. write ‘2’ to position 1.
    4. write ‘2’ to position 5000.

    At close() time, the NFS client probably doesn’t flush “1” to position 1, then “1” to position 5000, then “2” to position 1, then “2” to position 5000. It’s probably just flushing “2” to position 1, and then “2” to position 5000, because when you close the file, that’s what’s in the list of dirty data in the file.

    The thing is that unless the NFS client retains a log of all those write operations, there’s no way to send the writes to the server in a way that avoid putting the file into a corrupt state if power is lost. It doesn’t matter whether it writes the “2” at position 1 or the “2” at position 5000. In either case, it’s creating a situation where, for a moment, one of those two positions has a “0”, and the other has a “2”. If there’s a failure at that point – the server loses power, the network connection is severed – that’s the state in which the file winds up in. That’s a state that is inconsistent, should never have arisen. And if the file is a filesystem image, then the filesystem might be corrupt.

    So I’d guess that at both of those two points in the stack – the NFS client writing data to the server, and the server block device scheduler, permit inconsistent state if there’s no fsync()/sync()/etc being issued, which appears to be the default behavior for qemu. And running on NFS probably creates a larger window for a failure to induce corruption.

    It’s possible that using qemu’s iSCSI backend avoids this issue, assuming that the iSCSI target avoids reordering. That’d avoid qemu going through the NFS layer.

    I’m not going to dig further into this at the moment. I might be incorrect. But I felt that I should at least mention it, since filesystem images on NFS sounded a bit worrying.



  • Do you use a macro keyboard for shortcuts?

    No. I think that macro functionality is useful, but I don’t do it via the physical keyboard.

    My general take is that chording (pressing some combination of keys simultaneously) that lets one keep one hands on the home row is faster than pressing one key. So, like, instead of having separate capital and lowercase letter keys, it’s preferable to have “shift” and just one key.

    I think that the main arguments for dedicated keys that one lifts one hands for would be for important but relatively-infrequently-used keys that people don’t use enough to remember chorded combinations for – you can just throw the label on the button as a quick reference. Like, we don’t usually have Windows-Alt-7 on a keyboard power on a laptop, but instead have a dedicated power button.

    Maybe there’s a use to have keyboard-level-programmed macros with chording, as some keyboards can do…but to me, the use case seems pretty niche. If you’re using multiple software environments (e.g. BIOS, Windows, Linux terminal, whatever) and want the same functionality in all of them (e.g. a way to type your name), that might make some sense. Or maybe if you’re permitted to take a keyboard with you, but are required to use a computer that you can’t configure at the software level, that’d provide configurability at a level that you have control over.

    In general, though, I’m happier with configuring stuff like that on the computer’s software; I don’t hit those two use cases, myself.



  • No, because the DBMS is going to be designed to permit power loss in the middle of a write without being corrupted. It’ll do something vaguely like this, if you are, for example, overwriting an existing record with a new one:

    1. Write that you are going to make a change in a way that does not affect existing data.

    2. Perform a barrier operation (which could amount to just syncing to disk, or could just tell the OS’s disk cache system to place some restrictions on how it later syncs to disk, but in any event will ensure that all writes prior to to the barrier operation are on disk prior to those write operations subsequent to it).

    3. Replace the existing record. This may be destructive of existing data.

    4. Potentially remove the data written in Step 1, depending upon database format.

    If the DBMS loses power and comes back up, if the data from Step #1 is present and complete, it’ll consider the operation committed, and simply continue the steps from there. If Step 1 is only partially on disk, it’ll consider it not committed and delete it, treat the commit as not having yet gone through. From the DBMS’s standpoint, either the change happens as a whole or does not happen at all.

    That works fine for power loss or if a filesystem is snapshotted at an instant in time. Seeing a partial commit, as long as the DBMS’s view of the system was at an instant in time, is fine; if you start it up against that state, it will either treat the change as complete and committed or throw out an incomplete commit.

    However, if you are a backup program and happily reading the contents of a file, you may be reading a database file with no synchronization, and may wind up with bits of one or multiple commits as the backup program reads the the file and the DBMS writes to it – a corrupt database after the backup is restored.


  • Some databases support snapshotting (which won’t take the database down), and I believe that backup systems can be aware of the DBMS. I’m not a good person to ask as to best practices, because I don’t admin a DBMS, but it’s an issue that I do mention when people are talking about backups and DBMSes – if you have one, be aware that a backup system is going to have to take into account the DBMS one way or another if you want to potentially avoid backing up a database in inconsistent state.


  • the importation into the United States of artificial intelligence or generative artificial intelligence technology or intellectual property developed or produced in the People’s Republic of China is prohibited.

    This guy might get a bill through that bans Chinese AI stuff, though I think that enforcement is gonna be a pain, but as per the text, this is banning all Chinese intellectual property, AI or not. That’s a non-starter; it’s not going to go anywhere in Congress. Like, you couldn’t even identify all instances of Chinese intellectual property if you wanted to do so.

    EDIT: Okay, they define the phrase elsewhere to specifically be “technology or intellectual property that could be used to contribute to artificial intelligence or generative artificial intelligence capabilities”, which is somewhat-narrower but still not going anywhere, because pretty much any form of intellectual property meets that bar; you can train an AI on whatever to improve its capabilities.