Login
You're viewing the front-end.social public feed.
  • Jul 3, 2026, 6:22 PM

    I think .tar.gz will technically do this to some degree because gzip doesn't know what files are, but this approach still limited by the size of its sliding window (eg it can't deduplicate two files whose size is over 258 bytes or distance apart in the tar is larger than 32K)

    💬 7🔄 0⭐ 0

Replies

  • Jul 3, 2026, 6:26 PM

    @mcc does the ZFS stream format count? Also the brotli sliding window is much larger (up 16M)

    💬 1🔄 0⭐ 0
  • Jul 3, 2026, 6:39 PM

    @evert @mcc Yeah, if the question is whether *any* such format exists, you could make a ZFS filesystem with dedup and compression enabled, write the data, then ‘zfs send’ it to a file. The file could then be ‘zfs receive’d or mounted. Since dedup only matters on write, you just don’t use dedup on the receiving end, and you don’t pay the RAM penalty.

    That’s more or less how I make base images for zones.

    💬 0🔄 0⭐ 0
  • 💬 0🔄 0⭐ 0
  • 💬 0🔄 0⭐ 0
  • Jul 3, 2026, 6:48 PM

    @mcc .tar with a more algorithm than .gz. .tar.xz or .tar.zst. There are probably dedicated file-deduplicating solutions too.

    💬 0🔄 1⭐ 0
  • Willwaffle_iron@nyan.lol
    Jul 3, 2026, 6:48 PM

    @mcc I guess squashfs's deduplication could accomplish that, but I'm sure most people would be confused by squashfs as an archive.

    💬 0🔄 0⭐ 0
  • Jul 3, 2026, 7:01 PM

    @mcc anything that's operating in "solid archive" mode should perform similar to a tarball, I think?

    I've seen game roms where every region of a game is in one zip, and those compress as if they are de-duped.

    💬 0🔄 0⭐ 0
  • Jul 3, 2026, 7:12 PM

    @mcc Some people have already said "tar + something with a large compression window", but in particular tar + lrzip may be best if you go that way. It should be packaged on most distros.

    It's intended to have a compression window up to the size of your RAM (or 2 GB on a 32-bit OS), or optionally even larger than RAM, though that's slower.

    The "LR" stands for Long Range, even!

    It's also multithreaded, which not everything is.

    It can also be combined with other compression algorithms if you are a hardcore compression algorithm enthusiast.

    (The only thing that bothers me about lrzip is that it doesn't preserve the file's modification time.)

    wiki.archlinux.org/title/Lrzip
    en.wikipedia.org/wiki/Rzip

    💬 0🔄 0⭐ 0