💾 Archived View for dioskouroi.xyz › thread › 29358499 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
We did a bunch of benchmarking around this when writing nbdcopy (
https://gitlab.com/nbdkit/libnbd/-/tree/master/copy
https://libguestfs.org/nbdcopy.1.html
) which can copy file to file as well as between local files and NBD servers. There was also the goal to avoid polluting the page cache, which depending on if you're going to use the file content immediately afterwards matters for system throughput.
Anyway long story short, the best thing we found (for Linux) was Linus's own advice linked from here:
https://stackoverflow.com/a/3756466
To do is to combine this with io_uring.
Well, Rust's `copy_regular_files` uses the (Linux) syscall `copy_file_range`
https://man7.org/linux/man-pages/man2/copy_file_range.2.html
Go has had some troubles because they used that with their `io.Copy()`:
https://lwn.net/ml/linux-kernel/20210212044405.4120619-1-drinkcat@chromium.org/
Article about `copy_file_range`
https://lwn.net/Articles/846403/
Btw.
https://stdrs.dev/nightly/x86_64-unknown-linux-gnu/std/sys/u...
As the name says, it only works on regular files.
is a bit misleading, it does only work on regular files in the same filesystem when using a kernel older than 5.3.
EXDEV The files referred to by fd_in and fd_out are not on the same mounted filesystem (pre Linux 5.3).
If you're on Linux, and you want to copy multiple files, the best bet is probably io_uring. I actually wrote a small file copy program / blog post about this:
https://wheybags.com/blog/wcp.html
Unfortunately you can't do ioctls with io_uring, so you can't use it for cloning ranges...
macOS has a couple platform-specific APIs for copying/cloning files that’re worth looking at
- copyfile:
https://developer.apple.com/library/archive/documentation/Sy...
- clonefile:
https://www.unix.com/man-page/mojave/2/clonefile/
Just wanted to mention those. On modern macOS using clonefile()directly is likely the fastest option.
What is copy_regular_files? All I found is some Rust code which does not use a single syscall as claimed (it takes file descriptors, so you need to open/close it anyway).
uses copy acceleration when the file system supports it
I thought that refers to clone_file_range, but that doesn't seem to be the case. As such, I would recommend looking into that, I believe it's supported on btrfs and XFS currently in Linux.
There is also sendfile, but it's probably the same as copy_file_range.
If the author is looking at a Rust specific function, then the actual advice should be to use File::copy(). That function forwards to OS specific implementations which use copy_file_range on Linux, fcopyfile on macOS, and CopyFileExW on Windows.
Whoops, author here. That was a mistake, I meant to say the copy_file_range syscall since that's what Rust uses by default under the hood on Linux (or copyfile on macOS). Updated the article with those corrections.
It uses Linux `copy_file_range`.
As I wrote in another post, that syscall had already caused some trouble for Go, they used it with `io.Copy()`
You can only copy 'non special' files _in _the_same_filesystem_ (that part is missing in Rust's documentioan) with it.
https://stdrs.dev/nightly/x86_64-unknown-linux-gnu/std/sys/u...
Same, and in fact the benchmarks linked are all in Rust so I suspect the author really means "Fastest way to copy a file in Rust".
Before I clicked on the article, I pondered the title and thought it would say "make a symlink."
If the file system implements copy-on-write, wouldn't a hard link even preserve the original file if the file is edited through the original inode?
Copy-on-write and hard links are different things. A file clone would, though (as in cp --reflink).
So if I understand correctly, copy-on-write is implemented using hard links, but the existence of hard links alone doesn't constitute CoW.
You can also have hard links on non CoW filesystems (eg ext3). I assume hard links exists on most filesystems
Where there is a filename there is a hard link, so that's true in a trivial sense.
Right, indeed. I guess I was thinking about the functionality of having multiple hard links point to the same “file” (inode)
Now we could dig into "What is the fastest way to make a symlink?"
The symlink() system call. I can't think of any other way, actually.
Well, one would start from the symlink libc call, then as you say move on to the system call, then analyze the system call for "redundant" operations (libc calls usually have many, kernel calls much fewer - who needs error checking if all we care is speed?). Then compare it with non-portable raw fs modifications of a chosen filesystem. I bet there could be some tiny gains at each stage.
I would guess sendfile() on a Linux box would be fast. But sendfile() on MacOS seems to require that the outbound file descriptor is a socket.
I don’t think there’s such a syscall named “copy_regular_files”. It seems that it’s a function provided by rust which uses “copy_file_range” syscall
TIL Microsoft has patented offloaded I/O:
https://patents.google.com/patent/US20120079583A1/en
This describes their ODX tech, when a file copying between two machines can be initiated and managed from a third node.
https://docs.microsoft.com/en-us/windows-hardware/drivers/st...
> file copying between two machines can be initiated and managed from a third node.
Isn't that something FTP, a 50-year-old protocol can do too?
Same general idea, yeah. What they patented is their enterprise-oriented implementation of it.
So the same as Novell's ncopy?
In one case, the fastest way would be to not have to copy it, just reuse the same file.
Interesting. I wonder if mmap+vmsplice instead of write would be better. Or plain spice.
With SSDs being CoW, I'm surprised that copying a file actually requires any copying... that is clearly a little naive.
I would expect that SSDs do normally clone data internally when copying. Often when given a write operation by the OS the SSD would not be aware of where it already stores that data. Drive read and write performance is commonly benchmarked - I don't recall seeing _copy_ performance tested as a different category which would have much different results if they automatically 'deduplicated' files.
I believe SSDs are COW in the respect that they cannot make partial changes to a page so need to write a whole new page at a time, but not that the drives can arrange multiple file handles to point to the same data. That should be a responsibility of the filesystem.
X-COPY ;)
The fastest way to copy a file... on this person's Macbook.
These are essentially random benchmarks that measure something that is very much both language-, library- and OS-specific. That should've been mentioned in bold friendly letters at the top of the article, because it makes the whole exercise anecdotal at best.
The bottleneck in copying small files will inevitably be in open(). It will also be peanuts, so any noise added by the language and its libraries will be comparable to the actual syscall. You switch to C, you can get 2x speed increase. Or not. Depends on the language.
Secondly, the copying of file data will depend heavily on caching/buffering, so that will vary by the OS, by the filesys config, by the mode in which file is opened AND by the file's state in OS' file cache. If you are not careful, you run your test once, you have a warm cache. You run it again and, bam, super speeds.
The answer to the "fastest way to copy file" will depend on the file system, the OS and the nature of files being copied. None of which is even remotely covered in the post.
Edit -
Also, the results will depend on the type of storage and the file count.
For HDDs you'll see one thing, for NVMe - another, for flash storage - third, and that's without getting into working with remote shares. Ditto for the file count - 100 vs 10K vs 1M will yield completely different results and each will have its own distinct optimal copying approach. And since you _are_ interested in doing things quickly, most likely it will in fact involve very large file counts. If so, then you can do some things on NVMe that you can't on SSD, so all these variables are _also_ dependent on each other.
This post effectively derives its findings from one specific combo of all of these variables. May make for an interesting (if not largely obvious) read, but these findings are all but wortheless.
> The only way to get an objective answer is by writing benchmarks, so here they are. The results will be highly dependent on your specific system, so I would recommend simply running those benchmarks yourself if you’re curious. However, I’ll share my findings from running these on my ext4 XPS 17 and an APFS Macbook.
He says exactly that right there in the article.
Love it when someone harshly criticises an article but can't be bothered reading it.