Reimplementing cat
in Rust
The motivation behind this post is twofold. I am not very familiar with Rust's file io. And I need to test my custom web server with an actual post.
GNU Cat
I first decided to how figure out how GNU cat
functions. Below is an extremely stripped-down version of it:
// <various includes> #define IO_BUFSIZE 256 * 1024 void gnu_cat(char* path) { struct stat stats; int fd = open(path, O_RDONLY); fstat(fd, &stats); // blk size that is divisible by st_blksize and >= IO_BUFSIZE ptrdiff_t opt_insize = stats.st_blksize + (IO_BUFSIZE - 1) - (IO_BUFSIZE - 1) % stats.st_blksize; posix_fadvise(fd, 0, 0, POSIX_FADV_SEQUENTIAL); char* input_buf = aligned_alloc(getpagesize(), opt_insize); while (true) { ptrdiff_t n_read = read(fd, input_buf, opt_insize); ptrdiff_t n_write = 0; while (n_write < n_read) { n_write += write(STDOUT_FILENO, input_buf, n_read); } if (n_read == 0) { break; } } }
The GNU authors stated that they selected 256KiB as the read size by testing on a variety of (now old) CPUs using the following script:
for i in $(seq 0 10); do bs=$((1024*2**$i)) printf "%7s=" $bs timeout --foreground -sINT 2 \ dd bs=$bs if=/dev/zero of=/dev/null 2>&1 \ | sed -n 's/.* \([0-9.]* [GM]B\/s\)/\1/p' done
According to the above, the optimal read size is 512KiB on my laptop.
Rust Comparison
Below is the naive Rust translation,
// <various imports> fn basic_cat(path: &str) -> anyhow::Result<()> { let mut file = File::open(path)?; let mut buf = String::new(); file.read_to_string(&mut buf)?; println!("{}", buf); Ok(()) }
The above diverges from GNU cat
:
- We allocate the entire buffer into memory before writing it.
- Since our buffer is a
String
, it includes manual checks for UTF-8 compliance. - Rust's
io::stdout()
is line-buffered by default. Thus, our program can be either memory-intensive or CPU-intensive depending on frequency of\n
.
Benchmarking gives:
I recorded these results on a Legion Slim 5 16AHP9 employing EXT4 with a 4k block size using the release profile for both programs. The benchmarking software is published to GitHub and can be run with one command. Please try it out! I'm curious what results others get.
Alternatively, we can write:
// <various imports> const IO_BUFSIZE: u64 = 256 * 1024; fn buffered_cat(path: &str) -> anyhow::Result<()> { let mut file = File::open(path)?; let blksize = file.metadata()?.blksize(); let opt_insize = blksize + (IO_BUFSIZE - 1) - (IO_BUFSIZE - 1) % blksize; let mut stdout = unsafe { File::from_raw_fd(1) }; let mut input_buf: Vec<u8> = vec![0; opt_insize as usize]; input_buf.fill_with(Default::default); loop { let n_read = file.read(input_buf.as_mut_slice()).unwrap(); if n_read == 0 { break; } let mut n_write = 0; while n_write < n_read { n_write += stdout.write(&input_buf).unwrap(); } } Ok(()) }
The above is a much-closer representation to GNU cat
. The translation is actually quite nice, and even the singular unsafe block has fairly intuitive behaviour. But we are redoing the work of std
. Instead of buffering manually we can wrap the file descriptors with io::BufReader
and io::BufWriter
.
// <various imports> const IO_BUFSIZE: u64 = 256 * 1024; fn wrapped_cat(path: &str) -> anyhow::Result<()> { let file = File::open(path)?; let blksize = file.metadata()?.blksize(); let opt_insize = blksize + (IO_BUFSIZE - 1) - (IO_BUFSIZE - 1) % blksize; let mut reader = BufReader::with_capacity(opt_insize as usize, file); let mut stdout = BufWriter::new(unsafe { File::from_raw_fd(1) }); io::copy(&mut reader, &mut stdout).unwrap(); Ok(()) }
We are almost done! But we have have one more external code base to compare...
Rust Coreutils
Obviously, I was not the first person to port cat
to Rust. The project uutils/coreutils
is a complete rust port of all of GNU coreutils. Below is a simplified version of their code:
fn uutils_cat(path: &str) -> anyhow::Result<()> { use nix::{ fcntl::{self, OFlag}, sys::stat::Mode, }; const SPLICE_SIZE: usize = 1024 * 128; const BUF_SIZE: usize = 1024 * 16; /// An alias of fcntl that ignores setst the offsets to 0. fn splice( fd1: impl AsFd, fd2: impl AsFd, amount: usize, ) -> anyhow::Result<usize> { Ok(fcntl::splice( &fd1, None, &fd2, None, amount, fcntl::SpliceFFlags::empty(), )?) } let fd = fcntl::open(path, OFlag::O_RDONLY, Mode::S_IWUSR)?; let (read, write) = nix::unistd::pipe()?; loop { let n_read = splice(&fd, &write, SPLICE_SIZE)?; if n_read == 0 { return Ok(()); } let mut n_write = 0; while n_write != n_read { n_write += splice(&read, unsafe { File::from_raw_fd(1) }, n_read)?; } } }
The syscall splice
avoids copying between userspace and kernel space (GNU cat
employed a similar syscall copy_file_range
but it did not function as expected). Note that there is a fallback if splice
does not work.
Benchmarking
The results are as follows:
While the code for gnu_cat
doesn't include error checking, I did include checks in the benchmarks so that it would be at parity with the rust versions. The entire benchmarking repo is available here.
Notes
I removed the syscall copy_file_range()
in the code for gnu_cat
. Intended to copy between two file descriptors — removing the need for a user space copy similar to splice
— I thought it would speed up the code substantially. However, it is finicky and does not seem to work for stdout
on my machine.
In additon, I formatted rust code using two spaces instead of four. I figured that the smaller code blocks would view better on mobile. If not, please let me know!