💾 Archived View for perso.pw › blog › articles › unix-split.gmi captured on 2024-05-10 at 11:23:14. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-05-24)

-=-=-=-=-=-=-

How to split a file into small parts

Comment on Mastodon

Introduction

Today I will present the userland program "split" that is used to split a single file into smaller files.

OpenBSD split(1) manual page

Use case

Split will create new files from a single files, but smaller. The original file can be get back using the command cat on all the small files (in the correct order) to recreate the original file.

There are several use cases for this:

- store a single file (like a backup) on multiple medias (floppies, 700MB CD, DVDs etc..)

- parallelize a file process, for example: split a huge log file into small parts to run analysis on each part

- distribute a file across a few people (I have no idea about the use but I like the idea)

Usage

Its usage is very simple, run split on a file or feed its standard input, it will create 1000 lines long files by default. -b could be used to tell a size in kB or MB for the new files or use -l to change the default 1000 lines. Split can also create a new file each time a line match a regex given with -p.

Here is a simple example splitting a file into 1300kB parts and then reassemble the file from the parts, using sha256 to compare checksum of the original and reconstructed files.

solene@kongroo ~/V/pmenu> split -b 1300k pmenu.mp4
solene@kongroo ~/V/pmenu> ls
pmenu.mp4  xab        xad        xaf        xah        xaj        xal        xan
xaa        xac        xae        xag        xai        xak        xam
solene@kongroo ~/V/pmenu> cat x* > concat.mp4
solene@kongroo ~/V/pmenu> sha256 pmenu.mp4 concat.mp4 
SHA256 (pmenu.mp4)  = e284da1bf8e98226dc78836dd71e7dfe4c3eb9c4172861bafcb1e2afb8281637
SHA256 (concat.mp4) = e284da1bf8e98226dc78836dd71e7dfe4c3eb9c4172861bafcb1e2afb8281637
solene@kongroo ~/V/pmenu> ls -l x*
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xaa
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xab
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xac
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xad
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xae
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xaf
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xag
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xah
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xai
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xaj
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xak
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xal
-rw-r--r--  1 solene  wheel   1331200 Mar 21 16:50 xam
-rw-r--r--  1 solene  wheel    810887 Mar 21 16:50 xan

Conclusion

If you ever need to split files into small parts, think about the command split.

For more advanced splitting requirements, the program csplit can be used, I won't cover it here but I recommend reading the manual page for its usage.

csplit manual page