Hi, I wrote a gemini server in C and I currently use an hardcoded list of file extensions <-> MIME type assocation. This isn't great because it relies on file extension which can be wrong, but a file without extension would use a default. I chose to set a default text/gemini in case the extension is unknown or if the file has no extension. What are the good practices to determine a file MIME type? regards Sol?ne
You can use the file(1) command or its library libmagic. These are excellent for binary files and support some text file formats, and are extensible. There are libmagic bindings for at least Perl, Python, Go, Rust, Lua, Common Lisp, and Chicken Scheme. On Thu, Dec 10, 2020 at 4:12 PM Sol?ne Rapenne <solene at perso.pw> wrote: > Hi, > > I wrote a gemini server in C and I currently use an hardcoded list of > file extensions <-> MIME type assocation. > This isn't great because it relies on file extension which can be wrong, > but a file without extension would > use a default. > > I chose to set a default text/gemini in case the extension is unknown or > if the file has no extension. > > What are the good practices to determine a file MIME type? > > regards > Sol?ne >
Sol?ne Rapenne <solene at perso.pw> writes: > Hi, > > I wrote a gemini server in C and I currently use an hardcoded list of > file extensions <-> MIME type assocation. > This isn't great because it relies on file extension which can be > wrong, but a file without extension would > use a default. > > I chose to set a default text/gemini in case the extension is unknown > or if the file has no extension. > > What are the good practices to determine a file MIME type? > > regards > Sol?ne I'm using the same approach in my server, but there are two alternatives I know: - using /usr/share/misc/mime.types (still a list, but probably more complete than a manual one). Don't know if it's widespread, but it's present in base on OpenBSD :) - using libmagic: it's a library to detect the MIME type by reading the file. it powers the file(1) command on some unices. The drawback is that it needs to open and read the file, whereas guessing from the extension doesn't.
Hey, Afaik, OpenBSD doesn't ship a libmagic implementation by default, but it does ship a version of file(1) as well as a magic(5) db that you can look at. If you look at the source of the file command, you might be able to work out how to make use of the file(5) db or just lift the code from there. Alternatively, libmagic is in ports. Hope that helps!
Le 2020-12-10 22:26, Omar Polo a ?crit?: > Sol?ne Rapenne <solene at perso.pw> writes: > >> Hi, >> >> I wrote a gemini server in C and I currently use an hardcoded list of >> file extensions <-> MIME type assocation. >> This isn't great because it relies on file extension which can be >> wrong, but a file without extension would >> use a default. >> >> I chose to set a default text/gemini in case the extension is unknown >> or if the file has no extension. >> >> What are the good practices to determine a file MIME type? >> >> regards >> Sol?ne > > I'm using the same approach in my server, but there are two > alternatives > I know: > > - using /usr/share/misc/mime.types (still a list, but probably more > complete than a manual one). Don't know if it's widespread, but > it's present in base on OpenBSD :) > - using libmagic: it's a library to detect the MIME type by reading > the > file. it powers the file(1) command on some unices. The drawback > is > that it needs to open and read the file, whereas guessing from the > extension doesn't. I already did use that exact mime.types file, but I hardcoded it. I will take a look at file(1) code. I target OpenBSD first but I'll see if it can be ported easily.
libmagic uses the local mime-types file(s) but is conditionalized to know where they are on different operating systems, so it's better to use it, even if it is not installed by default. On Thu, Dec 10, 2020 at 4:26 PM Omar Polo <op at omarpolo.com> wrote: > > Sol?ne Rapenne <solene at perso.pw> writes: > > > Hi, > > > > I wrote a gemini server in C and I currently use an hardcoded list of > > file extensions <-> MIME type assocation. > > This isn't great because it relies on file extension which can be > > wrong, but a file without extension would > > use a default. > > > > I chose to set a default text/gemini in case the extension is unknown > > or if the file has no extension. > > > > What are the good practices to determine a file MIME type? > > > > regards > > Sol?ne > > I'm using the same approach in my server, but there are two alternatives > I know: > > - using /usr/share/misc/mime.types (still a list, but probably more > complete than a manual one). Don't know if it's widespread, but > it's present in base on OpenBSD :) > - using libmagic: it's a library to detect the MIME type by reading the > file. it powers the file(1) command on some unices. The drawback is > that it needs to open and read the file, whereas guessing from the > extension doesn't. >
> I chose to set a default text/gemini in case the extension is unknown or > if the file has no extension. This is not a good idea for any unrecognized file, extension or not. If you know the file is UTF-8 text, serve it as "text/plain", otherwise you should serve it as "application/octet-stream", indicating a generic binary file. Jetforce used to default to text/plain for all files[1], and it was a problem because clients will try to display binary data as text, resulting in garbled data. For example, try running `cat /dev/urandom` and see how that looks. 1: https://github.com/michael-lazar/jetforce/issues/38#issuecomment-659688602 Cheers, makeworld P.S. The more accurate term is media type, not MIME type. See https://www.iana.org/assignments/media-types/media-types.xhtml, or just Wikipedia :)
It was thus said that the Great Sol?ne Rapenne once stated: > Hi, > > I wrote a gemini server in C and I currently use an hardcoded list of > file extensions <-> MIME type assocation. > This isn't great because it relies on file extension which can be wrong, > but a file without extension would > use a default. > > I chose to set a default text/gemini in case the extension is unknown or > if the file has no extension. > > What are the good practices to determine a file MIME type? For my server [1] you can configure a mapping of extensions to MIME type. If a file's extension isn't found in that mapping, then I use libmagic to determine the MIME type. -spc [1] https://github.com/spc476/GLV-1.12556
On Thu, 10 Dec 2020 22:12:41 +0100 Sol?ne Rapenne <solene at perso.pw> wrote: > Hi, > > I wrote a gemini server in C and I currently use an hardcoded list of > file extensions <-> MIME type assocation. > This isn't great because it relies on file extension which can be wrong, > but a file without extension would > use a default. > > I chose to set a default text/gemini in case the extension is unknown or > if the file has no extension. > > What are the good practices to determine a file MIME type? There is no way that will work completely reliably without implementing full parsers of the different file types. For a complete server I'd expect to be able to determine file type for a certain served resource myself without relying on an extension-type mapping. For example, to be able to say that every file under /text/ is text/plain. John Cowan suggests libmagic and file. AFAIK utilities/libraries like this can operate using matching rules on some-few bytes of a file to determine the file type with a limited degree of accuracy I suggest a procedure like this to determine the file type: 1. Check if there is a configuration rule for this particular file to determine its file type. If so, use that. 2. If there is not, check if the server extension-type mapping configuration contains the file extension. If so, use that. 3. If there is not, check the system level mime type database if there is a type assigned to the extension. If so, use that. 4. If there is not, you can now optionally use some heuristic approach to determine the file type. This can be via a library like libmagic, or a simpler approach as suggested by makeworld to determine whether you can defer to text/plain or not. If so, use that. 5. If not, assume application/octet-stream and use that. You could cache the results in memory and drop the cache on e.g. SIGHUP As for extension-less files, if you don't want extensions visible to the client, you can still use extensions on the server side, which the server optionally strips off. Overall I think it's fair to expect some level of effort from the server operator in making sure that the static files have sensible extensions. Any smartness beyond extension mapping is at best a bonus AFAIC, at worst a potentially nasty surprise. -- Philip
Sean Conner <sean at conman.org> writes: > For my server [1] you can configure a mapping of extensions to MIME > type. If a file's extension isn't found in that mapping, then I use > libmagic to determine the MIME type. My server (Germinal) does the same thing, largely because that's what the MIME library I'm using does by default. -- Jason McBrayer | ?Strange is the night where black stars rise, jmcbray at carcosa.net | and strange moons circle through the skies, | but stranger still is lost Carcosa.? | ? Robert W. Chambers,The King in Yellow
---