đŸ’Ÿ Archived View for dioskouroi.xyz â€ș thread â€ș 29363054 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

SQLite Release 3.37.0

Author: massysett

Score: 332

Comments: 110

Date: 2021-11-27 20:27:03

Web Link

________________________________________________________________________________

ComputerGuru wrote at 2021-11-27 20:52:00:

The biggest news, of course, is the addition of the long-awaited STRICT mode which fixes SQLite's biggest shortcoming (at least from a particular perspective) by turning into a proper, strongly-typed datastore.

https://www.sqlite.org/stricttables.html

SPBS wrote at 2021-11-28 03:05:16:

STRICT tables forbid the use of any other types besides INT/INTEGER, REAL, TEXT, BLOB (and ANY).

What about TIMESTAMP? DATETIME? You can't use that as a type name anymore. Either pick INT, REAL or TEXT for that. What about JSON? You can't use it as a type name anymore. Pick TEXT. Basically type names, even if SQLite didn't have explicit support for that type, were helpful documentation hints in what the column contained. Now everything devolves to either INT or TEXT. I have UUID fields, I can't even call them UUID or BINARY(16), just BLOB.

derefr wrote at 2021-11-28 04:34:53:

Presumably the point of the strict typing forbidding aliases here, is to force the developer to realize/acknowledge that SQLite does not in fact have separate internal native types with these names.

It's like the modern C convention of avoiding "int" in favor of u8/u16/u32/u64. Rather than pretending "int" is a separate thing, you have to explicitly choose and acknowledge the concrete primitive type you're really getting, and explicitly opt into the semantics that use of that concrete primitive type implies.

nayuki wrote at 2021-11-28 05:00:22:

Not a great analogy because the semantics of the classic C types still leak through. For example, ((uint16_t)0xFFFF) * ((uint16_t)0xFFFF) would cause undefined behavior if int is 32 bits wide, due to implicit arithmetic promotions.

Also, many APIs of the C standard library are defined to take or give char/int/etc., and it takes effort to translate between those types and u8/u32/etc. in a safe way that works on all platforms.

derefr wrote at 2021-11-28 06:29:08:

That's not because "int is 32 bits wide", though; that's because you're on a machine whose native register size is more than 16 bits, and so your processor's MUL instruction likely expects to always write the product to a 32/64-bit destination register, even if the source registers are both 16-bit.

IMHO C is far too lax with things like this; an ideal systems language would force you to annotate maths operations with

the "working precision" for each operation, making explicit any promotion done to the inputs before the math happens. And then, if your "working precision" exceeded the destination variable's size, you'd also be forced to do an explicit demoting cast, just as if you were assigning from a declared larger-sized variable.

Yes, it'd be mostly irrelevant for everyday int math stuff; but it'd make things _so much clearer_ as soon as you started working with floats (where you'd have to explicitly opt into Intel's 80-bit extended "working precision.")

the_duke wrote at 2021-11-28 08:10:32:

Rust requires explicit conversions for numeric types for exactly the reasons you mention.

derefr wrote at 2021-11-28 17:39:55:

Most systems languages (that aren't C) force you to be explicit about demoting casts _between formal variables_; that's the easy part. But no language I know of (not even Rust) gets rid of implicit promotion during math operations, in favor of 1. forcing you to specify the size of the destination register you want the instruction to use (i.e. the working precision), and then 2. requiring an explicit demoting cast if the destination formal variable is smaller than _the working precision you specified_.

I mean, C's inline assembler, and LLVM IR, both require this; but they also require you to specify _which_ work register you're talking about, rather than just the size, which breaks the conceit of a "portable systems language."

ComputerGuru wrote at 2021-11-28 16:11:34:

Yup. Most newcomers to SQLite are bitten hard by the fake datetime support.

skissane wrote at 2021-11-28 09:34:02:

I think a logical future step would be to add the SQL-99 CREATE DOMAIN statement. That allows you to define a custom type alias, for example:

CREATE DOMAIN UUID AS TEXT;

Or if you prefer:

CREATE DOMAIN UUID AS BINARY;

Without CREATE DOMAIN, for STRICT tables they don't know which basic type to use for an unknown type name. They could use something like "unknown types are equivalent to ANY" but that would mean they couldn't implement CREATE DOMAIN in the future without breaking backward compatibility. If they did implement CREATE DOMAIN I expect it would be supported for STRICT tables only.

What makes it really powerful is you can also define a CHECK constraint, NOT NULL constraint, and DEFAULTs to be automatically applied to table columns of this custom type. Examples:

CREATE DOMAIN UUID AS TEXT CHECK (value REGEXP '[a-f0-9]{8}(-[a-f0-9]{4}){4}[a-f0-9]{8}');

CREATE DOMAIN UUID AS BINARY CHECK (LENGTH(value) = 16);

I think Postgres is the only major relational database to implement CREATE DOMAIN, although there are some more obscure RDBMSes which support it too (such as Firebird, Interbase, Sybase/SAP SQL Anywhere).

The SQLite developers haven't said they plan to do this, but I imagine they are thinking about it and may do it at some point. I think it is the kind of feature likely to appeal to them, it should be relatively simple to implement (especially if they don't implement ALTER DOMAIN, only CREATE and DROP), but provide a lot of power in exchange.

DemocracyFTW wrote at 2021-11-28 13:32:33:

came here to say this (approximately)

dotancohen wrote at 2021-11-28 11:34:49:

    > What about TIMESTAMP? DATETIME?

The same thing as we've always done: proper column names and strong datatypes in the application layer.

SPBS wrote at 2021-11-28 13:42:54:

Yeah, I'm starting to warm to the idea of `CREATE TABLE tbl ( created_at INT ) STRICT;`. The name still hints at what domain type it contains, and enforcing INT timestamps should make any indexes on that column considerably smaller.

StreamBright wrote at 2021-11-28 07:34:40:

> What about TIMESTAMP? DATETIME?

I usually use TEXT and store ISO 8601.

Datetime and timestamp is fine if you need date functions ("- interval '7' day", etc.) but most of the time you can just convert the text to datetime when running the query without much performance penalty in other databases.

What is the usecase for TIMESTAMP and DATETIME that cannot be solved today?

the_duke wrote at 2021-11-28 08:38:42:

You really don't want to store timestamps as text.

It's horrible for performance. A iso8601 timestamp takes at least double the amount of bytes (without a timezone), but more importantly you need to parse and validate the timestamp each time you use it. Filtering and sorting suffer a lot. Iso8601 _without_ timezones can in theory be sorted trivially without parsing, but sorting strings is still a lot more expensive than sorting arrays of int64. And you also have to make sure to never write a different timezone. Why not just use an int?

You then have to parse the string into the native datetime type again in application code.

You also need to add a custom check constraint to prevent writing garbage to the column, which is easy to forget.

littlecranky67 wrote at 2021-11-28 10:15:18:

SQLite is not postgres. Often sqlite dbs span only a couple of MB, ie when used in an App, and performance might not be as important as readability during debugging/development.

hnarn wrote at 2021-11-28 11:07:54:

The argument "this doesn't matter because all SQLite databases are small anyway" does not hold water. There are many large SQLite databases out there. And even if there wasn't, storing timestamps as text is simply and fundamentally A Bad Idea(tm).

littlecranky67 wrote at 2021-11-30 09:10:00:

True, but I think SQLite has a design goal of being an embedded DB.

As for your argument to store timestamps as text, look at DynamoDB who only allows storing dates as text, and it is designed to handle petabytes of data.

hnarn wrote at 2021-11-28 07:46:24:

If you’re going to convert it anyway, why use a human readable format in the first place? If you store it as unix epoch, not only do you never have to worry about time zones, it’s also a sortable integer instead of text.

roelb wrote at 2021-11-28 10:54:03:

Regarding time and timezones, it's best to not prematurely optimise. Ideally store both. It's much more valuable to track the timezone from your source then to have to worry about it later. ISO8601 is the universal standard.

A case where you see this issue play out is the horrible Strava-*device sync. You track in a different timezone and they store+visualise activities against some weird client-side profile setting, which causes morning runs to render at 11pm. Only adding timezone later on. They totally mess this up.

hnarn wrote at 2021-11-28 11:04:55:

First of all, I don't see how using unix epoch timestamps can be called "premature optimization", it's a pretty widely used and standardized way of saving timestamps.

Secondly, if you're in "SQLite Strict Land" and you don't have access to abstractions like "tell the database this is a date", then the best way of storing timestamps _is_ unix epoch, I would be extremely surprised if databases don't already do this behind the scenes when abstracting away things like dates for the user.

Thirdly, what is the value of tracking "source timezone"? This is solving a non-existent problem: if you are getting the timestamp in unix epoch from the source, and you're storing it in unix epoch, the "source timezone" is already known: it's UTC just like all unix timestamps are.

Timezones is fundamentally a data presentation concern, and I strongly believe they should not be a part of the source data.

> You track in a different timezone and they store+visualise activities against some weird client-side profile setting, which causes morning runs to render at 11pm. Only adding timezone later on. They totally mess this up.

This is exactly the type of issue that happens _because_ you involve timezones in your source data. If all applications and databases only concern themselves with unix timestamps, and the conversion to a specific timezone only happens in the application layer upon display, this type of issue simply does not happen, because time "1638097466" when I'm writing this is the exact same time everywhere on the globe.

(Of course, similar issues _can_ happen due to user error if two applications have different time zone settings and the user mistakenly enter a timestamp in the wrong TZ, but that's definitely not solved by making time zones a part of the data itself)

fauigerzigerk wrote at 2021-11-28 14:37:23:

I agree that this is the preferred way of dealing with it. Unfortunately, it's not always possible. Some cases that come to mind:

- Importing event/action data that contains date/time values with a timezone but insufficient information on the place where the event occurred. Converting to UTC and throwing away the timezone means you're losing information. You can no longer recover the local time of the event. You can no longer answer questions like, did this happen in broad daylight or in the dark?

- Importing local date/time data without a timezone or place (I've seen lots of financial records like this). In this case, you simply don't know where on the timeline the event actually took place. The best you can do is store the local date/time info as is. You can't even sort it with date/time data from other sources. It's bad data but throwing it away or incorrectly converting it to UTC may be worse.

- Calendar/reminder/alarm clock kind of apps. You don't want to set your alarm to 7 am, travel to another timezone and have the alarm blare at you at 4 am. Sometimes you really do want local time.

- There are other cases where local times are not strictly necessary but so much more convenient. Think shop opening hours in a mapping app for instance. You don't want to update all opening hours every time something timezone related happens, such as the beginning or end of daylight saving time.

hnarn wrote at 2021-11-28 15:41:35:

You are correct, there are many other reasons for saving dates and times in a database apart from recording "events", and for those it does make sense to use "relative" descriptions of time or date. I'd argue though, that if your data collection of _events_ is imperfect, like if you have no idea which system an event came from or whether you can trust the syntax of the timestamps, those are primary problems that should be fixed and not "worked around" by changing how your timestamps are saved.

For example, if you don't know where a data point originated, that's already a pretty big issue regardless of whether the time is correct or not. If you have financial data with ambiguous timestamps, this is not only a problem but potentially a compliance problem, since banks are heavily regulated. I think it's unlikely that it's acceptable for a bank to be unable to answer the question "when did this event take place", so the fundamental issue should be fixed, not tolerated.

fauigerzigerk wrote at 2021-11-28 16:45:05:

You're not always in a position to fix these issues, nor is it always worth the effort. It may be a complete side-issue that no one cares about. The information may exist somewhere but not in the dataset you're working with.

Having done quite a bit of data integration work in my life, I can tell you that fixing things can range from the unavoidable to the quixotic.

StreamBright wrote at 2021-11-30 10:30:29:

Another use case is the definition of a business day.

A business day is almost always in local time. Example: business is open from 9AM until 6PM. This is affected by day light saving and it moves back and forth in UTC.

Working with financial systems, payments are business day aligned and if you try to have a system with UTC time you might be surprised what sort of issues are surface. I learned these things the hard way. :)

CodesInChaos wrote at 2021-11-29 11:28:31:

That only applies if what you're storing identifies a point in time. In business code we often have to work with date times in a specific timezone. e.g. if you schedule a meeting on 2022-07-01 09:00 in Berlin time, throwing away the timezone would be a mistake which results in incorrect behaviour when the DST rules change.

hnarn wrote at 2021-11-29 11:39:51:

> That only applies if what you're storing identifies a point in time.

A meeting is definitely "a point in time", and your example illustrates my point: the display of "2022-07-01 09:00 UTC+2" should be up to the client (which has a timezone setting) but the _storage_ of this meeting's time should, in my opinion, not be "2022-07-01 09:00 UTC+2" but the timezone-neutral "1656658800".

Since there is technically no such thing as "Berlin time", it should be up to the client to decide whether the date you selected in the future (July 1, 2022) is UTC+1 (winter) or UTC+2 (summer) and then store the time in a neutral way to ensure it's then displayed correctly in other clients that have another timezone set (for example the crazy Indian UTC+5:30).

Meetings are a good example of when this matters because it's important that meetings happen at the same time for everyone, which will always happen when the source data is unix epoch. Of course it also happens with correct time zone conversions, but to me that adds unnecessary complexity. Other examples given, like the opening hours of a store, are probably better to store in a "relative" way (09:00, Europe/Berlin) as the second part of that tuple will vary between UTC+1 and UTC+2.

StreamBright wrote at 2021-11-30 10:27:25:

Great question!

In my experience human readability is pretty important when you are debugging or working with SQL in a raw form (running random, ad-hoc queries). Storing unix epoch is fine and sometimes I do it but more recently I just realised that unless I am working with a database with billions of rows storing a human readable text is fine.

hnarn wrote at 2021-11-30 10:39:03:

> In my experience human readability is pretty important when you are debugging or working with SQL in a raw form (running random, ad-hoc queries).

I agree, but most databases have functions for this. MySQL example:

> SELECT FROM_UNIXTIME(1196440219);

-> '2007-11-30 10:30:19'

I would claim this reinforces the benefit of unix timestamps because now you're getting it back in _your_ local time (or whatever you choose to convert it to: SET time_zone), not whatever time it happened to be put into the database as.

For MySQL there's a more important point though:

* Storing timestamps in MySQL as pure text is _simply wrong_ since MySQL has abstractions for timestamps (like the TIMESTAMP type)[1] and not using this is a completely unnecessary violation of good practice

* In the case we're discussing (no abstractions available, INT only) I would still say that unix timestamps brings the rather huge benefit of ensuring that all data is put in correctly: there is no way to sanitize inputs with a string column and ensure that the same timezone is always used, at least not without a bunch of extra an unnecessary code.

[1]:

https://dev.mysql.com/doc/refman/8.0/en/datetime.html

melony wrote at 2021-11-28 04:08:56:

And more importantly, JSON.

derefr wrote at 2021-11-28 04:36:56:

Note that you don't have to have a JSON-typed column in SQLite in order to do JSON-y things. In SQLite, JSON is just "text that happens to contain JSON", and JSON functions are just functions that operate on "text that happens to contain JSON".

michalc wrote at 2021-11-27 21:01:56:

I have been particularly looking forward to that. Although I have to admit that I'm a bit... disappointed(?)... that previous versions of SQLite won't be able to open such tables (unless they use PRAGMA writable_schema=ON).

I wonder if SQLite didn't store the schema for a table as a "CREATE TABLE..." string, but instead something a bit more... structured, then such things could be added in a more backwards compatible way?

cryptonector wrote at 2021-11-27 21:22:07:

This could never be added in a backwards-compatible way if the existing CHECK() functionality was not sufficient already.

And yes, it'd be very nice to have a relational metaschema that doesn't suck for SQL schema. So far all such metaschemas I've seen leave a lot to be desired.

saurik wrote at 2021-11-27 21:52:59:

Well, in a different concept of how compatibility was thought of with respect to stored schemes, you could have feature flags marked as "this feature is entirely optional to understand", "this feature is required to understand", and a third case that would work here and be extremely useful "this feature is required to understand if you want to write to the database, but if you merely open it read only it is optional".

ComputerGuru wrote at 2021-11-27 21:28:49:

The existing check syntax would cover inserts but wouldn’t have supported the behavior of the new “any” type in a strict table.

What would be nice is if SQLite had separate “compatible with” and “read-only compatible with” fields, since the read-only behavior should not have changed at all (although I suppose describing the table would fail depending on how the presence of the STRICT modifier is codified).

cryptonector wrote at 2021-11-27 22:10:13:

What they could have done is translated a STRICT table w/o ANY-typed columns to an older CREATE TABLE with CHECK()s. But anyways.

smitty1e wrote at 2021-11-27 21:46:45:

> a relational metaschema that doesn't suck for SQL schema

Something like an object-relational mapper, e.g. SQLAlchemy?

porker wrote at 2021-11-27 21:27:14:

What kind of thing do you mean by a relational metaschema?

formerly_proven wrote at 2021-11-27 21:50:03:

Describing SQL tables with an SQL table

jhgb wrote at 2021-11-27 22:23:50:

That's called a catalog. (

https://reldb.org/c/index.php/twelve-rules/

, Rule #4)

chmaynard wrote at 2021-11-28 09:43:20:

As I understand it, STRICT is not a database "mode" but rather a feature of individual tables. In other words, strict and non-strict tables can co-exist in the same database.

Lio wrote at 2021-11-27 20:59:15:

There’s nothing obvious is linked release but is this expected to have an affect on performance either good or bad?

ripley12 wrote at 2021-11-27 21:09:44:

I wouldn't expect much difference in performance. "The on-disk format for the table data is the same."

I would expect the additional checks on writes to be negligible in most cases (dwarfed by other bookkeeping and I/O). I imagine that strict mode could enable some optimizations for read queries, but I can't see those being huge wins.

ComputerGuru wrote at 2021-11-27 21:43:56:

> but I can't see those being huge wins.

I agree from an algorithmic perspective but in the real world, it really depends. The size of a non-text/binary cell within a column is now fixed, meaning guaranteed to never need to resize/reallocate when streaming results. That could translate to non-negligible improvements (multiple percentage point speed up) if it’s implemented separately from the existing code path.

prirun wrote at 2021-11-27 23:42:45:

> The size of a non-text/binary cell within a column is now fixed

I think this is not true since the on-disk format didn't change, ie, ints are still variable-size whether strict or not.

ComputerGuru wrote at 2021-11-28 16:13:18:

The format/spec doesn’t change, the implementation does. “This field can be variably sized” vs “but in practice since this is a strict table it will always be exactly n bytes”

SQLite wrote at 2021-11-28 20:57:20:

The SQLite dev's internal performance testing using cachegrind

shows that STRICT mode uses about 0.34% more CPU cycles. So

STRICT is slightly slower, but not enough that you could measure

the difference reliably in a real-world system.

ComputerGuru wrote at 2021-11-27 21:21:14:

It’s really no more than an additional (hyper optimized) check constraint on the type; with branch prediction it’s probably ~free.

CyberDildonics wrote at 2021-11-27 22:27:11:

What's the difference between a datastore and a database?

ffwd wrote at 2021-11-28 00:29:36:

CREATE TABLE if not exists boxoffice (title ANY, theaters ANY, release_date ANY, year ANY) STRICT;

dkjaudyeqooe wrote at 2021-11-27 21:50:42:

Given that it has the Any type, it's really not that much different, it's mostly the lack of the 'interesting' best-effort automatic conversions which confused and alarmed people and gave SQLite's type system a bad name. This version still does automatic conversions, which IMO are always a bad idea, but justifies them with "but everyone is doing it!".

Some people seem to object to the Any type, but it's incredibly useful since a column may naturally contain a different type per row and it avoids ugly hacks to accommodate them.

brandmeyer wrote at 2021-11-27 23:15:48:

Hot take: The ANY column type will never be used. Users who want to take advantage of it will just avoid the STRICT keyword on the whole table (maybe even the entire database). Users who like STRICT typing will reach for a JSON column for their occasional weakly- or sum-typed needs.

pdimitar wrote at 2021-11-28 18:41:28:

I'll start using it tomorrow for one reason: three STRICT tables feature set will very likely get gradually expanded with time and I believe it's also very likely that sticking to the new syntax will thus net increasing benefits with time.

Of course, migrating current databases is another story. They have plenty of CHECK constraints and app-level validation already so the value in migrating those is small (although I'd still do it because the ANY data type columns will not coerce / convert anything when the table is STRICT which to me is still a win).

brigandish wrote at 2021-11-28 09:02:44:

ANY acts differently between strict and non-strict[1]:

> The behavior of ANY is slightly different in a STRICT table versus an ordinary non-strict table. In a STRICT table, a column of type ANY always preserves the data exactly as it is received. For an ordinary non-strict table, a column of type ANY will attempt to convert strings that look like numbers into a numeric value, and if successful will store the numeric value rather than the original string.

So it has some utility in a strict table that you can't get otherwise.

[1]

https://www.sqlite.org/stricttables.html#the_any_datatype

SQLite wrote at 2021-11-28 20:59:46:

The ANY datatype in STRICT mode behaves the same as a column with no datatype

at all in ordinary SQLite. ANY does not give you any new capabilities.

brigandish wrote at 2021-11-29 00:31:26:

I see. I like SQLite but the idea that there were people leaving the type information off of their columns entirely to take advantage of this hidden(?) "feature" is something I'll wonder about for a while.

dkjaudyeqooe wrote at 2021-11-29 10:02:37:

That's how I've always used SQLite, with no column type information, then I use the API to populate the column with type information as well as the value, thus implying the Any type. This is now explicit in strict tables.

As an embedded SQL engine I think this is the most rational way to use it, making it very predictable and very flexible.

contingencies wrote at 2021-11-27 22:15:55:

Couldn't resist.

https://imgur.com/a/W3nufMT

belter wrote at 2021-11-27 22:28:21:

https://youtu.be/st6-DgWeuos

mastax wrote at 2021-11-27 22:04:48:

Well I definitely prefer the new strict mode, I'll enable that right away. I've long fantasized about a mode that would make SQLite behave like a normal SQL database, but this is not that. You still can't store a float in the database and get it back out unmodified. SQLite will expand it to a double, and it can't store NaNs. And probably more overhead to be able to support storing strings in my REAL column. Oh well.

I tried out DuckDB which is more my style but the .NET libs are poor and I can't shave that yak right now.

dotancohen wrote at 2021-11-28 11:40:06:

  > You still can't store a float in the database and get it back out unmodified. SQLite will expand it to a double,

When would a double not be an acceptable replacement for float? Seriously interested. If the application layer works with floats, then a double-precision float should contain all possible values. I'd really love to hear about the use-case, I'm sure they exist even if I can't think of one.

  > and it can't store NaNs.

Well, neither can other small-time databases such as MySQL. Postgres does, though, which I did not know until googling it now.

mastax wrote at 2021-11-28 13:18:42:

Storing 13,000,000 floats to disk per minute on user machines. Doubling storage usage for no reason is not okay, we're already having problems filling up disks with other data.

dotancohen wrote at 2021-11-28 14:50:17:

Storage concern, I see. Yes, that actually was the only thing I could think off. Thank you.

phiresky wrote at 2021-11-28 01:12:41:

According to the docs, looks like the ANY type in a strict table might do that:

> In a STRICT table, a column of type ANY always preserves the data exactly as it is received. For an ordinary non-strict table, a column of type ANY will attempt to convert strings that look like numbers into a numeric value, and if successful will store the numeric value rather than the original string

There's still much less data types than in most SQL databases though, e.g. float4, boolean, timestamp etc are missing.

mastax wrote at 2021-11-28 03:12:57:

SQLite converts NaNs to NULL, and SQLite libraries tend to throw on encountering NaN.

marcosdumay wrote at 2021-11-28 01:27:42:

> SQLite will expand it to a double

Is that part a real problem? (The NaN thing certainly is.)

Doubles are a strict superset of floats. You don't risk losing any precision on the float to double conversion, and if you then convert the result back to float, you won't lose any precision either.

mastax wrote at 2021-11-28 03:02:59:

It's can be a real problem when you need to store 13,000,000 floats per minute.

Is it hard to work around? No. But every time I use SQLite it requires half a dozen little hacks to work around it's strangeness and I would prefer to not have to do that.

marcosdumay wrote at 2021-11-28 14:20:51:

Are you talking about the increased IO and disk usage due to doubles being twice the size? Because the conversion itself is usually done in hardware and is extremely fast.

Anyway, I've never touched a serialization library that doesn't mess with the data types it gets. There is always some conversion needed, and this one is on the easy end of the scale. The fact that composed types (records and arrays) aren't on the list is a much larger troublemaker.

(Regardless, I'm curious about the applications of 13M inserts per minute on a synchronous single-computer database. Pushing a machine to that extreme looks very interesting. Is that application something you can share?)

mastax wrote at 2021-11-28 17:00:03:

It's a data acquisition system which records various data collected out in the field by operators on standard laptops. They're frequently operating in places with no cell coverage so we can't "just" stream to the cloud, we store the data locally and then bulk upload later. Also want to be able to visualize the data on the laptops which may be larger than ram.

It's not a particularly demanding use-case, especially with respect to querying. Could just use some serialization format, parquet maybe, but I know SQLite works reliably and has good libs. It's nice to have standard tools available, and I know no matter what changes I'll be able to read these databases in 20 years. Didn't have a lot of time to work on this project so I didn't spend much time evaluating alternatives.

marcosdumay wrote at 2021-11-28 17:31:31:

Oh, that's interesting. I didn't think about batch importing local data. Yes, I imagine you want to minimize the time importing data, and so halving the size of your main data would speed things up by a sizeable amount.

cryptonector wrote at 2021-11-27 22:08:14:

Content inserted into the column with a datatype other than ANY must be either a NULL (assuming there is no NOT NULL constraint on the column) or the type specified. SQLite attempts to coerce the data into the appropriate type using the usual affinity rules, [...].

I would like an even stricter mode with no automatic type coercion.

kzrdude wrote at 2021-11-27 23:54:59:

Not trying to be mean, but it's like when someone who doesn't understand the point of the feature implements it.

cryptonector wrote at 2021-11-27 23:59:13:

I don't understand your point; I'm not sure why.

djbusby wrote at 2021-11-28 00:40:45:

I think they mean that your up-thread (is that OP or GP or?) comment is asking for the wrong thing. Like one of those XY type problems

kzrdude wrote at 2021-11-28 01:15:09:

I always underestimate how easy it is to be misunderstood. Why it's worthwhile to spend time on clear communication!

I'm chiding SQLite, a bit unfairly, that they only implemented the strict feature half right, still leaving some type coercion in.

cryptonector wrote at 2021-11-28 02:10:33:

Thanks, I understand your point now, and I agree. The SQLite3 devs seem to think that because other SQL RDBMSes do this, they should too. At this time I'm of the opinion that more strictness is better.

kiwijamo wrote at 2021-11-28 03:50:55:

Which other SQL engines do this? The ones I'm familiar with (MySQL, PostgreSQL, Microsoft SQL, Orcale to name a few) certainly don't behave anywhere close to how SQLite behaves.

alberth wrote at 2021-11-28 01:55:17:

STRICT + BedrockDB will be an interesting pairing.

I wonder if this will provide more

adoption of this client server database.

https://bedrockdb.com/

EDIT: for this not aware, Expensify is a paid sponsor of SQLITE and there’s an official branch (Bedrock) of SQLite that has client / server capabilities.

barosl wrote at 2021-11-28 02:11:33:

To keep things simple, the current parser accepts duplicate options without complaining, but that might change in future releases, so applications should not rely on it.

I love this level of attention to detail and kindness in the SQLite documentation. Very few projects do this.

malkia wrote at 2021-11-28 00:01:10:

Finally - table_list - You can't imagine in how many places sqlite pops up (web too - for example

https://perfetto.dev/

), and having introspection built-in abilities helps a lot!

remram wrote at 2021-11-28 17:23:17:

You could already select from sqlite_master

brigandish wrote at 2021-11-28 08:50:43:

Some developers appreciate the freedom that SQLite's flexible typing rules provide and use that freedom to advantage. But other developers are aghast at SQLite's flagrant rule-breaking and prefer the traditional rigid type system found in all other SQL database engines and in the SQL standard. For this latter group, SQLite supports a strict typing mode, as of version 3.37.0 (2021-11-27), that is enabled separately for each table.

At last.

karteum wrote at 2021-11-27 21:40:01:

Great news.

One remark: "The on-disk format for the table data is the same" => I guess this means there is still some kind of "type/length" header for every single record... (which could theoretically be dropped) ?

cryptonector wrote at 2021-11-27 21:20:19:

The query planner now omits ORDER BY clauses on subqueries and views if removing those clauses does not change the semantics of the query.

That's fair and allowed by the standard, but it will break some things.

cldellow wrote at 2021-11-27 22:11:41:

I think I recall when this feature was discussed and implemented.

If it's what I'm thinking of, this concern was carefully considered:

  If a subquery in the FROM clause has an ORDER BY, that ORDER BY is omitted if all of the following conditions are also true:
  
  1. There is no LIMIT clause in the subquery
  
  2. The subquery was not one of the virtual subqueries added internally by SQLite for window-function processing
  
  3. The subquery is not part of the FROM clause in an UPDATE-FROM statement
  
  4. The outer query does not use any aggregate functions other than the built-in count(), min(), and/or max() functions.
  
  5. Either the outer query has its own ORDER BY clause or else the subquery is one term of a join.

from

https://sqlite.org/forum/forumpost/878ca7a9be0862af?t=h

cryptonector wrote at 2021-11-27 22:14:32:

Thanks! That works for me!

ComputerGuru wrote at 2021-11-27 21:25:32:

It says “if it doesn’t change the semantics of the query” and not “if it isn’t guaranteed by the spec,” which means it actually should only happen in cases where it doesn’t break anything, eg “foo in (
 only now without an implicit order)” but not “foo == (select foo from (
 now without implicit order) limit 1))” - at least, presumably.

cryptonector wrote at 2021-11-27 22:13:34:

Well, I guess I'll have to play with it. It's not entirely clear from the release notes what "semantics" means in this case. In principle in relational algebra ORDER BY is simply not meaningful because it's all unordered sets. In practice ORDER BY is essential. In principle all ORDER BY clauses in sub-queries can just be deleted, but in practice it can affect aggregation functions not built on commutative operations.

emmelaich wrote at 2021-11-27 23:01:41:

> what "semantics" means

An almost Clintonian phrase! :-)

zzzeek wrote at 2021-11-27 22:42:21:

wow, that STRICT thing is going to be a big deal. It's difficult enough to work with databases that have an extremely mature strong typing model (PostgreSQL - bound parameters need super-explicit typing information sent in most cases, and it's pretty unforgiving about implicit casts). A brand new one bolted onto SQLite after decades of "types as a vague suggestion" should be ...interesting! to support in downstream products (for my end, it's SQLAlchemy).

TedShiller wrote at 2021-11-28 00:46:59:

The Postgres documentation should adopt these types of state diagrams to describe the syntax:

https://www.sqlite.org/lang_createtable.html#ckconst

ako wrote at 2021-11-28 08:45:27:

Oracle has been using these in their reference guides for at least 30 years, e.g.:

https://docs.oracle.com/en/database/oracle/oracle-database/2...

Personally i'm more helped by good examples than these syntax diagrams.

pepoluan wrote at 2021-11-30 05:27:34:

"state diagrams"? Did you mean "syntax diagrams"?

tpetry wrote at 2021-11-28 08:44:54:

Many years ago when learning SQL i was looking at these state diagrams despite using another database. So yes, they should be added! They may not be usefull for anyone SQL who is able to read the Postgresql grammar definition but for someone learning it it‘s helping very much.

Keyframe wrote at 2021-11-28 01:23:11:

Syntax diagram, just like ye olde Pascal syntax poster!

cryptonector wrote at 2021-11-28 02:45:29:

They don't look very accessible to me, but yes, they are pretty.

curiousmindz wrote at 2021-11-27 21:24:14:

As long as an app is architectured around having a data layer with "typed" objects, the need for SQLite to support strict column types is not too important.

However, it does make handling more "advanced" types less standard. For example, there isn't a standard way to store a date, especially if we want to preserve high-precision (nanoseconds).

ComputerGuru wrote at 2021-11-27 21:34:14:

It’s actually really important in some edge cases. For instance, I have a column that is declared as text but I read and write to it via the blob api (because the source is utf8 bytes I don’t want to parse client-side for allocation reasons). The flexible typing ended up - silently - storing the fields as blobs, but comparisons with strings continued to work (“foo” == table.col) until I tried changing that to a pattern (table.col like “%foo”) at which point there were runtime exceptions (iirc). I had to update all existing data to cast the columns to the correct type, whereas the original insertion query should have been doing that from the start, but SQLite was too flexible for my own good.

pstuart wrote at 2021-11-27 22:04:37:

> especially if we want to preserve high-precision (nanoseconds).

Why not just by convention as ints, nanoseconds UTC post epoch?

jatone wrote at 2021-11-27 22:34:26:

because the lack of a type annotation via introspection makes tools automatically generating code for your less than useful.

41b696ef1113 wrote at 2021-11-28 01:26:23:

Would this be bundled into existing Python releases or would I have to wait until Python 11 to benefit from strict?

jftuga wrote at 2021-11-28 01:43:32:

If you don't want to wait, you can pre-link a custom Python build with your own-built sqlite3:

https://stackoverflow.com/a/28045550/452281

gigatexal wrote at 2021-11-27 22:16:42:

yup - strict tables is very welcome.

This also is an interesting tidbit:

"The query planner now omits ORDER BY clauses on subqueries and views if removing those clauses does not change the semantics of the query."

Curious how it might improve queries -- I wonder how many folks were doing potentially redundant order by's in subqueries

41b696ef1113 wrote at 2021-11-28 00:03:29:

The order by optimization was a work-around for auto-generated SQL. The original (?) issue [0]

[0]

https://sqlite.org/forum/forumpost/2d76f2bcf65d256a

jokoon wrote at 2021-11-27 21:08:21:

I wonder if Spatialite is still maintained...

durkie wrote at 2021-11-27 23:17:04:

seems to be -- there was a major new version out in february this year i believe. still works great, even though it's kind of weird to switch between spatialite and postgis.

unbanned wrote at 2021-11-28 01:01:59:

Cascade deletes yet?

3dfan wrote at 2021-11-27 21:21:37:

It seems to be a universal law of software projects:

1: Start out lean and simple. Often replacing an old complex one

2: Become more complex over time

3: Be replaced by a new lean and simple one

ComputerGuru wrote at 2021-11-27 21:36:53:

If anything, this behavior is far simpler than trying to automagically coerce types and marshal differing types to/from one column.

nikeee wrote at 2021-11-27 21:47:51:

I think the point here is that there are now two modes of operation, which leads to more complexity overall.

dkjaudyeqooe wrote at 2021-11-27 21:55:50:

Strict tables still do both of those things.

curiousmindz wrote at 2021-11-27 21:28:03:

I think this is a concerning pattern when the core usage of the software gets more complex due to the new features.

If you can take a developer using the old version that was lean and simple, give them the latest version full of extra features, and they can use that version without any need to change anything, then I think it's fine.

And, so far, I think that SQLite has done a good job of keeping their core usage lean and simple.