💾 Archived View for dioskouroi.xyz › thread › 25010802 captured on 2020-11-07 at 00:57:39. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Fast open-source intrusion detection

Author: lelf

Score: 124

Comments: 15

Date: 2020-11-06 20:26:58

Web Link

________________________________________________________________________________

yabones wrote at 2020-11-06 21:11:36:

From what I understand, this is essentially an extension to Snort that uses high performance NICs with built-in FPGAs to do the heavy lifting.

Honestly, I don't really think this is the best way forward. Using Suricata on cheap hardware with OSS rulesets I can get _very close_ to gigabit throughput. Instead of relying on specialized hardware Suricata can use CUDA making it much more accessible.

https://suricata-ids.org/

syoc wrote at 2020-11-06 22:02:15:

Suricata can do 40Gb/s on a single server using AF_XDP and eBPF for bypassing traffic that will not be inspected. (Think TLS after the handshake)

https://suricon.net/wp-content/uploads/2019/11/SURICON2019_P...

_wldu wrote at 2020-11-06 21:40:05:

I like Suricata, too, but am disappointed that they are dropping Snort's unified2 binary file support. Binary file formats are so much faster and efficient than JSON. I have no idea why projects use JSON when they expect to produce tens or hundreds of GB of data. When monitoring 100 Gb links, JSON marshaling and unmarshaling is so wasteful and costly and one reason why they need clusters of servers that expand easily.

yabones wrote at 2020-11-06 22:19:15:

I don't think anybody disagrees that json is wasteful and costly, it's just that most people see it as the necessary evil to make the data portable and easy to integrate with different SIEMs or databases.

I don't have a ton of experience with unified2, but it's certainly not as simple to bring into something like Elasticsearch compared to JSON where you basically just point fluentd at it and let it do all the work. All of the common log management tools are built for text files, so why not play to their strengths?

GordonS wrote at 2020-11-06 23:35:05:

JSON is really useful as a portable, human-readable format that is also (relatively) easily machine-readable.

Under the hood, doesn't Suricata transform the JSON config to an efficient binary representation though?

(I've never used Suricata, and haven't used Snort or seen the code for ~15 years, but it seems the obvious thing to do)

syoc wrote at 2020-11-06 22:06:25:

unified2 is not a great format and does not sound very flexible for further development. The JSON marshaling is expensive but is not the first of my worries when doing line rate inspection on traffic at those rates.

I think most shops have a million integrations that they want to set up for their IDS alerts and JSON is pretty close to a lingua franca.

_wldu wrote at 2020-11-06 22:27:11:

JSON is widely used. There is no arguing that. And for small to medium networks, JSON is fine. However, when you are monitoring multiple 100 Gb links and you produce hundreds of GB of logs a day, JSON is the wrong format. Especially if you want to search in a reasonable period of time.

Zeek can output JSON as well (in addition to plaintext). I've done comparisons. Using jq to search zeek JSON is five times slower than using the C++ simdjson library (fastest JSON parser known to humankind). And simdjson is three times slower than bro-cut on the plaintext logs. This may not sound like a lot, but it is extremely important when monitoring multiple 100Gb links and you have 40 GB JSON conn logs every hour and you need queries to run in a reasonable period of time.

justinsaccount wrote at 2020-11-06 23:58:23:

jq is just slow.. using github.com/buger/jsonparser to slice out some fields from a large json log is about 10x faster than using jq.

The most common mistake I see is people composing pipelines and putting jq before grep. Or worse, using select() in jq. The grep should always come first. Even if you need to do something like

        .. | fgrep value | jq ... | fgrep value

to pre-filter, and then filter again to rule out any false positives.

I have some tooling built around jsonparser and fastjson that I need to split out and open source. jq just makes me sad.. very capable, but overkill for what most people use it for and very slow for the common case.

source: wrote the C version of bro-cut that you like so much :)

Godel_unicode wrote at 2020-11-06 23:43:35:

If you're generating that much logging though, you're definitely not using brocut/jq/etc, you're going to be well into Kafka/Hadoop/elk territory. See also, Apache metron.

matsur wrote at 2020-11-06 22:44:03:

As an aside from the paper, I'm a PM at Cloudflare and very interested in hearing from current IDS users on what you'd like to see out of the edge IDS we're building. Reach out to rustam @ cloudflare if you'd like to chat!

Sephr wrote at 2020-11-06 21:40:23:

It looks like CUDA support was removed back in February 2018 with the release of Suricata 4.1.

tw04 wrote at 2020-11-07 02:19:49:

>Instead of relying on specialized hardware Suricata can use CUDA making it much more accessible.

Wait, what? Instead of using specialized hardware... you can use an Nvidia GPU (specialized hardware)?

At least the smartNIC uses about 1/10th the power of most Nvidia GPUs, and I don't have to chew up an extra PCIe slot.

justinsaccount wrote at 2020-11-06 21:29:36:

small print: limited to 100K concurrent flows and 10k rules.

Just checked a box running zeek/suricata that is seeing 25gbps with about 50k rules, it's currently tracking a bit north of 1 million flows.

so you'd only need 10 of their 'fastest' system to handle what this one system is doing.

new_realist wrote at 2020-11-07 05:28:20:

To look at more than metadata, so these systems require a MITM attack? Most data these days is encrypted.

pharos92 wrote at 2020-11-07 01:08:25:

If only there was a 1/10/40Gbps option :( - a $10-10k is out of reach... Awesome research though, well done team.