💾 Archived View for dioskouroi.xyz › thread › 29388588 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
Nearly off-topic remark. The pandemic pushed our family to get a Netflix account. Yesterday, I did a small review of my router logs. Before Netflix we had an average 60GB download traffic per month, since Netflix we are at around 400GB per month.
Even so (or because) I started my Internet life with a 14.4 kbps modem, went through all the upgrades and now have a 70 Mbits DSL line at home, manage servers all over the world, I was surprised by the incredible amount of data streaming is moving.
An extra 340GB per month. At 10Mbps that is roughly 75 hours of Netflix per month. And that is for the whole family. So really not that much.
But I have been mentioning this for quite some time, we are fundamentally limited by our time spent to watch video. Which means our appetite for Data wont grow forever.
Resolution and framerate requirements will only expand. If Facebook succeeds at the metaverse idea where everyone lives in VR after work, you’re going to have every member of a household consuming a serverside rendering of 360 degree extremely high resolution and high framerate feed.
I was also surprised, I was living in Finland for a while, and they often have unlimited data in their mobile phone plans there.
We used an iPad to watch Netflix, while sharing my connection with my phone. Watched the data downloaded after 3 or 4 months, it was close to 2TB. It seems... unreal?
3.2TB/month ends up being 24/7 at 10mbs
How much Netflix were you watching?
I think the commenter you are replying to meant 2TB after 3 or 4 months. If so, I’m guessing that sounds more reasonable
just a quick note here, most of that traffic was probably served from your ISP's network and never reached AWS:
https://openconnect.netflix.com/en/
And yet it still counts against my Xfinity monthly bandwidth cap. Pretty soon our wireline ISPs will be playing the same games we've seen in the mobile space for years, uncharging for data etc.
Surely Netflix pays only a small fraction of list price for AWS.
I wonder how that reduced price might influence the technical advice from Netflix.
Further to this …. following the technical advice of Netflix on AWS may well bankrupt you given how low Netflix pays for AWS.
I wouldn’t be racing to emulate the way they do things.
> I wouldn’t be racing to emulate the way they do things.
Netflix or not, emulating how someone else conducts their business without a firm understanding of the underlying reasons is often a bad idea.
I imagine most medium to large enterprises using AWS have some discount, especially if you are operating at the kinds of scale that this article deals in.
So reading this article am I understanding correctly that multiple VMs are mounting the same EBS volumes, and some write while others read?
What's the underlying technology here? If EBS is implemented over iSCSI AFAIK this is not supported currently available to do in most distros.
It is interesting to see that they could move 76.8GB of data in 1 hour per-instance with the old architecture, while I can move 120GB of data in 1 hour locally between 2 laptops over wifi.
The article mentions that this is due to S3 throttling.
An additional issue is that the same interface is used for serving the data out to the cache customers as is used for fetching from S3, (serving hundreds of gigs to hundreds/thousands of clients from ram) which is something your laptop also probably isn't doing.
S3 is big and reliable. It isn't fast or cheap.
I think it is fair to expect that even with the interface actively serving traffic, the data transfer over s3 should still be faster than the data transfer over wifi. Someone should take a closer look at that S3 throttling?
Is this to cache huge static files such as videos encoded in multiple resolutions/formats?
No, this isn’t for video data, those tend to just be stored on S3. EVCache is like memcached for small data.
What file system works while being mounted on two different operating systems?
Not the one they're using, apparenrtly.
> On the destination instance, the EBS volume is mounted with RO (Read-Only) permissions. But the caveat to this is — we do not use a clustered file system on top of EBS. So the destination side can’t see the changes made to EBS volume instantaneously. Instead, the Cache Populator instance will unmount and mount the EBS volume to see the latest changes that happened to the file system. This allows both writers and readers to work concurrently, thereby speeding up the entire warming process.
In the before times, I had read about attaching a disk to two different scsi controllers (in different hosts) and you can also do that with fibre channel or extra fancy double ended sas drives. But that was almost always as a way to access the drive from one system at a time.
Solaris has (had?) QFS which allows for multi-host access:
> _Shared QFS adds a multi-writer global filesystem, allowing multiple machines to read from & write to the same disks concurrently through the use of multi-ported disks or a storage area network. (QFS also has a single-writer/multi-reader mode which can be used to share disks between hosts without the need for a network connection.)_
*
https://en.wikipedia.org/wiki/QFS
A few jobs ago I inherited a setup where files/artefacts were uploaded to a file server but had to be shared with clients, but for 'legal' reasons the internal host could not be exposed to the outside world—but (some of?) the data could be. So an external disk pack was purchased and one SAS port went to the internal machine and the other SAS port went to the external machine.
Cluster filesystems are designed for this - Ceph, CXFS, Isilon, Gluster, I'm probably forgetting one or two more. (Notably, _not_ ZFS.) It used to be a much rarer need, but with the advent of VMs all over the place, it's a need that the market hadn't seen fit to produce a quality cluster filesystem that also has Windows and macOS drivers, in addition to Linux & BSD.
I'm guessing any filesystem would work, when it's only writing on one server.
"On the destination instance, the EBS volume is mounted with RO (Read-Only) permissions. But the caveat to this is — we do not use a clustered file system on top of EBS. So the destination side can’t see the changes made to EBS volume instantaneously. "
Nah. Because not every intermediate state of a filesystem is self-consistent, and because caching can "tear" updates. So Amazon's multi-attach documentation says:
> Standard file systems, such as XFS and EXT4, are not designed to be accessed simultaneously by multiple servers, such as EC2 instances. Using Multi-Attach with a standard file system can result in data corruption or loss, so this is not safe for production workloads. You can use a clustered file system to ensure data resiliency and reliability for production workloads.
> Multi-Attach enabled volumes do not support I/O fencing. I/O fencing protocols control write access in a shared storage environment to maintain data consistency. Your applications must provide write ordering for the attached instances to maintain data consistency.
Or from the Netflix document:
> It ignores files with partial file system metadata. Cache populators can see incomplete file system metadata due to the way it mounts the filesystem, with “No Recovery” option. It proceeds with complete files, leaving incomplete files to be processed by a subsequent iteration. This is explained further in step 7.
AKA you get random read errors, etc, and need to cope.
Behavior of something like this can be expected to be very kernel version and filesystem dependent, even with the most defensive application access strategy.
might be kinda fun to build a "ringbufferfs" that uses multi attach ebs volumes as slow shared memory for pushing bits around via the ebs side channel.
probably cheaper and more reliable too than provisioning a full size volume and relying on implementation quirks in existing single host filesystems and ebs.
It sounds like one would have to code very defensively to make that work safely.