Image Scaling Attacks

Author: wendythehacker

Score: 426

Comments: 69

Date: 2020-10-29 05:48:34

________________________________________________________________________________

tgsovlerkhgsel wrote at 2020-10-29 07:15:37:

This obviously works when the image is "scaled" by sampling/nearest-neighbor (e.g. downscaling 2x by taking every second pixel and discarding the rest), not actually scaled through some better method (by doing math that involves all pixel values).

What the article doesn't mention, and the paper it links to probably mentions somewhere alongside so much irrelevant information that I couldn't find it yet, is whether this also works on some of the better scaling algorithms, and thus whether it's a "duh, OBVIOUSLY" or actually interesting research.

The blog post gives a cv2.resize example which seems to default to "bilinear", but I'm not sure what this means for _down_scaling, in particular for downscaling by a large factor.

I suspect that the key takeaway is "default downscaling methods are bad".

dietrichepp wrote at 2020-10-29 08:06:54:

There's another way to hide the image, and that is to exploit the nonlinearity of the response curves (gamma).

I have an image I crafted a long time ago which looks something like gray noise when you open it up, but when you downscale it, you see an image of Lt Cmdr Data from Star Trek. I wonder if I can dig it up.

The technique itself was not novel when I did it, a more sophisticated version involving embedded gamma values (which you can make quite large or small) was routinely used on image boards some ten or fifteen years ago.

kuschku wrote at 2020-10-29 11:26:00:

It's ridiculous that so few websites actually handle this well. Even my own self-written imgur clone does it just fine:

https://i.k8r.eu/i/F_XCMA

https://i.k8r.eu/F_XCMAm.png

https://i.k8r.eu/F_XCMAt.png

You just have to go into a linear colorspace and use an area filter.

bawolff wrote at 2020-10-30 06:20:35:

Fwiw, the reason why wikipedis doesn't do this when rescaling images (or at least didn't years ago when i was working on image resizing code for wikipedia) is that to do that (with off the shelf software) required keeping the entire image in memory, which was a big no no. I mean, i guess it would be fine for small images, but then you're using two different algorithms depending on image size, which seems bad.

05greg wrote at 2020-10-29 11:41:41:

Related, You can get an idea of what your browser display is doing in this shadertoy:

https://www.shadertoy.com/view/Wd2yRt

VMG wrote at 2020-10-29 09:17:39:

is it like this?

http://www.ericbrasseur.org/gamma.html?i=1

TeMPOraL wrote at 2020-10-29 10:37:40:

The article links to this browser test page:

http://www.ericbrasseur.org/gamma_dalai_lama.html

On my machine, both Firefox and Chrome display grey rectangles when scaling down. Why do the browsers get this wrong?

tech2 wrote at 2020-10-29 12:31:50:

Because resizing in a linear colorspace is more costly. JPEG can be resized without shifting colorspaces VERY cheaply, but requires loading into RAM if a change in colorspace (or gamma shift) is performed. The hit can be quite significant. On a phone or laptop it would hurt battery, on an online service (dynamic resizer service) it would impact latency.

bawolff wrote at 2020-10-30 06:25:38:

> on an online service (dynamic resizer service) it would impact latency.

If its even possible at all. Sometimes users upload things like

https://commons.wikimedia.org/wiki/File:“Declaration_of_vict...

choppaface wrote at 2020-10-29 16:26:03:

Can also depend on the monitor? When I drag this page between monitors I see different effects.

bonoboTP wrote at 2020-10-29 09:17:06:

You have to use AREA interpolation for downscaling. Bilinear will only interpolate among the 4 nearest source image pixels. It still ignores most of the source pixels.

This is in essence a special version of sampling artifacts, aliasing artifacts. Anyone writing image processing software should already know about aliasing, the Nyquist theorem etc. Or, well, perhaps not in the current hype, where everyone is a computer vision expert who took one Keras tutorial...

Resizing with nearest neighbor or bilinear (ie ignoring aliasing) also hurts ML accuracy, so they better fix it even regardless of this specific "attack".

contravariant wrote at 2020-10-29 21:06:38:

Bilinear _could_ mean downscaling with a triangle kernel, but it might well be the standard bilinear interpolation that's native to most GPUs and OSs.

Also area interpolation still has some pretty terrible aliasing, since box kernels are terrible at filtering high frequencies.

And of course with downscaling you could still freely manipulate the downscaled image if you're allowed to use ridiculously high or low values, provided you knew the exact kernel used.

bonoboTP wrote at 2020-10-30 09:53:26:

Bilinear uses the triangular kernel over the source image (with size corresponding to the input pixel size).

Area interp works very well in practice, it's more sophisticated than just a box filter on the input and sampling. It calculates the exact intersecting footprint sizes and computes a weighted average. Do you have examples where this causes aliasing and can show a better alternative?

contravariant wrote at 2020-10-30 16:17:02:

You can use any image with a high frequency regular pattern. Wikipedia has the following example:

https://en.wikipedia.org/wiki/File:Moire_pattern_of_bricks_s...

Anything softer than area will help with those kind of issues (which is why the original

https://en.wikipedia.org/wiki/Aliasing#/media/File:Moire_pat...

, looks fine in most browsers even if your resize it). Bicubic tends to do better in this respect. It's a trade off though.

bonoboTP wrote at 2020-10-30 22:20:13:

Sorry, but this is wrong. Area has no aliasing, all others introduce aliasing artifacts when DOWNscaling.

https://imgur.com/a/C6utkwr

Now you could use pre-smoothing with a kernel and then resampling, but then we are talking about something else.

It's important to understand that interpolation happens in the source pixels, so it does not help when downscaling. Cubic tends to look nice, yes, but only when UPscaling.

tpoacher wrote at 2020-10-29 08:12:10:

Max Pooling could also be targeted extremely easily with this technique, and it is immensely popular as a scale reduction technique in convolutional neural networks. So, yes, it could very well be a relevant and non-trivial attack in the context of 'dataset poisoning'. (it would also be relatively easy to defend against; just don't use max-pooling in the first layer -- but the point is this is a steganographic attack).

NohatCoder wrote at 2020-10-29 08:23:03:

It is a very common, and often overlooked issue in image processing. Bilinear is widely used, and not particularly good. For large factor downscaling it is reminiscent of nearest pixel.

enriquto wrote at 2020-10-29 12:13:01:

> It is a very common (...)

Bilinear interpolation is perfectly acceptable for zooming-in an image (making it larger by adding new pixel values). If you want to zoom-out, you have can still use bilinear interpolation, but of course you have to filter the image data beforehand to avoid aliasing.

NohatCoder wrote at 2020-10-29 15:53:20:

Most often scaling and filtering is an integrated process, when one says bilinear it is usually implied that it is combined with nothing else.

bonoboTP wrote at 2020-10-29 09:19:55:

Yeah, the default implementation should check the scaling factor and use AREA interpolation when downscaling and bilinear for upscaling.

jchw wrote at 2020-10-29 09:38:54:

Whether it works or not depends on how many samples are used to downscale. Amusingly, this attack was used for bait-and-switch and “click here to [x]” gimmicks on some websites, especially 4chan, and you can find examples tuned primarily for typical thumbnail generators (which, probably for performance reasons, tend to only sample a small number of pixels.)

https://thume.ca/projects/2012/11/14/magic-png-files/

hailwren wrote at 2020-10-29 16:04:56:

You're looking for section 3.1 in [1] where they analyze the effect of scaling width and kernel size for any abitrary downscaling kernel.

> Any algorithm is vulnerable to image-scaling attacks if the ratior of pixels with highweight is small enough.

1 -

https://www.usenix.org/system/files/sec20-quiring.pdf

DarkWiiPlayer wrote at 2020-10-29 08:12:01:

Just a quick thought: If you just average the surrounding pixels, you could possibly still add occasional pixels to skew the average and create a different image, though that may be much more noticeable.

JacobiX wrote at 2020-10-29 10:18:22:

If you add occasional pixels to skew the average then probably it will be noticeable in the original image. But the interpolation scheme that uses only the four corners while ignoring the rest can be easily fooled. You can blend an entire lower resolution image in the four corner.

kevingadd wrote at 2020-10-29 09:08:31:

One key thing to be aware of is that not all "bilinear" scaling algorithms are created equal. If the "bilinear" in question is GPU-accelerated, it's quite possible that it's the Direct3D/OpenGL bilinear filter, which samples _exactly_ 4 taps of the image from the highest appropriate mip level (which may be the only one, unless the application goes out of its way to generate more). That means if the scaling ratio is less than 50%, it becomes something like a smoothed nearest neighbor filter and is vulnerable to this attack.

The introduction of a mip chain + enabling mip mapping mitigates this, because when the scaling ratio is less than 50% the GPU's texture units will select lower mips to sample from, approximating a "correct" bilinear filter. This does also require generating mips with an appropriate algorithm - there are varying approaches to this, so I suspect it is possible to create attacks against mip chain generation as well.

Thankfully, quality-focused rendering libraries are generally not vulnerable to this, because users demand high-quality filtering. A high-quality bilinear filter will use various measures to ensure that it samples an appropriate number of points in order to provide a smooth result that matches expectations.

One other potential attack against applications relying on the GPU to filter textures is that if you can manually provide mip map data, you can use that to hide alternate texture data or otherwise manipulate the result of downscaling. As far as I know the only common formats that allow providing mip data are DDS and Basis, and DDS support in most software is nonexistent. Basis is an increasingly relevant format though and could potentially be a threat, but as a lossy format it poses unique challenges.

genpfault wrote at 2020-10-29 15:42:31:

> This does also require generating mips with an appropriate algorithm - there are varying approaches to this

http://number-none.com/product/Mipmapping,%20Part%201/index....

http://number-none.com/product/Mipmapping,%20Part%202/index....

NohatCoder wrote at 2020-10-29 09:26:04:

Bilinear and trilinear with mipmap is still relatively poor. 3D also use anisotropic filtering, that eliminates a lot of artifacts, even in 2D scenarios.

blueflow wrote at 2020-10-29 08:23:44:

I remember seeing this techinqie 8 or 10 years ago on 4chan. The thumbnail was some innocuous picture, when clicked on it, it expanded to the larger version with a banana. The larger version also had these kind of dots on it.

marcan_42 wrote at 2020-10-29 08:43:42:

This is a different, related trick, which I explored in detail in PoC||GTFO 15:13.

https://archive.org/stream/pocorgtfo15#page/n96/mode/1up

This isn't based on attacking scaling algorithms per se, but rather on the fact that most browsers honor the gAMA gamma setting in PNG file headers, while most image processing libraries don't and strip it when downscaling them.

The abuse potential for AI training exists here too, but both attacks are a bit of a stretch.

DominikD wrote at 2020-10-29 13:27:01:

There was a very popular yet useless trick in the late '90, early 2000 where you'd combine two images in a checkerboard pattern: one at regular intensity, the other very bright (so it doesn't stand out that much upon regular viewing.

Internet Explorer had this feature that if you CTRL+A page contents, it would overlay images with 1px grid to indicate selection. If you got your pattern right, the hidden image would appear. This is essentially the same effect, but on steroids.

m12k wrote at 2020-10-29 08:26:06:

I'm curious about the use of the word 'attack' here - is that really what this is? If so, what exactly is being attacked? I thought this kind of thing was called steganography

josefx wrote at 2020-10-29 09:18:22:

The attack part seems to be that husky ai is down scaling images it uses to train its model. If it was vulnerable to this attack its down scaling would expose the hidden image and use that for training instead of the user visible image. I think this could be used to trick manual or even automated reviews of the input.

javchz wrote at 2020-10-29 08:34:18:

My guess it's an evil actor could contaminate a training data set with hidden images, resulting in a faulty ML model.

... but yeah, it's a screech as the application in the real world, seems to be a really specific case to work.

steerablesafe wrote at 2020-10-29 09:04:08:

I guess you can potentially bypass automatic content filters on social media for example.

andreareina wrote at 2020-10-29 08:28:59:

Steganography usually has the recipient intending to get the hidden message. Since this is about fooling the recipient "attack" seems apt.

Mordisquitos wrote at 2020-10-29 08:15:38:

This reminds me that a few years ago (almost two decades?) there was a lot of concern online, almost "moral panic", about the potential of digital steganography to hide information in public image files.

Even if this method is not feasible as an attack vector, at the very least it looks like a very practical way to share information that otherwise would be censored or restricted–all the more so if the hidden image data can be encrypted, which may make it impossible to detect.

On the other hand, I know nothing about steganography and I'm talking out of my arse, so maybe current steganography methods are much more powerful.

Karawebnetwork wrote at 2020-10-29 15:20:53:

I remember in the early 00's that people would share books and movies by using a simple command that would allow someone to zip an archive into a jpeg. For example, they would put a book's pdf file in an image of its cover. Someone else could then download the image, unzip it and get it's contents.

I can easily imagine how someone could use this for nefarious purposes.

callamdelaney wrote at 2020-10-29 13:33:22:

very recently someone created a method to encode files and data into videos - this video could then be uploaded to youtube, and distributed / stored permanently there.

stevewodil wrote at 2020-10-29 17:54:31:

How could that be possible though as YouTube doesn't serve the original video file back to users? It get processed to create different video streams, so this seems pretty crazy

callamdelaney wrote at 2020-10-29 22:45:35:

According to the creator, /u/T0X1K01 on reddit:

> No, that's what's so cool about it. I explain it in more detail in the video, but basically because the videos are created using 1-bit color images, it makes it easy to retrieve data without having to worry about how YouTube changes the video.

There's a video explanation here:

https://www.youtube.com/watch?v=yu_ZIr0q5rU&feature=youtu.be

Source code here:

https://github.com/AlfredoSequeida/fvid/

An example here:

https://www.youtube.com/watch?v=NzZDFxM5Coo

SomeoneFromCA wrote at 2020-10-30 09:07:37:

But you won't be able to download it - youtube-dl is not with us anymore.

metafunctor wrote at 2020-10-29 08:33:19:

I suppose a very stupid thumbnail generator could be attacked with something like this. Proper tools for downscaling images already take this (and also gamma correction) into account.

See

http://www.ericbrasseur.org/gamma.html

steerablesafe wrote at 2020-10-29 09:00:22:

It's one thing to take non-linearity into account. But you also need to take into account the embedded colorspace information of your source image, if it has one. It's not necessarily sRGB.

oarfish wrote at 2020-10-29 12:54:06:

This redirects me to

https://support.google.com/accounts/answer/61416

sleavey wrote at 2020-10-29 06:56:19:

I was expecting the article to mention another use for this attack: to share porn on regular hosting sites and bypass automated detection systems.

mattigames wrote at 2020-10-29 07:44:22:

Mmm, I wonder if it would work for videos too.

qayxc wrote at 2020-10-29 12:38:40:

It would be spectacularly difficult to do for videos:

first there's lossy compression, which means that there's no guarantee your injected pixels survive the encoding pass.

Then there's the additional hurdle of motion vectors, which will most likely be misaligned between the original video and the injected one.

This would result in hard to predict artefacts after encoding.

Finally, each decoder handles scaling slightly differently, so even if your embedded video trick works on one software/hardware decoder, it might fail on another (sometimes even depending on just the version or additional settings/filters being enabled).

nautilus12 wrote at 2020-10-29 08:51:10:

This is what came to mind to me, to break major social sites automated censorship mechanisms, although I feel like its largely crowdsourced these days?

arendtio wrote at 2020-10-29 09:40:54:

Imagine combining this technique with the encoding of software like youtube-dl in this twitter post:

https://twitter.com/GalacticFurball/status/13197659867911577...

Probably hard to get it working in every environment, but if you know what you are up against, it might be possible ;-)

rademacher wrote at 2020-10-29 13:21:08:

Typically when you downdsample you're going to want to filter than use whatever downdsample kernel you want with the correct stride. Since the filter is lowpass, think just Fourier transform then taking an inner smaller square of the image and inverting, then you can embed the poison image only in that frequency spectrum. Now by playing with the power, if we downdsample by a factor of 4 then just assume that we lose a quarter of the power in the original image while the poison image loses no power. So right off the bat, we are scaling up the poison image power by a factor of the downsampling ratio. For example, we might go from 1/4 power in the poison image relative to the true image then to equivalent power. The other aspect would be if the interpolation kernel and strides are known we can just make sure that the poison image has large values at those specific pixels and further increase the gain.

pfortuny wrote at 2020-10-29 06:41:30:

Really impressive and ingenious. This looks scary in some sense...

barbegal wrote at 2020-10-29 07:08:24:

I thought almost everyone uses some form of interpolation when resizing which would defeat this attack completely. Or are there use cases for not using interpolation (I know it requires less processing)?

tsbinz wrote at 2020-10-29 07:10:59:

Opencv does use linear interpolation by default. What you'd need to do is to use something that helps against aliasing, for example first blurring the image with a kernel of the appropriate size or to use a scaling method like opencv's INTER_AREA.

steerablesafe wrote at 2020-10-29 09:06:39:

What actually helps here is to use linear colorspace for downscaling and to correctly detect the source image's colorspace.

tsbinz wrote at 2020-10-29 12:09:53:

Colorspaces are an issue with scaling/averaging, but it's not what's happening here.

gbh444g wrote at 2020-10-29 08:42:19:

Try filling a large square image with thin vertical lines, interpolated for smoothness, but still visibly separate from each other. The width of each line should be about 1-3 pixels. Then map the image onto polar coordinates, so the lines would meet in the middle. Finally, downscale it a couple times with a basic avg(2x2) -> 1x1 mapping. Observe an elaborate "shadow shape" in the middle that looks like r=cos(4 pi a), but with a lot more nuanced details.

enriquto wrote at 2020-10-29 12:15:05:

This is oddly specific. Can you point to a realistic scenario where this makes sense?

gbh444g wrote at 2020-10-29 23:48:15:

I'm just working on a very particular app in the webgl rendering space and noticed this mysterious glitch in this very case. First I though I've discovered something interesting, but turned out it was just the antialiasing bug being discussed here. I'll share a link to that demo on HN a little later: my account is still green and I'm afraid HN would shadowban me and my domain for sharing links now.

BATHING wrote at 2020-10-29 15:34:11:

A Square, Lines... polarise... A Landscape maybe ? Good News, talking about Art, _um_ someone did... (-;

http://www.bildschirmarbeiter.com/content/images/coole_graff...

maybe it is just my humor but... ^^

miguelmota wrote at 2020-10-29 08:51:02:

More info:

https://scaling-attacks.net/

tgv wrote at 2020-10-29 07:03:01:

For those wondering how it works: it's explained in the article linked in the third paragraph. It takes advantage of aliasing.

lifeisstillgood wrote at 2020-10-29 09:07:00:

So in the same way we build pipelines to sanitise user text input (Little Bobby Tables etc) we need to treat image data in the same way - I guess a pipeline that uses openCV to detect an image in full size and thumbnail - and if they are widely different flag for review ?

It's still cool though

Aerroon wrote at 2020-10-29 07:18:34:

It's definitely neat that it works that way, but I don't really see it as a problem.

SeeManDo wrote at 2020-10-29 12:59:59:

In the example "atttack image" the husky and the outline of the fence in the sky. "That's amazing!"

DDR0 wrote at 2020-10-29 06:34:29:

Hah, this is kind of brilliant. Hiding in plain sight…

nullc wrote at 2020-10-29 07:25:45:

Oh good, perhaps things will start defaulting to less aliasy downsampling kernels now.

chrisallick wrote at 2020-10-29 16:07:22:

thats incredible!

redgc wrote at 2020-10-29 09:03:11:

I did not see in the article nor so far here in the comments one example of this in the wild, which perhaps indicates that using such simple sampling approach isn't common? If someone could successfully execute this against Twitter or Reddit, for example - that changes its newsworthiness completely.