💾 Archived View for g.mikf.pl › gemlog › 2022-11-14-powershell.gmi captured on 2024-09-28 at 23:54:40. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-01-29)

-=-=-=-=-=-=-

Powershell oneliner to search for files in zips inside zips and more fun

2022-11-14

Today I had Powershell at hand instead of Bash and I had a zip containing zips that I needed to find files matching a certian prefix while also getting the information from which zip in which zip did the file come from. What I ended up with:

'filename one.zip','filename two.zip' |
ForEach-Object {
    $topzip = $_;
    [io.compression.zipfile]::OpenRead($_).Entries |
    ForEach-Object {
        [PSCustomObject]@{
            TopZip = $topzip;
            Name = $_.Name;
            Thing = $_;
        }
    }
} | Where-Object { $_.Name -match "\.zip$" } |
ForEach-Object {
    $subzip = $_.Name;
    $topzip = $_.TopZip;
    [io.compression.ziparchive]([io.stream]($_.Thing.Open())) |
    ForEach-Object { $_.Entries.Name } |
    Where-Object { $_ -match "prefix.*" } |
    ForEach-Object {
        [PSCustomObject]@{
            TopZip = $topzip;
            SubZip = $subzip;
            Found = $_
        }
    }
}

The execution of it has to be preceded with

Add-Type -Assembly "system.io.compression.filesystem"

It results in three columns: TopZip with the outer zip filename, SubZip with the inner zip filename, and Found with the filename of the file. The filenames are given in the beginning as an array to be piped, and the prefix (as a regex) is in the Where-Object in the second top-level-in-pipe ForEach-Object. I maybe could do something about reading the entries because it's just the filename instead of the relative path in the archive, but I hadn't needed that that much there.

The code has been run in regular Windows Powershell, i.e. 5. However I also am sometimes using Powershell Core. It's not pleasant to set up with Windows Terminal however, so unless I take some time on each new machine, I by default end up with having to use its regular Powershell console window.

That has been the first time in quite a while that I sat down to Powershell again. My inspiration has been a pen gal I found on Slowly app.

Since I am recently more and more using two Windows machines of mine, one being a regular laptop and the other being my very portable 1st-gen ThinkPad Helix that I can easily pack alongside my work laptop when I take the trip to the office sometimes, I very much want to transition my workflows to Powershell, not having to use some Git Bash or WSL Bash or some other MinGW Cygwin whatever stuff.

One of those things is a need to rewrite my Gemini capsule sourcehut pages publishing flow, which would be

2022-08-30-comeback.gmi

along with the tinylog gemfeed generator addition i mentioned in tinylog timestamped Sep 27 2022 9:20 CEST

Therefore there has been an attempt (an illegible trial-and-error mess):

$GemCap = "~/gemcap"
$TotalDirectorySize = (Get-ChildItem $GemCap -File -Recurse |
    Measure-Object -Sum Length |
    Select-Object -ExpandProperty Sum)
$TarStream = ([System.IO.MemoryStream]::new([int]$TotalDirectorySize * 2))
Add-Type -Assembly "system.formats.tar"
$TarWriter = ([System.Formats.Tar.TarWriter]::new($TarStream))

Add-Type -Assembly "system.io.compression.filesystem"
#$GzOut = ([System.IO.MemoryStream]::new($TotalDirectorySize))
$GzOut = [System.IO.File]::OpenWrite("C:\Users\m\helo.tar")
# $Gzipper = ([System.IO.Compression.GZipStream]::new(
    # $GzOut, [System.IO.Compression.CompressionLevel]3))
    #[System.IO.Stream]([System.IO.File]::OpenWrite( "C:\Users\m\helo.tgz" )), [int]81920
# $Promise = ($TarStream.CopyToAsync($Gzipper))
#$Promise = ($TarStream.CopyToAsync($GzOut))
Get-ChildItem $GemCap -File -Recurse -Name |
Where-Object { $_ -NotMatch "^(.*[\\\/])?\..*" } |
ForEach-Object { $TarWriter.WriteEntry( (Join-Path $GemCap $_ -Resolve),$_ ) }
#$Promise.Wait()
$TarStream.CopyTo($GzOut)

That is a chaotic mess of trial-end-error of trying to make not even just gzip but even the TarWrite alone work and it doesn't want to work too well so far.

I'm giving the MemoryStream for the plain tar twice the all-files size sum initial capacity, and the MemoryStream for the gzip once ditto. I am filtering out the dotfiles, especially as for example .gitignore is not considered hidden by Get-ChildItem.

I am quite stuck at this time but I will see if I'll manage to break thru it further.

More to come in coming (hopefully) posts.