💾 Archived View for svmhdvn.name › posts › array-language-io-idioms.gmi captured on 2024-06-16 at 12:03:37. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Around three years ago, I came across a post on Hacker News announcing the first episode of The Array Cast, a podcast about the array programming languages. This post caught my eye quite quickly because I had never heard of the array programming paradigm before and wanted to learn more about it. I was working for NVIDIA at the time, so I was (and still am) enthralled by the performance gains of restructuring code to take advantage of large-scale parallelism. Upon first glance, I understood it as a way to express homogeneous computations over a collection of data without explicitly writing housekeeping code to loop over the collection. It registered in my mind as a SIMD-first (Single Instruction, Multiple Data) paradigm.
Hacker News: The Array Cast – A podcast about the array programming languages
Life got in the way of spending quality time doing a proper deep-dive into the paradigm. I decided to first learn the J language after coming across a nice book called “J for C programmers”, which was definitely applicable to me. I took advantage of a long flight to read through a good chunk of the book, stopping after the “Input and Output” section. It was a good experience for me because the ASCII symbols were easy to type in a normal text editor and the wiki was very detailed (even if it was quite dense) when I was in need of help. The concepts of abstracting out the loops and carefully constructing customized “views” over raw multidimensional data became very attractive.
At this point, I started to follow the ArrayCast and listen to every episode as they came out. I really love it! The technical discussion interleaved with humour and stories from all the accomplished array programmers around the world is a great way to learn more about this powerful paradigm. As of now, I'm really interested in BQN and will be spending some free time reading through all the wonderful documentation.
To someone who has been programming for 15 years now, learning the syntax and understanding the parsing/evaluation semantics of an array language is not too difficult. Discovering and internalizing the idioms that experienced programmers use to perform real-world tasks is the hard part. For example, as a POSIX shell enthusiast, I might be able to easily conjure certain blocks of code for chaining together a large pipeline of bite-sized string processing utilities to parse some arbitrary text data. Others might find it to be a daunting task to scour the POSIX specifications to find exactly the flags they are looking for to manipulate the data in a certain way. I currently feel like this with the array programming languages.
What I have noticed is that although the currently available documentation and pedagogical material is really high quality and quite captivating, it misses a lot of the idioms that newcomers might need to start writing programs with real-world data. I'm grateful for the existence of the “J for C programmers” book for this reason, since it does use a practical approach to appeal to procedural programmers like me who are used to consuming data in a certain C-oriented way. But what I'm really looking for (and this may be a bit lazy of me, but this is an unfiltered thought of mine) experienced array programmers to teach the best practices of:
This is my question to the panel of The Array Cast. As I am currently learning BQN, my question will be more tailored towards it, but I love the paradigm as a whole and all languages within it.
Conor's YouTube videos and podcast appearances in CodeRecursive and ADSP are a great source of content for algorithmic and combinatorial idioms in BQN using practical problem-solving techniques. These are great for solving isolated tasks such as the ones found in LeetCode, Project Euler, Advent of Code, and similar coding challenges.
Marshall and BQN contributors’ documentation on the BQN website is extensive, articulate, and quite captivating with its high quality content and comedic references. The BQN source code is also full of incredibly high content pedagogical material on the topic of data interchange, such as the following:
A Markdown parser used to generate the beautiful website
A multithreaded Make-like build tool used to build CBQN
The BQN-libs project that provides various utilities for data interchange
Going through the last of the above points primitive-by-primitive should really help a newcomer to learn some idioms, and I have put this task on my TODO list. However, this prompts the question in the heading of this section. Hypothetically, in an ideal world where array programming languages are the “lingua franca” of computer programming, what on-disk and in-memory data interchange formats and data-organizational methods are IDEAL for array language consumption? I realize that this is a context-dependent and industry/domain-dependent question, but I observe a few patterns in the industry right now:
These are all examples of widely used formats that are ripe for consumption by the Iversonian array languages. But I don't know where to start. I would absolutely love to listen to a discussion on this topic and learn what the best practices are in the eyes of all the panel members of The Array Cast. Thanks for the incredible discussions so far! I can't wait to listen to the upcoming episodes.