Attribution for the above image: ArnoldReinhold, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons
This chapter is kind of long and glosses over a lot of details because it covers about 25 years of computing history. In this chapter, I'm showing you some examples of source code in older programming languages. Don't worry if you can't understand those code examples. The point of this chapter is not to teach you those languages, just like the point of a dinosaur exhibit is not to teach you bird anatomy. The code snippets here are to show you how programming languages evolved over time to approach the forms they have now; in subsequent chapters, where we talk about modern programming languages, you will see echoes of the design of these older languages.
In the beginning, there was the machine. And the machine's instruction set was a practically arbitrary assortment of binary sequences representing different opcodes. And the programmer saw that it was Not Good. So from the programmer's imagination came the word, and the name of the word was Assembly.
The first assembler was written in 1947 by Kathleen Booth. She indubitably found it very frustrating to have to remember a bunch of arbitrary numbers representing machine instructions. You could certainly look them up in a reference manual, but that would be frustrating too, because it would be very tedious. Isn't the whole point of computers supposed to be that they help save us the tedium and frustration of processing and referencing information? So instead of making the programmer look up or remember the opcodes for machine instructions, why not make the machine do it? That's what an assembler does. Instead of code looking like this:
10010010 11100100 00000011
you can write code that looks like this:
add ra 3
and that is clearly much easier to remember if you want to write code to add the value 3 to the value in register A.
Notwithstanding macho galaxy-brain programmers like the legendary Melvin Kaye of the Royal McBee Computer Company, who took pride in writing raw machine code by hand basically just because it was harder, Assembly programming quickly caught on as a much more pragmatic way to code. From 1947 through the mid-1950s, programming generally meant programming in Assembly, and advancements in programming technology mostly meant assemblers that were more efficient and easier to use. Simple "single-pass" assemblers like Mrs. Booth's soon gave rise to "macro assemblers", which would allow you to define new "words" (or "macros") that would be translated into many Assembly instructions before finally being assembled into machine code. Whereas a single line of plain Assembly would concretely represent a single opcode, "macros" could be more flexible by allowing you to represent many instructions with a single line of code. In other words, they were at a higher level of abstraction.
If you're curious, click here to read the story of Melvin Kaye.
Early computers ran programs in a manner that we would now call "bare metal" programming, because the computer had no software on it except the software you supplied it with. For example, if you wanted to read or write some data on an external magnetic tape drive, you would have to learn the details of how the tape reels and read/write head physically work, then you would have to write some code to precisely direct electrical pulses to the correct physical connector pins to operate the reels without jamming the tape or accidentally overwriting important data. If you wanted to use the tape drive with a different computer, you would have to rewrite all that code from scratch. If you wanted to use a different tape drive with the same computer, you guessed it: you would have to rewrite all that code from scratch. If you want to save yourself some time by getting someone else to write the "driver" code for the tape drive, well, you better make sure that you *fully* understand the eye-glazing pages upon pages of Assembly code the other person wrote, lest your own code do something that causes the other person's code to misbehave.
And it wasn't just tape drives: do you want your program to interact with a keyboard? You have to write driver code to make your specific computer model interface with the specific electric typewriter you want to use. Do you want your program to interact with a screen? You have to write driver code to make your specific computer model interface with the specific television set you want to use. Do you want your program to interact with another program running on a different computer? May God have mercy on your mortal soul.
In those days, programs were mostly written on punchcards, which were a major improvement over front panels. Punchcards are literally little sheets of cardstock with holes punched in them. They were first used in 1804 by a guy named Joseph Marie Jacquard when he invented the "Jacquard Loom", which weaves fabric according to the pattern described by the card. Later on, in 1890, they were used by a predecessor to IBM to "tabulate" the results of the 1890 U.S. census. With such an extensive history of use as a medium for giving stored instructions to machines, it seemed like an obvious choice for how to store programs to run on computers.
To write programs on punchcards, you would use a machine called a "keypunch". A keypunch is a machine that has a keyboard, which punches holes in punchcards when you press the keys, with specific patterns of holes representing different letters, numbers, and punctuation marks. It would also print the letters you typed at the top of the cards, to make them more readable. You would take a stack of unpunched cards, write your program on them with the keypunch, and then you would give the stack of cards to the computer's operator, a person whose job it was to feed your punched cards into the computer, run the assembler on them, then tell the computer to execute the assembled program, then collect the printed output from the computer, and then give that output to you along with your stack of cards.
Computers back then were really slow - like, on the order of a *million* times slower than modern computers. The amount of RAM a computer had would typically be measured in *kilobytes*. When you submitted your stack of punched cards to the operator, they usually wouldn't be able to give the cards and the output back to you until the *next day*. This system was called "batch programming", because the operator would feed a "batch" of programs into the computer, run them overnight, and then return the programs and the output to the programmers the next morning. If you had a syntax error in your program (basically, the programming equivalent of a spelling or grammar mistake), you might not find out about it until the next day; imagine how frustrating that must have been.
An operator's job was to load programs into main memory, assemble them into machine code, execute them, and return the output to the user. In addition to all of this, an operator had to balance the competing demands of different programmers wanting to run different programs on a single machine. To keep things fair, an operator had to have some kind of system in place to determine the priority by which programs should be run and what to do when a program has an error (or when the machine has an error!).
All of that must have made being an operator a tedious and potentially stressful job. What is to be done when a programmer learns that someone is doing a tedious and potentially stressful job, especially when it directly concerns computers? Why, to automate it, of course. People began writing programs to automate the operator's job so that the operator would be able to get more done and enjoy their personal life more. Such programs were called "subroutines" at the time. A subroutine might do something like, automatically determine a fair and balanced order in which to run a batch of programs, or be invoked when a program encounters an unexpected error condition in order to handle the error. An assembler could be considered a kind of subroutine, and a moderately complicated one at that. As time went on, more and more of the operator's job got automated in this way, until there was so little left for the operator to do that there was no longer any need for the job title. A program (or collection of programs) that does all the things that would previously have been an operator's job is called an "operating system".
Until about 1972 or so, operating systems were written pretty much exclusively in Assembly. For that whole time, most people thought that Assembly was the only language that could be used to write an operating system. It has to be fast, and it has to work with the nitty-gritty details of a specific machine; therefore, how could you use any other language to do it? But for other programs, programs that ran *on top of* an operating system, programs that didn't necessarily have to be as fast as possible nor worry too much about how a tape drive works, people began to seek out easier solutions.
Early on, computers weren't typically thought of as general-purpose "do everything" devices like how we think of them today. They were viewed more like how we think of calculators: they are machines that do math. When they were big enough to take up a whole room and cost millions of dollars, they had to do math much more quickly and accurately than a person could do by hand to justify the expense.
An early use-case for computers was to calculate trajectories for missiles. Rockets and ballistic missiles alike require calculus to be able to strike a target with accuracy. To compute these in mere minutes or seconds is something you can't reasonably expect from human mathematicians, especially not in a high-pressure situation like a war. If you want to use an ICBM to nuke Moscow, but your figures are off and it lands in a sparsely inhabited region in Siberia, you've just wasted a nuclear warhead *and* lost the first-strike advantage to the Soviets. It's a very grim, very Dr. Strangelove type situation.
The thing is, computers typically don't have opcodes built into them for calculating derivatives. Typically, they only support basic mathematic operations: adding, subtracting, maybe multiplying. Some of them didn't even have hardware support for division; you would have to write subroutines to do that. So Assembly language wasn't necessarily a great language for expressing complicated mathematical formulas. They still made it *work*, because it was the *only* language you could use that was better than machine code. But it wasn't ideal. If only there was some way you could have a program that would work as like... a formula translator. So you could write your FORmulas in a language that's more concise and readable than Assembly, and then a program would TRANslate them into Assembly for you.
So thought John Backus in 1954 when he invented Fortran. No longer would programs have to be written in terms of "add such-and-such to this register, then move this value from this address to this register, then store the result of subtracting two from it to this address", on and on for thousands of mind-numbing lines. Instead, you could now write programs like this example taken from Wikipedia:
C AREA OF A TRIANGLE WITH A STANDARD SQUARE ROOT FUNCTION C INPUT - TAPE READER UNIT 5, INTEGER INPUT C OUTPUT - LINE PRINTER UNIT 6, REAL OUTPUT C INPUT ERROR DISPLAY ERROR OUTPUT CODE 1 IN JOB CONTROL LISTING READ INPUT TAPE 5, 501, IA, IB, IC 501 FORMAT (3I5) C IA, IB, AND IC MAY NOT BE NEGATIVE OR ZERO C FURTHERMORE, THE SUM OF TWO SIDES OF A TRIANGLE C MUST BE GREATER THAN THE THIRD SIDE, SO WE CHECK FOR THAT, TOO IF (IA) 777, 777, 701 701 IF (IB) 777, 777, 702 702 IF (IC) 777, 777, 703 703 IF (IA+IB-IC) 777, 777, 704 704 IF (IA+IC-IB) 777, 777, 705 705 IF (IB+IC-IA) 777, 777, 799 777 STOP 1 C USING HERON'S FORMULA WE CALCULATE THE C AREA OF THE TRIANGLE 799 S = FLOATF (IA + IB + IC) / 2.0 AREA = SQRTF( S * (S - FLOATF(IA)) * (S - FLOATF(IB)) * + (S - FLOATF(IC))) WRITE OUTPUT TAPE 6, 601, IA, IB, IC, AREA 601 FORMAT (4H A= ,I5,5H B= ,I5,5H C= ,I5,8H AREA= ,F10.2, + 13H SQUARE UNITS) STOP END
A program that translates one programming language into another is called a "compiler". You could write a program like the example given above, run it through a Fortran compiler, and get executable machine code output. The compiler would ignore lines starting with "C"; those are called "comments", and they exist only to clarify the meaning of the code for human readers. The other lines would be translated into Assembly language: possibly hundreds of lines of Assembly language. Then that would be assembled into binary, and you'd have a runnable program.
This was a serious game-changer. No longer would you have to write your own algorithm by hand for calculating square roots in Assembly; you could just "call" the "function" named "SQRTF". No longer would you have to write your own driver code to write the results to a magnetic tape; you could just say "WRITE OUTPUT TAPE", and the Fortran compiler would handle all those tedious details for you. What's more, you could conceivably write Fortran compilers for multiple different models of computer, which would then allow you to compile that same code for multiple different machines, rather than having to rewrite it from scratch for each one!
Curious readers might wonder what "FLOATF" means. This is a reference to "floating-point" numbers. Short explanation: "floating-point" numbers are numbers that aren't necessarily whole numbers, e.g. 3.14 or 2.007. Longer explanation: they're called "floating-point" in contrast to "fixed-point" numbers, which are decimal numbers that have an exact number of digits after the decimal point; floating-point numbers can have varying numbers of digits after the decimal point, allowing you to store 312.7 and 2.45355 in the same type of variable, instead of having to have it be 312.70000 and 2.45355.
Fortran was great for writing code that does a lot of complicated mathematics - and still is; it has been continuously updated since it was first released, and to this day remains unparalleled as a language of choice for high-performance number-crunching applications like physics simulations, weather prediction, and even machine learning. But at the same time as Fortran was beginning to be used for calculating trajectories for rockets and ballistic missiles, people were looking for other uses for computers as well.
When you run a very large corporation, it is important to have lots of reports and do lots of accounting. A corporate husk needs to appear to be very busy to advance their career, and nothing makes you look busy like writing lots of reports. Not to mention that those meddling kids at the IRS and the FTC require you to keep lots of accurate, up-to-date records on how much capital you're accumulating and how you're doing it; hence the need for accounting. But accountants require expensive salaries and benefits, especially back when unpaid internships and neoliberalism hadn't been invented yet. And what, exactly, does an accountant do? Mostly they take a bunch of numbers from a spreadsheet, then do a bunch of arithmetic to those numbers, then put them into a report. With computers beginning to be a thing that people are aware of, the solution is obvious.
But the solution is not necessarily *easy*. Fortran isn't really a great language for producing reports and spreadsheets. It can do a bunch of arithmetic, no doubt. But not all the data in a report can be calculated with simple arithmetic. Not all the data in a report is even numbers; sometimes it's people's addresses, and their names, and the addresses and names of companies, and a list of things someone bought, and stock ticker identifiers... It's not *impossible* to write programs to work with this kind of information in Fortran, but Fortran is really more designed for heavy-duty number crunching than manipulating strings of characters and producing nicely formatted output. Not to mention, Fortran is hard to read if you aren't familiar with it; how is management supposed to read a bunch of computer code that is only loosely inspired by English? How is an auditor supposed to check to make sure that your Fortran program to compute assets and liabilities isn't fraudulent? How is a crooked finance guy supposed to ensure that it *is* fraudulent in a way the company can get away with?
In 1959, to ensure that America's economic extractivity could stay ahead of the central planning systems used in the Soviet Union, the U.S. Department of Defense ordered the creation of a Common Business-Oriented Language, or COBOL for short, under the leadership of Navy officer Grace Hopper. Cobol was explicitly designed to closely resemble English, which seemed like a good idea at the time; people didn't realize how misguided that was until it was far too late. In making Cobol resemble English, they made it very *verbose*; i.e. it took a lot of typing to make a program do the most basic things. For example, here's a sample Cobol program sourced from IBM's website:
//*-------------------------------------------------------------------- //* THE FOLLOWING EXAMPLE IS CODED IN COBOL: //*-------------------------------------------------------------------- IDENTIFICATION DIVISION. ***************************************************************** * MULTIPLY ARRAY A TIMES ARRAY B GIVING ARRAY C * * USE THE REFERENCE PATTERN CALLABLE SERVICES TO IMPROVE THE * * PERFORMANCE. * ***************************************************************** PROGRAM-ID. TESTCOB. ENVIRONMENT DIVISION. DATA DIVISION. WORKING-STORAGE SECTION. * COPY THE INCLUDE FILE (WHICH DEFINES CSRFORWARD, CSRBACKWARD) COPY CSRBPCOB. * DIMENSIONS OF ARRAYS - A IS M BY N, B IS N BY P, C IS M BY P 1 M PIC 9(9) COMP VALUE 200. 1 N PIC 9(9) COMP VALUE 200. 1 P PIC 9(9) COMP VALUE 200. * ARRAY DECLARATIONS FOR ARRAY A - M = 200, N = 200 1 A1. 2 A2 OCCURS 200 TIMES. 3 A3 OCCURS 200 TIMES. 4 ARRAY-A PIC S9(8). * ARRAY DECLARATIONS FOR ARRAY B - N = 200, P = 200 1 B1. 2 B2 OCCURS 200 TIMES. 3 B3 OCCURS 200 TIMES. 4 ARRAY-B PIC S9(8). * ARRAY DECLARATIONS FOR ARRAY C - M = 200, P = 200 1 C1. 2 C2 OCCURS 200 TIMES. 3 C3 OCCURS 200 TIMES. 4 ARRAY-C PIC S9(8). 1 I PIC 9(9) COMP. 1 J PIC 9(9) COMP. 1 K PIC 9(9) COMP. 1 X PIC 9(9) COMP. 1 ARRAY-A-SIZE PIC 9(9) COMP. 1 ARRAY-B-SIZE PIC 9(9) COMP. 1 UNITSIZE PIC 9(9) COMP. 1 GAP PIC 9(9) COMP. 1 UNITS PIC 9(9) COMP. 1 RETCODE PIC 9(9) COMP. 1 RSNCODE PIC 9(9) COMP. PROCEDURE DIVISION. DISPLAY " BPAGE PROGRAM START " * CALCULATE CSRIRP PARAMETERS FOR INITIALIZING ARRAY A * UNITSIZE WILL BE THE SIZE OF ONE ROW. * UNITS WILL BE 25 * SO WE'RE ASKING FOR 25 ROWS TO COME IN AT A TIME COMPUTE ARRAY-A-SIZE = M * N * 4 COMPUTE UNITSIZE = N * 4 COMPUTE GAP = 0 COMPUTE UNITS = 25 CALL "CSRIRP" USING ARRAY-A(1, 1), ARRAY-A-SIZE, CSRFORWARD, UNITSIZE, GAP, UNITS, RETCODE, RSNCODE DISPLAY "FIRST RETURN CODE IS " DISPLAY RETCODE * CALCULATE CSRIRP PARAMETERS FOR INITIALIZING ARRAY B * UNITSIZE WILL BE THE SIZE OF ONE ROW. * UNITS WILL BE 25 * SO WE'RE ASKING FOR 25 ROWS TO COME IN AT A TIME COMPUTE ARRAY-B-SIZE = N * P * 4 COMPUTE UNITSIZE = P * 4 COMPUTE GAP = 0 COMPUTE UNITS = 25 CALL "CSRIRP" USING ARRAY-B(1, 1), ARRAY-B-SIZE, CSRFORWARD, UNITSIZE, GAP, UNITS, RETCODE, RSNCODE DISPLAY "SECOND RETURN CODE IS " DISPLAY RETCODE * INITIALIZE EACH ARRAY A ELEMENT TO THE SUM OF ITS INDICES PERFORM VARYING I FROM 1 BY 1 UNTIL I = M PERFORM VARYING J FROM 1 BY 1 UNTIL J = N COMPUTE X = I + J MOVE X TO ARRAY-A(I, J) END-PERFORM END-PERFORM * INITIALIZE EACH ARRAY B ELEMENT TO THE SUM OF ITS INDICES PERFORM VARYING I FROM 1 BY 1 UNTIL I = N PERFORM VARYING J FROM 1 BY 1 UNTIL J = P COMPUTE X = I + J MOVE X TO ARRAY-B(I, J) END-PERFORM END-PERFORM * REMOVE THE REFERENCE PATTERN ESTABLISHED FOR ARRAY A CALL "CSRRRP" USING ARRAY-A(1, 1), ARRAY-A-SIZE, RETCODE, RSNCODE DISPLAY "THIRD RETURN CODE IS " DISPLAY RETCODE * REMOVE THE REFERENCE PATTERN ESTABLISHED FOR ARRAY B CALL "CSRRRP" USING ARRAY-B(1, 1), ARRAY-B-SIZE, RETCODE, RSNCODE DISPLAY "FOURTH RETURN CODE IS " DISPLAY RETCODE * CALCULATE CSRIRP PARAMETERS FOR ARRAY A * UNITSIZE WILL BE THE SIZE OF ONE ROW. * UNITS WILL BE 20 * SO WE'RE ASKING FOR 20 ROWS TO COME IN AT A TIME COMPUTE ARRAY-A-SIZE = M * N * 4 COMPUTE UNITSIZE = N * 4 COMPUTE GAP = 0 COMPUTE UNITS = 20 CALL "CSRIRP" USING ARRAY-A(1, 1), ARRAY-A-SIZE, CSRFORWARD, UNITSIZE, GAP, UNITS, RETCODE, RSNCODE DISPLAY "FIFTH RETURN CODE IS " DISPLAY RETCODE * CALCULATE CSRIRP PARAMETERS FOR ARRAY B * UNITSIZE WILL BE THE SIZE OF ONE ELEMENT. * GAP WILL BE (N-1)*4 (IE. THE REST OF THE ROW). * UNITS WILL BE 50 * SO WE'RE ASKING FOR 50 ELEMENTS OF A COLUMN TO COME IN * AT ONE TIME COMPUTE ARRAY-B-SIZE = N * P * 4 COMPUTE UNITSIZE = 4 COMPUTE GAP = (N - 1) * 4 COMPUTE UNITS = 50 CALL "CSRIRP" USING ARRAY-B(1, 1), ARRAY-B-SIZE, CSRFORWARD, UNITSIZE, GAP, UNITS, RETCODE, RSNCODE DISPLAY "SIXTH RETURN CODE IS " DISPLAY RETCODE * MULTIPLY ARRAY A TIMES ARRAY B GIVING ARRAY C PERFORM VARYING I FROM 1 BY 1 UNTIL I = M PERFORM VARYING J FROM 1 BY 1 UNTIL J = P COMPUTE ARRAY-C(I, J) = 0 PERFORM VARYING K FROM 1 BY 1 UNTIL K = N COMPUTE X = ARRAY-C(I, J) + ARRAY-A(I, K) * ARRAY-B(K, J) END-PERFORM END-PERFORM END-PERFORM * REMOVE THE REFERENCE PATTERN ESTABLISHED FOR ARRAY A CALL "CSRRRP" USING ARRAY-A(1, 1), ARRAY-A-SIZE, RETCODE, RSNCODE DISPLAY "SEVENTH RETURN CODE IS " DISPLAY RETCODE * REMOVE THE REFERENCE PATTERN ESTABLISHED FOR ARRAY B CALL "CSRRRP" USING ARRAY-B(1, 1), ARRAY-B-SIZE, RETCODE, RSNCODE DISPLAY "EIGHTH RETURN CODE IS " DISPLAY RETCODE DISPLAY " BPAGE PROGRAM END " GOBACK.
All this does is multiply an an array (technical term for "list of numbers") by another array. The lines starting with asterisks are comments; the rest of it is actual Cobol. You can see how it *looks* like English, kind of, but also how it is so long and repetitive that that makes it *harder* to read.
Despite these shortcomings, Cobol was (and remains) a language endorsed by IBM, and consequently, it became (and remains) very widespread for business applications, also known as "enterprise" applications. Nowadays, Cobol is mostly used by banks and very large multinational corporations, where the expense of replacing the Cobol systems with a more modern language outweighs the cost of continuing to maintain the existing Cobol code. It is very common for people who don't write Cobol to think of it as a "dying" language, one that is on its way out, one that is only still used because IBM keeps it on life support; it is also very common for people who learn Cobol and write it for a living to make staggeringly large salaries with top-tier job security.
If you read chapter 0, you'll recall that programming has its origins in mathematics. At the same time the government was sponsoring the development of Fortran and Cobol, eggheads from the mathematics department continued to have a keen interest in computers. In fact, computers were drawing so much interest from a certain subset of mathematicians that studying them became a whole new discipline of mathematic research, and later a whole department unto itself. But because programmers are chronically bad at coming up with good names for things, this discipline wasn't called "computerology" or "computeronomy"; rather, it got the name "computer science". Which is about as silly as if "biology" was instead called "cell science".
Mathematicians famously care far more about their quest for eternally self-evident truth than about anything that concretely exists in real life. So they had little interest in contingent realities like vacuum tubes and magnetic tapes. Rather, they were interested in *algorithms* and *data structures*. An actually existing computer is a machine engineered to approximate a theoretical Turing machine, and what computer scientists are really interested in is the theorems that you can prove to be universally true for any system that approximates a Turing machine (in other words, what is true for systems that are "Turing complete"). What Turing et al. proved was that any computations that could be done at all, could be done on a Turing machine, and anything that can't be computed with a Turing machine, can't be computed at all. This raises questions like: what is theoretically the fastest algorithm that can, for example, sort a list of numbers from lowest to highest? Can mathematics tell us what's the fastest that a sorting algorithm could possibly be for a list of any length?
Cobol and Assembly are terrible languages for exploring these questions, and Fortran honestly isn't ideal either. Writing Fortran doesn't feel like doing mathematics, it feels like engineering. Mathematicians, generally, are Platonists deep down; they want to crystallize knowledge into its most pure and abstract universal form. So of course they wanted to create languages that express *algorithms* and *data structures* in a clean and beautiful way. An ALGOrithm Language, if you will.
Implementation of Algol began in 1958, but computer scientists were widely dissatisfied with it on both aesthetic and technical grounds until at least ten years thereafter with the initial release of Algol 68, which wasn't really considered finished until five years after that, in 1973. Because Algol was in development for so long, there are lots of different dialects of it for different machines, with variation in the syntax of specific examples. Here is an example of a program in a version of Algol 68, taken from Wikipedia, which finds the largest element in a matrix (in other words, finds the largest number in a list of lists of numbers) (don't worry if you can't fully understand it):
proc abs max = ([,]real a, ref real y, ref int i, k)real: comment The absolute greatest element of the matrix a, of size ⌈a by 2⌈a is transferred to y, and the subscripts of this element to i and k; comment begin real y := 0; i := ⌊a; k := 2⌊a; for p from ⌊a to ⌈a do for q from 2⌊a to 2⌈a do if abs a[p, q] > y then y := abs a[p, q]; i := p; k := q fi od od; y end # abs max #
This is where we begin to see some language design choices that echo the languages that are most popular in programming today, as you'll see when we talk about C in the next chapter. The most significant of these is that's known as "nested scoping", where the code can be understood as statements or expressions "nested" inside other statements or expressions. You can see that the if-statement is nested *inside* the for-loop, which is nested inside another for-loop, which in turn is inside the "body" of the procedure, delineated by the words "begin" and "end". This stands in contrast to versions of Fortran and Cobol that existed at the time, where a typical program was a long, straight list of statements, where moving around in the program was done by some variation of saying "go to line number such-and-such". The computer scientists who developed Algol felt that using "go to" statements to jump around in a program made it harder for a human reader to follow what was going on, and anyways felt more like engineering a machine than writing a beautiful equation. This idea was put forth in 1968 by one of the computer scientists of all time, Edsger Dijkstra, in one of the essays of all time, "GOTO Considered Harmful". You can read more about that here:
https://en.wikipedia.org/wiki/Goto#Criticism
This idea proved extremely influential, and when we say that most modern languages are "Algol-derived" or are "in the Algol family", that's what we mean. Just like with Fortran and Cobol, Algol has to be translated into Assembly language by a compiler, and indeed many languages in widespread use today also have to be translated into Assembly by a compiler; the compiler has the job of translating those nested blocks and statements and loops into Assembly, where the program's structure is boiled down to a sequence of instructions that the program jumps around in; this is still the way the computer works, physically, but thanks in part to Algol it is no longer the way we have to *think* about our programs.
But is there another way? Do you need a compiler to translate your nicely structured nested blocks into Assembly language? Indeed, not all languages have to be compiled. You can make an "interpreter", which is a program that reads a bunch of source code... and then does what the code says to do, with no need to translate it into Assembly. No need to compile your code; you just write it, and then you can run it right away. This approach was pioneered by a language with a funny name: Lisp.
Algol is no longer a popular language, but there's another language that also began development in 1958 that enjoys, shall we say, a cult following. This is a language that makes some programmers uneasy because it is so powerful; one might say it is the most expressive programming language. It is called Lisp, short for "List Processor", and in fact, it isn't so much a specific language as an idea about *how* a language can work.
Fortran, Cobol, and Assembly structure a program as a list of statements navigated by jumping around; Algol structures a program as a bunch of blocks containing statements or more blocks; but Lisp structures programs more like a tree-like structure, similar to a sentence diagram. As it turns out, this approach makes Lisp easier to implement than practically any other kind of programming language. John McCarthy, who came up with the idea, originally intended it as a theoretical abstraction, like a literal sentence diagram for programming. But his colleague Steve Russell developed it into an actual language in its own right; he did this by writing the first-ever interpreter, on punch cards, in Assembly language for the IBM 704. Lisp's simplicity was its strength; the basic building blocks of a Lisp are dead simple to implement in code, yet powerful enough that you can use them like Legos to build the rest of the features you'd want from a programming language. I will try to explain this as simply as I can, but it may be kind of tricky to follow if you don't know how to code yet.
Those building blocks are called S-expressions, and they consist of a list of values, variables, or more S-expressions, separated by spaces, enclosed in parentheses. If the first item in an S-expression is a variable, it is generally interpreted as a "function", i.e. a variable that represents an algorithm/procedure. For example, a Lisp interpreter might provide you with a variable named "*" which, when evaluated as the first item in an S-expression, multiplies together the numbers that come after it, like so:
; this is a comment. comments start with semicolons in Lisp ; the following S-expression is equivalent to the number 12 (* 3 4)
And from that you could define a function to find the square of a number like so:
(defun square (n) (* n n))
And having written that:
; the following S-expression is equivalent to the number 16, i.e. 4² (square 4) ; and then this S-expression is equivalent to 36, i.e. (2 * 3)² (square (* 2 3)) ; and this one is equivalent to 625, i.e. (5²)² (square (square 5))
And through a similar process of defining new functions, you can build yourself a Lego tower of a programming language from a Lisp interpreter that only has a handful of functions built-in.
That's why we say Lisp is "powerful". But that parentheses-based syntax is weird and unintuitive for many; why, for instance, should it be (* 2 3), instead of (2 * 3)? So Lisp did not end up being as influential as Algol. But it still has influence, particularly in other languages that are implemented using interpreters, like Javascript and Python; their *syntax* (the way they're written) is more like Algol, but their *semantics* (the way they actually work) is much more like Lisp.
Clearly we don't use punched cards anymore. Magnetic tapes are also out of fashion. Fortran and Cobol are still around, but are niche and archaic; Assembly is still around, but overwhelmingly most of it is generated by compilers, not written by programmers. Notably, modern operating systems are (for the most part) not written in Assembly. In fact, they are written in languages that more closely resemble Algol. How is that possible? How did we get here? Where begin the roots of the languages and systems that served you this text that you're reading? For that we must move beyond the 1960s and talk about: