💾 Archived View for rawtext.club › ~jmq › news › 2021-03-29-excel-vs-pandas.gmi captured on 2023-04-26 at 13:47:24. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-01-29)

-=-=-=-=-=-=-

Monday 29 March 2021

What are reasonable expectations regarding the technological sophistication of "digital natives"?

The shift to remote teaching has put on hold some curriculum innovations proposed within the math department in recent years. Adapting our existing pedagogy to a remote environment is challenging enough. Nobody was eager to compound the challenge with a rapid adoption of partially-vetted Open Educational Resources.

Eschewing the clean UI of a publisher's companion website (and the college-licensed suite of Office365 apps) in favor of open educational resources like RStudio and Jupyter Notebooks is probably too much of a lift for the students who just want a general education credit. Therefore I only demonstrated the familiar Excel interface in a live class session, leaving the Python/pandas tutorial as an optional reading assignment. (The college offers an Office365 subscription to all enrolled students, so requiring Excel is not an additional expense that would violate the terms of a "Z course".)

My own assumptions this semester about the typical student's ability to scale the Excel learning curve were mostly accurate. I did have a few students who manually constructed a time series using a rather coarse temporal resolution, but at least they chose the correct type of graph when going through the "Insert > Chart" dialog.

The proliferation of different Excel interfaces is more bothersome to those of us with muscle memory from the era when software releases were more infrequent, than it is to the students who are accustomed to smart phone apps getting trivial facelifts every month. The greater challenge that comes from using Excel is its poorly-documented timeseries functionality.

One of my students in a previous semester ended up employing substring extraction functions (LEFT, RIGHT, or MID) on the datetime cell contents in order to get the hour and minute of each set of observations. These commands can easily generate garbage if Excel has its own ideas about the format of the cells being referenced. In my demo this semester, I steered my class toward alternatives that would force Excel to treat the cells as integers rather than strings. It occurred to me while doing this demo that students at this level might never have seen practical uses for the FLOOR and MOD functions, and so would not have thought to use them in a formula.

In contrast, the pandas module in Python has a more self-explanatory set of commands for these kind of conversions, all fully-documented in the help files for `pd.to_datetime`. The `to_datetime` command can be invoked to tell Python exactly how the timestamps are formatted. You can also add an offset so that timestamps are placed in the correct month and year, rather than in January 1900. These features make Python/pandas almost as user-friendly as the weatherData package in R (for the purpose of creating meteograms).

When signing up for a non-laboratory science elective, students are probably not anticipating having to learn Python or R, even though familiarity with either of these ecosystems would have tangible benefits in other disciplines. But transfer of computing skills to other disciplines is also a benefit of getting to know Excel in greater depth. Excel's gentler learning curve, for "digital natives" more accustomed to touchscreens than to interfaces that deserve a full-size keyboard, is a compelling reason to teach only the spreadsheet techniques, however much we might wish to instill an appreciation for general-purpose programming languages.