💾 Archived View for dioskouroi.xyz › thread › 29385006 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Asmrepl: REPL for x86 Assembly Language

Author: tekkertje

Score: 223

Comments: 100

Date: 2021-11-29 20:43:15

Web Link

________________________________________________________________________________

rudolfwinestock wrote at 2021-11-29 23:27:42:

I've lost the original reference, but Joe Marshall once wrote in comp.lang.lisp:

Here's an anecdote I heard once about Minsky. He was showing a student how to use ITS to write a program. ITS was an unusual operating system in that the 'shell' was the DDT debugger. You ran programs by loading them into memory and jumping to the entry point. But you can also just start writing assembly code directly into memory from the DDT prompt. Minsky started with the null program. Obviously, it needs an entry point, so he defined a label for that. He then told the debugger to jump to that label. This immediately raised an error of there being no code at the jump target. So he wrote a few lines of code and restarted the jump instruction. This time it succeeded and the first few instructions were executed. When the debugger again halted, he looked at the register contents and wrote a few more lines. Again proceeding from where he left off he watched the program run the few more instructions. He developed the entire program by 'debugging' the null program.

Stratoscope wrote at 2021-11-30 01:05:19:

This is how I write code today!

Everything needs a catchy name, so I call it Debugger Driven Development.

I write the first few lines of a function, however much I feel sure about, then add a dummy statement at the end and set a breakpoint there. When the code stops at the breakpoint, I can see exactly what my code did and what data I now have on hand.

I use that new knowledge to write the next few lines, again until I get to something I'm unsure of or where I'd just like to get a better view of the data. Set a breakpoint there and view that new data.

Repeat as needed until the function is done.

At my work we have many internal APIs that are "documented" but the documentation is fairly lacking. With the debugger, I can see not only what the API _claims_ to do, but what it _really_ does with my actual input.

I am bummed that so many developers today eschew debuggers. I even read an article recently along the lines of "These famous programmers don't use debuggers, and you shouldn't either". Why would anyone want to talk people out of using such a useful tool? It makes no sense to me.

mejutoco wrote at 2021-11-30 11:04:56:

Reminds me of Common Lisp with slime.

https://dev.to/kmruiz/working-with-your-running-program-an-i...

userbinator wrote at 2021-11-30 14:32:20:

_I am bummed that so many developers today eschew debuggers. I even read an article recently along the lines of "These famous programmers don't use debuggers, and you shouldn't either". Why would anyone want to talk people out of using such a useful tool? It makes no sense to me._

That would be because of what I call "long vs. short-term". If you're only thinking of the next few lines and doing that a lot, you will have effectively trained yourself out of looking at the bigger picture. As someone who taught programmers, I've seen what "debugger driven development" code looks like (because that's how some of them will try to start writing code.) It's not pretty. There's a reason a lot of highly productive (but not necessarily famous) programmers consider debuggers as a last-resort tool: writing code that needs debugging should be a rare occurrence.

jamesfinlayson wrote at 2021-11-30 04:55:51:

I'd love to use a debugger more, but in my day job at least, the use of Docker makes it a hassle to set up.

For standalone Java applications, or when using Visual C++ though, it's so much better than printing out state.

saagarjha wrote at 2021-11-30 02:07:57:

It’s sadly fallen victim to the “if you’re not using a debugger you’re not a _real_ programmer” reversal.

Annatar wrote at 2021-11-30 07:28:12:

We did that in the 1980’s and the 1990’s on the Commodore 64 and the Amiga, only then the debugger was called a monitor and it was quickly discovered that programming in them was slow and error prone because they lacked the ability to recompute. That is how assemblers were invented: Turbo Assembler on the Commodore 64 and ASM-One, TRASH’M-One, Seka and MasterSeka on the Amiga. ASM-One and TRASH’M-One include an excellent debugger built natively into the integrated development environment, and stepping through the code after assembling it is a joy.

djmips wrote at 2021-11-30 10:59:15:

True! but that's not how assemblers were invented. Assemblers were around since the sixties at least.

userbinator wrote at 2021-11-30 00:51:13:

Many PC magazines of the late 80s/early 90s had program listings (in Asm) for small utilities that you created by typing them into DEBUG, the very basic debugger that came with DOS. The C64 and ZX ones had similar listings, although I believe they were more commonly in the platform's variant of BASIC. Unfortunately, I don't think this culture existed around Apple's machines since the Macintosh and the Lisa that came before it.

deckard1 wrote at 2021-11-30 05:26:36:

the way to low-level format early MFM drives was to run DEBUG and enter assembly and call some special routine in the controller chip on the drive. These were instructions that came in the manual with the drive. Pretty wild times. We've come a long way.

lproven wrote at 2021-11-30 12:34:34:

DEBUG

g=c800:5

From 30-year-old memory.

Yes, we actually had to do stuff like this. When I started my first job in the late 1980s, it was normal for hard disks to have a little label with a list of the (known) bad blocks on the drive. (Hand-written by someone at the factory on the first (circa 15-20MB) disks I used; later, on things like big ESDI disks, dot-matrix printed.) In some formatting tools, you had to manually enter them; formats took a long time, so eliminating retries on bad blocks could save half an hour.

Novell Netware came with its own low-level formatter called `COMPSURF`: COMPrehensive SURFace analysis. Dozens of people would be sharing a server's hard disk, so data losses would be extra-bad -- and might well bring down the server, losing everyone's work.

_Note: the assumption in the early days of Novell was that workstations didn't have hard disks of their own and booted off the server too, making a LAN tens of thousands of £/$ cheaper than giving everyone their own HDD._

Running COMPSURF before you installed took _hours_. Server HDDs were _big_ -- hundreds of megabytes! Scanning all that took ages.

msravi wrote at 2021-11-30 11:03:09:

That's how I learnt x86 assembly using MS-DOS and "debug" - which was a program that came with DOS. The proper assemblers at that time were Microsoft's MASM and Borland's TASM. With no access to those, the only option was to use the one bundled in DOS. Fun times, where you had to compute relative addresses of JMPs, based on the address where the JMP instruction sat. And then, you could even write to a particular cylinder/sector/offset on the hard disk and replace the boot sector with your own code.

protomyth wrote at 2021-11-30 03:05:53:

I know a few Smalltalk programmers who did the same thing. They basically wrote their programs in the debugger. It was an interesting technique.

saagarjha wrote at 2021-11-30 03:17:55:

I do a limited form of this with C/C++ projects that take forever to compile, with many conditional breakpoints that alter control flow to my liking that I then "solidify" into actual code when I have it looking like I want.

pjmlp wrote at 2021-11-30 06:21:33:

With VC++ you can solidify them directly.

saagarjha wrote at 2021-11-30 09:52:25:

Oh, really? I'm not very familiar with it, do you have the name of the feature so I can look it up?

pjmlp wrote at 2021-11-30 12:00:04:

Before VS 2022, it used to be called edit-and-continue, now they are doubling now on it improving the use cases that are actually supported, and it got renamed as hot reload.

https://devblogs.microsoft.com/cppblog/c-edit-and-continue-i...

https://docs.microsoft.com/en-us/visualstudio/debugger/edit-...

https://docs.microsoft.com/en-us/visualstudio/debugger/hot-r...

Here is a demo of its latest state from the VS 2022 launch event,

https://youtu.be/8SP1w7i8r-Y?list=PLReL099Y5nRc9f9Jpo1R7FsdH...

larsbrinkhoff wrote at 2021-11-30 06:48:48:

I tried to recreate such a session:

https://www.youtube.com/watch?v=7Ub36q03vkc

molticrystal wrote at 2021-11-30 00:39:08:

Those who enjoy Asmrepl might also enjoy "Cheap EMUlator: lightweight multi-architecture assembly playground" [0] it supports 32 and 64 bit variations of intel, arm, mips and sparc instruction sets and also provides a visual experience and supports many operating systems.

If you are on Windows and need something in a console, a nice colorful asm repl is available WinRepl [1] which is similar to " yrp604/rappel (Linux) and Tyilo/asm_repl".

[0]

https://github.com/hugsy/cemu

[1]

https://github.com/zerosum0x0/WinREPL/

deepspace wrote at 2021-11-30 05:44:35:

Wow, this brings back memories of my final project for my "programming for Engineering students" course in the mid-80s.

I wrote a DOS TSR program (remember those?) which would pop up a window when you pressed a key sequence and present you with an ASM86 REPL.

You could selectively 'save' pieces of code, and then when you exited the window, it would paste the saved code as inline assembly code (a hex byte array surrounded by some Turbo Pascal syntax) into your keyboard buffer - the assumption being that you are running the Turbo Pascal IDE, of course.

The TSR itself was written in x86 assembly, which added a level of complexity. I would have given and arm and a leg to be able to do it in a high-level language like Ruby.

djmips wrote at 2021-11-30 11:01:31:

Why not Turbo Pascal?

lproven wrote at 2021-11-30 12:50:17:

I can only take a guess: that for a TSR on an early DOS PC, you really wanted it to be small. TSRs took a significant chunk of your base memory, and as you only got 640 kB of that, you wanted to save as much as you could.

In the later days of DOS, programs grew so big that they wanted all of that 640 kB to themselves. Optional-extra type TSRs went out of fashion and DOS (first DR DOS 5, then MS played copy-cat with MS-DOS 5) gained built-in memory managers to load necessary TSRs (e.g. CD, mouse and keyboard drivers, disk cache, etc.) into Upper Memory Blocks.

UMBs were a 386 thing: you used a 386 memory manager to map any unused bits of the I/O space in the PC's memory map (i.e. from 641 kB up to 1 MB) as RAM. Anything that wasn't being used for ROM or memory-mapped I/O, you could put RAM there and then load TSRs into these little chunks of RAM -- 1 or 2 dozen kB of RAM each.

https://en.wikipedia.org/wiki/Upper_memory_area

Yes, we were that desperate for base memory. It didn't matter if you had 2 or 4 or 16MB of RAM, DOS could only run programs in the 1st 1 MB of it, and only freely use the first ⅓ of that first meg. All the rest could only be used for data, disk caches, and other non-executable stuff.

A side-effect of having a 386 memory manager, for real DOS power users, was that fancy 3rd party ones like Quarterdeck QEMM could also offer multitasking. Quarterdeck sold a tool called DESQview that let you run multiple DOS programs side-by-side and switch between them -- radical stuff in the 1980s.

But once you had that, you didn't need TSRs any more.

pavlov wrote at 2021-11-29 21:36:14:

Somebody should wrap this into a VGA-As-A-Service platform so that kids could learn programming the correct way:

mov ax, 13h
  int 10h

zokier wrote at 2021-11-29 21:49:36:

Ducktaping assembler into DOSBox Debugger would be interesting project, it provides almost whole UI otherwise by itself

https://zwomp.com/index.php/2020/05/01/understanding-the-dos...

dcveloper wrote at 2021-11-29 22:59:51:

I recently found some of my old Pascal mode 13h projects from when I was a teenager. Is DOSBox the best way to run those and Turbo Pascal?

a-priori wrote at 2021-11-29 21:48:06:

The assembly you’ve listed there assumes it runs in real mode and ring 0. You’d need to use virtualization of some kind to execute that.

StillBored wrote at 2021-11-29 21:55:04:

With a BIOS, or CSM on UEFI.

hoseja wrote at 2021-11-30 07:38:21:

That's the only Holy way to run.

jmmv wrote at 2021-11-29 23:29:47:

Not exactly the same, but

https://www.endbasic.dev/

tries to achieve precisely that: a REPL with built in graphics for learning purposes, albeit with BASIC instead of asm.

nitrogen wrote at 2021-11-29 23:52:53:

Sort of like this, but with BASIC?

https://en.wikipedia.org/wiki/Logo_(programming_language)

jmmv wrote at 2021-11-30 06:08:35:

Pretty much, though I’m trying to replicate what I got in an Amstrad CPC computer.

kingcharles wrote at 2021-11-29 23:05:46:

I'm guessing you typed that from memory. What were you doing in the 90s? (demo coder, video game developer checking in here)

enriquto wrote at 2021-11-30 05:44:37:

These two lines are deeply ingrained in the minds of a whole generation of programmers. They start a graphics mode of size 320x200 with a 256 byte palette and you can start dumping your pixels in the 0xa0000 segment right away.

I am yet to find a modern graphics programming environment that is so comfortable and easy to use as this.

StillBored wrote at 2021-11-30 13:17:53:

Well your sorta comparing heavyweight OS graphics stack APIs with old school firmware ones. Even so, things like SDL2 are dead simple, one requests a window region and its possible to write bytes to the resulting buffer that show up in a window. That said modern firmware interfaces are still pretty clean. If you write a UEFI hello world, its possible to access the raw frame buffer with just a connection to the GOP, which is just a couple lines of code in C. Its conceptually pretty close to what your describing, except its designed to work with a slightly more modern programming paradigm.

https://wiki.osdev.org/GOP

enriquto wrote at 2021-11-30 14:05:20:

I wouldn't call SDL2 "dead simple", unless sarcastically. Just opening an empty SDL window requires writing about 20 lines of code that deal with several different abstractions, a "window", a "surface", a "renderer, an "event". I only want an array of pixels that I can edit and see the results in realtime. It is of course possible to do that, but it seems ridiculously overcomplicated.

I was taught as a kid to program simple graphical demos using peek and poke in basic. Then in assembler. In either case, stupid me got colored pixels on the screen after a few minutes of work. Kids these days, how do they start? Please, don't tell me "matplotlib" or I will cry myself to sleep.

webdoodle wrote at 2021-11-29 22:12:45:

I fondly remember writing my first game using assembly that I hand typed from a magazine article on an Amiga. It didn't work because of a reversed peek/poke. It took us all day to figure it out, but we got it working!

SavantIdiot wrote at 2021-11-30 00:24:03:

Apple //e & ][+ had one built in. It was called the "monitor". You typed "CALL -151" and you started typing assembly code. You could run, save, dump memory and read registers. When I got my first 286 I was surprised I couldn't do the same thing.

Someone wrote at 2021-11-30 08:32:05:

“Monitor” was a common name for such software (

https://en.wikipedia.org/wiki/Machine_code_monitor

)

It is a step up from having front-panel switches (

https://en.wikipedia.org/wiki/Front_panel

)

On early mini- and microcomputers, those sometimes had to be used to enter the boot loader (

https://en.wikipedia.org/wiki/Booting#Minicomputers

https://en.wikipedia.org/wiki/Booting#Booting_the_first_micr...

). That was a step down from mainframes, which could automatically read in a program to run at boot.

It wouldn’t surprise me much if there were people alive today who still have some muscle memory to rapidly enter such a boot sequence for an Altair.

Narishma wrote at 2021-11-30 02:42:54:

You could do the same thing on your 286. PCs came with DEBUG.COM, which does the same things.

SavantIdiot wrote at 2021-11-30 16:29:05:

I didn't know that until about a decade later, unfortunately!

People forget that in 1984 information wasn't a click away.

The problem with owning a Hong Kong-made 286 clone in 1984, and using pirated software, is that it was extremely hard to learn things. I was limited by the books at my local "Waldenbooks" computer section, which was about 20 books. Computer shopper and Byte magazine were kinda helpful, but I learned very, very slowly. It wasn't until I entered college that I started learning rapidly, but the focus wasn't on PCs (it was still MTS mainframes). It took until my first job writing 16-bit drivers that I finally started learning the nuts and bolts of MSDOS.

djmips wrote at 2021-11-30 11:02:37:

The Apple ][ with Integer basic had a better one which had a built in mini-assembler. Very fun and useful. It was a real shame it got pushed out by the bloated Microsoft Basic. ;-)

incanus77 wrote at 2021-11-30 05:17:16:

Built into the Commodore 128, too.

kstrauser wrote at 2021-11-30 01:42:36:

I had the same (SmartMon) on a C64. I didn't know at the time that it was unusual.

panzagl wrote at 2021-11-30 03:03:52:

At the time, it wasn't...

fouc wrote at 2021-11-29 23:19:12:

Does anyone remember Ketman (1997)? A combination assembler interpreter & tutor for MSDOS. That was the first time I saw a REPL for assembly language.

http://web.archive.org/web/20051211022146/http://www.btinter...

nielsbot wrote at 2021-11-29 22:01:24:

Not snark, but a serious question: What would one use this for?

tenderlove wrote at 2021-11-29 22:55:08:

I wrote it because I can never remember what the `test` instruction does to the zero flag. Every time I use the instruction I have to look up the docs. Looking up docs is fine, but running code in a REPL helps me remember things better.

djmips wrote at 2021-11-30 11:06:39:

It's a shame that modern debuggers don't have mini-assemblers included like the original Apple II. Having a REPL would be real nice. For one I wouldn't have to type 90 (NOP) into memory windows to blank out code like non mortal ASSERTS.

tzot wrote at 2021-11-29 23:03:24:

I believe in most (if not all) architectures that have it, a test instruction is the same as comparing to zero; so testing zero sets the zero flag :)

mkup wrote at 2021-11-29 23:15:27:

TEST instruction in x86/x64 is the same as AND instruction (bitwise and), but result of the computation is discarded (only flags are retained).

tzot wrote at 2021-11-30 00:31:55:

Thanks; my assembly experience was with earlier processors, with a single argument for their test instruction (kind of like calling x86 test with two same arguments). I should have checked what the x86 test instruction does before replying.

woodruffw wrote at 2021-11-29 22:06:01:

I do a lot of program analysis work, and it's occasionally useful to see the pre- and post-machine states of arbitrary instructions. I have my own (more? less?) hacky version of this program that I use for that purpose; I know other people use GEF and similar GDB extensions for similar purposes.

qsort wrote at 2021-11-29 22:07:02:

For visually exploring the results of applying instructions. Similar to how you would use jshell.

unbanned wrote at 2021-11-29 22:03:08:

Education

6bfdc1954b8e wrote at 2021-11-29 22:56:05:

Shellcode testing I suppose.

sebow wrote at 2021-11-29 22:20:15:

Learning assembly can be a pain, especially without something like gdb (with layout regs &layout asm).

This is much simpler and doesn't require you to type like 4-5 extra commands(start gdb, put breakpoint, set layouts, step through the code),thus avoiding the pain that gdb can be for very-simple asm programs.

kitd wrote at 2021-11-29 23:07:12:

Cool!

This reminds me of a fun project I once did, writing an x86 assembler in Lotus 123, using lookup tables. On the odd occasion when it worked, it was immensely fulfilling.

westurner wrote at 2021-11-29 22:56:47:

This could be implemented with Jupyter notebooks as a Jupyter kernel or maybe with just fancy use of explicitly returned objects that support the (Ruby-like, implicit) IPython.display.display() magic.

IRuby is the Jupyter kernel for Rubylang:

iruby/display:

https://github.com/SciRuby/iruby/blob/master/lib/iruby/displ...

iruby/formatter:

https://github.com/SciRuby/iruby/blob/master/lib/iruby/forma...

More links to how Jupyter kernels and implicit display() and DAP: Debug Adapter Protocol work: "Evcxr: A Rust REPL and Jupyter Kernel"

https://news.ycombinator.com/item?id=25923123

"ENH: Mixed Python/C debugging (GDB,)"

https://github.com/jupyterlab/debugger/issues/284

... "Ask HN: How did you learn x86-64 assembly?"

https://news.ycombinator.com/item?id=23930335

asimjalis wrote at 2021-11-29 23:14:23:

Neat. This could be embedded into a Lisp/Clojure syntax.

User23 wrote at 2021-11-29 23:18:36:

A toy project I have in mind is bootstrapping a lisp in asm and then using lisp macros as assembler macros to build up a high level language that would effectively be native code.

Jach wrote at 2021-11-30 04:57:02:

Sounds like it'd be cool for the sake of it, but just in case you (or other readers) aren't aware (Edit -- looks like you are very aware ;) SBCL already compiles Lisp code to native code. It's not the same as (asm) macros all the way down, but still. You can even inspect the assembly of a function with the built-in function DISASSEMBLE, and see how it changes with different optimization levels or type declarations or other things.

https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-co...

is worth a read too for a cool experiment in generating custom assembly for a VM idea.

lispm wrote at 2021-11-30 10:05:17:

With various implementations (Clozure Common Lisp) one can write inline assembler interactively.

praveen9920 wrote at 2021-11-30 00:23:28:

I wonder if something like this for wasm. Would be interesting to see something like this in browser

woodruffw wrote at 2021-11-30 01:24:47:

My understanding of wasm (which could be very wrong) is that it's a stack-based virtual machine (like cpython), rather than a load/store or register/memory ISA.

You could probably visualize the operand stack and opcode sequence, but it wouldn't be quite as "flashy" as x86's state transitions look when visualized here.

jonny_eh wrote at 2021-11-29 21:59:21:

Is this emulating x86? Can I run it on an M1?

woodruffw wrote at 2021-11-29 22:04:32:

It's not emulating x86: it looks like it's assembling instructions on the fly and executing them in a mmap'd region. In other words, it's a very simple JIT.

But you probably _can_ run it on an M1 anyways, since Apple's Rosetta will do the dynamic binary translation for you under the hood. YMMV.

bdowling wrote at 2021-11-29 23:29:24:

It's a bit more complicated than that. Code is assembled into a shared memory buffer. The application spawns a child process that runs the code in the shared memory buffer. The parent process attaches to the child using ptrace to inspect and manipulate the CPU state and memory of the subprocess.

The app is entirely written in Ruby. So, it might run on Apple M1, but only if you're running an x86 Ruby interpreter through Rosetta.

woodruffw wrote at 2021-11-29 23:32:03:

Ah, great point! I had assumed that the Ruby interpreter would be x86, but that isn’t a reasonable assumption now that native builds are common.

spiffistan wrote at 2021-11-30 00:39:06:

JIT makes sense given his other current project:

https://github.com/tenderlove/tenderjit

a-dub wrote at 2021-11-29 22:45:04:

would it? rosetta is a jit translator isn't it? how would it know to translate the instructions that are being generated on the fly interactively? unless there's hardware support in the m1 for translation or some other interrupt that gets triggered to do translation on the fly...

jcranmer wrote at 2021-11-30 01:58:49:

Dynamic JIT translation for x86 is pretty old-hat at this point; the general state of the art can be summarized in the (now 16 years old) Pin paper:

https://www.cin.ufpe.br/~rmfl/ADS_MaterialDidatico/PDFs/prof...

In general, the way you handle translation of machine code tends to resolve around compiling small dynamic traces (basically, the code from the current instruction pointer to the next branch instruction), with a lot of optimizations on top of that to make very common code patterns much faster than having to jump back to your translation engine every couple of instructions. The interactive generation this article implies is most likely to be effected with use of the x86 trap flag (which causes a trap interrupt after every single instruction is executed), which is infrequent enough that it's likely to be fully interpreted instead of using any sort of dynamic trace caching. In the case of x86 being generated by a JIT of some sort, well, you're already looking at code only when it's being jumped to, so whether the code comes from the program, some dynamic library being loaded later, or being generated on the fly doesn't affect its execution.

woodruffw wrote at 2021-11-29 23:28:21:

Rosetta contains both an AOT static binary translator and a JIT dynamic binary translator. That’s how Apple managed to get JS engines working even when the host browser was running as x86-on-M1.

jsmith45 wrote at 2021-11-29 23:04:27:

I'd assume Rosetta works for newly marked executable pages by not actually flagging them as executable. When control flow attempts to transfer there, a page fault will occur since the page is not actually executable, this is the interrupt that allows Rosetta to step in, see what code was about to be executed, and write out a new ARM equivalent of the code to other memory, and redirect execution to the new equivalent ARM code, before resuming.

This basic sort of support is needed for any application that targeting x86 that uses any form of dynamic code generation, which is probably a whole lot more than most people think (even some forms of dynamic linking utilize small amounts of generated code, due to being more efficient than calling a method though a pointer to a pointer to the method).

a-dub wrote at 2021-11-29 23:26:30:

so every write to an executable page would have to clear that bit then, triggering an interrupt on jump to let the translator jump in?

i'd venture a guess that the rosetta jit stuff probably does some kind of prelinking.

kinda makes me wish i had an m1 mac to play with...

anyfoo wrote at 2021-11-30 00:00:15:

x86 code is never actually marked as executable from the CPU's point of view, since that CPU does not know how to execute x86 code. The pages which contain the translated code are, but those are not something the x86 code knows about.

chrisseaton wrote at 2021-11-30 01:10:36:

> x86 code is never actually marked as executable from the CPU's point of view, since that CPU does not know how to execute x86 code. The pages which contain the translated code are, but those are not something the x86 code knows about.

No, pages and the executable bit are something that the processor knows about.

anyfoo wrote at 2021-11-30 01:40:21:

Sorry, I don't understand what you are trying to say. Of course the CPU knows about pages and the executable bit? But there is no executable bit on a page filled with x86 code running on an ARM CPU, because the ARM CPU cannot execute that. It can only execute the translated ARM code that sits somewhere else, essentially out of sight for the x86 program.

chrisseaton wrote at 2021-11-30 01:45:37:

> Sorry, I don't understand what you are trying to say.

Rosetta implements x86 execution bit semantics.

It does this by invalidating translated pages when the system call to set the execution bit is set.

Which bit do you not understand?

How do you think for example the JVM works today on Rosetta?

brigade wrote at 2021-11-30 02:16:13:

The JIT'd ARM code pages are W^X, and that's not optional on macOS ARM. But W^X was opt-in on x86 macOS, so for backwards compatibility Rosetta can't require the x86 code to implement it in order to function.

So your model of how Rosetta works is off - the translation would need to support remapping the original code page read-only regardless of whether the x86 code did so, and letting a subsequent write invalidate the JIT cache of that page, instead of relying solely on the emulated process to implement W^X.

chrisseaton wrote at 2021-11-30 02:26:31:

Systems that install new machine code without changing page permissions run an instruction cache barrier after installing and before running. Rosetta catches this instruction.

brigade wrote at 2021-11-30 02:36:16:

X86 does not require any explicit barrier if you modify through the same virtual address as execution, so no.

chrisseaton wrote at 2021-11-30 02:44:52:

Not sure which bit you’re saying ‘no’ to.

Most JITs do execution an icache flush, and Rosetta does catch it to invalidate their code.

For example

https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x...

Otherwise, how do you think it works?

saagarjha wrote at 2021-11-30 02:57:30:

x86 does not require an icache flush because it has a unified cache. Rosetta emulates this correctly, which means it must be able to invalidate its code without encountering such an instruction.

chrisseaton wrote at 2021-11-30 03:39:29:

> x86 does not require an icache flush

It does if you wrote instructions from one address and execute them from another, which is why they use a flush.

> Rosetta emulates this correctly

Maybe you know more than I do, it my understanding is it does not emulate it correctly if you do not flush or change permissions.

How do you think it detects a change to executable memory without a permissions change or a flush?

saagarjha wrote at 2021-11-30 04:04:15:

Rosetta needs to support code that looks like this:

                              char *buffer = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANON | MAP_PRIVATE, -1, 0);
  *buffer = 0xc3;
  ((void (*)())buffer)();
  *buffer = 0xc3;
  ((void (*)())buffer)();

The region is RWX, and code is put into it and then executed without a cache flush. This requires careful setup by the runtime, and here's how Rosetta does it, line by line:

1. buffer is created and marked as RW-, since the next thing you do with a RWX buffer is obviously going to be to write code into it.

2. buffer is written to directly, without any traps.

3. The indirect function call is compiled to go through an indirect branch trampoline. It notices that this is a call into a RWX region and creates a native JIT entry for it. buffer is marked as R-X (although it is not actually executed from, the JIT entry is.)

4. The write to buffer traps because the memory is read-only. The Rosetta exception server catches this and maps the memory back RW- and allows the write through.

5. Repeat of step 3. (Amusingly, a fresh JIT entry is allocated even though the code is the same…)

As you can see, this allows for pretty acceptable performance for most JITs that are effectively W^X even if they don't signal their intent specifically to the processor/kernel. The first write to the RWX region "signals" (heh) an intent to do further writes to it, then the indirect branch instrumentation lets the runtime know when it's time to do a translation.

chrisseaton wrote at 2021-11-30 04:13:43:

That’s a more limited case than what we’re talking about in this thread.

Think about code that is modified without jumping into it, such as stubs that are modified or certain kinds of yield points.

saagarjha wrote at 2021-11-30 04:21:52:

Writing to an address would invalidate all JIT code associated with it, not just code that starts at that address. Lookup is done on the indirect branch, not on write, so if a new entry would be generated once execution runs through it.

anyfoo wrote at 2021-11-30 03:57:41:

> How do you think it detects a change to executable memory without a permissions change or a flush?

One way how this could be implemented was the way mentioned above: By making sure all x86-executable pages are marked r/o (in the real page tables, not from "the x86 API"). Whenever any code writes into it, the resulting page fault can flush out the existing translation and transparently return back to the x86 program, which can proceed to write into the region without taking a write fault (the kernel will actually mark them as writable in the page tables now).

When the x86 program then jumps into the modified code, no translation exists anymore, and the resulting page fault from trying to execute can trigger the translation of the newly modified pages. The (real, not-pretend) writable bit is removed from the x86 code pages again.

To the x86 code, the pages still look like they are writable, but in the actual page tables they are not. So the x86 code does not (need to) change the permission of the pages.

I don't know if that's exactly how it is implemented, but it is a way.

anyfoo wrote at 2021-11-30 01:57:36:

> Which bit do you not understand?

How you are disagreeing with me, then? The actual page table entries that the ARM CPU looks at will never mark a page containing x86 code as executable. x86 execution bit semantics are implemented, but on a different layer. From the ARM CPU's POV, the x86 code is always just data.

chrisseaton wrote at 2021-11-30 02:03:27:

> those are not something the x86 code knows about

The implementation of AMD64 is in software. It knows about page executable bits. The 'x86' code knows about them.

Again, how do you think things like V8 and the JVM work on Rosetta otherwise?

anyfoo wrote at 2021-11-30 02:16:27:

> The implementation of AMD64 is in software. It knows about page executable bits. The 'x86' code knows about them.

Where did I claim anything else? The thing I claimed the x86 code does not know about is _the pages that contain the translated ARM code_, which are distinct from the pages that contain the x86 code. The former pages are marked executable in the actual page tables, the latter pages have a software executable bit in the kernel, but are not marked as such in the actual page tables.

> Again, how do you think things like V8 and the JVM work on Rosetta otherwise?

Did I write something confusing that gave the wrong impression? My last answer says: "x86 execution bit semantics are implemented, but on a different layer".

a-dub wrote at 2021-11-30 01:37:05:

you think that x86 pages are marked executable by the arm processor? probably not.

maybe arm pages with an arm wrapper that calls the jit for big literals filled with x86 code are, or arm pages loaded with stubs that jump into the jit to compile x86 code sitting in data pages are... but if the arm processor cannot execute x86 pages directly, then it wouldn't make a lot of sense for them to be marked executable, would it?

chrisseaton wrote at 2021-11-30 01:46:36:

No the AMD64 page executable bit system is implemented in software by Rosetta.

saagarjha wrote at 2021-11-30 03:53:33:

No, it doesn't need to. Rosetta only emulates userspace, so it just needs to give the _illusion_ of protections to the program.

anyfoo wrote at 2021-11-30 04:10:21:

Ah, in this case I took "x86 execution semantics" just as how it behaves from user space, i.e. what permissions you can set and that they behave the same from an x86 observer (no matter what shenanigans is actually going behind the scenes).

chrisseaton wrote at 2021-11-30 01:09:28:

> rosetta is a jit translator

> how would it know to translate the instructions that are being generated on the fly interactively?

Just answered your own question.

emersonrsantos wrote at 2021-11-29 23:51:52:

It's a modern DOS debug.com

master_yoda_1 wrote at 2021-11-30 00:27:32:

Its better to use GCC intrinsic api

https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/x86-Built-in-Fu...

It really tough to write x86 assembly.

NavinF wrote at 2021-11-30 02:38:05:

Dunno how much things have changed, but intrinsics were kinda useless in the past:

https://danluu.com/assembly-intrinsics/

saagarjha wrote at 2021-11-30 02:59:07:

Using intrinsics correctly generally requires understanding assembly, because they are supposed to match the assembly you'd want to generate. Just sprinkling them around because you're not familiar with x86 assembly is unlikely to be productive.