💾 Archived View for nox.im › posts › 2021 › 1212 › solana-on-chain-programs captured on 2024-09-29 at 00:00:32. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-09-28)
-=-=-=-=-=-=-
In the Solana ecosystem, smart contracts are known as on-chain "programs". These programs are compiled with LLVM to an Executable and Linkable Format (ELF). It also utilizes a variation of the Berkeley Packet Filter (BPF) for the program instruction set. Storage on Solana requires low level management. For anyone who worked in embedded systems this will feel like a breeze of familiar air.
This post is a follow up on my previous notes on the Solana development environment setup[1] and touches on building and deploying Solana programs with Rust, the eBPF instruction set and what I learned so far started developing on Solana. This is all very new to me, if you find any mistakes please let me know! I'll keep expanding this article.
1: Solana development environment setup
Solana[1]
A quick overview over some fundamentals. You don't actually need to know these if you are just starting off and use the anchor framework. It may be a bit of background info though but feel free to skip this section.
Solana BPF programs have a fixed memory map,
entry point 0x100000000 stack 0x200000000 heap 0x300000000 parameters 0x400000000
Stack frames are not variable pointers but fixed in size of 4KB. The call depth is limited to 64 frames. The heap does not support free or realloc.
Note that while Rust is the language of choice in the Solana ecosystem, it should not be indimidating for developers unfamiliar with the language. With a bit of experience in any language, we should be able to understand what Rust programs do and how we can extend them.
eBPF is a fewature that allows to runs user-space code inside a sandboxed, sanity-checking virtual machine in the Linux kernel. The original **Berkeley Packet Filter (BPF)** was designed to capture and filter network packets matching specific rules. An eBPF program is "attached" to a code path in the kernel so that when the code path is traversed, attached eBPF programs are executed. It enables runtime extension and instrumentation without changing kernel source code or loading kernel modules. Such programs can be attached at run-time. The programs can access kernel data structures, tests and debugging code can be deployed without the need to recompile the kernel.
The BPF VM uses its own instruction set annd is **architecturally agnostic**, it can run on your kernel regardless if it's x86 or ARM. I showcase this in an x86 to Raspberry Pi article soon. I believe this to be a key reason of why Solana chose this approach.
Just to touch on the Linux background here, since commit daedfb22451d in 2014[1] by Alexei Starovoitov, the eBPF virtual machine is exposed to user space.
Allowing user-space code to run inside the kernel comes with apparent security and stability concerns. A number of checks are performed on every program before it is loaded and ready for execution. The eBPF program has to terminate and is not allowed to contain loops that could cause the kernel to lock.
The program's control flow graph (CFG) is checked with a depth-first search.
Unreachable instructions are prohibited and will fail to validate. A verifier then simulates all instructions and checks register and stack states to be valid. Uninitalized registers cannot be read and the frame-pointer cannot be written. Out of bounds jumps and accessing out of bounds data is prohibited. The code paths are all traversed a single time and repetition branches are pruned.
It may become apparent why this instruction set wraps itself nicely around the problem of smart contracts.
The LLVM Clang compiler has grown support for an eBPF backend that compiles C, C++, Rust and other supported languages into bytecode.
Solana likely also uses BPF because of these guarantees, because the instruction set is architecture agnostic and performant ennough to allow for just-in-time (JIT) compilation to the native architecture instruction set and thereby optimizes performance. When called, a program must be passed to something called a BPF loader which is responsible for loading and executing BPF programs. All programs export an entrypoint that the runtime looks up and calls when invoking a program.
Transactions, often abbreviated tx in code, on solana can be made up of multiple instructions. Everything that the transaction needs to process has to be passed as arguments. That includes storage (accounts, more on that in the next section if this is unfamiliar to you).
A transaction contains - an array of signatures - a message
Each digital signature is in the ed25519 binary format and consumes 64 bytes.
Signatures signal on-chain programs that the account holder has authorized the transaction.
A message contains - a header - an array of account addresses - a recent blockhash - an array of instructions
The recent blockhash to prevents duplication and gives transactions lifetimes.
This can be understood as request idempotence as for identical transactions, one is rejected.
An instruction contains - a program id index, the index into the accounts array.
- an array of account address indexes, indexing into the tx accounts
- an array of 8-bit instruction data
The data array is general purpose and program dependent. Programs are free to decide how this information is encoded into the instruction data byte array and consequently how it is used.
Accounts have a balances and data. The data is a vector of bytes. All accounts have an “owner” attribute. The owner is the public key that governs the state transitions for the account. Programs can only change data and lamports of accounts they own. If they own an account, they can transfer all lamports out and thereby close the account.
Note that the account owner cannot change on Solana. Only the system program can assign ownership and transfer of ownership can only occur once in the lifetime of an account.
An "account" on Solana is not actually a wallet. Accounts are a way for the ~~smart-contract~~ program to persist data between calls. Everything is actually an account on Solana. Wallets, programs, data. If you want to store data with your program, you need an account. If you want to access data in an instruction, you have to pass the account.
Data is not stored inside programs, except for statics or constants. Programs may create, read, update or delete accounts but require an account for these operations. Programs are special accounts that store their own code, are read-only and are marked as "executable". When designing programs, we have to decide if we want to require a larger account or store information more fine grained. The application will define this tradeoff depending on how often space is allocated and required.
Using anchor, we can use the `#[account]` directive to have the macro expand serializers for our accounts. For example, let's assume we want to store quotes on-chain.
#[account] pub struct Quote { pub author: Pubkey, pub timestamp: i64, pub data: String, }
The custom Rust attribute provided by the Anchor framework and removes boilerplate for us when defining accounts, such as parsing the account to - and from - an array of bytes. We include the `author` to keep track of the user that published the quote by storing its public key. The owner of an account will be the program that generated it. In order to perform actions such as updating or deleting own content, we need to store the pubkey of the creator.
Accounts (storage) pay rent in the form of lamports. Fractions of SOL.
Everybody that adds data to the blockchain is accountable for the amount of storage required. When an account is created, it has to be funded with money (SOL). When there is no rent to draw from an account, it is deleted from the blockchain.
Accounts with two years worth of rent attached are **"rent-exempt"** and can remain on the chain forever. The current cost is 0.01 SOL per MB of data per day, 3.65 SOL per MB per year.
The lamports an account has for rent can be transferred out in order to reclaim funds. This closes the account as no rent is remaining.
A program creator has to do low level data management on Solana, familiar to operating system and embedded systems people but as I noticed, mildly uncommon to "modern" programmers.
When a new account is created, a discriminator of exactly 8 bytes is added to the very beginning of the data to store the type of the account. The pubkey is 32 x unsigned integer of 8 bits `u8`. We're using an epoch timestamp requiring 64 bits or 8 bytes.
For the content string, using UTF-8 encoding, a character can use from 1 to 4 bytes. We need the maximum amount of bytes content could require. Let's store e.g. 40 bytes here. To store the actual length we include a 4 byte prefix. The memory layout will look as follows:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 00 | discriminator | pubkey | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 32 | pubkey | epoch | len | | |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 64 | | 96 | | 128 | | 160 | | 192 | content | 224 | (40 bytes) | 256 | | 288 | | 320 | | 352 | +-+-+-+-+-+-+-+-+-+-+-+-+ 384 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In total, here we need 8+32+8+4+8*40 bytes = 372 bytes.
Solana operates at its core like a key value store. Programs can access all accounts passed into an instruction and write to accounts that are marked writable. Accounts are passed by their public key. Account data can be executable or writable. We can inspect a deployed program with the solana toolchain:
solana program show 7abScw4KgUtHGVqCRNDKFRXMo2jPPs674VSj9JH9eYQz
Program Id: 7abScw4KgUtHGVqCRNDKFRXMo2jPPs674VSj9JH9eYQz <-- program address Owner: BPFLoaderUpgradeab1e11111111111111111111111 <-- system ProgramData Address: 44PSFxtkZWdPG8wXsTZHNcUPiUggA2WuhwcyUR7YMCk8 <-- uploaded binary Authority: 8jz3nUqvoxBwxeNSJtcPEnu1X65gaVF9Gab5qecuGET1 <-- uploader identity Last Deployed In Slot: 1727 Data Length: 398752 (0x615a0) bytes Balance: 2.776518 SOL
In the anchor tests, we can find the derived address
const newDataAccount = anchor.web3.Keypair.generate(); console.log(newDataAccount.publicKey.toBase58())
and inspect its account after we run the program:
solana account CSMTyYMtx6zqCktYCtEJ1guYeZEA3fYiYyfsgpF1jmiw
Public Key: CSMTyYMtx6zqCktYCtEJ1guYeZEA3fYiYyfsgpF1jmiw Balance: 0.0018096 SOL Owner: 7abScw4KgUtHGVqCRNDKFRXMo2jPPs674VSj9JH9eYQz Executable: false Rent Epoch: 0 Length: 132 (0x84) bytes 0000: a7 ca 14 c6 e4 42 69 d0 64 08 3a c1 9a f0 e2 ec .....Bi.d.:..... 0010: 52 2e 97 e1 38 cd c0 36 39 29 2f c3 06 bb 59 79 R...8..69)/...Yy 0020: 34 69 38 33 18 7a 48 70 9f 13 cf 61 00 00 00 00 4i83.zHp...a.... 0030: 4e 00 00 00 73 65 70 61 72 61 74 65 20 64 61 74 N...separate dat 0040: 61 20 66 72 6f 6d 20 70 72 6f 67 72 61 6d 73 2c a from programs, 0050: 20 62 65 63 61 75 73 65 20 64 61 74 61 20 61 6e because data an 0060: 64 20 69 6e 73 74 72 75 63 74 69 6f 6e 73 20 61 d instructions a 0070: 72 65 20 76 65 72 79 20 64 69 66 66 65 72 65 6e re very differen 0080: 74 2e 00 00 t...
Here we can find the storage that was written by our program onto the Solana blockchain.
You can find this example program on GitHub n0x1m/solana-raspberry-pi-test[1] which served as my test program for the Solana on the raspberry-pi post[2].
1: n0x1m/solana-raspberry-pi-test
2: Solana on the raspberry-pi post
When you executed Solana on-chain programs, chances are you have come accross what appear to be obscure error messages. For example
Error processing Instruction 1: custom program error: 0x65 ... failed: custom program error: 0x3
I didn't find a whole lot of great information on how to deal with these obscure hexadecimal codes at first. According to the solana source file[1] `/src/solana_program/program_error.rs`, some errors are typed with text, some are custom numeric returns from.
1: According to the solana source file
/// Reasons the program may fail #[derive(Clone, Debug, Deserialize, Eq, Error, PartialEq, Serialize)] pub enum ProgramError { /// Allows on-chain programs to implement program-specific error types and see them returned /// by the Solana runtime. A program-specific error may be any type that is represented as /// or serialized to a u32 integer. #[error("Custom program error: {0:#x}")] Custom(u32), #[error("The arguments provided to a program instruction where invalid")] InvalidArgument, // ...
These custom program errors always originate from the on-chain program that we're calling. When using the Anchor framwork[1], you will likely encounter anchor related error codes, e.g. `custom program error: 0xbbf`. We can lookup anchor errors at project-serum/anchor/lang/src/error.rs[2] and see that "The given account is owned by a different program than expected" for 0xbbf (decimal 3007).
2: project-serum/anchor/lang/src/error.rs
Though we always have to find out the idiosynchrasies of the individual program we're working with if these programs are not of our own making. For another example, see my Project Serum Errors section[1].
1: Project Serum Errors section
Tokens are instances of the SPL Token Program. An SPL token on Solana is analogous to an ERC20 token on Ethereum. Most open source Solana wallets allow to easily interact with SPL tokens.
You can have a token balance with a private key (being the owner) of an SPL token account whose Mint (the program address) corresponds to a respective SPL token. Instructions signed by the owner can withdraw token funds.
SOL can be transferred from wallet (public key) to another wallet. For a token transfer however, the recipient must have a token account with the compatible mint already created. To allow anyone to send someone tokens, "associated token accounts" are deterministically derived addresses for a token mint public key and a users wallet public key.
Note that we can indeed also create and initialize a new account as a dedicated token account that is owned by the user too, representing the same token, with a different address. But this is not the norm and unnecessary.
Care should be taken if you pass one of your token accounts into a program as part of an instruction that is signed with your private key. When the program has access to both your token account and your signature, it can transfer arbitrary amounts out of your token account.
Similarly to how we derive associated token accounts for our own wallet, program derived addresses (PDAs) are computed for accounts managed by a program. PDAs are derived from seeds and a program id. Solana SDKs provide a function to "find program address" that iteratively calls another function "create program address" until it finds a safe address for the given seed and program id that doesn't lie on the ed25519 curve. It returns that public key and the seed bump to make this reproducible and reusable from multiple entry points as well as from within the program. You may find programs passing in the bump and mirroring the seed.
If we were to use an address that lies on the ed25519 curve, a private key would be associated with it an an attacker could sign off transfers on behalf of our program. This would be a serious security implication.
We can get the current time from on-chain programs from sysvars[1] exposed from the cluster state. See `solana_program::clock::Clock` and it's as easy as this line:
let now_ts = Clock::get().unwrap().unix_timestamp;
More coming up soon when I have time for these notes...
You can further follow my notes on the Solana Project Serum DEX[1] to interact with a real world example of Solana's selling points.