đŸ Archived View for gemini.ctrl-c.club âș ~ssb22 âș pooler.gmi captured on 2024-08-31 at 12:44:16. Gemini links have been rewritten to link to archived content
View Raw
More Information
âŹ
ïž Previous capture (2024-06-16)
-=-=-=-=-=-=-
Primer Pooler
This program is for geneticists who want to use Multiplex PCR to study DNA samples, and wish to optimise their combinations of primers to minimise the formation of *dimers*. It has been used and cited in oncology, plant science, climatology, COVID-19 and other research.
Primer Pooler can:
- Check through each proposed pool for combinations that are likely to form dimers,
- Automatically move prospective amplicons between proposed pools to reduce dimer formation,
- Automatically search the genome sequence to find which amplicons overlap, and place their corresponding primers in separate pools,
- Optionally keep pool sizes within a specified range,
- Handle thousands of primers without being slow,
- Do all of the above with degenerate primers too.
If your CPU is modern enough to have them, Primer Pooler will take advantage of 64-bit registers and multiple cores.âBut it also runs on older equipment.
Please note that Primer Pooler does not design primers by itself: you must choose your primers first, whether by using NCBIâs Primer BLAST or any other method of your choice.âOnce you have your primers, Primer Pooler can partition them into pools.
Primer BLAST
Download Primer Pooler
If your lab runs Windows, youâll want:
- 64-bit Windows version for newer computers
64-bit Windows version
- 32-bit Windows version for older computers.
32-bit Windows version
MacPorts users may type sudo port install pooler (alternatively this Intel Mac binary should work on all OS X from 10.7)
Intel Mac binary
FreeBSD users may type pkg install pooler
For all other systems, including GNU/Linux, please compile from source (below).
There is also a version of Primer Pooler inside SeqEditor by Biotechvana.âThis was converted into Java code from an older copy and is roughly equivalent to Version 1.4 (see under Changes for what thatâs missing), with a lower limit on primer length, no validation and a slower speed.âThe conversion process introduced a bug in the bit-pattern handling that may corrupt some primersâthis was fixed in their 2021-11-19 version; please do not use earlier versions.
Example primers file
If you need it, here is an example primers file which references the human genome (download hg38.2bit from UCSC).âPrimer names in the examples file have been changed (the lab wouldnât like the real names released before they publish, so I replaced all their labels with obscure hex codes).
example primers file
hg38.2bit
Source code
To build from source, you will need:
- a C compiler (GCC or Clang) and basic Unix tools,
- MingW compiler(s) if you want to cross-compile for Windows.
Download the source code, unpack it, and type make or make win-crosscompile
source code
Usage
The easiest way to run Primer Pooler for first-time users is to run it interactively.âTo do this, simply launch the program file (pooler or pooler64), and it should ask you a series of questions to take you through what you want to do.
Questions asked by Primer Pooler when running interactively:
- Would you like to run interactively? (y/n): You should answer y to this question, otherwise Primer Pooler will merely display the command-line help (see below) and exit.
- Please enter the name of the primers file to read.âAs the program further explains, it is expecting a text file in multiple-sequence FASTA format, such as:
>toySet1-F
AGCTGCTGCTGCGATCT
>toySet1-R
GGCTGAGCGCTCAGTTT
>toySet2-F
ACGGCTTGACACCGTTCGACTG
>toySet2-R
CAGACGTTCAG
(this example does not represent real primers).âDegenerate bases are allowed using the normal letters, and both upper and lower case is allowed.âNames of ampliconsâ primers should end with F or R (for Forward and Reverse), and otherwise match.âOptionally include tags to apply to all primers (also called *tailed primers* or *barcoding*) using >tagF and >tagR (tags can also be changed part-way through the file).âIf you also have Taq probes or other primers that donât themselves make amplicons, you can include these ending with other letters, e.g. >toySet1-Pâany set of names differing in only the last character will be kept in the same pool, but you must use F for forward and R or B for reverse (backward) if you also want to check primer-pairs for overlaps in the genome.âIf you want to re-use the same primer in two amplicons (for example, two amplicons that have the same forward primer but differing reverse primers, to be found on two different genomes), then you should input the shared primer *twice*, once for each amplicon, each time naming it after the corresponding amplicon (e.g. product1-F and product2-F)âthe corresponding sets will then be kept in the same pool.âYou can also manually âfixâ a primer-set to a predetermined pool number by using a primer name prefix: >@2:myPrimer-F fixes myPrimer-F to pool 2 (in which case Primer Pooler will allocate other primer-sets around these limitations); this can be useful when you donât have a whole-genome file for overlap detection.
- Do you want to use deltaG? (y/n): As the program explains, it will need to be told the temperature and concenÂtration settings if you want it to use deltaG.âTemperature is normally the annealing temperature, about 5C below the âTmâ melting point of the primers; 45C is typical.âAlternatively you can use the faster and simpler âscoreâ method, but this is less accurate.
- If you opt to use Score when your primers and/or tags are very long, you will be asked if you are really sure you donât want to use deltaG instead.
- If you opt for deltaG, the following questions will be asked:
- Temperature: Enter a number (decimal fractions are allowed).âYou can enter it in Celsius, Kelvin, Fahrenheit or Rankine.âDo not enter the suffix C or K or F or RâPrimer Pooler will determine for itself which unit was meant, and ask you to confirm.â(Recent versions of Primer Pooler offer 5 additional obscure temperature scales if you decline all of the more probable ones.)
- Magnesium concenÂtration mM/L (0 for no correction): Enter your concenÂtration of magnesium in millimoles per litre (decimal fractions are allowed).âEnter 0 if you donât mind the deltaG figures not being corrected for magnesium concenÂtration.
- Monovalent cation (e.g. sodium) concenÂtration mM/L: Enter your concenÂtration of sodium etc in millimoles per litre (decimal fractions are allowed).âIf in doubt, try 50.
- dNTP concenÂtration mM/L (0 for no correction): Enter your concenÂtration of deoxyÂnucleotide (dNTP) in millimoles per litre (decimal fractions are allowed).âIf you have been supplied a mixture with separately-specified concenÂtrations of dATP, dCTP, dGTP and dTTP then sum these.âEnter 0 if you donât mind the deltaG figures not being corrected for dNTP concenÂtration.
- Shall I count how many pairs have what score/deltaG range? (y/n): Answer âyâ if you want a fast summary of how many pairs of primers (in the entire collection, before pooling) have what range of interaction strengths.âThis could be used for example to check a pool that you have already chosen manually, or if you want a rough idea of the worst-case scenario that pooling aims to avoid.
- If you answered yes to this question, the summary will be displayed on screen, and you will be asked if you also want to save it to a file.âIf you answer yes to this, you will be asked for a filename.
- These up-front counts will include self-interactions (a primer interacting with itself), and interactions between the pair of primers in any given set.âSelf-interactions and in-set interactions are *not* counted when summarizing the counts of each pool (below).
- Do you want to see the highest bonds of the whole file? (y/n): Similar to the above question, this can be useful for checking a manual selection or for a rough idea.âIf you answer Yes, you will be asked for a deltaG or score threshold, and all interactions worse than that threshold will be displayed on-screen with bonds diagrams such as:
5'-GGCTGAGCGCTCAGTTT-3'
xx||||||||||||xx
3'-TTTGACTCGCGAGTCGG-5'
and you will then be asked if you wish to save it to a file, and, if so, what file name.âYou will then be asked if you would like to try another threshold.
- Shall I split this into pools for you? (y/n): Most users will want to say y here, unless you merely wanted to check a batch of primers that you picked some other way.âIf you say No, Primer Pooler will forget about the primers at hand and ask you if you want to start the program again or exit.
- Shall I check the amplicons for overlaps in the genome? (y/n): If you answer yes to this, Primer Pooler will prompt you for a genome file, either in .2bit format as supplied by UCSC, or in .fa (FASTA) format.
To obtain a .2bit file from UCSC:
1. Go to http://hgdownload.cse.ucsc.edu/downloads.html
2. Choose a species (e.g. Human)
3. Choose âGenome sequence filesâ
4. If youâre under hg38, choose âStandard genome sequence filesâ
5. Scroll down to the links, and choose the one that ends .2bit (e.g. hg38.2bit)
hg38.2bit
It will then ask âDo you want me to ignore variant chromosomes i.e. sequences with _ or - in their names?â (youâll probably want to answer Yes if youâre using hg38.2bit), and will then ask for a maximum amplicon length (in base pairs): this is the maximum length of the *product*âthe number does *not* include the length of any tag sequences you have added to the primers.âThen it will scan through the genome data to detect where your amplicons start and finish, and which ones overlap.
- After the overlap scan is complete, Primer Pooler will then have enough data to write an input file for MultiPLX if you wish to run that software as well for comparison.âIf you decline this, it will ask if you want it to write a simple text file with the locations of all amplicons, which you may accept or decline.
- If you do not opt to check for overlaps in the genome, then Primer Pooler will *not* take overlaps into account when generating its pools.âThis is rarely useful unless you have *already* ensured there are no overlaps in the set of amplicons under consideration.âEven then, I would recommend performing a scan anyway, just to double-check: an early version found 11 overlaps in a supposedly overlap-free batch drawn up by an experienced academicâ*we all make mistakes*. But bypassing the overlap check might be useful *if* you are sure there are no overlaps and you donât want to download a very large genome file to the workstation youâre using.
- How many pools?âEnter a number of pools.âBefore answering this question, you will be given a âcomputer suggestionâ, which is the approximate lowest number of pools needed to achieve no worse than a deltaG of -7 (or a score of 7) in each.âIf youâre not sure how many pools, just pick a number and see.âYou will be allowed to come back to this question later and try a different number if you werenât happy with the result.
- Do you want to set a maximum size of each pool? (y/n): As the program explains, setting a maximum size of each pool can make the pools more even.âIf you decide to set a maximum, you will be asked to set the maximum number of primer-sets in each pool.âBefore answering this question you will be given a computer suggestion and a lower limit.
You will not be allowed to set the maximum size of each pool lower than the average size of each pool, since that would make it logically impossible to fit all primer-sets into all pools.âIt is not advisable to set it *just above* the average either, since being overly strict about the evenness of the pools could hinder Primer Pooler from finding a solution with lower dimer formation.âYou might want to experiment with different maximaâyou will be able to come back to this question and try again.
- Do you want to give me a time limit? (y/n): If you answer y, you will be asked to set a time limit in minutes.âNormally 1 or 2 is enough, although you may wish to let it run a long time to see if it can find better solutions.âYou donât *have* to set a time limit: you may manually interrupt the pooling process at any time and have it give the best solution it has found so far, whether a time limit is in place or not.âAdditionally, Primer Pooler will stop automatically when it detects better solutions are unlikely to be found.
- Do you want my ârandomâ choices to be 100% reproducible for demonstrations? (y/n): If you answer y, Primer Poolerâs random choices will be generated in a way that merely *look* random but are in fact completely reproducible.âThis is useful for demonstration purposesâyouâll know how long it will take to find the solution you want.âOtherwise, the random choices will be less predictable, as a different sequence will be chosen depending on the exact time at which the pooling was started.
- Pooling display While pooling is in progress, Primer Pooler will periodically display a brief summary of the best solution found so far, showing the pool sizes, and the counts of interactions (by deltaG range or score) within each pool.âAs instructed on screen, you may press Ctrl-C (i.e. hold down Ctrl while pressing and releasing C, then release Ctrl) to cancel further exploration and use the best solution found so far.
- Do you want to see the statistics of each pool? (y/n): After the pooling is complete, or after you have interrupted it (by pressing Ctrl-C as instructed on screen), you will be asked if you wish to see the interaction counts of *each* pool (rather than a simple summary of *all* pools as appeared during pooling).âIf you want this, you will also be asked if you wish to save them to a file, and, if so, what file name.
- Do you want to see the highest bonds of these pools? (y/n): If you answer Yes, you will be asked for a deltaG or score threshold, and all interactions worse than that threshold will be displayed on-screen with bonds diagrams such as:
5'-GGCTGAGCGCTCAGTTT-3'
xx||||||||||||xx
3'-TTTGACTCGCGAGTCGG-5'
and you will then be asked if you wish to save it to a file, and, if so, what file name.âYou will then be asked if you would like to try another threshold.
- Shall I write each pool to a different result file? (y/n): If you answer y to this, you will be asked for a prefix, which will be used to name the individual results files.âOtherwise, you will be asked if you wish to save all results to a single file.âIf you decline saving all results to a single file, the results will not be saved at allâthis is for when you werenât happy with the solution and want to go back to try a different number of pools or a different maximum pool size.
- Do you want to try a different number of pools? (y/n): This question is self-explanatory.âYou can go back as many times as you like, trying different numbers of pools.âBut many researchers have a pretty good idea of how many pools they want to use, or else are happy with the computerâs initial suggestion.
- Would you like another go? (y/n): If you answered No to trying a different number of pools, or if you didnât want the program to do pooling at all, then you will be asked if you want to start the program again.âAnswering No to this question will exit.
Command-line usage
Besides running interactively (see above), it is also possible to run Primer Pooler with command-line arguments.âThis section assumes familiarity with the concept of running programs from the command line.
The only *mandatory* argument (if not running interactively) is a filename for the primers file.âThis should be a text file in multiple-sequence FASTA format, such as:
>toySet1-F
AGCTGCTGCTGCGATCT
>toySet1-R
GGCTGAGCGCTCAGTTT
>toySet2-F
ACGGCTTGACACCGTTCGACTG
>toySet2-R
CAGACGTTCAG
(this example does not represent real primers).âDegenerate bases are allowed using the normal letters, and both upper and lower case is allowed.âNames of ampliconsâ primers should end with F or R, and otherwise match.âTaq probes etc can end with other letters.âIf you want to use the same primer sequence as part of two or more amplicons, then you may include two or more copies in the input with different names; theyâll be kept in the same pool.âOptionally include tags (tails, barcoding) to apply to all primers: >tagF and >tagR (tags can also be changed part-way through the file).
Processing options should be placed before this filename.âOptions are as follows:
- --help or /help or /? Show a brief help message and exit.
- --counts Show score or deltaG-range pair counts for the whole input. deltaG will be used if the --dg option is set (see below).âThis option produces a fast summary of how many primer pairs (in the entire collection, before pooling) have what range of interaction strengths.âThis could be used for example to check a pool that you have already chosen manually, or if you want a rough idea of the worst-case scenario that pooling aims to avoid.
- --self-omit Causes the --counts option to avoid counting self-interactions(a primer interacting with itself), and interactions between the pair of primers in any given set.
- --print-bonds=THRESHOLD Similar to --counts, this can be useful for checking a manual selection or for a rough idea.âAll interactions worse than the given threshold (deltaG if --dg is in use, otherwise score) will be written to standard output, with bonds diagrams.
- --dg[=temperature[,mg[,cation[,dNTP]]]] Set this option to use deltaG instead of score.âOptional parameters are the temperature (normally the annealing temperature, about 5C below the âTmâ melting point of the primers; default 45C), the concenÂtration of magnesium (default 0), the concenÂtration of monovalent cation (e.g. sodium, default 50), and the concenÂtration of deoxyÂnucleotide (dNTP, default 0). Decimal fractions are allowed in all of these.âTemperature is specified in kelvin, and all concenÂtrations are specified in millimoles per litre.
- --suggest-pools Outputs a suggested number of pools.âThis is the approximate lowest number of pools needed to achieve no worse than a deltaG of -7 (or a score of 7) in each.
- --pools[=NUM[,MINS[,PREFIX]]] Splits the primers into pools.âOptional parameters are the number of pools (if omitted or set to ? then the suggested number will be calculated and used), a time limit in minutes, and a prefix for the filenames of each pool (set this to - to write all to standard output).
- --max-count=NUM Set the maximum number of pairs per pool.âThis is optional but can make the pools more even.âA maximum lower than the average is not allowed, and itâs usually best to allow a generous margin above the average.
- --genome=PATH Check the amplicons for overlaps in the genome, and avoid these overlaps during pooling.âThe genome file may be in .2bit format as supplied by UCSC, or in .fa (FASTA) format.
- --scan-variants When searching for amplicons in a genome file, scan variant sequences in that file too, i.e. sequences with _ and - in their names.âBy default such sequences are omitted as theyâre not normally needed if using hg38.
- --amp-max=LENGTH Sets maximum amplicon length for the overlap check.âThe default is 220.
- --multiplx=FILE Write a MultiPLX input file after the --genome stage, to assist comparisons with MultiPLXâs pooling etc.
- --seedless Donât seed the random number generator
- --version Just show the program version number and exit.
Changes
Defects fixed
A defective âVersion 1.0â was on this site for only 2 days, but I have no access to the download logs so I have no idea if anybody got it.âIf you did, I strongly recommend re-downloading the current version and re-running your calculation, because Version 1.0 had important bugs that can affect results:
1. an error in incremental-update logic sometimes had the effect of generating suboptimal solutions (in particular, pools could be unnecessarily empty, and/or full beyond any limit that was set);
2. an error in the user-interface loop meant that if you use tags, run interactively, and answer âyesâ to the question âDo you want to try a different number of poolsâ, the *second* run will have been done without the tags, and its results will have been de-tagged *twice*, removing some bases from the output; moreover, the resulting truncated versions of your primers will have made it into the interaction calculations for any third run.
These bugs have now been fixed.âIn addition, Versions 1.1 through 1.13 had a bug related to the first fix, which would cause interaction-checking for pooling purposes to be performed *without* tags when running in interactive mode (command-line mode was not affected).âI therefore recommend re-running in the latest version.
Versions prior to 1.17 also had a display bug: the concenÂtrations for the deltaG calculation are in millimoles per litre, not nanomoles as stated on-screen in interactive mode (please ignore the on-screen instruction and enter millimoles, or upgrade to the latest version which fixes that instruction).âThe manual was fixed in version 1.8 (also noting that itâs per litre, not per cubic metre).
Versions prior to 1.34 would round down any decimal fraction you type when in interactive mode (for deltaG temperature, concenÂtration and threshold settings).âInternal calculation and command-line use was not affected by this bug.
Versions prior to 1.37 did not ignore whitespace characters after FASTA labels.
Versions 1.74 through 1.79 were accidentally released with only single-core binaries for the Mac.
Version 1.8 was briefly released with a regression that could sometimes result in pairs not being kept in the same pool; this was fixed in version 1.81.
Version 1.83 fixes a crash that could occur on very large servers where the number of CPU cores exceeds the number of primers, and version 1.84 fixes messages like pool sizes under unusual circumstances.
Version 1.85 changes the default annealing temperature from 37C to 45C.
Version 1.87 has an important update to maximum pool size handling.âPrevious versions accepted pool sizes in primer counts (not product counts), and incorrectly converted this to product counts in some cases where some product groups were not of size 2. Plus the user messages were confusing: this could cause issues for experimenters who wanted to set the pool size at the lower limit (which is not advisable but supported).âVersion 1.87 accepts pool sizes in product counts, and the associated messages have been revised.âDocumentation has also been fixed to clarify that itâs the last character (not the last letter) that should be different in labels of non-standard primer groups.âVersion 1.88 additionally fixes an infinite loop that can occur should the user ignore warnings and fill pools exactly to the maximum.
Notable additions
Version 1.2 added the MultiPLX output option, and Version 1.33 fixed a bug when MultiPLX output was used with tags and multiple chromosomes.âVersion 1.3 added genome reading from FASTA (not just 2bit), auto-open browser, and suggest number of pools.
Version 1.36 clarified the use of Taq probes, and allowed these to be in the input file during the overlap check.âItâs consequently stricter about the requirement that reverse primers must end with R or B: previous versions would accept any letter other than F for these.
Version 1.4 allows tags to be changed part-way through a FASTA file.âFor example, if there are two >tagF sequences, the first >tagF will set the tags for all F primers between the beginning of the file and the point at which the second >tagF is given; the second >tagF will set the tags for all F primers from that point forward.âYou can change tags as often as you like.
Version 1.5 allows primer sets to be âfixedâ to predetermined pools by specifying these as primer name prefixes, e.g. >@2:myPrimer-F fixes myPrimer-F to pool 2.
Version 1.6 detects and warns about alternative products of non-unique PCR.âIt was followed within hours by Version 1.61 which fixed a regression in the amplicon overlap check.âReporting was improved in version 1.82.
Version 1.7 makes the ignoring of variant sequences in the genome optional, and warns if primers not being found might be due to variant sequences having been ignored.
Version 1.72 changes the license to Apache 2.0.
Version 1.8 allows multiple amplicons to share one primer and to be kept together.
Glossary
- Base The nitrogenous base part of a nucleotide in a DNA sequence, represented by A, C, G or T. Informally, âbaseâ can also be used to refer to the entire nucleotide.
- Complement What the base binds with.âT binds with A and C binds with G. Complementing a sequence means swapping A for T and C for G throughout.
- Degenerate base A base weâre not sure about because of genetic variation in a population.âWe can use extra letters to specify which bases are allowable.
IUPAC/IUBMB degenerate-base codes
K - G or T
Y - C or T
S - C or G
W - A or T
R - A or G
M - A or C
B - any except A
D - any except C
H - any except G
V - any except T
N - any
- Primer *or* Oligo A short string of bases (actually nucleotides) thatâs used to start copying from the strand of DNA weâre testing.âThe primer matches up with the start of a section of DNA we want to copy.âThere are also extra structures at the two ends of the primer that set its direction: these are written as 5' (for the phosphate start) and 3' (for the hydroxyl end).âThe actual copying occurs from the *complementary* strand but we can ignore this.âPrimers are special cases of molecules called *oligonucleotides*.
- Degenerate primer A primer that has one or more degenerate bases.âIn practice, this means we manufacture separate primers for each combination of allowable bases and mix them together.âSo we have to make worst-case assumptions about these when checking for dimers or overlaps.
- Amplicon A section of the DNA weâre interested in amplifying (producing copies of).âPrimers are designed to copy it.
- Primer set Two primers, corresponding to the start and end of an amplicon.âThey must be kept in the same pool.âSometimes called a âprimer pairâ, but this might be confused with the two participants of a *dimer* (below) so I think âsetâ is better.âThe two primers in a set are called âforwardâ and âreverseâ primers, but the reverse primer is *not* a backward copy of the forward oneâif youâre reading my code, you have to be aware of the distinction between *backward*, which is just a flipped-over copy of any sequence, and *reverse*, which is the second primer of a set.âWith assistance from an enzyme called polymerase, the forward primer begins copying from the start of the amplicon, while the reverse primer begins from the end of the amplicon.âAlthough these initial copies continue for an indeterminate number of bases (probably not the whole chromosome, but longer than the region we want), the *second* cycle will apply the forward primer to the âendâ section of what the reverse primer produced, and conversely the reverse primer to the âstartâ section of what the forward primer produced, in both cases resulting in exactly the amplicon we want (which is then reduplicated in subsequent cycles).
- Negative strand The complement of the normal (positive) sequence in the genome.âIf a primer is designed to match the negative strand then you need to complement it and read it backwards to match the (positive) genome data.âIn a set, *one* of the two primers will be a negative-strand primer, but the primer file wonât tell us which one (itâs *not necessarily* the âreverseâ primer: when a chromosome has a gene on its negative strand, primers are typically labelled in the other direction so weâll see the âreverseâ primer on the positive strand followed by the âforwardâ primer on the negative).âYou canât put both primers on the *same* strand because collisions would occur during copying.
- Pool *or* Subpool *or* Group *or* Tube *or* Primer set combination (PSC) A bunch of primer-sets all drifting around in the same mixture.âWhen that mixture is added to some of our sample of DNA, the amplicons whose primer-sets are in that pool are copied (amplified) so we can measure them.âIf we can reduce the number of different pools we need, we can finish the testing more quickly and use up less of the sample, but on the other hand we want to avoid combinations that overlap or form dimers.
- Overlap Two primer-sets that access overlapping sections of the genome.âIf they are placed in the same pool, an unwanted shorter amplicon is produced.âConsider the following toy example:
....1..2..3..4....
A-----B
C-----D
C--B
If primers A and B are designed to obtain an amplicon from position 1 to 3, and C and D are designed to obtain an amplicon from 2 to 4, then placing them in the same pool will result in excessive pairings between C and B, producing a short amplicon from 2 to 3 at the expense of the other two.âThis is very bad news and we have to pick our pools to avoid it.
- Dimer Two primers stuck to each other.âThis is bad news because, if theyâre stuck to each other, theyâre not helping us test the sample.âBut a dimer is not as bad as an overlap: just because two primers *can* form a dimer doesnât mean they *will*, and the experiment might run anyway on the fraction of primers that didnât get stuck.âBut itâs *better* if each pool can have a combination of primers that tends to produce as few dimers as possible.
- Score A number that gives a rough idea of how likely it is that two primers will make a dimer.âItâs just the number of bases that bond, minus the number of bases that donât, and ignoring any bases that are left dangling off either end.âThis is repeated for all positions and the worst case is taken.
- Delta G (dG) The change in Gibbs free energy when two primers make a dimer.âThe more negative this is, the more likely dimers will form.âThis thermodynamics calculation gives better results than score, while being only a *little* slower (unless you have ridiculous numbers of degenerate bases).âIt does need to know the temperature and amounts of various chemicals, but if you donât know these, the defaults should still be reasonable for comparisons.
- Genome *All* the DNA in the cell (most species have hundreds of megabytes at the very least).âWe need data about the whole genome to work out which amplicons will overlap.âIf some parts are still unknown, we ignore those and hope for the best.
- Tag *or* index sequence *or* barcode *or* tail A constant set of extra bases added to the beginning (5'âactually the *end* on the complimentary strand) of every forward or reverse primer.âThis is used for fishing the results out of the pool.âIf you tell Primer Pooler what tags you are using, it takes them into account when checking for dimers, while ignoring them when checking the genome for amplicon overlaps.
- Efficiency The rate at which amplicons are copied, as a fraction of the ideal rate.âParticularly important in quantitative PCR (qPCR) as you need to know the copy rate for the final counts to be meaningful.âEfficiency is improved with dimer reduction, but it can also depend on manufacturing quality and equipment quality, so each batch needs to be checked experimentally.
- Massive(ly) parallel sequencing *or* next-generation sequencing *or* second-generation sequencing *or* high-throughput sequencing Base-by-base reading of thousands of short sections of a genome in parallel.âLess expensive machines in smaller labs typically need the relevant sections of the genome to be amplified first.âIf a reference copy of the genome has already been sequenced and we want to re-sequence specific sections to check them for alterations, then we can use multiplex PCR to pull out these sections.âThis may involve dealing with far more amplicons than is the case with PCR for detecting or counting genes.
- AutoDimer A 2004 program to check a single pool for dimers.âAutoDimer was coded in Visual Basic 6 and its dimer search is several thousand times slower than Primer Poolerâs; re-pooling must be done manually, as must the handling of degenerate bases.
- Thresholding A simple and fast way of grouping primer sets: âdonât add a set to a pool if the interaction badness would exceed some thresholdâ (usually dG<-7 or overlap).âThe total number of pools required is discovered by the computer, not chosen by the user.âPrimer Pooler uses thresholding to *suggest* a number of pools, but allows the user to override it for minimisation.
- Minimisation Method used by Primer Pooler to group primer sets into a user-specified number of pools, seeking to minimise the interactions within each pool.
- MPprimer A 2009 GPLd Perl+Python program for finding optimal PSCs by thresholding.âSlower than our C bit-patterns code and cannot cope with degenerate primers.
- MultiPLX A 2004 C++ program for grouping primer-sets by thresholding.âNo overlap checking: you are expected to divide the batches yourself and run them separately.âMultiPLX can score on differences between melting temperatures, and also on unwanted extra interactions between primer and product-amplicon (which isnât normally a concern when large numbers of primers are involved); its interaction calculations are slower than ours and it makes up for this by giving you the option of not checking for *every* kind of interaction.âPrimer Pooler has an option to output your primers and their products (after genome search) in MultiPLXâs input format if you wish to compare with MultiPLXâs scoring.
- Bit patterns A computer programming technique that involves writing information about different items into different binary digits of the same number, loading that number into the computerâs calculation circuitry, and getting it to do something to all its digits in one operation, thus processing many items together.âThis is even more effective on newer CPUs, because their wider registers can take even more digits at a time.âPrimer Pooler uses bit-pattern techniques for its bonding calculations.
- C compiler A computer program that takes something written in the C programming language and converts it into machine code that the CPU can run quickly.âModern C compilers can be *frighteningly* good at this, so a well-written C program can easily outpace what can be done in more âbeginner-friendlyâ languages.âThis doesnât usually matter if you just want to show things on the screen and wait for input, but you *will* notice the difference when big calculations are involved.
- C++ A computer language inspired by C but with many extra features which, if used well, can make programs easier to manage.âIn theory, well-written C++ can equal the speed of well-written C. In practice there can be problems with some C++ compilers.âSince I was handling register-level bit patterns and builtins for specific CPU opcodes, I decided not to risk it and stick with C even though I *could* have done it in C++.
- Command line A way of interacting with the computer that involves typing commands on the keyboard and seeing the computerâs response written below.âIt might not look as nice as a modern graphical desktop, but it can be quite efficient when you get used to it; moreover, if youâre writing in C then the command line tends to be the easiest interface to write for, freeing up the programmer to concentrate on the calculation part instead of having to spend all their time making it look pretty.âSometimes *another* programmer who specialises in pretty front-ends will come along later and add one.â(Iâm more of a âback-endâ than a âfront-endâ programmer.)
- CRISPR Naturally occuring DNA fragments in unicellular immune systems that have been repurposed for genetic engineering.âWidely hailed as the ânext big thingâ after PCR, but doesnât yet replace it in all cases.âCRISPR is more about editing genes like a Unix sed command (you script the edits but donât see them happen), but it can be modified to create a visible signal when a cut is made, thereby becoming a sequence-detection tool for one sequence at a time.
Citation
Silas S. Brown, Yun-Wen Chen, Ming Wang, Alexandra Clipson, Eguzkine Ochoa, and Ming-Qing Du (2017).âPrimerPooler: automated primer pooling to prepare library for targeted sequencing.âBiology Methods and Protocols.âOxford University Press. 2(1). doi:10.1093/biomethods/bpx006
License
Primer Pooler is free software, now licensed under the Apache License, version 2.0. Prior to v1.72 it was licensed under the GNU General Public License, version 3 or later; the new Apache 2 license is still GPL-compatible but with added permissions to make it more acceptable in laboratories with blanket legal policies against GPLâd code.
Thanks
Iâve lost track of how many giants Iâve stood on the shoulders of for this, but they include:
- All the scientists who figured out how DNA works and sequenced the human genome;
- Martin Richards for his BCPL bit-pattern techniques, which influenced the way I wrote the fast dimer check;
- The free/libre and open source software community for their legal research, a CÂ compiler, editor and debugger;
- my wife Yun-Wen, who needed this for her cancer-research project, provided test data and feedback, and put up with all my silly questions.
Legal
All material © Silas S. Brown unless otherwise stated. Apache is a registered trademark of The Apache Software Foundation. Biotechvana is a trademark of Biotech Vana S.L. (Spain). BLAST is a registered trademark of the National Library of Medicine. FreeBSD is a registered trademark of the FreeBSD Foundation. Intel is a trademark of Intel Corporation or its subsidiaries. Java is a registered trademark of Oracle Corporation in the US and possibly other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Mac is a trademark of Apple Inc. Python is a trademark of the Python Software Foundation. Unix is a trademark of The Open Group. Windows is a registered trademark of Microsoft Corp. Any other trademarks I mentioned without realising are trademarks of their respective holders.