đŸ’Ÿ Archived View for gemini.ctrl-c.club â€ș ~ssb22 â€ș pooler.gmi captured on 2024-12-17 at 11:02:08. Gemini links have been rewritten to link to archived content

View Raw

More Information

âŹ…ïž Previous capture (2024-06-16)

🚧 View Differences

-=-=-=-=-=-=-

Primer Pooler

This program is for geneticists who want to use Multiplex PCR to study DNA samples, and wish to optimise their combinations of primers to minimise the formation of *dimers*. It has been used and cited in oncology, plant science, climatology, COVID-19 and other research.

Primer Pooler can:

If your CPU is modern enough to have them, Primer Pooler will take advantage of 64-bit registers and multiple cores. But it also runs on older equipment.

Please note that Primer Pooler does not design primers by itself: you must choose your primers first, whether by using NCBI’s Primer BLAST or any other method of your choice. Once you have your primers, Primer Pooler can partition them into pools.

Primer BLAST

Download Primer Pooler

If your lab runs Windows, you’ll want:

64-bit Windows version

32-bit Windows version

MacPorts users may type sudo port install pooler (alternatively this Intel Mac binary should work on all OS X from 10.7)

Intel Mac binary

FreeBSD users may type pkg install pooler

For all other systems, including GNU/Linux, please compile from source (below).

There is also a version of Primer Pooler inside SeqEditor by Biotechvana. This was converted into Java code from an older copy and is roughly equivalent to Version 1.4 (see under Changes for what that’s missing), with a lower limit on primer length, no validation and a slower speed. The conversion process introduced a bug in the bit-pattern handling that may corrupt some primers—this was fixed in their 2021-11-19 version; please do not use earlier versions.

Example primers file

If you need it, here is an example primers file which references the human genome (download hg38.2bit from UCSC). Primer names in the examples file have been changed (the lab wouldn’t like the real names released before they publish, so I replaced all their labels with obscure hex codes).

example primers file

hg38.2bit

Source code

To build from source, you will need:

Download the source code, unpack it, and type make or make win-crosscompile

source code

Usage

The easiest way to run Primer Pooler for first-time users is to run it interactively. To do this, simply launch the program file (pooler or pooler64), and it should ask you a series of questions to take you through what you want to do.

Questions asked by Primer Pooler when running interactively:

>toySet1-F
AGCTGCTGCTGCGATCT
>toySet1-R
GGCTGAGCGCTCAGTTT
>toySet2-F
ACGGCTTGACACCGTTCGACTG
>toySet2-R
CAGACGTTCAG

(this example does not represent real primers). Degenerate bases are allowed using the normal letters, and both upper and lower case is allowed. Names of amplicons’ primers should end with F or R (for Forward and Reverse), and otherwise match. Optionally include tags to apply to all primers (also called *tailed primers* or *barcoding*) using >tagF and >tagR (tags can also be changed part-way through the file). If you also have Taq probes or other primers that don’t themselves make amplicons, you can include these ending with other letters, e.g. >toySet1-P—any set of names differing in only the last character will be kept in the same pool, but you must use F for forward and R or B for reverse (backward) if you also want to check primer-pairs for overlaps in the genome. If you want to re-use the same primer in two amplicons (for example, two amplicons that have the same forward primer but differing reverse primers, to be found on two different genomes), then you should input the shared primer *twice*, once for each amplicon, each time naming it after the corresponding amplicon (e.g. product1-F and product2-F)—the corresponding sets will then be kept in the same pool. You can also manually “fix” a primer-set to a predetermined pool number by using a primer name prefix: >@2:myPrimer-F fixes myPrimer-F to pool 2 (in which case Primer Pooler will allocate other primer-sets around these limitations); this can be useful when you don’t have a whole-genome file for overlap detection.

 5'-GGCTGAGCGCTCAGTTT-3'
    xx||||||||||||xx
3'-TTTGACTCGCGAGTCGG-5'

and you will then be asked if you wish to save it to a file, and, if so, what file name. You will then be asked if you would like to try another threshold.

To obtain a .2bit file from UCSC:

1. Go to http://hgdownload.cse.ucsc.edu/downloads.html

2. Choose a species (e.g. Human)

3. Choose “Genome sequence files”

4. If you’re under hg38, choose “Standard genome sequence files”

5. Scroll down to the links, and choose the one that ends .2bit (e.g. hg38.2bit)

hg38.2bit

It will then ask “Do you want me to ignore variant chromosomes i.e. sequences with _ or - in their names?” (you’ll probably want to answer Yes if you’re using hg38.2bit), and will then ask for a maximum amplicon length (in base pairs): this is the maximum length of the *product*—the number does *not* include the length of any tag sequences you have added to the primers. Then it will scan through the genome data to detect where your amplicons start and finish, and which ones overlap.

You will not be allowed to set the maximum size of each pool lower than the average size of each pool, since that would make it logically impossible to fit all primer-sets into all pools. It is not advisable to set it *just above* the average either, since being overly strict about the evenness of the pools could hinder Primer Pooler from finding a solution with lower dimer formation. You might want to experiment with different maxima—you will be able to come back to this question and try again.

 5'-GGCTGAGCGCTCAGTTT-3'
    xx||||||||||||xx
3'-TTTGACTCGCGAGTCGG-5'

and you will then be asked if you wish to save it to a file, and, if so, what file name. You will then be asked if you would like to try another threshold.

Command-line usage

Besides running interactively (see above), it is also possible to run Primer Pooler with command-line arguments. This section assumes familiarity with the concept of running programs from the command line.

The only *mandatory* argument (if not running interactively) is a filename for the primers file. This should be a text file in multiple-sequence FASTA format, such as:

>toySet1-F
AGCTGCTGCTGCGATCT
>toySet1-R
GGCTGAGCGCTCAGTTT
>toySet2-F
ACGGCTTGACACCGTTCGACTG
>toySet2-R
CAGACGTTCAG

(this example does not represent real primers). Degenerate bases are allowed using the normal letters, and both upper and lower case is allowed. Names of amplicons’ primers should end with F or R, and otherwise match. Taq probes etc can end with other letters. If you want to use the same primer sequence as part of two or more amplicons, then you may include two or more copies in the input with different names; they’ll be kept in the same pool. Optionally include tags (tails, barcoding) to apply to all primers: >tagF and >tagR (tags can also be changed part-way through the file).

Processing options should be placed before this filename. Options are as follows:

Changes

Defects fixed

A defective “Version 1.0” was on this site for only 2 days, but I have no access to the download logs so I have no idea if anybody got it. If you did, I strongly recommend re-downloading the current version and re-running your calculation, because Version 1.0 had important bugs that can affect results:

1. an error in incremental-update logic sometimes had the effect of generating suboptimal solutions (in particular, pools could be unnecessarily empty, and/or full beyond any limit that was set);

2. an error in the user-interface loop meant that if you use tags, run interactively, and answer “yes” to the question “Do you want to try a different number of pools”, the *second* run will have been done without the tags, and its results will have been de-tagged *twice*, removing some bases from the output; moreover, the resulting truncated versions of your primers will have made it into the interaction calculations for any third run.

These bugs have now been fixed. In addition, Versions 1.1 through 1.13 had a bug related to the first fix, which would cause interaction-checking for pooling purposes to be performed *without* tags when running in interactive mode (command-line mode was not affected). I therefore recommend re-running in the latest version.

Versions prior to 1.17 also had a display bug: the concentrations for the deltaG calculation are in millimoles per litre, not nanomoles as stated on-screen in interactive mode (please ignore the on-screen instruction and enter millimoles, or upgrade to the latest version which fixes that instruction). The manual was fixed in version 1.8 (also noting that it’s per litre, not per cubic metre).

Versions prior to 1.34 would round down any decimal fraction you type when in interactive mode (for deltaG temperature, concentration and threshold settings). Internal calculation and command-line use was not affected by this bug.

Versions prior to 1.37 did not ignore whitespace characters after FASTA labels.

Versions 1.74 through 1.79 were accidentally released with only single-core binaries for the Mac.

Version 1.8 was briefly released with a regression that could sometimes result in pairs not being kept in the same pool; this was fixed in version 1.81.

Version 1.83 fixes a crash that could occur on very large servers where the number of CPU cores exceeds the number of primers, and version 1.84 fixes messages like pool sizes under unusual circumstances.

Version 1.85 changes the default annealing temperature from 37C to 45C.

Version 1.87 has an important update to maximum pool size handling. Previous versions accepted pool sizes in primer counts (not product counts), and incorrectly converted this to product counts in some cases where some product groups were not of size 2. Plus the user messages were confusing: this could cause issues for experimenters who wanted to set the pool size at the lower limit (which is not advisable but supported). Version 1.87 accepts pool sizes in product counts, and the associated messages have been revised. Documentation has also been fixed to clarify that it’s the last character (not the last letter) that should be different in labels of non-standard primer groups. Version 1.88 additionally fixes an infinite loop that can occur should the user ignore warnings and fill pools exactly to the maximum.

Notable additions

Version 1.2 added the MultiPLX output option, and Version 1.33 fixed a bug when MultiPLX output was used with tags and multiple chromosomes. Version 1.3 added genome reading from FASTA (not just 2bit), auto-open browser, and suggest number of pools.

Version 1.36 clarified the use of Taq probes, and allowed these to be in the input file during the overlap check. It’s consequently stricter about the requirement that reverse primers must end with R or B: previous versions would accept any letter other than F for these.

Version 1.4 allows tags to be changed part-way through a FASTA file. For example, if there are two >tagF sequences, the first >tagF will set the tags for all F primers between the beginning of the file and the point at which the second >tagF is given; the second >tagF will set the tags for all F primers from that point forward. You can change tags as often as you like.

Version 1.5 allows primer sets to be “fixed” to predetermined pools by specifying these as primer name prefixes, e.g. >@2:myPrimer-F fixes myPrimer-F to pool 2.

Version 1.6 detects and warns about alternative products of non-unique PCR. It was followed within hours by Version 1.61 which fixed a regression in the amplicon overlap check. Reporting was improved in version 1.82.

Version 1.7 makes the ignoring of variant sequences in the genome optional, and warns if primers not being found might be due to variant sequences having been ignored.

Version 1.72 changes the license to Apache 2.0.

Version 1.8 allows multiple amplicons to share one primer and to be kept together.

Glossary

IUPAC/IUBMB degenerate-base codes

K - G or T

Y - C or T

S - C or G

W - A or T

R - A or G

M - A or C

B - any except A

D - any except C

H - any except G

V - any except T

N - any

....1..2..3..4....
    A-----B
       C-----D
       C--B

If primers A and B are designed to obtain an amplicon from position 1 to 3, and C and D are designed to obtain an amplicon from 2 to 4, then placing them in the same pool will result in excessive pairings between C and B, producing a short amplicon from 2 to 3 at the expense of the other two. This is very bad news and we have to pick our pools to avoid it.

Citation

Silas S. Brown, Yun-Wen Chen, Ming Wang, Alexandra Clipson, Eguzkine Ochoa, and Ming-Qing Du (2017). PrimerPooler: automated primer pooling to prepare library for targeted sequencing. Biology Methods and Protocols. Oxford University Press. 2(1). doi:10.1093/biomethods/bpx006

License

Primer Pooler is free software, now licensed under the Apache License, version 2.0. Prior to v1.72 it was licensed under the GNU General Public License, version 3 or later; the new Apache 2 license is still GPL-compatible but with added permissions to make it more acceptable in laboratories with blanket legal policies against GPL’d code.

Thanks

I’ve lost track of how many giants I’ve stood on the shoulders of for this, but they include:

Legal

All material © Silas S. Brown unless otherwise stated. Apache is a registered trademark of The Apache Software Foundation. Biotechvana is a trademark of Biotech Vana S.L. (Spain). BLAST is a registered trademark of the National Library of Medicine. FreeBSD is a registered trademark of the FreeBSD Foundation. Intel is a trademark of Intel Corporation or its subsidiaries. Java is a registered trademark of Oracle Corporation in the US and possibly other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Mac is a trademark of Apple Inc. Python is a trademark of the Python Software Foundation. Unix is a trademark of The Open Group. Windows is a registered trademark of Microsoft Corp. Any other trademarks I mentioned without realising are trademarks of their respective holders.