Probe level data is used to examine the correlation structure among the
N probes that have the same nominal target. Sometimes several probes
are badly behaved or contain SNPs or indels.
The well-behaved probes were then be used in GN1, at the user's
discretion, to make an eigengene that sometimes performs quite a bit
better than the Affymetrix probeset. Essentially, the user could design
their own probesets. And the probe level data is quite fascinating to
dissect some types of cis-eQTLs—the COMT story I have attached is a
good example. Here is figure 1 that exploits this unique feature:
Ideally, the probe level data would be in GN2 with the same basic
functions as in GN1.
All we need in GN2/3 is a new table to display the probe level
expression (mean) with their metadata (melting temperature, sequence,
location, etc). The probeset ID is the Table header and name (the
parent), and the probes in the table are the children. Using our now
standard DataTable format should work well.
We have a similar parent-child relation among traits with peptides and
proteins. All of the peptides of a single protein are should have
the same parent probeset/protein. And peptides could be entered as
"probes" in the same way that we did for Affymetrix.
Arun—I wonder whether this hierarchy could be usefully combined to
handle time-series data. Probably not ;-)
In the case of probes and probesets there is almost never any overlap
of probe sequence—all are disjoint. That is also usually true of
peptides and proteins.
Pjotr, the reason we have not added much probe level data to GN1 or GN2
is because we did not have the bandwidth. Arthur simply did not have
time and I did not push the issue. Instead we just started loading the
probe level data separately as if they were probesets. This is what we
have done for peptide data and the reason that there are now "parallel"
data sets—one labeled "protein" and another as "peptide" or as "gene
level" and "exon level". We just collapse the hierarchy.