💾 Archived View for thebird.nl › gn-gemtext-threads › topics › data-uploads › inserting-data.gmi captured on 2022-07-16 at 15:51:36. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Data Uploads: Inserting Data

Tags

assigned: bonfacem, zachs
type: bug
priority: high
status: in progress
keywords: data uploads

Introduction

The current uploader work documented in `editing-data.gmi` only caters

for the following operations by people with the right access:

- Editing phenotype and probeset metadata

- Editing sample data from a published phenotype

- Deleting sample data from published phenotypes

- Inserting data from strains that already exist in a ".geno" file

However, one of our beta users ran into a problem when attempting to

insert new trait data for BDL_10001. ATM, we can't add new samples.

Also, adding case attributes for new samples is a manual process. New

samples cannot be added because new genotype files need to be

generated when new strains/ samples are added. Addition of these

genotype files has always been manual. We can add strain data

(inserting new strains into the Strain table(s) if it's not already

there) by hacking existing code. However, this will show up-- the

strains-- in a separate "sample group" on the trait page and won't be

used for mapping until the new .geno file containing the new strains

is generated.

How is this genotype file generated?

[From Rob] Genotypes should be added by code. For some species like

humans, this won't happen; but for experimental animals and plant,

many families may grow and spread e.g. BXDs grow to the BXD Dax, then

they expand tot he BXDx Collaborative Cross DAX.

New strains are added by entering strains based on groups(InbredSetId) and SpeciesId. This is why we have the "StrainXRef" table. This is demonstrated below:

MariaDB [db_webqtl]> desc Strain;

+-----------+----------------------+------+-----+---------+----------------+

| Field     | Type                 | Null | Key | Default | Extra          |

+-----------+----------------------+------+-----+---------+----------------+

| Id        | int(20)              | NO   | PRI | NULL    | auto_increment |

| Name      | varchar(100)         | YES  | MUL | NULL    |                |

| Name2     | varchar(100)         | YES  |     | NULL    |                |

| SpeciesId | smallint(5) unsigned | NO   |     | 0       |                |

| Symbol    | varchar(20)          | YES  | MUL | NULL    |                |

| Alias     | varchar(255)         | YES  |     | NULL    |                |

+-----------+----------------------+------+-----+---------+----------------+

6 rows in set (0.001 sec)

MariaDB [db_webqtl]> desc StrainXRef;

+------------------+----------------------+------+-----+---------+-------+

| Field            | Type                 | Null | Key | Default | Extra |

+------------------+----------------------+------+-----+---------+-------+

| InbredSetId      | smallint(5) unsigned | NO   | PRI | 0       |       |

| StrainId         | int(20)              | NO   | PRI | NULL    |       |

| OrderId          | int(20)              | YES  |     | NULL    |       |

| Used_for_mapping | char(1)              | YES  |     | N       |       |

| PedigreeStatus   | varchar(255)         | YES  |     | NULL    |       |

+------------------+----------------------+------+-----+---------+-------+

5 rows in set (0.001 sec)

MariaDB [db_webqtl]> select max(Id) from Strain;

+---------+

| max(Id) |

+---------+

|   66085 |

+---------+

1 row in set (0.000 sec)

MariaDB [db_webqtl]> insert into Strain (Name,
Name2,SpeciesId,Symbol,Alias) value ("Test1","Test1",30,"Test1","Test1");

Query OK, 1 row affected (0.000 sec)

MariaDB [db_webqtl]> select max(Id) from Strain;

+---------+

| max(Id) |

+---------+

|   66086 |

+---------+

1 row in set (0.000 sec)

MariaDB [db_webqtl]> select * from Strain where Id=66086;

+-------+-------+-------+-----------+--------+-------+

| Id    | Name  | Name2 | SpeciesId | Symbol | Alias |

+-------+-------+-------+-----------+--------+-------+

| 66086 | Test1 | Test1 |        30 | Test1  | Test1 |

+-------+-------+-------+-----------+--------+-------+

1 row in set (0.000 sec)

Problems

- Integration with genotype files complicates things e.g. we can only

generate individual BXD genotypes from the "main" BXD genotype files

iff when we have the strains of each individual stored somewhere.

- CaseAttributes need to be updated manually. One option is to enable

this using the UI. ATM we need to query the Case Attribute tables

to look the BXD strain for each individual when generating the

genotype files.

- We should ideally be able to generate a set of genotype files or a

set of DB tables with all possible 20,000 BXD family genomes. As a

reference see David's smoothed WGS-based genotype files.

- Genotype files will most likely not be static. Soon, we should be

able to support the need for them to change during runtime.

- When the sample list for BDL_10001 is updated, will the sample list

for all other records be synchronized automatically.

Goal

- (Low hanging fruit) Ability to add insert new strains.

- There are 1000s of mice with computable genomes that have never been

born yet. Can we compute their phenotype? This is as difficult as

getting to Mars.