Errors, defects and missing data in the database

There are a number of errors in the database, which we try to track in this issue. These are best fixed directly in the database rather than by working around them in code.

LRS values listed as 0.000

Some LRS values in the database are wrongly listed as 0.000. They should be NULLs.

MariaDB> SELECT COUNT(*) FROM ProbeSetXRef WHERE LRS=0;
+----------+
| COUNT(*) |
+----------+
|   237327 |
+----------+

MariaDB> SELECT COUNT(*) FROM PublishXRef WHERE LRS=0;
+----------+
| COUNT(*) |
+----------+
|     1687 |
+----------+

Years are sometimes blank or other strings

Some years are blank strings or strings that contain text other than the year. These should be fixed and the year field must be made an integer.

MariaDB> SELECT COUNT(*) FROM Publication WHERE CONVERT(Year, UNSIGNED)=0;
+----------+
| COUNT(*) |
+----------+
|      854 |
+----------+

E-mail IDs with spaces

At least one e-mail IDs has spaces in it.

MariaDB> SELECT * FROM Investigators WHERE Email LIKE '% %';

Blank E-mail IDs

At least one e-mail ID is blank.

MariaDB> SELECT * FROM Investigators WHERE Email="";

Encodings and unprintable characters

At least one last name is encoded incorrectly and appear as unprintable characters.

MariaDB> SELECT LastName FROM Investigators WHERE FirstName='Yohan';

Author list is not always a comma-separated string

The list of authors is not always a strictly comma-separated string. Sometimes, there is an "and" at the end instead of a comma.

MariaDB> SELECT Authors FROM Publication WHERE Authors LIKE '% and %';

Sometimes, not all authors are listed by name. They are only mentioned as "and colleagues".

MariaDB> SELECT Authors FROM Publication WHERE Authors LIKE '%colleagues%';

At other times, there is a trailing comma at the end of the string.

MariaDB> SELECT Authors FROM Publication WHERE Authors LIKE '%,';