04 Dec Printing out GenBank- Nucleotide Sequences 1984

 I have in front of me a copy of the book “Nucleotide sequences 1984 Part 1 A compilation from the GenBankTM and EMBL data libraries” published by IRL Press. Wow, what a surreal book for anyone used to dealing with sequence databases today. The idea that DNA sequences would be printed out, in an actual book made of paper, and put on a shelf for people to consult, takes some getting used to. To say that it is an idea that has passed is something of an understatement. I bought it for almost nothing as a curio, and it is going to sit proudly on my office shelves. I might even buy Part 2 to go with it.

The sequences range from 1967 to late 1983. The paper is not very white and slightly absorbant, not due to age I just think it was just published that way. It weighs 1.55kg and isn’t a large book. I’ve put a gallery of images below with the book next to a DNA double helix for scale! OK there is a baseball too, a strange collection of things just came to hand, apparently. Quite a number of sequences are very short (<100bp) and remind me of second gen sequence reads! Despite my incredulity at the start of this post, some of the ideas concerning open access to data, which are referred to in this book’s Introduction are very contemporary. The international sequence databases really have been important torch bearers for open access to research data for the last few decades.

There are some nice quotes in the Introduction

While computerized management of the data is needed to provide accuracy, easy maintenance, and electronic access, it is also important to publish the complete database in printed form. This first annual printed compendium effectively makes the entire collection of information available to every member of the scientific community who wishes to use it, including investigators without access to computers.

One of the goals of the collaboration between GenBank and EMBL is continued movement toward common standards and conventions for the two databases.

This compendium, drawn from the American and European databases, is the first printed compilation of substantially all nucleic acid sequences reported between 1967 and late 1983.

As combined in this compendium, the two databases contain a total of nearly three million bases from over 4000 reported sequences.

Yeast and fungal sequences are in the Plant Sequences section

The individual entries within each section are arranged alphabetically by entry name.

The records seem to be closer to EMBL format than GenBank, although Appendix E (which is in part 2) “illustrates how the format used in the compendium relates to the formats used in the two databases“. The sequences are grouped into mammalian, other vertebrate, invertebrate, plant, and organelle sequence lists. There is also a table of contents, one record per line, giving the length of the sequence and what page it is on.

The first sequence in the entire book is “APE (CHIMPANZEE) ALU TYPE DNA ACCESSION NUMBERS: J00322” and the last is “YEAST (S. CEREVISIAE) MITOCHONDRIAL VAR1 GENE 3′ FLANK . ACCESION NUMBERS: K00385”

Google books seems to have scanned in the entirety of both volumes, but I couldn’t get it to work for me. What a fantastic book.