|
Question:
"Is it possible to set up a local version of the Genome
Browser that uses my own database rather than UCSC's?"
Response:
The default Genome Browser installation described on
the
mirror
page includes all the databases and annotation
tracks found on the UCSC Genome Browser website. It is
possible to download a smaller data set to conserve
space on your server. Or, if you prefer, you can load
your local version of the Genome Browser with your own
data rather than using the data supplied by UCSC.
Here are the basic steps to follow to add a new
genome assembly to the Genome Browser (replace
references to newGenome with the name of your
genome assembly):
-
Download your sequences in fasta format. This format
is usually organized as one file per chromosome,
although unfinished assemblies may be grouped into
scaffolds rather than chromosomes. In some cases, the
assembly may consist only of unplaced contigs,
eg. the C. briggsae assembly. In
these cases, we arbitrarily group the contigs
together—separated by some sort of gap—into
a single chromosome chrUn (this works only for small
organisms).
-
Repeatmask the sequence, then concatenate the masked
fasta files into a single two-bit
file (.2bit) using the utility faToTwoBit:
faToTwoBit newGenome.fa /gbdb/newGenome/newGenome.2bit
For information on the usage options for the
faToTwoBit utility, execute the command with no
arguments (this is true for all utilities in the Genome
Browser source tree). The faToTwoBit utility will place
the resulting .2bit file in the directory
/gbdb/newGenome/newGenome.2bit.
-
Create a database for the new assembly using the
command:
hgsql -e "create database newGenome;" mysql
-
Create a group table for the new database, using any of
the existing UCSC grp.sql SQL dumps:
hgsql newGenome < grp.sql
hgsql newGenome -e 'load data local infile "grp.sql" into table grp;'
-
Create a chromInfo table from the .2bit file:
twoBitInfo newGenome.2bit stdout | \
awk '{printf "%s\t%s\t/gbdb/newGenome/newGenome.2bit\n", $1,$2}' > chromInfo.tab
hgsql newGenome < $HOME/kent/src/hg/lib/chromInfo.sql
hgsql newGenome -e 'load data local infile "chromInfo.tab" into table chromInfo;'
-
Add an entry for the new assembly to the dbDb and
defaultDb tables in the hgcentral database. For example, the entry for the
monDom1 opossum assembly was created like this:
# Enter monDom1 into dbDb and defaultDb so test browser knows about it:
hgsql -e 'INSERT INTO dbDb (name, description, nibPath, organism, \
defaultPos, active, orderKey, genome, scientificName, \
htmlPath, hgNearOk, hgPbOk, sourceName) \
VALUES("monDom1", "Oct 2004", "/gbdb/monDom1", "M. domestica", \
"scaffold_13303:1000000-11000000", 1, 33, "Opossum", \
"Monodelphis domestica", \
"/gbdb/monDom1/html/description.html", 0, 0, \
"Broad Inst. Prelim Oct04");' \
-h localhost hgcentraltest
hgsql -e 'INSERT INTO defaultDb (name, genome) \
VALUES("monDom1", "Opossum")' \
-h localhost hgcentraltest
-
Add the new database name to the DBS variable in the
makefile in
src/hg/makeDb/trackDb/makefile.
You can configure the
list of genome databases into which you want to load
trackDb tables by editing the list in this makefile.
You may also want to comment out the 'git pull'
step for the 'alpha' make target in that file.
For the makefile to work correctly,
add a subdirectory in
src/hg/makeDb/trackDb/ corresponding
to the organism name (e.g. opossum), and then a
newGenome subdirectory in the organism
directory. Using the opossum example shown in the
previous step, this would result in the directory
hierarchy
src/hg/makeDb/trackDb/opossum/monDom1/.
-
Run "make alpha" in src/hg/makeDb/trackDb/. This
creates the tables trackDb and hgFindSpec that contain
information about the Genome Browser tracks that will
be displayed for the new assembly. For more information
about creating these tables, see
src/product/README.trackDb in the source tree.
Do not load
the trackDb table manually; instead, use the
trackDb.ra file format found in the source tree at
src/hg/makeDb/trackDb/ and load it with the
command 'make alpha'.
The previous steps will create an empty browser for the
new assembly, which can then be populated with
annotation tracks, as desired. For examples of how
tracks are created for the standard Genome Browser
assemblies, see the files src/hg/doc/*.txt
in the Genome Browser source tree (these are plain text
files, not MS Word documents). The Gold and Gap
tracks can be created from the genome assembly's AGP
file.
If you encounter problems or have questions about this
procedure, email the
genome-mirror mailing list.
Messages sent to this address will be posted to the
moderated genome-mirror mailing list, which is archived
on a public Web-accessible pipermail archive. This
archive may be indexed by non-UCSC sites such as Google.
| |