UCSC Genome Bioinformatics: FAQ

- - - - -

Frequently Asked Questions: Mirroring or Licensing the Genome Browser


	Licensing the Genome Browser or Blat Downloading the Genome Browser source Mirroring the Genome Browser Setting up a mirror with a customized data set Return to FAQ Table of Contents

Licensing the Genome Browser or Blat

Question:
"Do I need a license to install the Genome Browser or its databases on my own machine?"

Response:
A license is required for commercial use of the Genome Browser or Blat alignment tool. No license is needed for academic, nonprofit, and personal use. The data displayed by the Genome Browser is freely available for both public and commerical use with a few exceptions. Check the README.txt file in the assembly download directory to view the use restrictions specific to that release.

For information on licensing the Genome Browser or Blat tool, see the licensing page.

Downloading the Genome Browser source

Question:
"Where can I download the source code and executables for the Genome Browser?"

Response:
The Genome Browser source code and executables are freely available for academic, nonprofit, and personal use (see Licensing the Genome Browser or Blat for commerical licensing requirements). The latest version of the source code may be downloaded here.

See Downloading Blat source and documentation for information on Blat downloads.

Mirroring the Genome Browser

Question:
"Our academic institution would like to install and run the Genome Browser and its databases on our local server. How do we do this? Is there a procedure for updating the data when new tables and assemblies are released?"

Response:
Non-commerical organizations are welcome to become a Genome Browser mirror site. A license is required for commercial mirroring of the Genome Browser. For detailed procedures on creating a full or partial mirror browser, see the mirror site procedures page.

Setting up a mirror with a customized data set

Question:
"Is it possible to set up a local version of the Genome Browser that uses my own database rather than UCSC's?"

Response:
The default Genome Browser installation described on the mirror page includes all the databases and annotation tracks found on the UCSC Genome Browser website. It is possible to download a smaller data set to conserve space on your server. Or, if you prefer, you can load your local version of the Genome Browser with your own data rather than using the data supplied by UCSC.

Here are the basic steps to follow to add a new genome assembly to the Genome Browser (replace references to newGenome with the name of your genome assembly):

Download your sequences in fasta format. This format is usually organized as one file per chromosome, although unfinished assemblies may be grouped into scaffolds rather than chromosomes. In some cases, the assembly may consist only of unplaced contigs, eg. the C. briggsae assembly. In these cases, we arbitrarily group the contigs together—separated by some sort of gap—into a single chromosome chrUn (this works only for small organisms).
Repeatmask the sequence, then concatenate the masked fasta files into a single two-bit file (.2bit) using the utility faToTwoBit:
faToTwoBit newGenome.fa /gbdb/newGenome/newGenome.2bit
For information on the usage options for the faToTwoBit utility, execute the command with no arguments (this is true for all utilities in the Genome Browser source tree). The faToTwoBit utility will place the resulting .2bit file in the directory /gbdb/newGenome/newGenome.2bit.
Create a database for the new assembly using the command:
hgsql -e "create database newGenome;" mysql
Create a group table for the new database, using any of the existing UCSC grp.sql SQL dumps:
hgsql newGenome < grp.sql
hgsql newGenome -e 'load data local infile "grp.sql" into table grp;'
Create a chromInfo table from the .2bit file:
twoBitInfo newGenome.2bit stdout | \
awk '{printf "%s\t%s\t/gbdb/newGenome/newGenome.2bit\n", $1,$2}' > chromInfo.tab
hgsql newGenome < $HOME/kent/src/hg/lib/chromInfo.sql
hgsql newGenome -e 'load data local infile "chromInfo.tab" into table chromInfo;'
Add an entry for the new assembly to the dbDb and defaultDb tables in the hgcentral database. For example, the entry for the monDom1 opossum assembly was created like this:
# Enter monDom1 into dbDb and defaultDb so test browser knows about it:
hgsql -e 'INSERT INTO dbDb (name, description, nibPath, organism, \
defaultPos, active, orderKey, genome, scientificName, \
htmlPath, hgNearOk, hgPbOk, sourceName) \
VALUES("monDom1", "Oct 2004", "/gbdb/monDom1", "M. domestica", \
"scaffold_13303:1000000-11000000", 1, 33, "Opossum", \
"Monodelphis domestica", \
"/gbdb/monDom1/html/description.html", 0, 0, \
"Broad Inst. Prelim Oct04");' \
-h localhost hgcentraltest

hgsql -e 'INSERT INTO defaultDb (name, genome) \
VALUES("monDom1", "Opossum")' \
-h localhost hgcentraltest
Add the new database name to the DBS variable in the makefile in src/hg/makeDb/trackDb/makefile. You can configure the list of genome databases into which you want to load trackDb tables by editing the list in this makefile. You may also want to comment out the 'git pull' step for the 'alpha' make target in that file. For the makefile to work correctly, add a subdirectory in src/hg/makeDb/trackDb/ corresponding to the organism name (e.g. opossum), and then a newGenome subdirectory in the organism directory. Using the opossum example shown in the previous step, this would result in the directory hierarchy src/hg/makeDb/trackDb/opossum/monDom1/.
Run "make alpha" in src/hg/makeDb/trackDb/. This creates the tables trackDb and hgFindSpec that contain information about the Genome Browser tracks that will be displayed for the new assembly. For more information about creating these tables, see src/product/README.trackDb in the source tree. Do not load the trackDb table manually; instead, use the trackDb.ra file format found in the source tree at src/hg/makeDb/trackDb/ and load it with the command 'make alpha'.

The previous steps will create an empty browser for the new assembly, which can then be populated with annotation tracks, as desired. For examples of how tracks are created for the standard Genome Browser assemblies, see the files src/hg/doc/*.txt in the Genome Browser source tree (these are plain text files, not MS Word documents). The Gold and Gap tracks can be created from the genome assembly's AGP file.

If you encounter problems or have questions about this procedure, email the genome-mirror mailing list. Messages sent to this address will be posted to the moderated genome-mirror mailing list, which is archived on a public Web-accessible pipermail archive. This archive may be indexed by non-UCSC sites such as Google.