Open Enzyme Submission Guide – Open Bioeconomy Lab

Open Enzyme
Submission Guide

Decide what enzyme you would like to express

We prioritise enzymes that can be expressed in Escherichia coli and are useful a) as research tools; b) to a wide range of biologists. We’re also interested in economically useful enzymes from and/or for the global South. If your enzyme is in neither of those categories it will not be prioritised right now but please submit anyway! You can also check in with us on info@openbioeconomy.org.

You can see submitted enzymes here (coming soon) and the current backlog here (coming soon).

We encourage you to gather a group of people who are interested in the enzyme and locating resources (additional funding, volunteers to test the enzyme etc).

Find the sequences

Ideally you should submit both DNA and protein sequence of your gene of interest, in some cases the sequence can help you find the patent and in some cases the patent will be the best source of the sequence so the search strategies below may need to be performed in a circular manner. The best database to look for proteins is the Protein Data Bank (PDB).

Is it patented?

Without the Sequence

Check the data sheet and documentation from the commercial providers, they may say “patented under US patent XXXXXX” or similar.
Read the Wikipedia entry if available
Perform a Google Scholar search for mention of first paper and/or patent. If patent number is not provided it may mention the inventor or patent holder. Note that sometimes the easiest way to find a reference to the first paper is from the patent!
Search lens.org for the patent using the enzyme name, organism, patent holder and any other information gained.
- - - Add a date range to limit to early patents to those before 2005 in the first instance. Sort in reverse date order to see the earliest mentioned.
    - If you find a patent that looks likely or is an improvement on the enzyme (even if very recent), check the patents that it cites for earlier ones.
    - The enzyme may have a brand name which will not appear in the literature so might need to check the original paper or online descriptions to find out what it is e.g. Phusion is a Pfu with an sso7d domain but the exact sequence is not patented.

With the sequence

Search lens.org using PatSeq Finder using either the DNA and amino acid sequences (preferably both).

You can search use amino acid sequences to search the amino acid database (blastp) or the nucleotide database (tblastn). You can also use nucleotide sequences to search the nucleotide database and there are two algorithm choice: blastn does a direct match but we recommend tblastx which translates the nucleotides to an amino acide sequence and compares to other translated nucleotides. This second algortihm should still work with codon-optimised nuceotide sequences, whereas blaatn will not as it is looking for an exact match! Note that your nucelotide sequence must be in-frame for tblastx to work.

Unfortunately, many patents (especially the old ones) report protein sequences with a three letter code. lens.org has done a great job at reading and digitising sequences in many patents but it will not be 100%. You may still end up having to transcribe. Sorry.

Interpreting PatSeq Hits:

identify the patent that focus just about on your gene or protein (if any) PatSeq will find all the patent with your sequence in the attachments. However, most of the patents will cite your sequence because they use the protein in their invention, but the patent itself won’t be about your gene. You can filter in PatSeq for where the sequence is found in the “claims” of the patent, not mentioned as background and also filter out primers and other sequences
find if the patent covers the sequence of the gene/protein itself or its use in an application Sometimes patents are filed for an application for your protein, not on the protein sequence itself. Read carefully the claims: if the patent covers the sequence itself, within the claims is stated the % of amino acid identity that is covered by the patent (e.g. every protein that has more than 80% identity and/or it is mutated in specific amino acids.) This is a patent related to a protein sequence.
is the patent expired? In the unfortunate case that your sequence is covered by a patent, look for the expiration date: is the patent still valid? Usually patents expire 20 years after the filing date, with some exceptions. Some times you might be lucky and the patent will have an earlier expiration than anticipated, due to failing in paying the fee from the authors or for any other reason. On lens.org you can check the “Legal Info” tab in a patent record.

If your gene is free, proceed! If it is not, we might be able to fix that so get in touch.

Preparing the sequence

We recommend loading the sequences into Benchling.

Check the length, <100 bp is incorrect
Check for ATG start codon
Translate DNA and align with AA sequences to double check it matches
Remove any restriction sites that can interfere with DNA assembly. The Synthetic Biology Gene Standardizer (Shepherd, 2017) will remove most but NOT SapI, you can do this in Benchling.
Optimise the codons for expression in E. coli. We recommend using Benchling but other options are available.

References

Shepherd, Tyson R. “Refactoring gene sequences for broad assembly standards compatibility.” bioRxiv (2017): 225284.