--- language: - dna tags: - biology - genomics - foundation-model license: apache-2.0 --- # Evo 2 (1B Base) - Hugging Face Transformers Format This repository contains the **Evo 2 (1B Base)** model, converted to the Hugging Face Transformers format. **Original Repository:** [arcinstitute/evo2_1b_base](https://huggingface.co/arcinstitute/evo2_1b_base) **Paper:** [Genome modeling and design across all domains of life with Evo 2](https://www.biorxiv.org/content/10.1101/2024.02.27.582234v1) **Authors:** Garyk Brixi, Matthew G. Durrant, Jerome Ku, Michael Poli, et al. ## Model Description Evo 2 is a biological foundation model trained on 9.3 trillion DNA base pairs from a curated genomic atlas spanning all domains of life. It uses the StripedHyena architecture to process long sequences (up to 1 million base pairs) at nucleotide-level resolution. This model is designed for tasks such as predicting the functional effects of mutations and generating novel genomic sequences. This version has been converted to be compatible with the `transformers` library, allowing for easy loading and inference. ## Usage You can load and run this model using the `transformers` library as follows: ```python import torch from transformers import Evo2ForCausalLM, Evo2Tokenizer # Replace with your local path or the Hub repo ID after uploading model_path = "path/to/this/repo" print(f"Loading model from {model_path}...") model = Evo2ForCausalLM.from_pretrained(model_path) tokenizer = Evo2Tokenizer.from_pretrained(model_path) # Move to GPU if available device = "cuda" if torch.cuda.is_available() else "cpu" model = model.to(device) # Input sequence (DNA) sequence = "ACGTACGT" print(f"Input: {sequence}") # Tokenize input_ids = tokenizer.encode(sequence, return_tensors="pt").to(device) # Generate print("Generating...") with torch.no_grad(): output = model.generate(input_ids, max_new_tokens=20) # Decode generated_sequence = tokenizer.decode(output[0]) print(f"Output: {generated_sequence}") ``` ## Citation If you use this model, please cite the original paper: ```bibtex @article{brixi2024genome, title={Genome modeling and design across all domains of life with Evo 2}, author={Brixi, Garyk and Durrant, Matthew G and Ku, Jerome and Poli, Michael and others}, journal={bioRxiv}, year={2024}, publisher={Cold Spring Harbor Laboratory} } ```