Skip to contents

rsynthbio is an R package that provides a convenient interface to the Synthesize Bio API, allowing users to generate realistic gene expression data based on specified biological conditions. This package enables researchers to easily access AI-generated transcriptomic data for various modalities including bulk RNA-seq, single-cell RNA-seq, microarray data, and more.

Alternatively, you can AI generate datasets from our platform website.

How to install

You can install rsynthbio from CRAN:

install.packages("rsynthbio")

If you want the development version, you can install using the remotes package to install from GitHub:

if (!("remotes" %in% installed.packages())) {
  install.packages("remotes")
}
remotes::install_github("synthesizebio/rsynthbio")

Once installed, load the package:

Authentication

Before using the Synthesize Bio API, you need to set up your API token. The package provides a secure way to handle authentication:

# Securely prompt for and store your API token
# The token will not be visible in the console
set_synthesize_token()

# You can also store the token in your system keyring for persistence
# across R sessions (requires the 'keyring' package)
set_synthesize_token(use_keyring = TRUE)

Loading your API key for a session.

# In future sessions, load the stored token
load_synthesize_token_from_keyring()

# Check if a token is already set
has_synthesize_token()

You can obtain an API token by registering at Synthesize Bio.

Security Best Practices

For security reasons, remember to clear your token when you’re done:

# Clear token from current session
clear_synthesize_token()

# Clear token from both session and keyring
clear_synthesize_token(remove_from_keyring = TRUE)

Never hard-code your token in scripts that will be shared or committed to version control.

Basic Usage

Creating a Query

The first step to generating AI-generated gene expression data is to create a query. The package provides a sample query that you can modify:

# Get a sample query
query <- get_valid_query()

# Inspect the query structure
str(query)

The query consists of:

  1. output_modality: The type of gene expression data to generate (see get_valid_modalities)
  2. mode: The prediction mode (e.g., “mean estimation” or “sample generation”)
  3. inputs: A list of biological conditions to generate data for

We train our models with diverse multi-omics datasets. There are two model types/modes available today:

  • Sample generation: This runs in “diffusion” mode and generates different results for each sample requested. Use this mode to understand the distribution of expression across sample groups.

  • Mean estimation: This is deterministic. For a given metadata specification, you will get the same values.

# Request raw counts data
result <- predict_query(query, raw_response = TRUE)

This result will be a list of two dataframes: metadata and expression

Modifying a Query

You can customize the query to fit your specific research needs:

# Change output modality
query$output_modality <- "single_cell_rna-seq"

# Adjust number of samples
query$inputs[[1]]$num_samples <- 10

# Add a new condition
query$inputs[[3]] <- list(
  metadata = list(
    sex = "male",
    sample_type = "primary tissue"
  ),
  num_samples = 3
)

Making Predictions

Once your query is ready, you can send it to the API to generate gene expression data.

# Request raw counts data
result <- predict_query(query, as_counts = TRUE)

If you want the full API response beyond just than just the result of the metadata and expression returned put raw_response = TRUE.

Working with Results

# Access metadata and expression matrices
metadata <- result$metadata
expression <- result$expression

# Check dimensions
dim(expression)

# View metadata sample
head(metadata)

You may want to process the data in chunks or save it for later use:

# Save results to RDS file
saveRDS(result, "synthesize_results.rds")

# Load previously saved results
result <- readRDS("synthesize_results.rds")

# Export as CSV
write.csv(result$expression, "expression_matrix.csv")
write.csv(result$metadata, "sample_metadata.csv")

Custom Validation

You can validate your queries before sending them to the API:

# Validate structure
validate_query(query)

# Validate modality
validate_modality(query)

Session info

Additional Resources