Access

ImmPrint is the data, and you can reach it however suits your workflow. Every route serves the same versioned snapshot.

R, the `immprintr` package

The immprintr package delivers ImmPrint as a tidy tibble, in the spirit of msigdbr.

# install.packages("remotes")
remotes::install_github("akmclz/immprintr")

library(immprintr)

ip <- immprintr()                       # latest snapshot as a tidy tibble
immprintr_versions()                    # which snapshots are bundled

# Build a signature for scoring (pan core plus a cell type refinement)
immprintr_signature("IMMPRINT_TGFB_SIGNALLING", "Fibroblast")

# Many signatures as a named list, ready for UCell or AddModuleScore
immprintr_signatures(c("IMMPRINT_IFNG_SIGNALLING", "IMMPRINT_TNF_SIGNALLING"))

Raw data, CSV or JSON

The canonical snapshot is published here for direct, language-agnostic use:

Download CSV Download JSON

Both are also versioned in the repository under data/.

What each column means

The CSV has one row per gene set, cell type and gene membership:

column	meaning
`gene_set`	ImmPrint set name, e.g. `IMMPRINT_IFNG_SIGNALLING`
`gene_symbol`	HGNC gene symbol
`pathway_level`	signalling tier: 1 = Receptor, 2 = Transducer, 3 = Effector
`regulatory_role`	the gene’s effect on its own pathway: `none` (a forward component or output), `inhibitory`, `activating` (genuine positive feedback) or `context_dependent`
`cell_type`	responding lineage; several lineages are semicolon-delimited (e.g. `Epithelial; Fibroblast`); `pan` for the cell type-agnostic core
`pmids`	semicolon-delimited PubMed ID(s) supporting the membership
`reference_gene_set`	corroborating MSigDB set, where available

The JSON carries the same fields per record (with pmids as an array and an added pathway_level_label), under a small header giving version, n_gene_sets and n_pairs.

Python

There is no dedicated package yet, so read the published files directly:

import pandas as pd

url = "https://akmclz.github.io/immprint/data/immprint_0.2.0-alpha.csv"
ip = pd.read_csv(url)

# genes of one set, by tier
ip.query("gene_set == 'IMMPRINT_IFNG_SIGNALLING'")[["gene_symbol", "pathway_level"]]

# pan core plus a lineage refinement; cell_type may list several lineages,
# so split on ";" rather than matching the whole string
def has_lineage(ct, want):
    parts = [p.strip() for p in str(ct).split(";")]
    return "pan" in parts or want in parts

tgfb_fibro = ip[(ip.gene_set == "IMMPRINT_TGFB_SIGNALLING")
                & ip.cell_type.apply(has_lineage, want="Fibroblast")]["gene_symbol"]

# a named dict of signatures
sigs = ip.groupby("gene_set")["gene_symbol"].apply(list).to_dict()

Versioning

Every snapshot is immutable. Pin the version your analysis used, both the package immprintr(version = "0.2.0-alpha") and the raw files are version-stamped, so results stay reproducible. See the changelog and how to cite.

R, the immprintr package

Raw data, CSV or JSON

What each column means

Python

Versioning

R, the `immprintr` package