My genome is ready: Download & tellmeGen reports
As previously reported, approximately three months ago, on October 23, 2025, I ordered the Ultra Whole Genome Sequencing 30x kit from tellmeGen. On December 10, 48 days later, I finally received an email with the wonderful words: “Your results are already available.”. In this post, I would like to briefly report on my experiences with my data so far.
Downloading the raw data
The raw data is linked under “Settings” at tellmeGen. There, the genetic variants are available for direct download in VCF format. For the sequencing data, you have to request an email with the download links because these files are very large. Because this is paired-end sequencing, two links are provided. I was able to download everything quickly and easily at about 20 MB/s.
The VCF file is 143 MB in size. The two compressed FASTQ files are 39 GB and 44 GB in size and each contain 360 million reads with a length of 150 base pairs each.
In addition to these three files, I would have liked to see a sequencing quality report. I will therefore generate this myself later using the raw data.
The tellmeGen reports
tellmeGen provides a variety of reports in English in the user area. A report highlights my genetic profile (i.e., my variants) in relation to a phenotype, such as a disease or trait. The reports are divided into several sections: Genetic vulnerability to health conditions (genetic risk for multifactorial diseases), Hereditary conditions (monogenic diseases and carrier status), Pharmacology, Traits (general characteristics such as height, weight, or hair color), Wellness (characteristics related to health), and Ancestry (macro-ancestry, Neanderthal ancestry).
Nothing was displayed for me in the Pharmacology section. In the other sections (except Ancestry), the phenotypes and a personal classification (such as “low/high risk” or “low/high levels”) are listed. The detailed view provides more information about the variants involved and the phenotype. If several variants are included, genes are listed. In addition, there is often a brief biological or clinical classification of the topic. I like the references at the end of the report because they allow you to trace where a statement comes from.
The Ancestry section is visually appealing and well designed for educational purposes. The visualizations are clear, much of the content is explained in an understandable way, and overall it comes across as well-rounded. At the same time, I found the topic less compelling than I had expected. A large part of it is general classification, and in my opinion, the personal implications are minimal to non-existent.
Validity
Lack of classification
What I found lacking when reviewing the reports was a clear classification of the significance of the variants used in each case. Specifically: How well do these variants actually explain the respective phenotype, and how great is the expected effect for me and my health?
In the case of classic hereditary diseases, the situation is often relatively clear-cut because individual pathogenic variants in a gene can have a major effect (although penetrance and expressivity can still vary). Most other topics in the reports, on the other hand, deal with complex, multifactorial phenotypes. These are typically influenced by a large number of genetic variants, each with small effects, and are often additionally influenced by non-genetic factors such as lifestyle and environment (Boyle 2017; Visscher 2017).
Particularly in the case of these multifactorial phenotypes, genetic studies have found many associated loci, but the explained variance often falls significantly short of the estimated heritability. This tension has been discussed for years under the term “missing heritability” and has several plausible causes, for example, a large number of variants with very small effects, rare variants, incomplete recording of structural variants or interactions (Wainschtein 2025; Brandt 2025).
Licensing barriers for polygenic risk scores?
I also noticed that although many multifactorial phenotype reports use a combination of several genetic variants in the form of a polygenic risk score (PRS), according to the report description, there are relatively few specific risk loci/variants from genome-wide association studies (GWAS). For example, the report on coronary heart disease mentions 179 loci. This seems less like a modern, genome-wide model and more like a PRS from a manageable number of GWAS top hits.
This is worth mentioning because we now understand quite well that many complex traits are highly polygenic. According to the omnigenic model, in addition to a few “core” genes, many genes expressed in relevant cell types can contribute indirectly via regulatory networks to complex traits such as multifactorial diseases. This helps explain why GWAS find numerous signals with mostly small effects distributed across the entire genome (Boyle 2017).
Accordingly, many current PRS in the literature now actually contain many thousands to millions of variants. For example, The Polygenic Score (PGS) Catalog contains scores for coronary heart disease that combine several million variants. A prominent example is the PRS by Khera et al., which uses 6,630,150 variants (vs. 179 loci, as far as can be seen from the report description by tellmeGen).
A plausible reason for the scores described in the tellmeGen reports as a combination of fewer top hits/loci could be licensing and usage terms. The PGS Catalog explicitly points out that individual scores may have specific licenses or restrictions (e.g., non-commercial). Lambert et al. also describe specific barriers to the availability of PGS data, including restrictions on sharing variants and weights for commercial reasons, as well as terms and conditions for access to GWAS summary statistics.
Not population-specific
Another point is the lack of population or ancestry specificity. Polygenic scores are not “universal” because allele frequencies and linkage disequilibrium differ between populations. This can alter both the accuracy of a PRS depending on genetic background and the distribution of scores in the respective reference population. This is precisely why you normally need either population-specific calibration or at least a clearly defined reference distribution from a suitable comparison population if you want to make statements such as “you are in the Xth percentile.” The fact that PRS accuracy can vary significantly along a genetic ancestry continuum is demonstrated, for example, by Ding et al..
However, I did not find any clear information about this in my tellmeGen reports, neither which reference population was used for the score calculation or calibration, nor where my score lies in a suitable population. There is also a structural problem: because the GWAS data sets underlying the PRS have historically been dominated by people of European descent, the scores are often less well calibrated in other populations and lose their predictive power. Martin et al. explicitly discuss this as a consequence of Eurocentric GWAS biases.
Technical Quality
The reports also lack technical quality information about the variants used. For example, there is no information about the sequencing depth at the respective position, no no per-variant quality control fields (e.g., read depth/DP, genotype quality/GQ, allele balance), and no warning flags for difficult-to-map regions. Such a quality indicator would be particularly helpful for difficult regions.
Conclusion
In my opinion, tellmeGen’s main focus is clearly on sequencing. The reports are a nice extra, but overall I find them limited in their informative value, mainly because key information for classification is missing. As a next step, I will run my FASTQ files through my own pipeline to assess the sequencing quality myself. I also want to try out other publicly available services and, if necessary, develop my own reports.
Leave a comment