Smoking Likelihood and Your Genetics

What is Smoking Likelihood?

Smoking likelihood describes the heritable component of whether a person begins using tobacco. Genetic epidemiology has established that roughly half of the variation in smoking initiation is attributable to inherited factors. Large genome-wide studies have mapped the contributing loci across the genome, revealing a polygenic architecture in which hundreds of common variants each add a small statistical push toward or away from ever becoming a smoker.

This page presents findings from a multi-trait analysis of GWAS — MTAG — a methodology that increases statistical power by jointly analyzing genetically correlated phenotypes. By combining smoking initiation with other substance-use traits that share underlying genetic architecture, MTAG recovers signal that single-trait analyses would miss, particularly when sample sizes are moderate.

Research base: Robust.

The genetics of Smoking Likelihood

Xu et al. (2023), published in Addiction, applied MTAG to genetic data from 49,929 individuals drawn from the Yale-Penn cohort and the Penn Medicine BioBank, spanning European and African ancestry participants. The analysis jointly modeled smoking initiation alongside opioid use disorder, cannabis use disorder, and alcohol use disorder — phenotypes that share substantial genetic architecture with tobacco initiation. By leveraging the genetic correlations among these behaviors, MTAG amplified statistical power beyond what the sample size alone would deliver.

The cross-phenotype framework identified genome-wide significant loci near ENTPD6, LMO3, and XYLT1 that emerged with higher confidence in the multi-trait analysis than in smoking-only analyses of comparable size. Polygenic risk scores derived from MTAG showed stronger predictive power for each individual substance-use trait than scores from single-trait GWAS, demonstrating the practical value of cross-phenotype enrichment for discovery and for score construction.

Genes identified through MTAG for smoking initiation carry a distinct interpretive nuance: their signals are enriched because multiple addiction-related phenotypes converge on overlapping loci, suggesting involvement in shared neurobiological pathways rather than smoking-specific mechanisms. This makes them informative for understanding addiction vulnerability broadly, not only tobacco use.

Stat block: 49,929 individuals in the Xu et al. (2023) multi-trait GWAS contributed to identifying shared genetic architecture across substance-use behaviors including smoking initiation.

Stat block: 182 gene-proximal variants in the refined MTAG signal set for this smoking initiation phenotype, reflecting the higher specificity of the cross-phenotype approach compared to broader single-trait catalogs.

Key genes: ENTPD6, LMO3, AUTS2, ARID5B, and ATXN1L

The MTAG approach highlighted a set of genes with statistical support across multiple substance-use phenotypes, making them strong candidates for shared neurobiological mechanisms underlying addiction-relevant behavior.

ENTPD6 (ectonucleoside triphosphate diphosphohydrolase 6) is the highest-confidence gene in this MTAG signal set, with particularly strong protein-QTL colocalization evidence at the chromosome 20 locus. ENTPD6 hydrolyzes extracellular nucleotides and regulates purinergic signaling — a modulation pathway in the brain that intersects with dopaminergic and glutamatergic neurotransmission. Purinergic signaling influences synaptic plasticity in reward-relevant circuits, positioning ENTPD6 as a biologically plausible contributor to addiction vulnerability across substance types.

LMO3 (LIM domain only 3) is a transcriptional regulator expressed in the developing and adult nervous system. LIM-domain proteins form transcription factor complexes that govern neuronal differentiation and circuit formation. Variants near LMO3 that alter its regulatory activity during development or in response to substance exposure could modify the neural substrates underlying reward and impulse control.

AUTS2 (autism susceptibility candidate 2) is a nuclear regulator with roles in transcriptional activation and neuronal migration. Despite the historical name, AUTS2 is not exclusively associated with autism — it appears across GWAS of multiple neurodevelopmental and behavioral traits, consistent with a broad role in establishing the neural architecture underlying a range of behavioral tendencies including substance use initiation.

ARID5B (AT-rich interaction domain 5B) encodes a chromatin-binding protein that participates in transcriptional regulation during development and in response to metabolic and environmental signals. Its presence in multiple smoking- and substance-use-related GWAS suggests involvement in the epigenetic regulation of gene networks relevant to reward processing, and it appears consistently across the tobacco use genetics landscape.

ATXN1L (ataxin 1-like) belongs to the ataxin protein family and interacts with chromatin-modifying complexes involved in gene repression. Ataxin-family proteins have well-established roles in cerebellar and cortical neuron function. Its appearance in this MTAG signal reflects the convergence of neurodevelopmental and synaptic maintenance pathways at loci relevant to substance use initiation.

What the research says

The Xu et al. (2023) MTAG analysis in Addiction was designed to address a fundamental challenge in addiction genetics: the substantial genetic overlap among substance-use disorders. By treating this overlap as a resource rather than a confound, MTAG simultaneously improves discovery power for each individual phenotype and reveals the genes most likely to operate through shared mechanisms.

For smoking initiation specifically, the multi-trait analysis recovered loci that would have fallen below genome-wide significance in a single-trait GWAS of comparable sample size. This is particularly valuable for behavior genetics, where sample sizes tend to be smaller than for physiological traits. The enrichment of loci near neurodevelopmental genes — AUTS2, ARID5B, ATXN1L — is consistent with the hypothesis that early-life neural circuit formation sets a biological baseline for addiction susceptibility that persists into adulthood.

The cross-ancestry design of the Xu et al. cohort adds further confidence to loci that replicate across European and African ancestry participants. Variants surviving cross-ancestry analysis are less likely to reflect population-stratification artifacts and more likely to represent genuine functional effects at the implicated genes.

The compact gene set from this MTAG analysis — 182 gene-proximal variants compared to over 1,900 in broader smoking initiation meta-analyses — reflects the precision gain from the multi-trait approach. A smaller, higher-confidence gene list is more tractable for biological follow-up and for understanding which pathways are most centrally involved in the shared genetic risk for substance use behaviors.

How Smoking Likelihood affects you

A higher genetic score based on MTAG-derived smoking initiation loci reflects stronger statistical overlap with variants associated with ever starting to smoke across multiple genetic studies. Because these loci were identified through a cross-phenotype approach, the score also partially captures shared genetic predisposition toward substance use behaviors more broadly.

This is not a clinical prediction of behavior. The genetic tendency described here shapes the biological substrate — not the outcome. Social environment, personal decisions, access to substances, and life circumstances remain co-determinants of behavior that no genetic score can override or predict at the individual level.

For individuals with family histories of smoking or other substance use, a higher genetic score adds one piece of biological context to a clinical picture that should always be interpreted holistically.

Working with your Smoking Likelihood profile

  • MTAG-derived scores integrate signal from multiple substance-use phenotypes; a higher score may warrant attention to a range of addictive behaviors, not only tobacco use.
  • If you currently smoke, evidence-based cessation approaches — behavioral therapy, nicotine replacement, and pharmacotherapy — remain effective across the full spectrum of genetic predisposition captured by this score.
  • For never-smokers with a higher score, the most actionable response is situational awareness, not alarm. The same behavioral factors that protect non-smokers generally apply regardless of genetic profile.
  • Discuss this score alongside your complete genetic and personal health picture with a healthcare provider; it is most informative as part of a broader conversation about addiction risk.

Frequently asked questions

Q: What does MTAG mean for this trait? A: MTAG stands for multi-trait analysis of GWAS. Instead of analyzing smoking initiation in isolation, the method combines it with other genetically correlated substance-use traits. This increases statistical power to detect shared loci and recovers signals that would be missed with the same sample size in a single-trait analysis.

Q: Which genes are highlighted by the MTAG approach for smoking initiation? A: ENTPD6, LMO3, AUTS2, ARID5B, and ATXN1L are among the highest-confidence genes in this MTAG signal set. ENTPD6 stands out for its protein-QTL colocalization evidence and its role in purinergic signaling, a modulation pathway relevant to reward-circuit plasticity.

Q: Does a higher MTAG score suggest risk for other substance use behaviors? A: A higher MTAG-derived score for smoking initiation partially reflects shared genetic predisposition across substance-use phenotypes, because the method leverages genetic correlations among them. It is not a clinical indicator for any specific disorder, but it provides biological context about addiction-relevant neural pathways that span multiple behaviors.

Q: How does an MTAG-derived score differ from a standard GWAS score? A: An MTAG-derived score is based on loci identified by jointly modeling multiple traits. This makes it more sensitive to variants in shared pathways and potentially more informative than a score based on smoking initiation alone, particularly when the smoking-only sample size is limited.

Q: Is the research quality for this trait strong? A: Yes — this trait carries a robust confidence tier. The underlying GWAS evidence meets a high threshold for replication and statistical confidence. The MTAG methodology adds further credibility by requiring signals to cohere across multiple related phenotypes, reducing the chance that any individual finding is a false positive.


References

Xu K, et al. (2023). Identifying genetic loci and phenomic associations of substance use traits. Addiction. PMID: 37156939.

Data sources: GWAS Catalog, Open Targets, ClinVar, ClinGen, NCBI Gene, dbSNP, PheGenI.

Browse all traits →