Smoking Initiation Risk and Your Genetics

What is Smoking Initiation Risk?

Smoking initiation risk describes the heritable component of whether a person crosses the threshold from never having smoked regularly to becoming a regular tobacco user. The phenotype studied here uses a binary ever-regular vs never-regular definition: participants who reported ever smoking regularly are classified as cases, while those who never smoked regularly serve as controls. This precise boundary makes the GWAS signal particularly informative about the genetics of the initiation threshold specifically, rather than smoking quantity or dependence severity.

Twin and family studies consistently place heritability for smoking initiation at roughly 50 to 60 percent. The genetic architecture is polygenic — many loci each contributing a small fraction of the heritable component — with enrichment in neurobiological pathways governing reward sensitivity, impulse control, and behavioral reinforcement.

Research base: Robust.

The genetics of Smoking Initiation Risk

Liu et al. (2019), published in Nature Genetics, performed one of the largest GWAS of smoking behaviors, studying up to 1.2 million individuals across multiple cohorts. That landmark analysis covered several smoking phenotypes simultaneously, including the ever-regular vs never-regular binary definition. The scale of the cohort enabled discovery of hundreds of genome-wide significant loci and refined the genetic map of smoking initiation to a degree not previously possible.

Brazel et al. (2019), published in Biological Psychiatry, complemented large-scale common-variant GWAS with an exome chip meta-analysis spanning 152,348 to 433,216 participants across five substance-use phenotypes. By targeting coding and rare variants rather than common intronic signals, that work demonstrated that rare coding variation contributes meaningfully to the heritability of smoking behavior — accounting for roughly 1 to 2 percent of phenotypic variance and approximately 11 to 18 percent of total SNP heritability. Fine-mapping in that study narrowed putative causal variants to single base-pair resolution at 24 loci, providing unusually precise mechanistic anchors in the smoking initiation architecture.

Together, these studies establish smoking initiation as a trait with a rich, multilayered genetic structure — common polygenic variation setting the background level of heritable predisposition, with rarer coding variants adding discrete biological perturbations at specific genes.

Stat block: 1.2 million individuals in the Liu et al. (2019) GWAS of smoking behaviors, including the ever regular vs never regular initiation phenotype.

Stat block: 414 gene-proximal variants captured in the current signal landscape for this ever-regular vs never-regular smoking initiation phenotype.

Key genes: IGSF21, ARID5B, ENTPD6, ALK, and ANKRD26

The gene-level evidence for smoking initiation under the ever-regular vs never-regular definition converges on a set of candidates spanning neural adhesion, transcriptional regulation, purinergic signaling, and receptor tyrosine kinase biology.

IGSF21 (immunoglobulin superfamily member 21) holds the highest confidence ranking for this phenotype. Its strong statistical evidence — including proximity to the lead variant and protein-QTL colocalization — is consistent across multiple smoking initiation phenotypes, making it among the most replicated gene-level findings in tobacco use genetics. IGSF21 encodes a synaptic cell adhesion molecule expressed in the nervous system, where synaptic organization shapes how neural circuits encode reinforcement and reward.

ARID5B (AT-rich interaction domain 5B) is the second-ranked gene by confidence for this phenotype, appearing at loci on chromosome 10 with multiple credible sets. It encodes a chromatin-binding protein involved in transcriptional regulation during development and in response to metabolic and environmental signals. ARID5B appears across multiple tobacco-related GWAS phenotypes, consistent with a regulatory role in gene networks relevant to both initiation and dependence.

ENTPD6 (ectonucleoside triphosphate diphosphohydrolase 6) contributes enzymatic regulation of extracellular purinergic signaling. ATP and its metabolites function as neuromodulators that influence dopaminergic and glutamatergic transmission at synapses in reward-processing regions. ENTPD6's appearance near the top of the gene-confidence ranking for both this phenotype and the MTAG smoking initiation dataset suggests that purinergic signaling modulation is a reproducible biological theme across smoking initiation genetics.

ALK (anaplastic lymphoma kinase) is a receptor tyrosine kinase with a well-characterized role in cancer biology, but its function extends to the nervous system: ALK is expressed in the brain during development and regulates neuronal differentiation and axonal guidance. Emerging evidence links ALK to reward-pathway development, and its appearance in the filtered gene set for smoking initiation is consistent with a developmental neurobiological contribution to addiction susceptibility.

ANKRD26 (ankyrin repeat domain 26) encodes a scaffold protein with roles in signal transduction and cytoskeletal organization. Ankyrin-repeat proteins participate in the assembly of protein complexes at postsynaptic densities, where they influence receptor anchoring and signal integration. Variants near ANKRD26 in the smoking initiation landscape may affect synaptic signaling fidelity in circuits relevant to behavioral reinforcement.

What the research says

The Liu et al. (2019) analysis in Nature Genetics was transformative for tobacco use genetics by virtue of its scale. Studying 1.2 million individuals across smoking initiation, cessation, cigarettes per day, and age of initiation simultaneously revealed genetic correlations among these behaviors and identified loci with effects spanning multiple smoking phenotypes. For the ever-regular vs never-regular phenotype specifically, the genome-wide signal landscape confirmed the deeply polygenic architecture and pointed toward enrichment in neurological and behavioral gene sets.

Brazel et al. (2019) in Biological Psychiatry added a complementary layer by demonstrating that rare coding variation — typically missed by standard SNP arrays — contributes meaningfully to smoking initiation heritability. Fine-mapping at 24 loci to single-variant resolution provided unusually precise mechanistic targets, connecting the polygenic GWAS signal to specific functional coding changes in defined proteins. This convergence of common-variant and rare-variant evidence strengthens confidence in the biological pathways implicated across both analyses.

The consistency of IGSF21 and ARID5B as top-ranked genes across multiple smoking initiation GWAS phenotypes — including this ever-regular vs never-regular definition and broader meta-analyses — provides independent replication of their involvement at the gene level, the strongest form of cross-study validation available in this field.

How Smoking Initiation Risk affects you

A higher genetic score for smoking initiation risk means your genetic profile more closely matches the variant pattern associated with having ever smoked regularly in large population studies. This describes a statistical tendency at the population level — not a prediction of individual behavior.

The biological meaning of a higher score lies in the underlying neuroscience: genetic variation in reward-circuit synaptic organization, transcriptional regulation of neuronal gene expression, and purinergic neuromodulation collectively shape how reinforcing early tobacco exposure is likely to feel. These are upstream biological influences on the probability of initiating regular use, not downstream determinants of a fixed outcome.

Environmental and behavioral factors — social context, household norms, stress, access, and personal choice — remain the dominant proximal determinants of smoking behavior regardless of genetic score.

Working with your Smoking Initiation Risk profile

  • If you currently smoke, this genetic profile provides biological context. All major evidence-based cessation strategies — pharmacological and behavioral — are effective across the genetic spectrum.
  • For those who have never smoked regularly, a higher genetic score is informative about biological predisposition but confers no certainty about future behavior. Social and situational factors remain the primary modifiable influences.
  • Share your genetic profile with a clinician if you are in a cessation program or have a strong family history of tobacco dependence — it can complement a holistic clinical assessment.
  • Because IGSF21 and ARID5B signals appear across multiple smoking phenotypes, a higher score here may be worth considering alongside related trait scores for a fuller picture of tobacco use genetics.

Frequently asked questions

Q: What does 'ever regular vs never regular' mean as a phenotype? A: It is a binary classification: people who self-reported ever smoking on a regular basis are cases, and those who never smoked regularly are controls. This clean binary captures the threshold-crossing moment of initiation — which is genetically distinct from how much someone smokes or whether they can quit.

Q: Why do IGSF21 and ARID5B appear across multiple smoking initiation studies? A: Both genes sit near variants that reach genome-wide significance in multiple independent smoking initiation GWAS with different phenotype definitions and ancestry compositions. Replication across independent studies is the strongest evidence available in this field that the gene-level association is genuine rather than a statistical artifact.

Q: What role does ALK play in smoking initiation genetics? A: ALK is best known in oncology, but in the nervous system it is a receptor tyrosine kinase involved in neuronal differentiation and axonal guidance during brain development. Emerging evidence links ALK to reward-pathway formation, and its presence in the smoking initiation gene set points to a developmental neurobiological contribution to addiction susceptibility.

Q: Does this genetic score apply only to cigarette smoking? A: The GWAS studies underlying this trait defined smoking initiation as ever using tobacco cigarettes regularly. The genetic signals reflect biology relevant to nicotine and reward-circuit development broadly, but the phenotype definition was cigarette-specific. Whether the score generalizes to other tobacco products has not been directly tested in the studies cited here.

Q: Is the research quality for this trait strong? A: Yes — this trait carries a robust confidence tier. The Liu et al. (2019) analysis studied 1.2 million individuals, providing exceptional statistical power, and Brazel et al. (2019) independently confirmed contributions from rare coding variants. The convergence of common- and rare-variant evidence across large independent cohorts meets a high bar for confidence.


References

Liu M, et al. (2019). Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat Genet. PMID: 30643251. Brazel DM, et al. (2019). Exome Chip Meta-analysis Fine Maps Causal Variants and Elucidates the Genetic Architecture of Rare Coding Variants in Smoking and Alcohol Use. Biol Psychiatry. PMID: 30679032.

Data sources: GWAS Catalog, Open Targets, ClinVar, ClinGen, NCBI Gene, dbSNP, PheGenI.

Browse all traits →