The human genome contains 3.2 billion base pairs. Of these, only about 1.5% codes for proteins — the classic “genes” we learned about in school. What does the remaining 98.5% do? For decades, the answer was three words: nothing useful. Scientists called it junk DNA — garbage. Until a series of discoveries began changing the entire picture, sparking one of the most heated scientific debates of the 21st century.
📖 Read more: Pet Cloning: Same DNA, Different Soul
Birth of a Label
The term “junk DNA” first appeared in a 1972 paper by geneticist Susumu Ohno. Ohno observed that gene number doesn't correspond to organism complexity — an onion has five times more DNA than a human. He concluded that most of the genome does nothing. It's evolutionary debris, remnants from ancient viruses and genetic “fossils” we simply carry along.
The idea was logical and grounded in evolutionary reasoning. If natural selection doesn't eliminate something harmless, it remains in the genome generation after generation — like a storage room that fills with things nobody uses but nobody throws away. Indeed, sequence analysis revealed that much of the genome consists of repetitive sequences with no apparent function, pseudogenes that don't produce proteins, fragments of ancient retroviruses incorporated into our DNA millions of years ago. The “junk” label stuck easily and quickly.
Transposable Elements: Genetic Parasites
Nearly 45% of human DNA consists of transposable elements — sequences that can copy themselves and “jump” to new positions in the genome. Barbara McClintock discovered them in corn in the 1940s, but it took 30 years for science to recognize their significance — she won the Nobel Prize only in 1983.
The most common in humans are LINE-1 elements (about 500,000 copies, covering nearly 17% of the genome) and Alu elements (over 1 million copies, one every 3,000 base pairs on average). Most are “dead” — mutations have rendered them useless millions of years ago. But several hundred LINE-1 elements remain active, retaining the ability to move, and occasionally cause insertional mutations in critical genes. At least 124 cases of genetic diseases are attributed to transposable element movement. This isn't junk. It's something more complex: genetic parasites coexisting with their host in a relationship ranging from dormancy to destruction.

ENCODE: The Controversial Answer
In 2012, the ENCODE project (Encyclopedia of DNA Elements) announced that 80% of the human genome shows some “biochemical activity.” The Nature publication made headlines everywhere: “There is no junk DNA!” Media translated the number to “80% of DNA has function.”
The reaction from evolutionary genetics was immediate and fierce. Evolutionary biologist Dan Graur published a scathing critique in Genome Biology and Evolution, pointing out that “biochemical activity” doesn't equal “biological function.” If a protein binds to a DNA site, this might happen randomly — the protein might “slide” along DNA without doing anything meaningful. If a sequence is transcribed into RNA, this doesn't automatically mean the RNA performs any useful function. Graur's critique was sharp: “If ENCODE is right, evolution works in ways we cannot explain.” The debate remains open today.
Regulatory DNA: The Invisible Director
If genes are actors, regulatory DNA is the director deciding what gets performed. Enhancers, silencers, insulators, promoters — these non-coding sequences determine when, where, and how much a gene will be expressed. Every cell in the body carries exactly the same DNA, but a liver cell looks nothing like a neuron. The difference isn't in the genes but in which genes are activated — and regulatory DNA controls this.
A striking example: an enhancer known as ZRS (Zone of Polarizing Activity Regulatory Sequence) sits one million base pairs away from the Sonic Hedgehog gene, but precisely controls how fingers form during embryonic development. Mutations in ZRS cause polydactyly — extra fingers or toes. No gene changed — no protein mutated. Only the “switch” controlling when and how much this gene activates changed.
📖 Read more: Dung Beetles Navigate Using the Milky Way Galaxy
Onions and Salamanders: The Size Paradox
If non-coding DNA were all functional, you'd expect the most complex creatures to have the largest genomes. They don't. The common onion (Allium cepa) has a genome of 16 billion base pairs — five times larger than humans. The salamander Necturus reaches 85 billion. If every base pair had purpose, an onion would be more complex than us.
This “C-value paradox,” as it's known in genetics, provides strong evidence that much of the genome is indeed non-functional or at least not under natural selection. Evolution doesn't always optimize everything. Sometimes DNA simply accumulates because there isn't strong enough evolutionary pressure to remove it — mere size increase doesn't kill the organism, so it isn't eliminated. The smallest and most elegant genomes, conversely, are found in organisms with very rapid reproduction — bacteria, yeasts — where every extra base pair costs precious energy and replication time.

Junk DNA and Disease
Regardless of the theoretical debate, one thing is clear: mutations in non-coding regions can cause disease. Genome-wide association studies (GWAS) show that the majority of genetic variants linked to diseases are found outside genes — in enhancers, regulatory elements, and non-coding RNAs.
An example: sickle cell anemia isn't just due to the well-known point mutation in hemoglobin. Variants in regulatory regions controlling when production switches from fetal to adult hemoglobin affect disease severity. Doctors at Boston Children's Hospital used CRISPR to modify precisely such a regulatory region in patients — and the results were dramatic. The first approved CRISPR gene therapy, Casgevy, targets non-coding DNA.
Non-Coding RNAs: A New World
Until recently, we thought DNA's purpose was to make proteins through mRNA. But thousands of non-coding RNAs (ncRNAs) intervene in critical processes. MicroRNAs, molecules only 22 nucleotides long, silence genes by binding mRNA messages and preventing their translation. Long non-coding RNAs (lncRNAs) regulate chromatin structure, affecting which genome regions are accessible. XIST, an lncRNA, inactivates an entire X chromosome in female mammals.
None of these RNAs code for protein. But their function is undeniable — and microRNAs are now at the center of research into cancer, cardiovascular disease, and neurodegenerative conditions. The world of non-coding DNA proves far more multidimensional than early geneticists imagined.
Truth Somewhere in the Middle
Reality, as often in biology, is messy and doesn't fit neat labels. Neither ENCODE's 80% is correct as a functionality estimate, nor does the old estimate that nearly everything is junk withstand serious genetic scrutiny. Serious estimates based on evolutionary conservation methods place the truly functional percentage somewhere between 8% and 15% — much more than the 1.5% coding for proteins, but far less than 80%.
The remainder isn't useless in the absolute sense — it simply isn't under evolutionary pressure. It can change, be deleted, mutate without affecting the organism. Biology isn't engineering. Nothing was designed. And sometimes, the most honest answer to “what does 98% of your DNA do?” is: a small part does critical work behind the scenes, and the rest simply exists.
Sources:
- Ohno, “So much 'junk' DNA in our genome,” Brookhaven Symposia in Biology, 1972
- ENCODE Project Consortium, “An integrated encyclopedia of DNA elements in the human genome,” Nature, 2012
