1) This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic
lifestyle; platypus females
lactate, yet lay eggs; and males are equipped with venom similar to that of
reptiles.
2) Analysis of the
first monotreme genome
aligned these features with genetic innovations. We find that reptile and
platypus venom proteins
have been co-opted
independently from the same gene families; milk protein genes are conserved
despite platypuses laying
eggs; and immune gene family
expansions are directly related to platypus biology.
.
3. Expansions of protein,
non-protein-coding
RNA and microRNA families,
as well as repeat elements, are identified.
4) The platypus genome, as well
as the animal, is an amalgam of
ancestral reptilian and
derived mammalian characteristics. The
platypus karyotype comprises
52 chromosomes in both sexes14,15,
with a few large and many
small chromosomes, reminiscent of reptilian
macro- and microchromosomes.
Platypuses have multiple sex
chromosomes with some
homology to the bird Z chromosome16.
Males have five X and five Y
chromosomes, which form a chain at
meiosis and segregate into
5X and 5Y sperm17,18.
5) Non-protein-coding genes
In general, the platypus
genome contains fewer computationally predicted
non-protein-coding (nc)RNAs
(1,220 cases excluded high
repetitive small nucleolar
RNA (snoRNA) copies; see below) than
do other mammalian species
(for example, human with 4,421 Rfam
hits), similar to
observations in chicken19 (655 Rfam-based ncRNAs).
This is probably because of
the extensive retrotransposition of
ncRNAs in therian mammals
and the apparent lack of L1-mediated
retrotransposition in
chicken and platypus. The exception to this is
the platypus family of
snoRNAs, which is markedly expanded
(,2,000 matches to the Rfam
covariant models) compared to that
for therian mammals (,200).
snoRNAs are involved in RNA modifications,
in particular of ribosomal
RNA, and are often located in
introns of protein-coding
genes22.
6. Our investigations revealed a
novel
short-interspersed-element (SINE)-like, snoRNA-related
retrotransposon—which we
have labelled snoRTEs—that has duplicated
in platypus to ,40,000
full-length or truncated copies. It is
retrotransposed by means of
retrotransposon-like non-LTR (long
terminal repeat)
transposable elements (RTE) as opposed to the
L1-mediated transposition
mechanism in therians23.
7 We constructed
a complementary DNA library
of small, ncRNAs and identified 371
consensus sequences of small
RNAs that included 166 snoRNAs23
(Supplementary Table 3).
Ninety-nine of these cloned snoRNAs
are found in paralogous
families, and 21 of them belong to the
snoRTE class. The presence
of both the structural requirements
known to be important in
snoRNA function24 and evidence of their
expression are consistent
with these snoRTE elements being functional
in the platypus. Similar to
other unrelated ncRNAs that have
proliferated in therian
mammals (for example, 7SL RNA-derived
primate Alu elements,
tRNA-derived rodent identifier (ID) elements),
this recent SINE-like
expansion is probably due to chance
events. However, given the
RNA modification activity of snoRNAs,
and our increasing awareness
of the cellular importance of RNA
molecules, it might be that
some of the retrotranspositionally duplicated
RNAs were exapted into new
functions in this species.
Other small RNAs. Overall,
we found commonalities with small
RNA (sRNA) pathways of other
mammals, but also features that
are unique to monotremes.
Components of the RNA interference
machinery are conserved in
platypus, including elements of biogenesis
pathways (Dicer and Drosha)
and RNA-interference effector
complexes (argonaute
proteins; Supplementary Table 4). Of
20,924,799 platypus and
echidna sRNA reads derived from liver,
kidney, brain, lung, heart
and testis, 67% could be assigned to
known microRNA (miRNA)
families. Established patterns of
miRNA expression were
generally recapitulated in monotremes.
To determine the
conservation patterns of miRNAs in platypus, we
identified platypus miRNAs
sharing at least 16-nucleotide identity with
miRNAs in eutherian mammals
(mouse/human) and chicken.
Although most conserved
miRNAs were identified across these vertebrate
lineages (137 miRNAs), 10
miRNAs were shared only with eutherians
(mouse/human) and 4 only
with chicken (Fig. 2a). miRNAs can
be classified into families
based on identity of the functional ‘seed’
region at position 2–8 of
the mature miRNA strand. We identified
miRNA families that were
shared between platypus and eutherians
but not chicken (40
families), or between platypus and chicken but
not eutherians (8 families), suggesting that
for some miRNAs only
the seed region may have
been selectively conserved (Fig. 2a).
Conserved miRNAs tended to
be more robustly expressed in the
platypus tissues analysed
than lineage-restricted miRNAs (Fig. 2b).
To identify miRNAs unique to
monotremes we used a heuristic
search that identifies miRNA
candidates in deep-sequencing data
sets25. This method
predicted 183 novel miRNAs in platypus and
echidna (Fig. 2a). Notably,
92 of these lay in 9 large clusters, on
platypus chromosome X1 and
contigs 1754, 7160, 7359, 8388,
11344, 22847, 198872 and
191065. Physical mapping confirmed that
at least five of these
contigs are linked to the long arm of chromosome
X1 (ref. 25). These
abundantly expressed clusters were sequenced
almost exclusively from
platypus and echidna testis (Fig. 2b). The
expansion of this unique
miRNA class and its expression domain
suggest possible roles in
monotreme reproductive biology25.
Piwi-interacting RNAs
(piRNAs) associate with a germlineexpressed
clade of argonaute proteins,
known as Piwis26, and have
a role in transposon
silencing and genome methylation26. Monotreme
piRNAs bear strong
structural similarity to those in eutherians.
They are,29 nucleotides in
length and arise from large testis-specific
genomic clusters with
distinct genomic strand asymmetry, often with
a typical ‘bidirectional’
organization. We identified 50 major platypus
piRNA clusters as well as
numerous smaller clusters25. In contrast
to piRNAs in mouse, platypus
piRNAs are repeat-rich and bear
strong signatures of active
transposon defence.
8) Gene evolution
Overall this resulted in 18,527
protein-coding genes being
predicted from the current platypus
assembly.
As expected, the majority of
platypus genes (82%; 15,312 out of
18,596) have orthologues in
these five other amniotes (Supplementary
Table 5). The remaining ‘orphan’
genes are expected to primarily
reflect rapidly evolving
genes, for which no other homologues are
discernible, erroneous
predictions, and true lineage-specific genes
that have been lost in each
of the other five species under consideration.
Simple 1:1 orthologues,
which have been conserved without
duplication, deletion or
non-functionalization across the five mammalian
species, were greatly
enriched in housekeeping functions,
such as metabolism, DNA
replication and mRNA splicing
8. Chemoreception. The
semi-aquatic platypus was expected to sense
its terrestrial, but not
aquatic, environment by detecting airborne
odorants using olfactory
receptors and vomeronasal receptors (types
1 and 2: V1Rs, V2Rs).
Nevertheless large numbers of odorant receptor,
V1R and V2R homologues
(approximately 700, 950 and 80,
respectively) are apparent
in the platypus genome assembly, although
for each family only a
minority lack frame disruptions (approximately
333, 270 and 15,
respectively)34. The large expansion
of the platypus V1R gene
family might reflect sensory adaptations
for pheromonal communication
or, more generally, for the detection
of water-soluble,
non-volatile odorants, during underwater
foraging.
The platypus odorant
receptor gene repertoire is roughly one-half
as large as those in other
mammals37. Nevertheless, platypus odorant
receptors fall into class,
family and subfamily structures that are well
represented from across the
mammals, with a few notable exceptions
such as family 14 (Fig. 3a).
Together with the finding that lizard
contains only ,200 odorant
receptor genes and pseudogenes, this
indicates that the platypus
olfactory repertoire is, as expected, more
akin to other mammals than
it is to sauropsids.
10. Eggs. Fertilization in the
platypus exhibits both sauropsid and therian
characteristics. Platypus
ova are small (4mm diameter) relative
to comparably sized reptiles
and birds, and eggs hatch at an early
stage of development so that
most growth of the embryo and infant is
dependent on lactation, as
in marsupials. Like all mammals and
many other amniotes, when
fertilization occurs the ovum is invested
with a zona pellucida. The
platypus genome encodes each of the four
proteins of the human zona
pellucida38, as well as two ZPAX genes
(Table 1) that previously
were observed only in birds, amphibians
and fish. The
aspartyl-protease nothepsin is present in platypus, but
has been lost from marsupial
and eutherian genomes (Table 1). In
zebrafish, this gene is
specifically expressed in the liver of females
under the action of
oestrogens, and accumulates in the ovary39.
These are the same
characteristics as of the vitellogenins, indicating
that nothepsin may be
involved in processing vitellogenin or other
egg-yolk proteins. We find
that platypus has retained a single vitellogenin
gene and pseudogene, whereas
sauropsids such as chicken
have three and the
viviparous marsupials and eutherians have none.
Spermatozoa. Orthologues of
many of the eutherian sperm membrane
proteins related to
fertilization40 are present in platypus (and
marsupial) genomes. These
include the genes for a number of putative
zona pellucida receptors and
proteins implicated in sperm–
oolemma fusion.
Testis-specific proteases, which in eutherians participate
in degradation of the zona
pellucida during fertilization, are
all absent from the platypus
genome assembly.
Monotreme spermatozoa
undergo some post-testicular maturational
changes, including the
acquisition of progressive motility, loss
of cytoplasmic droplets and
aggregation of single spermatozoa into
bundles during passage through
the epididymis11. Nevertheless,
maturational changes in the
sperm surface that are both unique
and essential in other
mammals for fertilization of the ovum have
yet to be identified. Also,
the epididymis of monotremes is not highly
adapted for sperm storage as
in most marsupial and eutherian mammals.
Consistent with these
findings is the absence of platypus genes
for the epididymal-specific
proteins that have been implicated in
sperm maturation and storage
in other mammals. The most abundant
secreted protein in the
platypus epididymis is a lipocalin, the
homologues of which are the
most secreted proteins in the reptilian
epididymis41. Notably, ADAM7,
a protease that is secreted in the
epididymis of eutherians,
has an orthologue in the platypus. This is
a bona fide protease with a
characteristic Zn21-coordinating
sequence HExxH in the
platypus, in the opossum and the tree shrew
(Tupaia belangeri). However,
loss of its proteolytic activity is predicted
in eutherians42 owing to a
single point mutation within its
active site (E to Q).
11. Lactation and dentition. Lactation
is an ancient reproductive trait
whose origin predates the
origin of mammals. It has been proposed
that early lactation evolved
as a water source to protect porous
parchment-shelled eggs from
desiccation during incubation43 or as
a protection against
microbial infection. Parchment-shelled egglaying
monotremes also exhibit a
more ancestral glandular mammary
patch or areola without a
nipple that may still possess roles in egg
protection. However, in
common with all mammals, the milk of
monotremes has evolved
beyond primitive egg protection into a true
milk that is a rich
secretion containing sugars, lipids and milk proteins
with nutritional,
anti-microbial and bioactive functions. In a
reflection of this eutherian
similarity platypus casein genes are tightly
clustered together in the
genome, as they are in other mammals,
although platypus contains a
recently duplicated b-casein gene
(Supplementary Fig. 2).
12. Mammalian casein genes are
thought to have originally arisen by
duplication of either
enamelin or ameloblastin44, both of which are
tooth enamel matrix protein
genes that are located adjacent to the
casein gene cluster in
eutherians and, we find, also in platypus. Adult
platypuses, as well as
echidnas, lack teeth but the conservation of
these enamel protein genes
is consistent with the presence of teeth
and enamel in the juvenile,
as well as the fossil platypuses45.
Venom. Only a handful of
mammals are venomous, but the male
platypus is unique among
them in delivering its poison not via a bite
but from hind-leg spurs. Despite the obvious
difficulties in obtaining
samples, it is now known
that platypus venom is a cocktail of at least
19 different substances46 including
defensin-like peptides (vDLPs),
C-type natriuretic peptide
(vCNP) and nerve growth factor (vNGF).
When analysed
phylogenetically and mapped to the platypus genome
assembly, these sequences are
revealed to have arisen from local
duplications of genes
possessing very different functions (Fig. 4).
Notably, duplications in
each of the b-defensin, C-type natriuretic
peptide and nerve growth
factor gene families have also occurred
independently in reptiles
during the evolution of their venom47.
Convergent evolution has
thus clearly occurred during the independent
evolution of reptilian and
monotreme venom48.
13. Immunity. Although the major
organs of the monotreme immune
system are similar to those
of other mammals49, the repertoire of
immunity molecules shows
some important differences from those
of other mammals. In
particular, the platypus genome contains at
least 214 natural killer
receptor genes (Supplementary Notes 18)
within the natural killer complex,
a far larger number than for human
(15 genes50), rat (45 genes50)
or opossum (9 genes51).
Both platypus and opossum
genomes contain gene expansions in
the cathelicidin
antimicrobial peptide gene family (Supplementary
Fig. 3). Among eutherians,
primates and rodents have a single cathelicidin
gene52,53, whereas sheep and
cows have numerous genes that
have been duplicated only
recently54. The expanded repertoire of
cathelicidin genes in both
marsupials and monotremes may arm their
immunologically naive young
with a diverse arsenal of innate
immune responses. In
eutherians, with their increases in length of
gestation and advances in
development in utero of their immune
systems, the diversity of
antimicrobial peptide genes may have
become less critical. The
platypus genome also contains an expansion
in the macrophage
differentiation antigen CD163 gene family
(Supplementary Notes 18).
14. Genome landscape
First, we analyse the
phylogenetic position of platypus and confirm
that marsupials and
eutherians are more closely related than either is
tomonotremes (Supplementary
Notes 19).Wethen describe platypus
chromosomes and observe some
properties of platypus interspersed
and tandem repeats. We also
discuss a potential relationship between
interspersed repeats and
genomic imprinting and investigate how the
extremely highG1Cfraction in
platypus affects the strong association
seen in eutherians between
CpG islands and gene promoters.
Platypus chromosomes. Platypus
chromosomes provide clues to the
relationship between mammal
and reptile chromosomes, and to the
origins of mammal sex
chromosomes and dosage compensation. Our
analysis provides further
insight with the following findings: the 52
platypus chromosomes show no
correlation between the position of
orthologous genes on the
small platypus chromosomes and chicken
microchromosomes; for the
unique 5X chromosomes of platypus we
reveal considerable sequence
alignment similarity to chicken Z and no
orthologous gene alignments
to human X, implying that the platypusX
chromosome evolved directly
from a bird-like ancestral reptilian system
55; and the genes on the
five platypus X chromosomes appear to be
partially dosage compensated
(Supplementary Fig. 5), perhaps parallel
to the incomplete dosage
compensation recently described in birds56.
Repeat elements. About
one-half of the platypus genome consists of
interspersed repeats derived
from transposable elements. The most
abundant and still active
repeats are (severely truncated) copies of the
5-kb
long-interspersed-element (LINE2) and its non-autonomous
SINE-companion
mammalian-wide interspersed repeat (MIR,
Mon-1 in monotremes) that
became extinct in marsupials and in
eutherians 60–100 Myr ago.
We estimate that there are 1.9 and 2.75
million copies of LINE2 and
MIR/Mon-1, respectively, in the 2.3-Gb
platypus genome. DNAtransposons and LTR
retroelements are quite
rare in platypus, but there
are thousands of copies of an ancient
gypsy-class LTR element (all
LTR elements previously identified in
mammals, birds, or reptiles
belong to the retrovirus clade). Overall,
the frequency of
interspersed repeats (over 2 repeats per kb) is
higher than in any
previously characterized metazoan genome.
Population analysis using
LINE2/Mon-1 elements distinguished
the Tasmanian population
from three other mainland clusters
(Supplementary Fig. 4a, b),
in good agreement with tree-based
analysis, physical proximity
and previous knowledge of platypus
population relationships57.
Cluster analysis of all
LINE2 copies revealed a phylogenetic relationship
lacking branches, as if a
single-locus, fast-evolving gene has
steadily spread an
exceptional number of pseudogenes over time
(Supplementary Fig. 6). This
‘master gene’ appearance is, to a lesser
degree, also observed for
LINE1 in eutherians58, but not to the same
extent for MIR/Mon-1 or
other retrotransposons in mammals. The
phylogeny of LINE2 and Mon-1
was also supported by a genomewide
transposition-in-transposition
(TinT) analysis59 (Supplementary
Tables 7 and 8). LINE2
density is similar on all chromosomes
(Supplementary Fig. 7); it
does not correlate with chromosome
length (and recombination
rate) as the CR1 LINE density does in
the chicken genome19, nor is
it higher on sex chromosomes than on
autosomes, as LINE1 density
is in eutherians (which has led to postulations
on a function in dosage
compensation)60.
We compared microsatellites
in the platypus genome with those of
representative vertebrates
(Supplementary Notes 22). The mean
microsatellite coverage of
platypus genomic sequences assembled
into chromosomes is 2.6760.34%;
significantly lower than all other
mammalian genomes sequenced
so far and most similar to that
observed in chicken
(Supplementary Fig. 8). Microsatellites are on
average shorter in platypus
than in other genomes (Supplementary
Table 9), but microsatellite
coverage surpasses chicken owing to very
long tri- and
tetranucleotide repeats (Supplementary Fig. 9). The
platypus has a higher
proportion of microsatellites with high A1T
content, in comparison to
the other vertebrates examined, an abundance
distribution that has more
in common with reptiles than with
mammals (Supplementary Fig.
10).
15. Genomic imprinting. Genomic
imprinting is an epigenetic phenomenon
that results in monoallelic
gene expression. In the vertebrates,
imprinting seems to have
evolved recently and has only been
confirmed in marsupials and
eutherian mammals61,62. The autosomal
localization of some
imprinted orthologues in platypus is known63.
However, we examined the
conservation of synteny and the distribution
of retrotransposed elements
in all orthologous eutherianimprinted
clustered and non-clustered
genes in the platypus genome.
A representative cluster is
shown in Fig. 5 (see also Supplementary
Fig. 12).
Clusters that became
imprinted in therians (with the exception
of the Prader–Willi–Angelman
locus64) have not been assembled
recently and reside in
ancient syntenic mammalian groups, although
some regions have expanded
by mechanisms such as gene duplication
or transposition. There were
significantly fewer LTR and DNA
elements across all platypus
orthologous regions relative to eutherian
imprinted genes (P,0.04 and
0.04, respectively), whereas there was
a significant increase in
the sequences masked by SINEs (P,0.03).
The chicken had fewer total
repeats and no SINEs or sRNAs.
Comparison of all regions in
the platypus with the orthologous
regions in opossum, mouse,
dog and human demonstrates that accumulation
of LTR, DNA elements, and
simple and low complexity
repeats coincides with, and
may be a driving force in, the acquisition
of imprinting in these
regions in therian mammals.
The CpG fraction. The
eutherian and chicken genomes generally
average around 41% G1C
content, although many intervals differ
substantially from the
average, particularly in humans (Supplementary
Notes 23). In contrast, the
platypus genome averages
45.5% G1C content and rarely
deviates far from the average. The
opossum genome averages only
38% G1C content and also has a
narrow distribution
(Supplementary Fig. 13). The source of the elevated
G1C fraction in platypus
remains unclear. It is explained only
in part by monotreme
interspersed repeat elements, as platypus DNA
outside of known
interspersed repeats is 44.7% G1C. Furthermore,
tandem repeats of short DNA
motifs (microsatellites) in platypus
show an A1T bias, as with
other mammals. Recombination-driven
biased gene conversion may
be a factor, in agreement with what has
been shown for eutherians65 and
marsupials66. This is suggested by
the observation that the six
platypus chromosomes where the currently
mapped DNA sequence averages
over 45% G1C content (that
is, 17, 20, 15, 14, 10 and
11 in order of decreasing G1C fraction) are
among the 10 shortest
(Supplementary Fig. 14), because short chromosomes
have a higher recombination rate67. However, a
direct test
is currently lacking because
platypus recombination rates have not
been measured. A further
examination of the CpG fraction, that
associated with promoter
elements, is found in Supplementary
Notes 24 and Supplementary
Fig. 15.
Conclusions
The egg-laying platypus is a
remarkable species with many biological
features unique among
mammals. Our sequencing of the
platypus genome now enables
us to compare its sequence characteristics
and organization with those
of birds and therian mammals
in order to address the
questions of platypus biology and to
date the emergence of
mammalian traits. We report here that
sequence characteristics of
the platypus genome show features of
reptiles as well as mammals.
Platypus contains a largely
standard repertoire of non-proteincoding,
ncRNAs, except for the
snoRNAs, which exhibit a marked
expansion associated with at
least one retrotransposed subfamily.
Some of these
retrotransposed snoRNAs are expressed and thus
may have functional roles.
The platypus has fully elaborated
piRNA and miRNA pathways,
the latter including many monotreme-
specific miRNAs and miRNAs
that are shared with either
mammals or chickens. Many
functional assessments of these novel
miRNAs remain to be carried
out and will surely add to our knowledge
of mammalian miRNA
evolution.
The 18,527 protein-coding
genes predicted from the platypus
assembly fall within the
range for therian genomes. Of particular
interest are families of
genes involved in biology that links
monotremes to reptiles, such
as egg-laying, vision and envenomation,
as well as mammal-specific
characters such as lactation,
characters shared with
marsupials such as antibacterial proteins,
and platypus-specific
characters such as venom delivery and underwater
foraging. For instance,
anatomical adaptations for chemoreception
during underwater foraging
are reflected in an unusually
large repertoire of
vomeronasal type 1 receptor genes. However,
the repertoire of milk
protein genes is typically mammalian, and
the arrangement of milk
protein genes seems to have been preserved
since the last common
ancestor of monotremes and therian
mammals.
Since its initial
description, the platypus has stood out as a species
with a blend of reptilian
and mammalian features, which is a characteristic
that penetrates to the level
of the genome sequence. The
density and distribution of
repetitive sequence, for example, reflects
this fact. The high
frequency of interspersed repeats in the platypus
genome, although typical for
mammalian genomes, is in contrast
with the observed mean
microsatellite coverage, which appears more
reptilian. Additionally, the
correlation of parent-of-origin-specific
expression patterns in
regions of reduced interspersed repeats in
the platypus suggests that
the evolution of imprinting in therians is
linked to the accumulation
of repetitive elements.
We find that the mixture of
reptilian, mammalian and unique
characteristics of the
platypus genome provides many clues to the
function and evolution of
all mammalian genomes. The wealth of
new findings and
confirmation of existing knowledge immediately
evident from the release of these data promise
that the availability of
the platypus genome sequence
will provide the critically needed
background to inspire rapid
advances in other investigations of
mammalian biology and evolution.