Two concerns share one chapter because they share one source of truth: the
content tree, public_origin, and the supported-language list. Every page
emits per-language metadata in <head> for crawlers fetching that URL, and a
background writer drops a sitemap.xml that agrees with what those crawlers
see.
The SEO block lives in templates/base/base.html, inside a single
-[ if canonical ]- guard so the entire group either renders or vanishes.
In order of emission:
<link rel="canonical"> — the bare URL for this resource.<link rel="alternate" hreflang="…"> — one per supported language, plus
x-default. Driven by the alternates template var.<meta property="og:type"> (website), og:title, og:description,
og:url (same value as canonical), og:site_name, og:locale.<meta property="og:locale:alternate"> — one per alternate, filtered by
alt["is_alt"] so the current language isn’t duplicated.<meta name="twitter:card"> (summary_large_image), twitter:title,
twitter:description.<meta property="og:image"> and <meta name="twitter:image">, both
gated on -[ if og_image ]- so a resource with no usable image just
omits the pair instead of emitting an empty attribute.<script type="application/ld+json"> carrying a BreadcrumbList,
gated on -[ if breadcrumb_jsonld ]-.The single -[ if canonical ]- block wraps everything from canonical
through twitter:image. When public_origin is unset, canonical_url in
src/content.rs returns an empty string, so the whole block is suppressed
rather than emitting tags with garbage URLs. og:image / twitter:image
have their own inner -[ if og_image ]- because a page may have a valid
canonical without a usable image. The JSON-LD <script> is a separate
outer guard — breadcrumbs work fine without a public origin, just with
relative item URLs.
All of these live in src/content.rs:
canonical_url(path) — format!("{}{}", origin, path) or "" when
origin is empty. The empty return is the off-switch for the entire SEO
block.og_image_for(header, resource_path, lang) — precedence: Header.banner
for the current language, then Header.icon if absolute (/ or
http-prefixed), then default_og_image. Relative values are resolved
against public_origin; if no absolute URL can be formed, returns None
and the template suppresses the tag.hreflang_alternates(path, current_lang) — returns an empty list when
public_origin is empty. Otherwise emits one entry per supported
language. The first entry in SUPPORT_LANGS is treated as the default
language and gets the bare URL <origin><path>; every other language
gets <origin>/<code><path>. Each entry carries is_alt: code != current_lang so the og:locale:alternate loop can skip the active
language. Finally, an x-default entry pointing at the bare URL is
appended.breadcrumb_jsonld(breadcrumb) — flattens a path_value()-shaped array
into {"@context": "https://schema.org", "@type": "BreadcrumbList", "itemListElement": [...]} with position, name, and item per step.html_escape covers &, <, >, ", '. It runs over every
template-bound string (canonical, alternates’ href, og:image, breadcrumb
names) before akari sees it — akari does not auto-escape.
breadcrumb_jsonld does its own escaping with a nested json_escape
that additionally handles < → <, > → >, & → &,
and U+2028 / U+2029. The HTML chars matter because the JSON-LD body sits
inside <script type="application/ld+json">: a breadcrumb name
containing </script> would otherwise break out of the script element.
U+2028 / U+2029 are line separators that historically break inline JS
strings — belt-and-suspenders against future inline-JS rendering. The
breadcrumb_html_escape copy used for the visible breadcrumb <ol> is
kept separate; passing the HTML-escaped copy into breadcrumb_jsonld
would double-escape.
src/routes/sitemap.rs runs a background tokio task spawned from
main.rs via routes::sitemap::spawn_writer(). The task:
programfiles/op/sitemap.xml (the SITEMAP_FILE
const) into the in-process CACHE, so the public endpoint always has
something to serve during the first build.refresh() immediately, then loops every REGEN_INTERVAL_SECS
(3600 seconds = one hour).CACHE is Lazy<RwLock<Arc<String>>> so the /sitemap.xml endpoint can
read-lock briefly, clone the inner Arc, drop the lock, and respond.
Under a request burst the per-request cost is bounded to one Arc clone
plus one String clone for the body — no filesystem read, no content-tree
walk.
persist writes to SITEMAP_FILE.tmp and then renames over the real
path. Readers (either the writer’s own preload on next restart, or an
admin running cat) never see a partially-written XML file.
render_url_entryEach <url> block contains a <loc> for the default-language URL and
one <xhtml:link rel="alternate" hreflang="…"> per supported language,
followed by an x-default link. Default-lang URLs are bare; non-default
URLs carry the /<code> prefix. This mirrors hreflang_alternates so
that what crawlers see in a page’s <head> agrees with what they see in
the sitemap. Paths come from collect_paths, which walks
programfiles/content/ and emits every folder that has a ctx.json —
except type = "link" folders, which 302-redirect off-site and aren’t
worth indexing.
robots.txtprogramfiles/op/robots.txt is checked-in static content, not generated.
It’s a minimal allow-all (User-agent: * / Disallow:) plus a Sitemap:
line pointing at /sitemap.xml on the running origin. Update it by hand
when the public origin changes; it’s served as-is.
public_origin — gates the whole SEO block, drives og:url and
canonical, and prefixes every sitemap entry. When unset, both the
per-page SEO block and the sitemap body go away (build_xml returns an
empty string).default_og_image — fallback used by og_image_for when no banner /
absolute icon is configured.See Language system for the language-prefix scheme used by
hreflang and sitemap alternates, Backup writer for the parallel
writer / atomic-write pattern, and Common: Header and LangDict
for how og:image, name, and desc resolve per language.