Data Sources & Licenses
Every piece of data on SENTINEL is credited. This page lists every upstream source, its license, and how we attribute it.
Our attribution policy
We follow each source's license requirements to the letter. Every page that uses data from a source includes a visible citation, a link to the original, and the license name. Where a source requires share-alike (such as OpenStreetMap or Wikipedia), we either do not derive content from it or we release the derivative under the matching license. Where a source is public domain, we still credit it voluntarily — because transparency is the whole point.
Active sources
GeoNames
Used for: Places module — 854,104 cities, towns, and administrative areas worldwide
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Homepage: geonames.org
Attribution: Every place page on SENTINEL links back to the GeoNames record and credits GeoNames in the page footer.
Project Gutenberg
Used for: Books module — 76,883 classic works in the public domain (US)
License: Public domain in the United States, with the Project Gutenberg trademark policy observed
Homepage: gutenberg.org
Attribution: Every book page links to the original Gutenberg record and includes the Gutenberg ID, author, original publication date, and a "Read on Project Gutenberg" outbound link.
DMOZ / Open Directory Project (historical)
Used for: Seed taxonomy for the directory module
License: CC BY 2.0 (last snapshot 2017, DMOZ closed that year)
Attribution: The directory module honors the spirit of DMOZ as a human-curated web catalog and credits it as the conceptual ancestor.
Planned sources (not yet ingested)
OpenStreetMap POIs
Planned for: Places enrichment — points of interest around cities
License: Open Database License (ODbL) — share-alike applies to the database
Status: Not yet ingested. Will be added with full ODbL-compliant attribution.
USPTO Bulk Patent Data
Planned for: Patents module — 11 million granted US patents
License: Public domain (US Government work)
Status: Planned
OpenAlex
Planned for: Academic papers and authors
License: CC0 1.0 Universal (public domain dedication)
Status: Planned
PubMed
Planned for: Medical research abstracts
License: Generally free for re-use; per-publisher restrictions observed
Status: Planned
UK Companies House
Planned for: Company profiles (UK)
License: Open Government License v3.0
Status: Planned
Sources we will not use
The following types of content are excluded from SENTINEL by policy, regardless of availability:
- Copyrighted news articles and paywalled journalism
- Song lyrics and music metadata
- Movie and TV synopses from proprietary databases
- Recipes sourced from food blogs
- Any content scraped from Reddit, Twitter/X, Quora, or Stack Overflow in violation of their terms of service
- Any medical, legal, or financial advice (YMYL categories)
- Any content under a license incompatible with our redistribution model
Want to suggest a source?
If you know of a permissively-licensed dataset that belongs on SENTINEL, email [email protected] with the source URL and license. We review suggestions monthly.