API Reference
Types
PaperFetch.BibEntry — Type
BibEntry(key, type, fields)Stable internal representation of one BibTeX entry.
fields stores lower-case BibTeX-style field names mapped to string values. The input file is never edited; BibEntry is only an analysis view.
Example
entry = BibEntry("smith2020", "article", Dict("doi" => "10.1000/example"))
entry.keyPaperFetch.WorkIdentifier — Type
WorkIdentifier(kind, value)A normalized identifier extracted from a bibliography entry.
kind is one of :doi, :isbn, :url, :arxiv, :pmid, or :openalex. value is the normalized identifier string.
Example
id = WorkIdentifier(:doi, "10.1000/example")
id.kind == :doiPaperFetch.CandidateSource — Type
CandidateSource(record, identifier)A possible authority for a bibliography entry, recording which WorkIdentifier was used to find it.
Example
r = SourceRecord(provider="test", id="x")
id = WorkIdentifier(:doi, "10.1000/x")
cs = CandidateSource(r, id)
cs.identifier.kind == :doiPaperFetch.SourceRecord — Type
SourceRecord(; provider, id="", title=nothing, authors=String[], year=nothing, doi=nothing,
url=nothing, journal=nothing, pages=nothing, publisher=nothing, pdf_url=nothing,
raw=Dict{String,Any}())Metadata about a work returned by an API, fixture, or landing-page adapter.
authors stores the creator list returned by the provider. For edited books and book chapters this may be compared with a BibTeX editor field when the entry has no author field.
Example
source = SourceRecord(provider="fixture", doi="10.1000/example", title="Example")
source.providerPaperFetch.FieldComparison — Type
FieldComparison(field, status, input, source, note)Result of comparing one bibliography field with source metadata.
status is one of :exact, :normalized, :equivalent, :missing_input, :missing_source, :conflict, or :ambiguous.
Example
cmp = FieldComparison("doi", :exact, "10.1000/x", "10.1000/x", "same DOI")
cmp.statusPaperFetch.EntryReport — Type
EntryReport(entry, sources, comparisons, confidence, notes, pdf_candidates)Review result for a single bibliography entry.
notes contains entry-level diagnostics such as provider errors, discarded candidate sources, and warnings about self-comparison fallback. Field-level diagnostics live in comparisons.
Example
entry = BibEntry("x", "misc", Dict("title" => "Example"))
report = EntryReport(entry, SourceRecord[], FieldComparison[], 0.0, ["no source"], String[])
report.entry.keyPaperFetch.FetchResult — Type
FetchResult(key, status, file, source_url, final_url, note, sha256, bytes)Manifest record for one PDF fetch attempt.
Example
result = FetchResult("x", "skipped", nothing, nothing, nothing, "no PDF", nothing, 0)
result.statusInput And Normalization
PaperFetch.read_bibtex — Function
read_bibtex(path; check=:warn)Read a BibTeX file into BibEntry values using BibParser.jl.
Entries are returned sorted by key for stable, reproducible ordering.
Example
entries = read_bibtex("examples/01_exact_article.bib"; check=:none)
length(entries) >= 1PaperFetch.read_items — Function
read_items(path; check=:warn)Read bibliography input. BibTeX files are parsed with BibParser; plain text files are interpreted as one DOI or URL per non-comment line.
Item keys for plain-text input are item1, item2, … in line order, skipping blank lines and comments.
Example
items = read_items("examples/11_plain_dois.txt"; check=:none)
length(items) == 2PaperFetch.extract_identifiers — Function
extract_identifiers(entry)Extract normalized WorkIdentifier values from a BibEntry.
Checks for DOI, arXiv eprint (when archiveprefix is arXiv), ISBN, PMID, and URL fields in that priority order. DOI-like strings, arXiv identifiers, and URLs are also recovered from common misplaced fields such as note, howpublished, and LaTeX \url{...} macros.
Example
entry = BibEntry("x", "article", Dict("doi" => "10.1000/example"))
ids = extract_identifiers(entry)
ids[1].kind == :doiPaperFetch.normalize_doi — Function
normalize_doi(value)Normalize a DOI to a lower-case bare DOI string.
Example
normalize_doi("https://doi.org/10.1000/ABC") == "10.1000/abc"PaperFetch.normalize_url — Function
normalize_url(value)Normalize a URL for tolerant comparison.
DOI resolver URLs (https://doi.org/10.x/y, https://dx.doi.org/10.x/y) are canonicalized to doi:<normalized-doi>. Other HTTP(S) URLs have their scheme stripped, host lowercased, default port removed, and trailing slashes or punctuation removed. URL paths and queries keep their original case because many web servers treat those components as case-sensitive.
Example
normalize_url("https://doi.org/10.1000/ABC") == "doi:10.1000/abc"
normalize_url("https://doi.org/10.1000/abc") == normalize_url("https://dx.doi.org/10.1000/ABC")
normalize_url("https://example.org/") == "example.org"
normalize_url("https://Example.org/Data/File.pdf?ID=ABC") == "example.org/Data/File.pdf?ID=ABC"PaperFetch.normalize_text — Function
normalize_text(value)Normalize bibliographic text for tolerant comparison.
Removes BibTeX braces and LaTeX accent commands, applies Unicode normalization (NFD + stripmark), lowercases, and collapses punctuation and whitespace.
Example
normalize_text("{Caf\'e} Data") == "cafe data"Checking
PaperFetch.compare_entry — Function
compare_entry(entry, sources; fields=nothing)Compare one BibEntry with candidate source records and return an EntryReport.
By default, proceedings and chapter-style entries compare their container as booktitle; articles compare journal. Books and chapter-style entries with an editor but no author compare editor as the creator field.
The comparison is tolerant for bibliographic formatting, but conflicts are still reported explicitly. DOI values must match after DOI normalization. Author and editor names use the same normalization, including accents and initials.
Example
entry = BibEntry("x", "article", Dict("doi" => "10.1000/x", "title" => "A"))
source = SourceRecord(provider="fixture", doi="10.1000/x", title="A")
compare_entry(entry, [source]).confidence == 1.0book = BibEntry("edited", "book", Dict("editor" => "Example, Erin", "title" => "Edited"))
src = SourceRecord(provider="fixture", title="Edited", authors=["Erin Example"])
any(cmp -> cmp.field == "editor", compare_entry(book, [src]).comparisons)PaperFetch.check_bibliography — Function
check_bibliography(path; providers=AbstractProvider[], fixture=nothing,
email="noreply@example.org", use_apis=false,
cache_dir=nothing, rate_limit_seconds=0.05,
ignore_keys=Set(["anon"]), check=:warn)Read a bibliography, collect source metadata, and return one EntryReport per entry.
The input file is not edited. Reports preserve the original BibTeX keys and are intended to guide a human or a separate editing step.
Provider selection order:
- A
FixtureProvideris added whenfixtureis set. - Explicitly supplied
providersare appended. - An
ApiProvideris added whenuse_apis=true. It can query Crossref, OpenAlex, Unpaywall, DataCite, arXiv, Semantic Scholar, PubMed, CORE, Figshare, Open Library, Google Books, and URL landing pages as appropriate. - If still empty, a
CandidateProvideris used as a read-only fallback that only echoes each entry's own title/doi/url back as its "source". This cannot detect an incorrect doi, title, or author. A@warnis emitted when this fallback is used, and affectedEntryReports carry a matching note.
Set cache_dir to a directory path to cache API responses between runs. Set rate_limit_seconds to the minimum delay between uncached live API requests made by the default ApiProvider. Set ignore_keys=nothing to keep all entries, including review artifacts such as anon.
Identifier recovery is deliberately forgiving: DOI, arXiv, PMID, ISBN, and URL values can be extracted from standard fields and common misplaced fields such as note and howpublished. Later comparison remains explicit about conflicts.
Example
reports = check_bibliography("examples/01_exact_article.bib";
fixture="examples/metadata_fixture.json", check=:none)
length(reports) == 1Reports And Fetching
PaperFetch.write_reports — Function
write_reports(reports, outdir; basename="paperfetch_report")Write Markdown and INC reports for reports.
The default basename is paperfetch_report for direct API calls. CLI-generated reports use the input file stem unless --report-basename is supplied. Pass basename explicitly when a different output name is needed.
Markdown reports include entry-level general flags and field-level comparison flags. INC reports contain one row per compared field, or one red no_comparison row when no source comparison was possible.
Example
entry = BibEntry("x", "misc", Dict("title" => "Example"))
report = EntryReport(entry, SourceRecord[], FieldComparison[], 0.0, String[], String[])
paths = write_reports([report], mktempdir())
haskey(paths, :markdown) && haskey(paths, :inc)PaperFetch.fetch_pdfs — Function
fetch_pdfs(reports, outdir; cookie_file=nothing, ezproxy=nothing)Download PDF candidates from reports and write INC and Markdown manifests.
Only explicit PDF candidate URLs are attempted. Missing PDFs are recorded as skipped, not as validation failures. The function returns the fetch results and the path to manifest.inc; manifest.md is written in the same directory for human review.
Example
entry = BibEntry("x", "misc", Dict("title" => "No PDF"))
report = EntryReport(entry, SourceRecord[], FieldComparison[], 0.0, String[], String[])
results, manifest = fetch_pdfs([report], mktempdir())
results[1].status == "skipped" && basename(manifest) == "manifest.inc"Command Line
PaperFetch.main — Function
main(args=ARGS)Command-line entry point.
check writes Markdown and INC reports. fetch also writes manifest.inc, manifest.md, and any successfully downloaded PDFs. Report basenames default to the input file stem unless --report-basename is supplied.
Example
PaperFetch.main(["check", "examples/01_exact_article.bib", "--fixture", "examples/metadata_fixture.json", "--outdir", mktempdir()])