operations-dumps-dcat

Mirror of https://gerrit.wikimedia.org/g/operations/dumps/dcat. See https://www.mediawiki.org/wiki/Developer_access for contributing

MIT License

Stars
4

DCAT-AP for Wikibase

A project aimed at generating a DCAT-AP document for Wikibase installations in general and Wikidata in particular.

Takes into account access through:

  • Content negotiation (various formats)
  • MediaWiki api (various formats)
  • Entity dumps e.g. json, ttl (assumes that these are compressed)

An example result can be found at lokal-profil / dcatap.rdf. The live DCAT-AP description of Wikidata can be found here.

To use

  1. Copy config.example.json to config.json and change the contents
    to match your installation. Refer to the Config section below for
    an explanation of the individual configuration parameters.
  2. Copy catalog.example.json to a suitable place (e.g. on-wiki) and
    update the translations to fit your wikibase installation. Set this
    value as catalog-i18n in the config file.
  3. Create the dcatap.rdf file by running php DCAT.php or
    php DCAT.php --config="<path_1>" --dumpDir="<path_2>" --outputDir="<path_3>"
    where each of the options is optional and can be left out.
    The options are:
    1. --config is the relative path to the json file containing the
      configurations, defaults to ./config.json
    2. --dumpDir is the relative path to the directory containing the
      dumps (if any), defaults to the directory parameter in the
      config file
    3. --outputDir is the relative path to the directory where the
      dcatap.rdf file should be created, defaults to the directory
      parameter in the config file

Translations

  • Translations which are generic to the tool are handled by Intuition
    and should be translated through translatewiki.net.
  • Translations which are specific to a project/catalog are added to
    the location specified in the catalog-i18n parameter of the config
    file.
  • To lint translations files: npm install && npm test.

Linting

We use various utilities to lint this repository. You would first want to get the dependencies:

composer install
npm install

Then run the tests:

composer test
npm test

Config

Below follows a key by key explanation of the config file.

  • directory: Relative path to the directory containing the dump
    subcategories (if any) and for the final dcat file.
  • api-enabled: (Boolean) Is API access activated for the MediaWiki
    installation?
  • dumps-enabled: (Boolean) Is JSON dump generation activated for the
    WikiBase installation?
  • uri: URL used as basis for rdf identifiers,
    e.g. http://www.example.org/about
  • catalog-homepage: URL for the homepage of the WikiBase installation,
    e.g. http://www.example.org
  • catalog-issued: ISO date at which the WikiBase installation was
    first issued, e.g. 2000-12-24
  • catalog-license: License of the catalog, i.e. of the dcat file
    itself (not the contents of the WikiBase installation),
    e.g. http://creativecommons.org/publicdomain/zero/1.0/
  • catalog-i18n: URL or path to json file containing i18n strings for
    catalog title and description. Can be an on-wiki page,
    e.g. https://www.example.org/w/index.php?title=MediaWiki:DCAT.json&action=raw
  • keywords: (array) List of keywords applicable to all of the datasets
  • themes: (array) List of thematic ids in accordance with
    Eurovoc, e.g. 2191 for
    http://eurovoc.europa.eu/2191
  • publisher:
    • name: Name of the publisher
    • homepage: URL for or the homepage of the publisher
    • email: Contact e-mail for the publisher, should be a function
      address, e.g. [email protected]
    • publisherType: Publisher type according to ADMS,
      e.g. NonProfitOrganisation
  • contactPoint:
    • name: Name of the contact point
    • email: E-mail for the contact point, should ideally be a
      function address, e.g. [email protected]
    • vcardType: Type of contact point, either Organization or
      Individual
  • ld-info:
  • api-info:
    • accessURL: URL to the MediaWiki API endpoint of the wiki,
      e.g. http://www.example.org/w/api.php
    • mediatype: (object) List of non-deprecated formats available
      thorough the API, see ld-info:mediatype above for formatting
    • license: See ld-info:license above
  • dump-info:
    • accessURL: URL to the directory where the .json.gz files
      reside ($1 is replaced on the fly by the actual filename),
      e.g. http://example.org/dumps/$1
    • mediatype: List of fileformat:media-type pairs where media-type is
      either an IANA media types
      or an object containing one or more of the following keys:
      • "contentType": IANA media types
      • "prefix": prefixed used in the filename. Defaults to "all" if not
        specified.
      • "format": overrides the fileformat useda as key for the list. The
        list key is still used for i18n description.
        Examples:
        "json": "application/json" or
        "truthy-nt": {"contentType": "application/n-triples", "prefix": "truthy-BETA", "format": "nt"}.
    • compression: (object) List of compression formats, in the
      format name:fileformat e.g. {"gzip": "gz"}
    • license: See ld-info:license above
Related Projects