Mirror of https://gerrit.wikimedia.org/g/operations/dumps/dcat. See https://www.mediawiki.org/wiki/Developer_access for contributing
MIT License
A project aimed at generating a DCAT-AP document for Wikibase installations in general and Wikidata in particular.
Takes into account access through:
An example result can be found at lokal-profil / dcatap.rdf. The live DCAT-AP description of Wikidata can be found here.
config.example.json
to config.json
and change the contentscatalog.example.json
to a suitable place (e.g. on-wiki) andcatalog-i18n
in the config file.php DCAT.php
orphp DCAT.php --config="<path_1>" --dumpDir="<path_2>" --outputDir="<path_3>"
--config
is the relative path to the json file containing the./config.json
--dumpDir
is the relative path to the directory containing thedirectory
parameter in the--outputDir
is the relative path to the directory where thedcatap.rdf
file should be created, defaults to the directory
catalog-i18n
parameter of the configWe use various utilities to lint this repository. You would first want to get the dependencies:
composer install
npm install
Then run the tests:
composer test
npm test
Below follows a key by key explanation of the config file.
directory
: Relative path to the directory containing the dumpapi-enabled
: (Boolean
) Is API access activated for the MediaWikidumps-enabled
: (Boolean
) Is JSON dump generation activated for theuri
: URL used as basis for rdf identifiers,catalog-homepage
: URL for the homepage of the WikiBase installation,catalog-issued
: ISO date at which the WikiBase installation wascatalog-license
: License of the catalog, i.e. of the dcat filecatalog-i18n
: URL or path to json file containing i18n strings forkeywords
: (array
) List of keywords applicable to all of the datasetsthemes
: (array
) List of thematic ids in accordance withpublisher
:
name
: Name of the publisherhomepage
: URL for or the homepage of the publisheremail
: Contact e-mail for the publisher, should be a functionpublisherType
: Publisher type according to ADMS,contactPoint
:
name
: Name of the contact pointemail
: E-mail for the contact point, should ideally be avcardType
: Type of contact point, either Organization
orIndividual
ld-info
:
accessURL
: URL to the content negotiation endpoint of themediatype
: (object
) List of IANA media typeslicense
: License of the data in the distribution, e.g.api-info
:
accessURL
: URL to the MediaWiki API endpoint of the wiki,mediatype
: (object
) List of non-deprecated formats availablelicense
: See ld-info:license abovedump-info
:
accessURL
: URL to the directory where the .json.gz files$1
is replaced on the fly by the actual filename),mediatype
: List of fileformat:media-type pairs where media-type isobject
containing one or more of the following keys:
"contentType"
: IANA media types
"prefix"
: prefixed used in the filename. Defaults to "all" if not"format"
: overrides the fileformat useda as key for the list. The"json": "application/json"
or"truthy-nt": {"contentType": "application/n-triples", "prefix": "truthy-BETA", "format": "nt"}
.compression
: (object
) List of compression formats, in the{"gzip": "gz"}
license
: See ld-info:license above