1. Jena Fuseki SPARQL Endpoint

1.1. Installing Fuseki

Additional documentation can be found at: http://jena.apache.org/documentation/serving_data/index.html

Download and extract the latest version of the Jena Fuseki sparql endpoint from http://www.apache.org/dist/jena/binaries/ (look for the latest version of jena-fuseki-0.x.x-distribution.tar.gz).

Move the extracted Fuseki directory to the installation directory /usr/share/fuseki/jena-fuseki-0.2.5:

$ tar -xzvf jena-fuseki-0.2.5-distribution.tar.gz $ sudo mkdir /usr/share/fuseki $ sudo mv jena-fuseki-0.2.5 /usr/share/fuseki

Download the example assembler config (using TDB as the triple store) from the sourceforge (http://sourceforge.net/projects/filteredpush/files/Release_1/misc/tdb-assembler.ttl) and place it in the Fuseki directory you just extracted to:

$ wget http://sourceforge.net/projects/filteredpush/files/Release_1/misc/tdb-assembler.ttl $ sudo cp tdb-assembler.ttl /usr/share/fuseki/jena-fuseki-0.2.5/

The fuseki startup script can be found in the FP-SCAN configuration projects. Download this file from sourceforge (http://sourceforge.net/projects/filteredpush/files/Release_1/misc/fuseki) and, if necessary, edit the file so that FUSEKI_HOME is set to the directory you installed fuseki to. Also make it executable via chmod and use update-rc.d to install it as a startup script.

NOTE: On Debian I reveived the following warning: insserv: warning: script 'fuseki' missing insserv: warning: script 'fuseki' missing LSB tags and overrides and overrides. If you want to add the LSB tags that the warning refers to you can find more info here http://wiki.debian.org/LSBInitScripts/

$ sudo cp /etc/filteredpush/scripts/fuseki /etc/init.d/ $ sudo chmod +x /etc/init.d/fuseki $sudo update-rc.d fuseki defaults

Start fuseki and visit http://localhost:3030/ in your browser:

/etc/init.d/fuseki start

1.2. SPARQL Query Primer

If you need to execute SPARQL queries on Fuseki manaully you can do so from the web interface. First select the Control Panel link on the main page (under Server Management) and select the "/AnnotationStore" dataset (the dataset is supplied as an argument when the startup script in /etc/init.d starts fuseki and the files are stored in the directory configured in the tdb-assembler.ttl file in FUSEKI_HOME). On the page that follows, you can launch sparql queries, updates or upload rdf/xml. Some useful queries are listed below.

In the query textarea:

1) SELECT * {?s ?p ?o} (select all triples currently in the default un-named graph) 2) SELECT * WHERE { GRAPH ?g {?s ?p ?o} } LIMIT 10 (select all triples in any named graph besides those in the default graph)

in the update textarea:

1) CLEAR ALL (clear all triples from the triplestore) 2) CLEAR GRAPH <name of graph here> (clear all triples in any given named graph) 3) LOAD <http: //yourhost/data/somefile.rdf> (load all the triples from the rdf file hosted at the address specified)

For the queries, selecting XML from the Output dropdown on the form will probably give the most human readable results. Additionally, you may specify an xslt file for styling the xml result for browser viewing (a value of /xml-to-html.xsl is supplied by default, this file is located in your FUSEKI_HOME/pages directory).

It may also be useful to enable logging to a file in the log4j.properties for fuseki. If you wish to do this now, uncomment the following lines in the log4j.properties file (located at /usr/share/fuseki/jena-fuseki-0.2.5/log4j.properties):

log4j.rootLogger=INFO, FusekiFileLog ... log4j.appender.FusekiFileLog=org.apache.log4j.DailyRollingFileAppender log4j.appender.FusekiFileLog.DatePattern='.'yyyy-MM-dd log4j.appender.FusekiFileLog.File=logs/fuseki-log log4j.appender.FusekiFileLog.layout=org.apache.log4j.PatternLayout log4j.appender.FusekiFileLog.layout.ConversionPattern=%d{HH:mm:ss}%-5p %-20c{1} :: %m%n

Create the logs directory in /usr/share/fuseki/jena-fuseki-0.2.5:

$ sudo mkdir /usr/share/fuseki/jena-fuseki-0.2.5/logs

1.3 Securing Fuseki

The deployment and configuration steps presented in the previous sections do not setup access restrictions on the Fuseki triplestore. Both the update and query endpoints are exposed to external clients. If you want to restrict access to fuseki from outside of localhost you can configure Fuseki (which is running in a bootstrapped Jetty servlet container) to only accept connections from localhost on a specified port. Provided that SparqlPuSH is installed on the same host, only SparqlPuSH will have access to the triplestore. External clients must be authenticated by SparqlPuSH before the updates are made to the triplestore.

If you would like to open access to the Fuseki query endpoint (i.e. http://localhost:3030/AnnotationStore/query) but not upload or updates you can configure mod_proxy in apache to allow access to the endpoints selectively. The steps for this are detailed below:

Obtain the jetty config for Fuseki from our sourceforge and put this in the FUSEKI_HOME directory (/usr/share/fuseki/jena-fuseki-0.2.5/). Make edits if necessary but the defaults should restrict access to localhost on port 3030:

$ sudo cp /etc/filteredpush/sparql/fuseki/jetty.xml /usr/share/fuseki/jena-fuseki-0.2.5/

More detailed info about the jetty configuration can be found at http://wiki.eclipse.org/Jetty/Reference/jetty.xml

Stop the Fuseki server if it is currently running and edit the startup script (/etc/init.d/fuseki). Add the --jetty-config=jetty.xml option to the invocation of fuseki-server

./fuseki-server --jetty-config=jetty.xml --desc=tdb-assembler.ttl --update /AnnotationStore 2>&1 &

Restart fuseki and confirm that external access is not possible (http://<hostname>:3030).</hostname>

If you only wish for applications running on localhost to invoke the fuseki (via query, upload, update) then you can stop here. Otherwise, if you would like to expose only certain endpoints while restricting others (for example you want to allow queries from the outside but no changes/additions to data), you can follow the rest of the configuration steps below.

The following steps are based on the documentation found at http://wiki.eclipse.org/Jetty/Tutorial/Apache

First enable mod_proxy in apache via the following:

$ sudo a2enmod proxy_http

This should have created a proxy.conf file in /etc/apache2/mods-enabled. This file contains some default configuration, edit it and replace everything within <IfModule>...</IfModule> with the following:

# Turn off support for true Proxy behaviour as we are acting as # a transparent proxy ProxyRequests Off # Turn off VIA header as we know where the requests are proxied ProxyVia Off # Turn on Host header preservation so that the servlet container # can write links with the correct host and rewriting can be avoided. ProxyPreserveHost On # Set the permissions for the proxy <Proxy *> AddDefaultCharset off Order deny,allow Allow from all </Proxy> # Turn on Proxy status reporting at /status # This should be better protected than: Allow from all ProxyStatus On <Location /status> SetHandler server-status Order Deny,Allow Allow from all </Location>

Next, add the following line to the default site in /etc/apache2/sites-available and make sure it is enabled (via a2ensite default):

ProxyPass /fuseki/AnnotationStore/query (if your dataset is not named AnnotationStore, replace this in both urls)

Restart apache and check that outside access to the query interface is enabled (http://<hostname>/fuseki/AnnotationStore/query should give an error from fuseki since no query is specified in the request. If you get an error from apache instead this means that it is still not accessible)

With outside access no longer possible you may not be able to get at the Fuseki control panel to test queries and updates. You could run the ruby scripts (located in your FUSEKI_HOME) provided by the Jena Fuseki project on the localhost instead (see http://jena.apache.org/documentation/serving_data/index.html#script-control for how to use the scripts. also see http://jena.apache.org/documentation/serving_data/soh.html for more info.)

Prev | Next - Deploy SparqlPuSH