1. Prerequisites

The following instructions were tested on a clean install of Debian 6.0.5 "squeeze" and Ubuntu 12.04 with the following packages installed vi apt-get install (also serves as a list of prerequisites)

If you are installing (or plan to install at a later date) the FP-Medium deployment, checkout and deploy the FP-Medium-SCAN configuration to /etc/filteredpush. This configuration will also work for FP-Lite (FP-Lite-SCAN is a subset of the FP-Medium-SCAN config).

$ sudo svn checkout svn://svn.code.sf.net/p/filteredpush/svn/trunk/FP-Configuration/FP-Medium-SCAN/config/filteredpush /etc/filteredpush

Otherwise, if you are only planning on deploying the light-weight annotation system (FP-Lite), checkout and deploy the FP-Lite-SCAN configuration:

$ sudo svn checkout svn://svn.code.sf.net/p/filteredpush/svn/trunk/FP-Configuration/FP-Lite-SCAN/config/filteredpush /etc/filteredpush

The next step in the FP-Lite deployment is to install the sparql endpoint and triplestore for storing the annotations. We are using the Fuseki sparql endpoint configured to use Jena TDB as the triplestore.

2. Jena Fuseki SPARQL Endpoint

2.1. Installing Fuseki

Additional documentation can be found at: http://jena.apache.org/documentation/serving_data/index.html

Download and extract the latest version of the Jena Fuseki sparql endpoint from http://www.apache.org/dist/jena/binaries/ (look for the latest version of jena-fuseki-0.x.x-distribution.tar.gz).

Move the extracted Fuseki directory to the installation directory /usr/share/fuseki/jena-fuseki-0.2.5:

$ tar -xzvf jena-fuseki-0.2.5-distribution.tar.gz $ sudo mkdir /usr/share/fuseki $ sudo mv jena-fuseki-0.2.5 /usr/share/fuseki

Download the example assembler config (using TDB as the triple store) from the sourceforge (http://sourceforge.net/projects/filteredpush/files/Release_1/misc/tdb-assembler.ttl) and place it in the Fuseki directory you just extracted to:

$ wget http://sourceforge.net/projects/filteredpush/files/Release_1/misc/tdb-assembler.ttl $ sudo cp tdb-assembler.ttl /usr/share/fuseki/jena-fuseki-0.2.5/

The fuseki startup script can be found in the FP-SCAN configuration projects. Download this file from sourceforge (http://sourceforge.net/projects/filteredpush/files/Release_1/misc/fuseki) and, if necessary, edit the file so that FUSEKI_HOME is set to the directory you installed fuseki to. Also make it executable via chmod and use update-rc.d to install it as a startup script.

NOTE: On Debian I reveived the following warning: insserv: warning: script 'fuseki' missing insserv: warning: script 'fuseki' missing LSB tags and overrides and overrides. If you want to add the LSB tags that the warning refers to you can find more info here http://wiki.debian.org/LSBInitScripts/

$ sudo cp /etc/filteredpush/scripts/fuseki /etc/init.d/ $ sudo chmod +x /etc/init.d/fuseki $sudo update-rc.d fuseki defaults

Start fuseki and visit http://localhost:3030/ in your browser:

/etc/init.d/fuseki start

2.2. SPARQL Query Primer

If you need to execute SPARQL queries on Fuseki manaully you can do so from the web interface. First select the Control Panel link on the main page (under Server Management) and select the "/AnnotationStore" dataset (the dataset is supplied as an argument when the startup script in /etc/init.d starts fuseki and the files are stored in the directory configured in the tdb-assembler.ttl file in FUSEKI_HOME). On the page that follows, you can launch sparql queries, updates or upload rdf/xml. Some useful queries are listed below.

In the query textarea:

1) SELECT * {?s ?p ?o} (select all triples currently in the default un-named graph) 2) SELECT * WHERE { GRAPH ?g {?s ?p ?o} } LIMIT 10 (select all triples in any named graph besides those in the default graph)

in the update textarea:

1) CLEAR ALL (clear all triples from the triplestore) 2) CLEAR GRAPH <name of graph here> (clear all triples in any given named graph) 3) LOAD <http: //yourhost/data/somefile.rdf> (load all the triples from the rdf file hosted at the address specified)

For the queries, selecting XML from the Output dropdown on the form will probably give the most human readable results. Additionally, you may specify an xslt file for styling the xml result for browser viewing (a value of /xml-to-html.xsl is supplied by default, this file is located in your FUSEKI_HOME/pages directory).

It may also be useful to enable logging to a file in the log4j.properties for fuseki. If you wish to do this now, uncomment the following lines in the log4j.properties file (located at /usr/share/fuseki/jena-fuseki-0.2.5/log4j.properties):

log4j.rootLogger=INFO, FusekiFileLog ... log4j.appender.FusekiFileLog=org.apache.log4j.DailyRollingFileAppender log4j.appender.FusekiFileLog.DatePattern='.'yyyy-MM-dd log4j.appender.FusekiFileLog.File=logs/fuseki-log log4j.appender.FusekiFileLog.layout=org.apache.log4j.PatternLayout log4j.appender.FusekiFileLog.layout.ConversionPattern=%d{HH:mm:ss}%-5p %-20c{1} :: %m%n

Create the logs directory in /usr/share/fuseki/jena-fuseki-0.2.5:

$ sudo mkdir /usr/share/fuseki/jena-fuseki-0.2.5/logs

2.3 Securing Fuseki

The deployment and configuration steps presented in the previous sections do not setup access restrictions on the Fuseki triplestore. Both the update and query endpoints are exposed to external clients. If you want to restrict access to fuseki from outside of localhost you can configure Fuseki (which is running in a bootstrapped Jetty servlet container) to only accept connections from localhost on a specified port. Provided that SparqlPuSH is installed on the same host, only SparqlPuSH will have access to the triplestore. External clients must be authenticated by SparqlPuSH before the updates are made to the triplestore.

If you would like to open access to the Fuseki query endpoint (i.e. http://localhost:3030/AnnotationStore/query) but not upload or updates you can configure mod_proxy in apache to allow access to the endpoints selectively. The steps for this are detailed below:

Obtain the jetty config for Fuseki from our sourceforge and put this in the FUSEKI_HOME directory (/usr/share/fuseki/jena-fuseki-0.2.5/). Make edits if necessary but the defaults should restrict access to localhost on port 3030:

$ sudo cp /etc/filteredpush/sparql/fuseki/jetty.xml /usr/share/fuseki/jena-fuseki-0.2.5/

More detailed info about the jetty configuration can be found at http://wiki.eclipse.org/Jetty/Reference/jetty.xml

Stop the Fuseki server if it is currently running and edit the startup script (/etc/init.d/fuseki). Add the --jetty-config=jetty.xml option to the invocation of fuseki-server

./fuseki-server --jetty-config=jetty.xml --desc=tdb-assembler.ttl --update /AnnotationStore 2>&1 &

Restart fuseki and confirm that external access is not possible (http://<hostname>:3030).</hostname>

If you only wish for applications running on localhost to invoke the fuseki (via query, upload, update) then you can stop here. Otherwise, if you would like to expose only certain endpoints while restricting others (for example you want to allow queries from the outside but no changes/additions to data), you can follow the rest of the configuration steps below.

The following steps are based on the documentation found at http://wiki.eclipse.org/Jetty/Tutorial/Apache

First enable mod_proxy in apache via the following:

$ sudo a2enmod proxy_http

This should have created a proxy.conf file in /etc/apache2/mods-enabled. This file contains some default configuration, edit it and replace everything within <IfModule>...</IfModule> with the following:

# Turn off support for true Proxy behaviour as we are acting as # a transparent proxy ProxyRequests Off # Turn off VIA header as we know where the requests are proxied ProxyVia Off # Turn on Host header preservation so that the servlet container # can write links with the correct host and rewriting can be avoided. ProxyPreserveHost On # Set the permissions for the proxy <Proxy *> AddDefaultCharset off Order deny,allow Allow from all </Proxy> # Turn on Proxy status reporting at /status # This should be better protected than: Allow from all ProxyStatus On <Location /status> SetHandler server-status Order Deny,Allow Allow from all </Location>

Next, add the following line to the default site in /etc/apache2/sites-available and make sure it is enabled (via a2ensite default):

ProxyPass /fuseki/AnnotationStore/query (if your dataset is not named AnnotationStore, replace this in both urls)

Restart apache and check that outside access to the query interface is enabled (http://<hostname>/fuseki/AnnotationStore/query should give an error from fuseki since no query is specified in the request. If you get an error from apache instead this means that it is still not accessible)

With outside access no longer possible you may not be able to get at the Fuseki control panel to test queries and updates. You could run the ruby scripts (located in your FUSEKI_HOME) provided by the Jena Fuseki project on the localhost instead (see http://jena.apache.org/documentation/serving_data/index.html#script-control for how to use the scripts. also see http://jena.apache.org/documentation/serving_data/soh.html for more info.)

3. SparqlPuSH Server

Checkout the SparqlPuSH Server php project from the FilteredPush Sourceforge svn:

$svn checkout svn://svn.code.sf.net/p/filteredpush/svn/trunk/FP-HTTP/FP-SparqlPuSH/spqlpsh-server/

Copy the spqlpsh-server PHP includes to your php include path (i.e. /usr/share/php5/). These are simplepie.inc for parsing rss feeds and xmlseclibs.php for xml digital signature authentication of annotations:

$ cp spqlpsh-server/lib/includes/* /usr/share/php/

Edit build.properties to configure the deployment. Replace /var/www in the server.home property value with your document root, replace sparql.endpoint with the fuseki url or other endpoint url, replace the value for pubsubhubbub.hub with the hub host. The urls for sparql.endpoint and pubsubhubbub.hub must end in a trailing slash (see the example build.properties provided) however server.home should not. The properties prefixed with db are specific to the ARC2 triplestore. You only need to configure these if you are using ARC2 instead of fuseki. Otherwise you can leave the defaults.

Each client that should be authorized to load annotation rdf/xml into the triplestore via the hub needs to supply a generated certificate to the adminstrator of SparqlPuSH. These certificates should be a .pem file that contains the public key and clients should sign outgoing rdf/xml with the corresponding private key (the PHP libraries for fp contain a class that a client can use for doing this: fp/common/XmlSign.php).

Generate public/private key pairs for each client you wish to be authorized to load annotations into the triple store:

$ openssl req -x509 -nodes -newkey rsa:2048 -out newcert.pem -outform PEM -days 1825

The above will generate two files: privkey.pem (which should reside with the client and be placed somewhere outside the directory root of the server) and newcert.pem (contains public key, a copy of this should be stored on the same server as SparqlPuSH somewhere the application can access it)

Create the keystore:

$ openssl pkcs12 -export -in newcert.pem -inkey privkey.pem -out keystore.p12 -name keystore

Configure SparqlPuSH with a list of clients who are authorized to load data into the triple store by editing the certs.txt (in the spqlpsh-server project) file and adding an alias (such as symbiota) paired with the path to the pem file that contains the client's public key.

Run the ant build script to deploy and create directories that are prerequisite to deploying the spqlpsh-client project:

$ cd spqlpsh-server vi build.properties (make changes or use defaults) $ gedit certs.txt (add alias,certificate pem file pairs for authorized clients) $ant deploy

4. Annotation Generator

Use svn to check out and deploy the FP-Lite-SCAN configuration to its default location in /etc/filteredpush:

$ sudo svn checkout svn://svn.code.sf.net/p/filteredpush/svn/trunk/FP-Configuration/FP-Lite-SCAN/config/filteredpush /etc/filteredpush

Download the Annotation Generator war file from the Sourceforge downloads page. Deploy the war file to the tomcat webapps directory in /var/lib/tomcat7/webapps/ and restart tomcat.

$ sudo mv FP-AnnotationGenerator.war /var/lib/tomcat/webapps/ $ sudo /etc/init.d/tomcat restart

Once tomcat starts up again, load http://localhost:8080/FP-AnnotationGenerator/rest/generate in a web browser to confirm that the application is working. You should see a list of handlers that correspond to the configuration files in /etc/filteredpush/model

5. Client Helper Libraries and Configuration

The FilteredPush libraries for php clients can be checked out from the sourceforge svn:

$ svn checkout svn://svn.code.sf.net/p/filteredpush/svn/FP-Tools/FP-PHP-Library/ FP-PHP-Library

Clients must be configured (via edits to fp/FPConfig.php) to use the network components. Edit this file and set the X509_CERTIFICATE, PRIVATE_KEY and NETWORK_FACADE properties.

X509_CERTIFICATE should be set to the file path of the certificate (pem file) containing the public key for the client and PRIVATE_KEY should be set to the file path of the pem file containing the client's private key. Set the NETWORK_FACADE property to either FPLiteFacade (for FP-Lite deployments) or FPMediumFacade (for FP-Medium deployments).

The rest of the defaults should work with the default single-node deployment of FilteredPush. A summary of all the configuration options can be found below:

Once configured, we can deploy the fp directory containing the config and the libraries in FP-PHP-Library/fp to /usr/share/php. Clients (such as Symbiota and Morphbank) will use these libraries when interacting with both FP-Medium and FP-Lite

$ cp -r FP-PHP-Library/fp /usr/share/php

With the libraries and configuration deployed you must enable FilteredPush in the symbini.php configuration file of Symbiota (i.e. /var/www/symbiot/config/symbini.php) by setting the $fpEnabled variable to true. To obtain the modifications to Morphbank for FilteredPush use git and clone the morphbank project from the FilteredPush sourceforge repository:

$ git clone git://git.code.sf.net/p/filteredpush/morphbank filteredpush-morphbank