The following instructions were tested on a clean install of Debian 6.0.5 "squeeze" and Ubuntu 12.04 with the following packages installed vi apt-get install (also serves as a list of prerequisites)
- General Deployment:
- General PHP Extensions:
- php5-mysql (for filteredpush branch of symbiota and SparqlPuSH)
- php5-curl (for FP-PHP-Library)
- Development/deployment utils:
- SparqlPuSH dependencies:
If you are installing (or plan to install at a later date) the FP-Medium deployment, checkout and deploy the FP-Medium-SCAN configuration to /etc/filteredpush. This configuration will also work for FP-Lite (FP-Lite-SCAN is a subset of the FP-Medium-SCAN config).
Otherwise, if you are only planning on deploying the light-weight annotation system (FP-Lite), checkout and deploy the FP-Lite-SCAN configuration:
The next step in the FP-Lite deployment is to install the sparql endpoint and triplestore for storing the annotations. We are using the Fuseki sparql endpoint configured to use Jena TDB as the triplestore.
2. Jena Fuseki SPARQL Endpoint
2.1. Installing Fuseki
Additional documentation can be found at: http://jena.apache.org/documentation/serving_data/index.html
Download and extract the latest version of the Jena Fuseki sparql endpoint from http://www.apache.org/dist/jena/binaries/ (look for the latest version of
Move the extracted Fuseki directory to the installation directory
Download the example assembler config (using TDB as the triple store) from the sourceforge (http://sourceforge.net/projects/filteredpush/files/Release_1/misc/tdb-assembler.ttl) and place it in the Fuseki directory you just extracted to:
The fuseki startup script can be found in the FP-SCAN configuration projects. Download this file from sourceforge (http://sourceforge.net/projects/filteredpush/files/Release_1/misc/fuseki) and, if necessary, edit the file so that FUSEKI_HOME is set to the directory you installed fuseki to. Also make it executable via chmod and use update-rc.d to install it as a startup script.
NOTE: On Debian I reveived the following warning: insserv: warning: script 'fuseki' missing insserv: warning: script 'fuseki' missing LSB tags and overrides and overrides. If you want to add the LSB tags that the warning refers to you can find more info here http://wiki.debian.org/LSBInitScripts/
Start fuseki and visit http://localhost:3030/ in your browser:
2.2. SPARQL Query Primer
If you need to execute SPARQL queries on Fuseki manaully you can do so from the web interface. First select the Control Panel link on the main page (under Server Management) and select the "/AnnotationStore" dataset (the dataset is supplied as an argument when the startup script in /etc/init.d starts fuseki and the files are stored in the directory configured in the tdb-assembler.ttl file in FUSEKI_HOME). On the page that follows, you can launch sparql queries, updates or upload rdf/xml. Some useful queries are listed below.
In the query textarea:
in the update textarea:
For the queries, selecting XML from the Output dropdown on the form will probably give the most human readable results. Additionally, you may specify an xslt file for styling the xml result for browser viewing (a value of /xml-to-html.xsl is supplied by default, this file is located in your FUSEKI_HOME/pages directory).
It may also be useful to enable logging to a file in the log4j.properties for fuseki. If you wish to do this now, uncomment the following lines in the log4j.properties file (located at /usr/share/fuseki/jena-fuseki-0.2.5/log4j.properties):
Create the logs directory in
2.3 Securing Fuseki
The deployment and configuration steps presented in the previous sections do not setup access restrictions on the Fuseki triplestore. Both the update and query endpoints are exposed to external clients. If you want to restrict access to fuseki from outside of localhost you can configure Fuseki (which is running in a bootstrapped Jetty servlet container) to only accept connections from localhost on a specified port. Provided that SparqlPuSH is installed on the same host, only SparqlPuSH will have access to the triplestore. External clients must be authenticated by SparqlPuSH before the updates are made to the triplestore.
If you would like to open access to the Fuseki query endpoint (i.e. http://localhost:3030/AnnotationStore/query) but not upload or updates you can configure mod_proxy in apache to allow access to the endpoints selectively. The steps for this are detailed below:
Obtain the jetty config for Fuseki from our sourceforge and put this in the FUSEKI_HOME directory (/usr/share/fuseki/jena-fuseki-0.2.5/). Make edits if necessary but the defaults should restrict access to localhost on port 3030:
More detailed info about the jetty configuration can be found at http://wiki.eclipse.org/Jetty/Reference/jetty.xml
Stop the Fuseki server if it is currently running and edit the startup script (/etc/init.d/fuseki). Add the --jetty-config=jetty.xml option to the invocation of fuseki-server
Restart fuseki and confirm that external access is not possible (http://<hostname>:3030).</hostname>
If you only wish for applications running on localhost to invoke the fuseki (via query, upload, update) then you can stop here. Otherwise, if you would like to expose only certain endpoints while restricting others (for example you want to allow queries from the outside but no changes/additions to data), you can follow the rest of the configuration steps below.
The following steps are based on the documentation found at http://wiki.eclipse.org/Jetty/Tutorial/Apache
First enable mod_proxy in apache via the following:
This should have created a proxy.conf file in /etc/apache2/mods-enabled. This file contains some default configuration, edit it and replace everything within <IfModule>...</IfModule> with the following:
Next, add the following line to the default site in /etc/apache2/sites-available and make sure it is enabled (via a2ensite default):
Restart apache and check that outside access to the query interface is enabled (http://<hostname>/fuseki/AnnotationStore/query should give an error from fuseki since no query is specified in the request. If you get an error from apache instead this means that it is still not accessible)
With outside access no longer possible you may not be able to get at the Fuseki control panel to test queries and updates. You could run the ruby scripts (located in your FUSEKI_HOME) provided by the Jena Fuseki project on the localhost instead (see http://jena.apache.org/documentation/serving_data/index.html#script-control for how to use the scripts. also see http://jena.apache.org/documentation/serving_data/soh.html for more info.)
3. SparqlPuSH Server
Checkout the SparqlPuSH Server php project from the FilteredPush Sourceforge svn:
Copy the spqlpsh-server PHP includes to your php include path (i.e. /usr/share/php5/). These are simplepie.inc for parsing rss feeds and xmlseclibs.php for xml digital signature authentication of annotations:
Edit build.properties to configure the deployment. Replace /var/www in the server.home property value with your document root, replace sparql.endpoint with the fuseki url or other endpoint url, replace the value for pubsubhubbub.hub with the hub host. The urls for sparql.endpoint and pubsubhubbub.hub must end in a trailing slash (see the example build.properties provided) however server.home should not. The properties prefixed with db are specific to the ARC2 triplestore. You only need to configure these if you are using ARC2 instead of fuseki. Otherwise you can leave the defaults.
Each client that should be authorized to load annotation rdf/xml into the triplestore via the hub needs to supply a generated certificate to the adminstrator of SparqlPuSH. These certificates should be a .pem file that contains the public key and clients should sign outgoing rdf/xml with the corresponding private key (the PHP libraries for fp contain a class that a client can use for doing this: fp/common/XmlSign.php).
Generate public/private key pairs for each client you wish to be authorized to load annotations into the triple store:
The above will generate two files: privkey.pem (which should reside with the client and be placed somewhere outside the directory root of the server) and newcert.pem (contains public key, a copy of this should be stored on the same server as SparqlPuSH somewhere the application can access it)
Create the keystore:
Configure SparqlPuSH with a list of clients who are authorized to load data into the triple store by editing the certs.txt (in the spqlpsh-server project) file and adding an alias (such as symbiota) paired with the path to the pem file that contains the client's public key.
Run the ant build script to deploy and create directories that are prerequisite to deploying the spqlpsh-client project:
4. Annotation Generator
Use svn to check out and deploy the FP-Lite-SCAN configuration to its default location in /etc/filteredpush:
Download the Annotation Generator war file from the Sourceforge downloads page. Deploy the war file to the tomcat webapps directory in /var/lib/tomcat7/webapps/ and restart tomcat.
Once tomcat starts up again, load http://localhost:8080/FP-AnnotationGenerator/rest/generate in a web browser to confirm that the application is working. You should see a list of handlers that correspond to the configuration files in /etc/filteredpush/model
5. Client Helper Libraries and Configuration
The FilteredPush libraries for php clients can be checked out from the sourceforge svn:
Clients must be configured (via edits to fp/FPConfig.php) to use the network components. Edit this file and set the X509_CERTIFICATE, PRIVATE_KEY and NETWORK_FACADE properties.
X509_CERTIFICATE should be set to the file path of the certificate (pem file) containing the public key for the client and PRIVATE_KEY should be set to the file path of the pem file containing the client's private key. Set the NETWORK_FACADE property to either FPLiteFacade (for FP-Lite deployments) or FPMediumFacade (for FP-Medium deployments).
The rest of the defaults should work with the default single-node deployment of FilteredPush. A summary of all the configuration options can be found below:
RDFHANDLER_ENDPOINT- the url for the annotation webservice, used for creating new identification annotation rdf/xml
FPNODE_ENDPOINT- this is the AccessPoint SOAP webservice as part of FP-Medium (for an FP-Lite deployment the default can be used for now)
SPARQLPUSH_SERVER- sparqlpush server uri
SPARQLPUSH_CLIENT- sparqlpush client uri
DS- the dataset that fuseki was started with in the startup script
SPARQL_ENDPOINT- the uri to the fuseki endpoint
RESULT_XSLT- the xsl for styling query results and the annotations shown on the Annotations tab in Symbiota (on the Occurrence Record form).
X509_CERTIFICATE- the pem file (newcert.pem from the example above) that contains the public key for this client
PRIVATE_KEY- the pem file used by this client (privkey.pem) for signing the rdf/xml
NETWORK_FACADE- Current network implementation to use (see classes/fp/facades), choices are FPLiteFacade and FPMediumFacade
Once configured, we can deploy the fp directory containing the config and the libraries in FP-PHP-Library/fp to /usr/share/php. Clients (such as Symbiota and Morphbank) will use these libraries when interacting with both FP-Medium and FP-Lite
With the libraries and configuration deployed you must enable FilteredPush in the symbini.php configuration file of Symbiota (i.e. /var/www/symbiot/config/symbini.php) by setting the $fpEnabled variable to true. To obtain the modifications to Morphbank for FilteredPush use git and clone the morphbank project from the FilteredPush sourceforge repository: