Whyis is a nano-scale knowledge graph publishing, management, and analysis framework.
Before starting this tutorial, please make sure that you have created a Whyis extension using the instructions in Installing Whyis.
Whyis is more than just a place to store, manage and query knowledge. The Inference Agent framework allows for post-processing of your uploaded knowledge. Inference Agents are capable of knowledge curation, reasoning, interaction, creation, and much more.
This tutorial will walk through the Inference Agent framework, how to create an inference agent, and invocation of an example inference agent.
The purpose of Whyis Inference Agents is provide additional knowledge or perform some task on your pre-existing knowledge. Each inference agent is itself a SADI service. There is a listener that waits for knowledge graph changes, such as the upload of new knowledge. On knowledge graph change, the listener invokes each agents’ SPARQL query.
Upon invocation of an Inference Agent, the input is used to perform some task and provide output to the knowledge base. For example, if knowledge exists for the Inference Agent’s pre-defined SPARQL query and something exists in the input class, create additional knowledge based on the content of that input. To learn more about this process, take a look through the source code of autonomic.py
.
Agents can also be set up to run on a timed schedule, otherwise known as Inference Tasks. A tutorial on this functionality will be included at a later date.
Example inference agents are provided within the Agents directory as nlp.py
. At the top of the file, you can import files, modules and more. Namespaces can be defined for your inference agents in the file as well. The nlp.py
file provides sample inference agent declarations.
A Whyis agent must consist of the following functions:
All inference agents should be stored in your kgapp. Here is a sample project source directory location: /apps/my_knowledge_graph/my_knowledge_graph/
. In this directory, there are two pre-generated files: agent.py
and __init__.py
.
__init__.py
contains import statements to all python inference agent files in this directory. For example, if you have one file in the directory called agent.py
, __init__.py
should contain: import agent
. If you create additional python files in that directory, you need to add import statements in __init__.py
.
agent.py
contains the framework for sample inference agents. Here is the content from agent.py
:
from whyis import autonomic
from rdflib import *
from slugify import slugify
from whyis import nanopub
sioc_types = Namespace("http://rdfs.org/sioc/types#")
sioc = Namespace("http://rdfs.org/sioc/ns#")
sio = Namespace("http://semanticscience.org/resource/")
dc = Namespace("http://purl.org/dc/terms/")
prov = autonomic.prov
whyis = autonomic.whyis
At the top of the file, we import the relevant classes and packages. These import statements should be included in all inference agent files. Below that are some pre-defined namespaces. You may include as many namespaces as needed. We recommend including prov = autonomic.prov
and whyis = autonomic.whyis
in other inference agent files as well.
After the import statements and namespace definitions, we can declare our Inference Agents! A example inference agent, HTML2Text is provided below. More information about this inference agent is provided later in this tutorial.
class HTML2Text(autonomic.UpdateChangeService):
activity_class = whyis.TextFromHTML
def getInputClass(self):
return sioc.Post
def getOutputClass(self):
return URIRef("http://purl.org/dc/dcmitype/Text")
def get_query(self):
return '''select ?resource where { ?resource <http://rdfs.org/sioc/ns#content> [].}'''
def process(self, i, o):
content = i.value(sioc.content)
soup = BeautifulSoup(content, 'html.parser')
text = soup.get_text("\n")
o.add(URIRef("http://schema.org/text"), Literal(text))
After a new Agent is implemented, it must be declared in your customization package’s whyis.conf
file. For the purpose of explanation, let’s say that I have created a new inference agent. The name of the python file is myAgents.py
, the inference agent (python class) defined in this file is called SampleAgent
, and the customization package’s name is my_knowledge_graph
.
To declare a new inference agent in the configuration, the following two tasks must be completed: 1) Import your new inference agent in the config file. 2) Add your Inference Agent to the inferencers dictionary (located at the end of the base config class). The key should be the name of your Inference Agent and the value should be the function call.
In the case of my example, I would first need to import my inference agent. The code might look like the following:
import my_knowledge_graph.myAgents as myAgents
. Secondly, we must include the inference agent in the inferencers dictionary. A sample inferencers dictionary is provided below, with our new inference agent included.
INFERENCERS = {
"SETLr": autonomic.SETLr(),
"HTML2Text" : nlp.HTML2Text(),
"EntityExtractor" : nlp.EntityExtractor(),
"EntityResolver" : nlp.EntityResolver(),
#"TF-IDF Calculator" : nlp.TFIDFCalculator(),
"SKOS Crawler" : autonomic.Crawler(predicates=[skos.broader, skos.narrower, skos.related]),
"SampleAgent" : myAgents.SampleAgent()
},
INFERENCE_TASKS = [
dict ( name="SKOS Crawler",
service=autonomic.Crawler(predicates=[skos.broader, skos.narrower, skos.related]),
schedule=dict(hour="1")
)
]
For this example, we will be invoking the pre-defined inference agent, HTML2Text. This Agent is defined in the Agents folder as nlp.py
and utilizes the BeautifulSoup python package. To better understand this inference agent, review the source code shown earlier in this tutorial (also located in your directory: /apps/whyis/agents/nlp.py
).
The purpose of HTML2Text is to take an HTML sample and convert it to plain text. HTML2Text takes the URI, http://rdfs.org/sioc/ns#Post, as input. The query used for this inference is included below.
select ?resource where { ?resource <http://rdfs.org/sioc/ns#content> [].}
In order to invoke HTML2Text, we must first define the inference agent in the config.py
file. Open the file and uncomment the line: ‘“HTML2Text” : nlp.HTML2Text(),’. Confirm that all uncommented inference agents end in a comma, except for the last defined inference agent in the list. If there is an error in your config.py
, Whyis will use the default config file instead. Your inferencers dictionary should look like the following:
INFERENCERS = {
"SETLr": autonomic.SETLr(),
"HTML2Text" : nlp.HTML2Text()
#"EntityExtractor" : nlp.EntityExtractor(),
#"EntityResolver" : nlp.EntityResolver(),
#"TF-IDF Calculator" : nlp.TFIDFCalculator(),
#"SKOS Crawler" : autonomic.Crawler(predicates=[skos.broader, skos.narrower, skos.related]),
},
Second, we must upload some knowledge to invoke the HTML2Text Agent. Below is a sample Turtle file. Copy the content and save it to a file in your local directory as ‘htmlKnowledge.ttl’.
@prefix sioc: <http://rdfs.org/sioc/ns#>.
<http://rdfs.org/sioc/ns#Post> <http://rdfs.org/sioc/ns#content> """<!DOCTYPE html><html><body><h1>Website heading template!</h1><p>There is no place like 127.0.0.1.</p></body></html>""".
To load our new knowledge, use the load command:
$ whyis load -i <path to input file directory> -f turtle
As a result of the knowledge upload, there will be a new triple that contains parsed HTML text.
subject | predicate | object |
---|---|---|
http://rdfs.org/sioc/ns#Post | http://schema.org/text | Website heading template! There is no place like 127.0.0.1. |
If you are having issues with creating an Inference Agent, try the following solutions:
$ sudo service celeryd restart