Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
rdf [2018/08/07 21:37]
christian [Triple store]
rdf [2020/03/18 09:09] (current)
christian [Experience]
Line 3: Line 3:
 The Resource Description Framework (RDF) is a framework for representing information in the Web. The Resource Description Framework (RDF) is a framework for representing information in the Web.
  
-This library implements the W3C Recommendation [[https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/|RDF 1.1 XML Syntax]]. RDF/XML files can be read as Graph, which is a collection of RDF Statements. Likewise, a RDF Graph can be written in RDF/XML. The library also implements reading and writing [[https://www.w3.org/TR/2014/REC-n-triples-20140225/|N-Triple]] format, since this is used in the official test cases.+This library implements the W3C Recommendation [[https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/|RDF 1.1 XML Syntax]]. RDF/XML files can be read as Graph, which is a simple collection of RDF Statements. Likewise, a RDF Graph can be written in RDF/XML. The library also implements reading and writing the simple [[https://www.w3.org/TR/2014/REC-n-triples-20140225/|N-Triple]] format, since this is used in the official test cases.
  
 The code is in the bundle ''**{RDF Project}**'' in the public store. The licence is MIT. The code is in the bundle ''**{RDF Project}**'' in the public store. The licence is MIT.
 ===== Motivation ===== ===== Motivation =====
  
-PDFs can have metadata describing the document. In PDF 2.0 this is mandatory. The metadata are in the [[https://www.adobe.com/products/xmp.html|XMP]] (''Extensible Metadata Platform'') defined by Adobe, which is now an [[https://www.iso.org/news/2012/03/Ref1525.html|ISO standard]]. XMP is also interesting, because many other digital formats (pictures, movies, etc.) embed their metadata as XMP.+PDFs can have metadata describing the document. In [[https://www.pdfa.org/publication/iso-32000-2-pdf-2-0/|PDF 2.0]] this is mandatory. The metadata are in the [[https://www.adobe.com/products/xmp.html|XMP]] (''Extensible Metadata Platform''format defined by Adobe, which is now an [[https://www.iso.org/news/2012/03/Ref1525.html|ISO standard]]. XMP is also interesting, because many other digital formats (pictures, movies, etc.) embed their metadata as XMP.
  
-The desire to add the metadata feature to the PDF library came from a request of [[http://www.objektfabrik.de/|Joachim Tuchel]], who asked me if I could deal wit [[https://www.ferd-net.de/zugferd/definition/index.html|ZUGFeRD]], a new format for electronic bills. A ZUGFerd document is a PDF of an invoive with an attached XML document containing the same information in a structured form. For this, I would need to implement PDF attachments (should be simple) and XMP, because ZUGFeRD requires certain entries in the metadata.+The impuls to add the metadata feature to the PDF library came from a request of [[http://www.objektfabrik.de/|Joachim Tuchel]], who asked me if I could deal wit [[https://www.ferd-net.de/zugferd/definition/index.html|ZUGFeRD]], a new format for electronic bills. A ZUGFerd document is a PDF of an invoive with an attached XML document containing the same information in a structured form. For this, I would need to implement PDF attachments (should be simple) and XMP, because ZUGFeRD requires certain entries in the metadata.
  
-XMP is a subset of RDF/XML with restrictions (seemingly the way to define RDF languages). Many difficult features of RDF/XML are not needed for XMP, so it might have been easier to implement just that. But I got interested in RDF and wanted to do the real thing. +XMP is a RDF language, a subset of RDF/XML with restrictions (seemingly the way to define RDF languages). Many difficult features of RDF/XML are not needed for XMP, so it might have been easier to implement just that. But I got interested in RDF and wanted to do the real thing. 
  
-RDF represents facts in a very basic way: by asserting statements about something. A statement has a subject, a predicate and an object. A subject is usually an IRI (internationalized URL) pointing to some document on the web. A subject can also be an anonymous placeholder, blank. Predicates always have to be an IRI. They specify the relation between the subject and the object and are usually contained in a vocabulary. Objects can be anything: IRI, Blank or Literal.+RDF represents facts in a very basic way: by asserting statements about something. A statement has a subject, a predicate and an object. A subject is usually an IRI (internationalized URL) pointing to some document on the web. A subject can also be an anonymous placeholder, blank. Predicates always have to be an IRI. They specify the relation between the subject and the object and are usually defined by a vocabulary. Objects can be anything: IRI, Blank or Literal.
  
-This simple representation is interesting, because it is more flexible and more expressive than relational databases or objects in object-oriented systems. RDF subjects need no uniform structure as in the other approaches. Also, because predicates are just IRIs, anyone can add their own sematics to existing objects without disturbing other users.+This simple representation is interesting, because it is more flexible and more expressive than relational databases or objects in object-oriented systems. RDF data need no uniform structure as in the other approaches. Also, because predicates are IRIs, anyone can add their own sematics to existing objects without disturbing other users.
  
 These features of RDF are inspiring and their use on the web is fascinating. [[https://www.wikidata.org/wiki/Wikidata:Database_download|Wikidata]] and [[https://wiki.openstreetmap.org/wiki/OSM_Semantic_Network|OpenStreetMap]], for example, are based on RDF. Now I dream of using RDF to describe the buget data for [[https://unsere-gelder.de|Unsere Gelder]] or use it for creating a PDF examples database. These features of RDF are inspiring and their use on the web is fascinating. [[https://www.wikidata.org/wiki/Wikidata:Database_download|Wikidata]] and [[https://wiki.openstreetmap.org/wiki/OSM_Semantic_Network|OpenStreetMap]], for example, are based on RDF. Now I dream of using RDF to describe the buget data for [[https://unsere-gelder.de|Unsere Gelder]] or use it for creating a PDF examples database.
  
 Although I want to use RDF for XMP and the PDF library, RDF itself is valuable on its own. And since the RDF implementation does not have any dependencies on PDF, I publish it as stand-alone library which depends only on XML. Although I want to use RDF for XMP and the PDF library, RDF itself is valuable on its own. And since the RDF implementation does not have any dependencies on PDF, I publish it as stand-alone library which depends only on XML.
-===== Implementation =====+===== Usage =====
  
 The namespace is RDF. The core classes are The namespace is RDF. The core classes are
   * **Graph** is a collection of ''**Statement**''s   * **Graph** is a collection of ''**Statement**''s
-  * **Statement** consists of subject, predicate and object; each of which is a ''**Term**''+  * **Statement** consists of ''subject''''predicate'' and ''object''; each of which is a ''**Term**''
   * **Term** with ''**IRI**'', ''**Blank**'' and ''**Literal**'' as concrete subclasses   * **Term** with ''**IRI**'', ''**Blank**'' and ''**Literal**'' as concrete subclasses
  
Line 66: Line 66:
 ===== Experience ===== ===== Experience =====
  
-First, I read the nicely written [[https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/#section-Syntax|introductory chapter]] of the specification. For the examples, I programmed a prototyp ad-hoc to read and write them. That was fun, but many details were not mentioned, so that I used wild guesses for parts of the code.+First, I read the nicely written [[https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/#section-Syntax|introductory chapter]] of the RDF specification. For the examples, I programmed a prototyp ad-hoc to read and write them. That was fun, but many details were not mentioned, so that I used wild guesses for parts of the code.
  
-Then I discovered all the [[https://www.w3.org/TR/rdf11-testcases/|tests for RDF]]. Among them was the official test suite which covers every detail. How cool is that! After importing all tests as SUnit tests into the image, I refactored and refined my implementation with the use of the tests. In the beginning, many - about half - failed or lead to errors. This as fun too, but again, there has been not enough information for a "right" implementation. The tests covered all features but only isolated and didn't give any reasoning about what the test is testing; just the expected outcome. Sometimes it was not clear how the features would interact with others. Anyhow, I dengled my implementation until all tests were green (about 350).+Then I discovered all the [[https://www.w3.org/TR/rdf11-testcases/|tests for RDF]]. Among them was the official test suite which covers every detail. How cool is that! After importing all tests as SUnit tests into the image, I refactored and refined my implementation with the use of the tests. In the beginning, many - about half - failed or lead to errors. This was fun too, but again, there has been not enough information for a "right" implementation. The tests covered all features but only isolated and didn't give any explanation about what the test is testing; just the expected outcome. Sometimes it was not clear how the features would interact with others. Anyhow, I fiddled with my implementation until all tests were green (about 350).
  
-When an example or feature was not clear, I consulted the [[https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/#section-Infoset-Grammar|formal grammar]]. The grammar consists of syntax productions which clearly describes the form and interplay of all elements. The chapter is not for reading. At first I skimmed over it - it is not meant for reading. But it is invaluable for deciding on the right way to program things. So I decided to implement the productions as shown in the spec. This was a lot of fun. I had the tests - all green - and all functionality has been implemented already. Therefore it was just a reformulation of the code into a form close to the productions in the spec. This lead to less and cleaner code. And now I am pretty sure, that the implementation is complete. I think that it will handle any legal RDF/XML!+When an example or feature was not clear, I consulted the [[https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/#section-Infoset-Grammar|formal grammar]]. The grammar consists of syntax productions which clearly describes the form and interplay of all elements. The chapter is not for reading. But it is invaluable for deciding on the right way to program things. So I decided to implement the productions as shown in the spec. This was a lot of fun. I had the tests - all green - and all functionality had been implemented already. Therefore it was just refactoring of the code into a form close to the productions in the spec. This lead to less and cleaner code. And now I am pretty sure, that the implementation is complete. I think that it will handle any legal RDF/XML.
  
-===== Possible directions for future work =====+===== Future directions =====
  
 RDF with RDF/XML is one of the basic parts for the "semantic web". RDF with RDF/XML is one of the basic parts for the "semantic web".
Line 79: Line 79:
 ==== XMP ==== ==== XMP ====
  
-For the PDFtalk library, especially for supporting PDF 2.0, I need XMP to work with the metadata of documents. XMP is a RDF language which restricts the very general RDF. Certain features, for example, are forbidden in XMP - luckily the complicated ones: XMLLiterals and xml:base. Other restrictions apply to properties and how they are used. This defines the semantics of the language.+For the PDFtalk library I need XMP to work with the metadata of documents. XMP is a RDF language which restricts the very general RDF. Certain features are forbidden in XMP - luckily the complex ones: ''XMLLiterals'' and ''xml:base''. Other restrictions apply to properties and how they are used. This defines the semantics of the language.
  
 I wonder how to do this: defining a language on RDF. Interesting task. I wonder how to do this: defining a language on RDF. Interesting task.
 ==== Triple store ==== ==== Triple store ====
  
-Eventually, I want to store a lot of statements in a triple store, a specialized server which can store statements aka triples.+Eventually, I want to store a lot of statements. For now, a ''Graph'' is just an ''OrderedCollection'' of ''Statement''s. This will not be sufficient for large amounts of data. I will need a [[https://en.wikipedia.org/wiki/Triplestore|triplestore]].
  
-=== Other syntaxes: Turtle and SparQL ===+=== Other syntaxes: Turtle and SPARQL ===
  
-There are other syntaxes for RDF. Beside HTML and JSON form, Turtle is a human readable form, often used for ontologies. Turtle has a few variants of which SparQL is the most important. It resembles SQL and allows variables in any part of the triple. SparQL is used as query language for triple stores, which usually offer a SparQL endpoint where queries can be made. With a nice SparQL writer, one can talk to any other triple store.  That would be cool! A parser would be nice for your own triple store.+There are other syntaxes for RDF. Beside [[https://www.w3.org/TR/2015/NOTE-rdfa-primer-20150317/|HTML]] and [[https://www.w3.org/TR/json-ld/|JSON]] forms[[https://www.w3.org/TR/turtle/|Turtle]] is a human readable form, often used for ontologies. Turtle itself has a few variants of which [[https://www.w3.org/TR/sparql11-query/|SPARQL]] is the most important. It resembles SQL and allows variables in any part of the triple. SPARQL is used as query language for triplestores, which usually offer a SPARQL endpoint where queries can be made. With a nice SPARQL writer, one can talk to any other triplestore.  That would be cool! A parser would be nice for your own triplestore.
  
 === Schemas and inferencing === === Schemas and inferencing ===
  
-Schemas (or Vocabularies or ontologies) for RDF can be easily (?) defined by anyone. A schema itself is defined in RDF. Commonly used (and sometimes mixed) are RDFS (RDF Schema) and OWL (Web Ontology Language). Besides concepts like ''**Class**'' and ''**Property**'', predicates and value restrictions, an ontology also defines properties of relationships. For example you can define that the relation ''**married**'' between two persons is symetrical; or that the ''**subclass**'' relationship is transitive.+Schemas (or Vocabularies or ontologies) for RDF can be easily (?) defined by anyone. A schema itself is defined in RDF. Commonly used (and sometimes mixed) are [[https://www.w3.org/TR/rdf-schema/|RDFS]] (RDF Schema) and [[https://www.w3.org/2001/sw/wiki/OWL|OWL]] (Web Ontology Language). Besides concepts like ''Class'' and ''Property'', predicates and value restrictions, an ontology also defines properties for relationships. For example you can define that the relation ''married'' between two persons is symetrical; or that the ''subclass'' relationship is transitive.
  
-With this, only basic facts have to be entered into a triple store. Many more facts can get deducted / inferenced. It would be an interesting project to build such an inferencer. I wonder if Gemstone would make a good triple store.+With this, only basic facts have to be entered into a triple store. Many more facts can get deducted / inferenced. It would be an interesting project to build such an inferencer. I wonder if Gemstone would make a good triplestore.
  
 === Modeling === === Modeling ===
  
-At some point, I will want to model my own fields of knowledge: budget data from all communities I can get and an index for PDFs by their technical features. The challenge with inferencing systems is that they can get out of hand easily. Even small knowledge bases can become intractible. I would like to learn how to balance expressiveness with practicality.+At some point, I will want to model my own fields of knowledge: budget data from all communities I can get hold of and an index for PDFs by the technical features they are using. 
 + 
 +The challenge with inferencing systems is that they can get out of hand easily. Even small knowledge bases can become intractible.
  • rdf.1533670624.txt.gz
  • Last modified: 2018/08/07 21:37
  • by christian