Skip to main content

SemanGit: A Linked Dataset from git

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2019 (ISWC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11779))

Included in the following conference series:

Abstract

The growing interest in free and open-source software which occurred over the last decades has accelerated the usage of versioning systems to help developers collaborating together in the same projects. As a consequence, specific tools such as git and specialized open-source on-line platforms gained importance. In this study, we introduce and share SemanGit which provides a resource at the crossroads of both Semantic Web and git web-based version control systems. SemanGit is actually the first collection of linked data extracted from GitHub based on a git ontology we designed and extended to include specific GitHub features. In this article, we present the dataset, describe the extraction process according to the ontology, show some promising analyses of the data and outline how SemanGit could be linked with external datasets or enriched with new sources to allow for more complex analyses.

Resource type: Dataset

Website: http://www.semangit.de/

Permanent URL: https://doi.org/10.5281/zenodo.2176047

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 69.99
Price excludes VAT (USA)
Softcover Book
USD 89.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://lod-cloud.net/.

  2. 2.

    https://github.com/SemanGit/SemanGit/blob/master/Documentation/ontology/semangitontology.ttl.

  3. 3.

    http://visualdataweb.de/webvowl/#opts=doc=0;editorMode=true;#iri=https://raw.githubusercontent.com/SemanGit/SemanGit/master/Documentation/ontology/semangitontology.ttl.

  4. 4.

    Intel Core i7-5820 CPU @ 6 \(\times \) 3.3 GHz, 64 GB DDR3, Ubuntu 18.04.

References

  1. Bitbucket. https://bitbucket.org/. Accessed 16 Aug 2019

  2. Comparison of source code hosting facilities. https://en.wikipedia.org/wiki/Comparison_of_source_code_hosting_facilities. Accessed 16 Aug 2019

  3. GitLab. https://about.gitlab.com/. Accessed 16 Aug 2019

  4. SourceForge. https://sourceforge.net/. Accessed 16 Aug 2019

  5. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Void guide—using the vocabulary of interlinked datasets. Community Draft, voiD Working Group (2009)

    Google Scholar 

  6. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  7. Auer, S., Lehmann, J., Hellmann, S.: LinkedGeoData: adding a spatial dimension to the web of data. In: Bernstein, A., et al. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 731–746. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04930-9_46

    Chapter  Google Scholar 

  8. Eric Prud’hommeaux, G.C.: RDF 1.1 turtle: terse RDF triple language. http://www.w3.org/TR/2014/REC-turtle-20140225/, The latest edition is available at http://www.w3.org/TR/turtle/

  9. Attribution-Sharealike 4.0 International (CC BY-SA 4.0). https://creativecommons.org/licenses/by-sa/4.0/. Accessed 16 Aug 2019

  10. GitHub About. https://github.com/about. Accessed 16 Aug 2019

  11. GitHub REST API. https://developer.github.com/v3/. Accessed 16 Aug 2019

  12. Gousios, G.: The GHTorrent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 233–236. IEEE Press, Piscataway (2013). http://dl.acm.org/citation.cfm?id=2487085.2487132

  13. Lerner, J., Tirole, J.: Some simple economics of open source. J. Ind. Econ. 50(2), 197–234 (2002)

    Article  Google Scholar 

  14. Ley, M.: The DBLP computer science bibliography: evolution, research issues, perspectives. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 1–10. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45735-6_1

    Chapter  Google Scholar 

  15. Lohmann, S., Link, V., Marbach, E., Negru, S.: WebVOWL: web-based visualization of ontologies. In: Lambrix, P., et al. (eds.) EKAW 2014. LNCS (LNAI), vol. 8982, pp. 154–158. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17966-7_21

    Chapter  Google Scholar 

  16. Torvalds, L., Hamano, J.: Git: fast version control system (2010). http://git-scm.com

Download references

Acknowledgment

This work is partly supported by the German Federal Ministry of Education and Research (BMBF) in the context of the research project “Industrial Data Space Plus” (GA 01IS17031) as well as the Fraunhofer Cluster of Excellence “Cognitive Internet Technologies” (CCIT); by the EU H2020 project “QualiChain” (GA 822404); and by the ADAPT Centre for Digital Content Technology funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded under the European Regional Development Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Damien Graux .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kubitza, D.O., Böckmann, M., Graux, D. (2019). SemanGit: A Linked Dataset from git. In: Ghidini, C., et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11779. Springer, Cham. https://doi.org/10.1007/978-3-030-30796-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30796-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30795-0

  • Online ISBN: 978-3-030-30796-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics