WARNING: this is old. Very very old! It shouldn't work and information here is vely very wrong as of 2022. Apologies for that. Will leave repo live for now for historic reasons, but yeah; please don't expect this to work anymore.
NOTE: it goes without saying that you need a properly configured JDK in your environment.
This is a hands on guide for JanusGraph. It is organized in sections (each folder is an independent project with a section) and it is expected you follow each guide in order.
Every code here assumes you are running JanusGraph 0.1.0 locally.
You should be ashamed. BUT, here is a shortcut:
./start_janus.sh
This is fairly simple; just download janus and tell it to start up.
$ wget https://github.com/JanusGraph/janusgraph/releases/download/v0.1.0/janusgraph-0.1.0-hadoop2.zip
$ unzip janusgraph-0.1.0-hadoop2.zip
$ cd janusgraph-0.1.0-hadoop2/
$ ./bin/janusgraph.sh start
The last command should output:
Forking Cassandra...
Running `nodetool statusthrift`.. OK (returned exit status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch (127.0.0.1:9300)... OK (connected to 127.0.0.1:9300).
Forking Gremlin-Server...
Connecting to Gremlin-Server (127.0.0.1:8182)..... OK (connected to 127.0.0.1:8182).
Run gremlin.sh to connect.
Meaning you have cassandra and elasticsearch listening on the loopback interface. This is important for the examples to work.
If you need to clean your data:
- stop janus graph
rm -rf db
- start janus graph
It is also recommended that you read:
- GraphDB - diving into JanusGraph part 1 (3 min read)
- GraphDB - diving into JanusGraph part 2 (4 min read)
I wrote this guide after trying to find my way through this technology. I had to learn it because the traditional tools were not enough for the kind of data processing required in the task assigned to me.
JanusGraph has proven to be a solid and reliable solution to our project and I hope this guide is useful for you.
This is by no means a complete guide to JanusGraph. But I believe that following this using the official documentation as a reference is enough framework for you to really dive into this technology.
On this tutorial we will build the backend database of a twitter clone. The sections are divided into:
- basic schema
- data loading
- querying
- hadoop integration
- indexing for performance
By the end of this tutorial you should be able to design your own (very simple but functional) database backend using JanusGraph.
There is also a last section included with some recommended experiments for after you are done.
Every Java code depends on the main schema class. This is a design decision to reuse code and have more consistency in naming. Also, by doing so, we avoid usage of hard coded Strings as much as possible.
To ease your life, there is a simple shell script in each section called run.sh
. This will build and evoke the example code for you.
We are using the standard gradle application plugin naming conventions on Java projects; this means that we have the folders:
/src/main/
dist
resources
java
Inside dist
you will find the JanusGraph configuration files. Each section has its own files. In resources
there is the log4j.properties
file. And java
contains the implementation.
In our ruby example codes we are relying on:
- RVM: for ruby version management (if you use someting different, please prepare your env).
- bundler (
gem install bundler
): for dependency management. - gremlin driver gem: a really simple driver in ruby for JanusGraph.