Recently, working for a new client here at UCI, we began to think in an internal Social Network, focused in educational purposes where students, researchers, teachers and more, could have an online space to discuss trends, make comments, give online talks, etc. My team was in charge to define the Data Model of the platform, and of course, to define the completed strategy to the new Data infrastructure behind the platform. We are a big fans of CentOS (and RHEL of course), for its incredible stability, performance and amazing security features through SELinux, the security project sponsored by Red Hat, available in many Linux distributions today, but in CentOS and Fedora, there is the last policy version (# 28) which is the most updated version of it.
We began to dig in the data model, and we saw an opportunity to create a new kind of data storage solution, using PostgreSQL like the center of the hub; but making interactions with NoSQL solutions focused in fast in-memory read/writes and of course, to be able to store the social interaction among the members of the network like a graph. So, we began to test Redis and Neo4j for these particular tasks. This post is to explain how we did it.
Installing PostgreSQL
Here at UCI, we have a completed mirror of EPEL repository, so, we could installed it easily with a simple:
yum install postgresql-server.x86_64
Then, you should do some configurations before to use PostgreSQL. First, you need to run:
/usr/bin/postgresql-setup initdb
to initiate the PostgreSQL cluster in your machine. After this step, you need to adjust the CentOS 7 firewall to accept connections to the 5432 port:
firewall-cmd — permanent — add-port=5432/tcp firewall-cmd — reload
Then, we can enable to the service to be initiated in the start of the system, and start the service with:
systemctl enable postgresql.service
systemctl start postgresql.service
Then, we need to adjust the /var/lib/pgsql/data/pg_hba.conf file, which is the responsible for authentication to the PostgreSQL server. In my case, I put this:
# “local” is for Unix domain socket connections only local all all peer
local red_social neo4j md5
# IPv4 local connections:
host all all 127.0.0.1/32 ident host all all 10.8.45.191/32 md5
host all all 10.54.20.191/32 md5
host all all 10.35.40.55/32 md5
host red_social neo4j 10.8.45.136/32 md5
and PostgreSQL is ready to work with it. Now, we installed the other systems.
Installing Redis
Installing Redis was even easier, because Redis is part of the EPEL repository too, but it has less config steps than PostgreSQL. The completed sequence of commands was:
yum install redis-server
firewall-cmd — permanent — add-port=6379/tcp firewall-cmd — reload
systemclt enable redis.service
systemclt start redis.service
You can check it doing this:
systemclt status redis.service
and the result should be something similar to this:
Integrating PostgreSQL + Redis using Foreign Data Wrappers (FDW)
Now the integration part. Foreign Data Wrappers in PostgreSQL is part of the implementation of SQL99 standard, particularly SQL/MED, which is focused to interact with remote objects from SQL servers. This is one of the most requested features in the system, and it was introduced in the version 9.1, just with read-only support; and in the version 9.3, was introduced write support. If you make a quick search in the PostgreSQL Extension Network (PGXN) for fdw, you will find many of them. So, I saw the redis_fdw at GitHub, and downloaded it the right branch for my PostgreSQL version (9.2.7). Then, I began the installation of the requirements to install it:
yum install hiredis.x86_64 hiredis-devel.x86_64 gcc.x86_64 postgresql-devel.x86_64
Then, you can compile the code with:
make USE_PGXS=1 make USE_PGXS=1 install
Then, you can load redis_fdw like an extension in PostgreSQL using:
CREATE EXTENSION redis_fdw;
Then, you can use the same example of the GitHub’s page:
CREATE SERVER redis_server FOREIGN DATA WRAPPER redis_fdw OPTIONS (address ‘127.0.0.1’, port ‘6379’);
CREATE FOREIGN TABLE redis_db0 (key text, value text) SERVER redis_server OPTIONS (database ‘0’);
CREATE USER MAPPING FOR PUBLIC SERVER redis_server OPTIONS (password ‘secret’);
CREATE FOREIGN TABLE myredishash (key text, value text[]) SERVER redis_server OPTIONS (database ‘0’, tabletype ‘hash’, tablekeyprefix ‘mytable:’);
I put the same example, because I can’t share the code of the application I’m working, but I want to say that everything works great. I made my external tables in Redis with this example, and until now, everything works in a great shape.
Installing Neo4j
Now, for Neo4j, I tried to use the same approach to use the FDW, but when I compiled the code, nothing worked as expected. So, I tried a new approach, based in Python. To connect from Python to Neo4j, there are a lot of libraries, so I picked py2neo for the job.
To install Neo4j in CentOS is very simple. Download it from the its site, and extract in the system. You have to install Java first. Neo4j’s community recommends to use Oracle’s JDK version, not OpenJDK. In my systems, I have OpenJDK, but this is only for tests; I recommend you that you must use Oracle’s JDK for production systems:
wget http://neo4j.org/downloads/neo4j-2.1M-community.tar.gz mkdir /opt/neo4j tar xvf neo4j-2.1M-community.tar.gz
The main configuration files are: neo4j-server.properties, neo4j.properties and neo4j-wrapper.properties, so open these files and see how it works. To start the service, you can use:
cd /opt/neo4j & bin/neo4j start
Then, you can use the browser to see Neo4j UI admin site using http://localhost:7474/browser link. You should see something like this:
This site is incredible, so embrace it and use it for your needs.
Now, to install the Python’s module py2neo, which you can do it with:
pip install py2neo
PL/Python to the rescue !!!
So, to test the integration between the two systems, I created a simple function to create a node in Neo4j through PostgreSQL. First, you mus install PL/Python in your system:
yum install postgresql-plpython.x86_64
Then, you must do the same thing with redis_fdw, loading PL/Python inside your database with:
CREATE EXTENSION plpythonu;
Then, you can test with this function:
CREATE OR REPLACE FUNCTION create_teacher_node(teacher_name text) RETURNS text AS $BODY$
from py2neo import Graph
from py2neo.cypher import CypherTransaction
graph = Graph(“http://neo4j:neo4j@localhost:7474/db/data")
tx = graph.cypher.begin()
statement = “CREATE (person:Teacher {name:{name}}) RETURN person” tx.append(statement, parameters={‘name’: teacher_name})
teacher= tx.commit()
return teacher
$BODY$ LANGUAGE plpythonu VOLATILE COST 100; ALTER FUNCTION create_teacher_node(text) OWNER TO postgres;
Then, you can check it out in the Neo4j browser if the node was created. This is a very simple function to probe the concept, but the real application does more things. Cypher’s language, the query language in Neo4j is very powerful to make a lot of things, and py2neo has a lot of very good functions to make use of them. It’s a very completed solution to interact with Neo4j using Python.
Some notes about SELinux
To avoid the infamous SocketError [Errno 13] Permission denied in Python, you need to make some changes in CentOS 7 if you have SELinux activated (which is the right approach). This error is raised because PostgreSQL is trying to bind a different port (in this case: 7474, which is the port of Neo4j) of the port that it have configured in the default security policy in SELinux. To enable this, you must make one change in the policy:
setsebool nis_enabled on
This change in the policy allows to other processes in the system to bind sockets in a different port. Like I said to my team, I encourage to enable SELinux, because it’s an amazing way to protect your CentOS systems from malicious users, and the cost to learn how to do it, always will be less comparing with the costs that implies a data security breach in the system, so:
DON’T DISABLE SELINUX AND EMBRACE IT.
Conclusions
I have to say that the systems are working very well with this approach; and the clients are very happy with the results. I just want to finish with an idea: If you need to this in one system, you can use the last version of DataStax Enterprise 4.7, which it has very interesting features like in-memory analytics and graph storage using TitanDB. Unfortunately, we are in Cuba, and we can’t use products from U.S companies for embargo laws, but you can do it. Thanks again for reading and let me know if you have any question.