ShardingSphere

Content:

Introduction
Concept
Config
Code
Evaluation

Introduction:

Apache ShardingSphere is an open source product originally called Sharding-JDBC developed in China in 2016.

It was renamed to its current name and moved to Apache in 2018.

Originally it only did sharding. Today it does both sharding and proxy.

The sharding module does transparent sharding. For brief intro to sharding see here.

The proxy module exposes either a PostgreSQL or MySQL API and proxies to any database.

This article will cover the sharding module.

Concept:

Sharding provides increased scalability for databases. In many cases N sharded database servers can support N times as high volume as 1 database server.

The problem with sharding is that often it requires huge changes to the application code (putting the sharding logic into the application itself).

ShardingsSphere provide transparent sharding - the sharding logic is put in a JDBC driver and Java applications can use that driver like any other JDBC driver.

Simple ShardingSphere JDBC sharding works like:

a config file defines:
- database servers to use (as JDBC connection info)
- tables to split out on database servers
- algorithm used to distribute rows on database servers
a DataSource get created from the config file
INSERT get routed to the correct database server
SELECT retrieve data from all database servers and append result sets

Because ShardingSphere works at the JDBC level then it supports both plain JDBC and all ORM frameworks build on top of JDBC, which practically means all Java based relational database access.

ShardingSphere has specific support for MySQL, PostgreSQL, Oracle DB and MS SQLServer. But in theory it should work with any database having a JDBC driver and support for SQL92.

But see my results at the end.

Config:

The key to ShardingSphere is the configuration.

ShardingSpehere can be configured different ways:

YAML config file
programmatic with Java code
via Spring

This article will use YAML config files.

PostgreSQL:

dataSources:
  ds1:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: org.postgresql.Driver
    url: jdbc:postgresql://localhost/Test?currentSchema=ds1
    username: postgres
    password: hemmeligt
  ds2:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: org.postgresql.Driver
    url: jdbc:postgresql://localhost/Test?currentSchema=ds2
    username: postgres
    password: hemmeligt

rules:
- !SHARDING
  tables:
    t: 
      actualDataNodes: ds${1..2}.t
      databaseStrategy:
        standard:
          shardingColumn: id
          shardingAlgorithmName: t_id_inline
  shardingAlgorithms:
    t_id_inline:
      type: INLINE
      props:
        algorithm-expression: ds${1 + id % 2}
        allow-range-query-with-inline-sharding: true

props:
  sql-show: false

Note that for PostgreSQL demo I am using two schemas on same database server. In a real life scenario it would ofcourse be two database servers.

MySQL:

dataSources:
  ds1:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://localhost/ds1
    username: root
    password: 
  ds2:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://localhost/ds2
    username: root
    password: 

rules:
- !SHARDING
  tables:
    t: 
      actualDataNodes: ds${1..2}.t
      databaseStrategy:
        standard:
          shardingColumn: id
          shardingAlgorithmName: t_id_inline
  shardingAlgorithms:
    t_id_inline:
      type: INLINE
      props:
        algorithm-expression: ds${1 + id % 2}
        allow-range-query-with-inline-sharding: true

props:
  sql-show: false

Note that for MySQL demo I am using two databases on same database server. In a real life scenario it would ofcourse be two database servers.

MS SQLServer:

dataSources:
  ds1:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: com.microsoft.sqlserver.jdbc.SQLServerDriver
    url: jdbc:sqlserver://localhost;database=DS1;integratedSecurity=true; 
    username: sa
    password: hemmeligt
  ds2:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: com.microsoft.sqlserver.jdbc.SQLServerDriver
    url: jdbc:sqlserver://localhost;database=DS2;integratedSecurity=true;
    username: sa
    password: hemmeligt

rules:
- !SHARDING
  tables:
    t: 
      actualDataNodes: ds${1..2}.t
      databaseStrategy:
        standard:
          shardingColumn: id
          shardingAlgorithmName: t_id_inline
  shardingAlgorithms:
    t_id_inline:
      type: INLINE
      props:
        algorithm-expression: ds${1 + id % 2}
        allow-range-query-with-inline-sharding: true

props:
  sql-show: false

Note that for MS SQLServer demo I am using two databases on same database server. In a real life scenario it would ofcourse be two database servers.

Oracle DB:

dataSources:
  ds1:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: oracle.jdbc.OracleDriver
    url: jdbc:oracle:thin:@localhost:1521:xe
    username: ds1
    password: hemmeligt
  ds2:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: oracle.jdbc.OracleDriver
    url: jdbc:oracle:thin:@localhost:1521:xe
    username: ds2
    password: hemmeligt

rules:
- !SHARDING
  tables:
    t: 
      actualDataNodes: ds${1..2}.t
      databaseStrategy:
        standard:
          shardingColumn: id
          shardingAlgorithmName: t_id_inline
  shardingAlgorithms:
    t_id_inline:
      type: INLINE
      props:
        algorithm-expression: ds${1 + id % 2}
        allow-range-query-with-inline-sharding: true

props:
  sql-show: false

Note that for Oracle DB demo I am using two schemas on same database server. In a real life scenario it would ofcourse be two database servers.

IBM DB2:

dataSources:
  ds1:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: com.ibm.db2.jcc.DB2Driver
    url: jdbc:db2://localhost:50000/Test:currentSchema=DS1;
    username: arne
    password: hemmeligt
  ds2:
    dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
    driverClassName: com.ibm.db2.jcc.DB2Driver
    url: jdbc:db2://localhost:50000/Test:currentSchema=DS2;
    username: arne
    password: hemmeligt

rules:
- !SHARDING
  tables:
    t: 
      actualDataNodes: ds${1..2}.t
      databaseStrategy:
        standard:
          shardingColumn: id
          shardingAlgorithmName: t_id_inline
  shardingAlgorithms:
    t_id_inline:
      type: INLINE
      props:
        algorithm-expression: ds${1 + id % 2}
        allow-range-query-with-inline-sharding: true

props:
  sql-show: false

Note that for IBM DB2 demo I am using two schemas on same database server. In a real life scenario it would ofcourse be two database servers.

The algorithm to split rows on two databases/schemas ds1 and ds2 are very simple as rows go to ds${1 + id % 2}.

Code:

ShardingSphere is transparent, so all database access is really just standard Java database access as described here.

Insert plain JDBC:

package demo.sharding;

import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.sql.DataSource;

public class InsertJdbc {
    public static void main(String[] args) throws SQLException, IOException {
        Logger.getLogger("").setLevel(Level.OFF);
        DataSource ds = Config.getDataSource();
        try(Connection con = ds.getConnection()) { 
            try(PreparedStatement ins = con.prepareStatement("INSERT INTO t(id,val) VALUES(?,?)")) {
                int n = 0;
                for(int i = 1; i <= 100; i++) {
                    ins.setInt(1,  i);
                    ins.setString(2, "Item #" + i);
                    n += ins.executeUpdate();
                }
                System.out.printf("%d inserted\n", n);
            }
        }
    }
}

Insert JPA:

package demo.sharding;

import java.util.logging.Level;
import java.util.logging.Logger;

import javax.naming.Context;
import javax.naming.InitialContext;
import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
import javax.sql.DataSource;

import org.sapia.ubik.rmi.naming.remote.EmbeddableJNDIServer;

import demo.T;

public class InsertJpa {
    public static void main(String[] args) throws Exception {
        Logger.getLogger("").setLevel(Level.OFF);
        //
        EmbeddableJNDIServer jndi = new EmbeddableJNDIServer("localhost", 9999);
        jndi.start(false);
        //
        Context ctx = new InitialContext();
        DataSource ds = Config.getDataSource();
        ctx.bind("ssds", ds);
        //
        EntityManagerFactory emf = Persistence.createEntityManagerFactory("test");
        EntityManager em = emf.createEntityManager();
        int n = 0;
        for(int i = 101; i <= 200; i++) {
            T o = new T(i,  "Item #" + i);
            em.getTransaction().begin();
            em.persist(o);
            em.getTransaction().commit();
            n++;
        }
        System.out.printf("%d inserted\n", n);
        em.close();
        //
        jndi.stop();
    }
}

Select plain JDBC:

package demo.sharding;

import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.sql.DataSource;

public class SelectJdbc {
    public static void main(String[] args) throws SQLException, IOException {
        Logger.getLogger("").setLevel(Level.OFF);
        DataSource ds = Config.getDataSource();
        try(Connection con = ds.getConnection()) {
            try(PreparedStatement sel = con.prepareStatement("SELECT id,val FROM t WHERE id = ?")) {
                int n = 0;
                for(int i = 0; i < 200; i++) {
                    sel.setInt(1, i + 1);
                    try(ResultSet rs = sel.executeQuery()) {
                        if(rs.next() ) {
                            int id = rs.getInt(1);
                            String val = rs.getString(2);
                            System.out.printf("%d %s\n", id, val);
                            n++;
                        }
                    }
                }
                System.out.printf("Total individual: %d\n", n);
            }
            try(PreparedStatement sel = con.prepareStatement("SELECT id,val FROM t")) {
                try(ResultSet rs = sel.executeQuery()) {
                    int n = 0;
                    while(rs.next() ) {
                        int id = rs.getInt(1);
                        String val = rs.getString(2);
                        System.out.printf("%d %s\n", id, val);
                        n++;
                    }
                    System.out.printf("Total unordered: %d\n", n);
                }
            }
            try(PreparedStatement sel = con.prepareStatement("SELECT id,val FROM t WHERE id BETWEEN ? AND ?")) {
                sel.setInt(1, 10);
                sel.setInt(2, 20);
                try(ResultSet rs = sel.executeQuery()) {
                    int n = 0;
                    while(rs.next() ) {
                        int id = rs.getInt(1);
                        String val = rs.getString(2);
                        System.out.printf("%d %s\n", id, val);
                        n++;
                    }
                    System.out.printf("Total interval: %d\n", n);
                }
            }
            try(PreparedStatement sel = con.prepareStatement("SELECT id,val FROM t ORDER BY id")) {
                try(ResultSet rs = sel.executeQuery()) {
                    int n = 0;
                    while(rs.next() ) {
                        int id = rs.getInt(1);
                        String val = rs.getString(2);
                        System.out.printf("%d %s\n", id, val);
                        n++;
                    }
                    System.out.printf("Total ordered: %d\n", n);
                }
            }
        }
    }
}

Select JPA:

package demo.sharding;

import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.naming.Context;
import javax.naming.InitialContext;
import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
import javax.sql.DataSource;

import org.sapia.ubik.rmi.naming.remote.EmbeddableJNDIServer;

import demo.T;

public class SelectJpa {
    public static void main(String[] args) throws Exception {
        Logger.getLogger("").setLevel(Level.OFF);
        //
        EmbeddableJNDIServer jndi = new EmbeddableJNDIServer("localhost", 9999);
        jndi.start(false);
        //
        Context ctx = new InitialContext();
        DataSource ds = Config.getDataSource();
        ctx.bind("ssds", ds);
        //
        EntityManagerFactory emf = Persistence.createEntityManagerFactory("test");
        EntityManager em = emf.createEntityManager();
        int n = 0;
        for(int i = 0; i < 200; i++) {
            T o = em.find(T.class, i + 1);
            if(o != null) {
                System.out.printf("%d %s\n", o.getId(), o.getVal());
                n++;
            }
        }
        System.out.printf("Total individual: %d\n", n);
        List<T> lst1 = em.createQuery("SELECT o FROM T AS o", T.class).getResultList();
        for(T o : lst1) {
            System.out.printf("%d %s\n", o.getId(), o.getVal());
        }
        System.out.printf("Total unordered: %d\n", lst1.size());
        List<T> lst2 = em.createQuery("SELECT o FROM T AS o WHERE o.id BETWEEN :start_id AND :end_id", T.class)
                         .setParameter("start_id", 10).setParameter("end_id", 20).getResultList();
        for(T o : lst2) {
            System.out.printf("%d %s\n", o.getId(), o.getVal());
        }
        System.out.printf("Total interval: %d\n", lst2.size());
        List<T> lst3 = em.createQuery("SELECT o FROM T AS o ORDER BY o.id", T.class).getResultList();
        for(T o : lst3) {
            System.out.printf("%d %s\n", o.getId(), o.getVal());
        }
        System.out.printf("Total ordered: %d\n", lst3.size());
        em.close();
        //
        jndi.stop();
    }
}

JPA uses the data class:

package demo;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.Table;

@Entity
@Table(name="t")
public class T {
    private int id;
    private String val;
    public T() {
        this(0, "");
    }
    public T(int id, String val) {
        this.id = id;
        this.val = val;
    }
    @Id
    @Column(name="id")
    public int getId() {
        return id;
    }
    public void setId(int id) {
        this.id = id;
    }
    @Column(name="val")
    public String getVal() {
        return val;
    }
    public void setVal(String val) {
        this.val = val;
    }
}

and the persistence.xml:

<persistence xmlns="http://java.sun.com/xml/ns/persistence"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
             version="2.0">
   <persistence-unit name="test">
      <provider>org.hibernate.ejb.HibernatePersistence</provider>
      <non-jta-data-source>ssds</non-jta-data-source>
      <class>demo.T</class>
      <exclude-unlisted-classes/>
      <properties>
        <property name="hibernate.dialect" value="org.hibernate.dialect.XxxxDialect"/>
      </properties>
   </persistence-unit>
</persistence>

The code:

        EmbeddableJNDIServer jndi = new EmbeddableJNDIServer("localhost", 9999);
        jndi.start(false);
        //
        Context ctx = new InitialContext();
        DataSource ds = Config.getDataSource();
        ctx.bind("ssds", ds);
        //
        ...
        //
        jndi.stop();

may look a bit unusual, but that is just a workaround to be able JPA with a DataSource in Java SE context. In Java EE context JNDI and DataSource's are done in the server configuration.

Evaluation:

Let us first note what works and what does not work:

	Insert	Select single	Select multiple unordered	Select multiple interval	Select multiple ordered
PostgreSQL	OK	OK	OK	OK	OK
MySQL	OK	OK	OK	OK	OK
MS SQLServer	OK	OK	OK	OK	OK
Oracle DB	OK	OK	OK	OK	Exception
IBM DB2	OK	OK	OK	OK	Exception

Note that the non-working functionality (NullPointerException for Oracle DB and IBM DB2 when trying to ORDER BY) may work in some setups - I only know that it did not work in my setup.

Overall I think this is a very interesting product. It is a way to add horizontal scalability to a only vertical scalable relational database with no or minimal changes to the code.

Article history:

Version	Date	Description
1.0	March 13th 2022	Initial version

Comments:

Please send comments to Arne Vajhøj