Apache ShardingSphere is an open source product originally called Sharding-JDBC developed in China in 2016.
It was renamed to its current name and moved to Apache in 2018.
Originally it only did sharding. Today it does both sharding and proxy.
The sharding module does transparent sharding. For brief intro to sharding see here.
The proxy module exposes either a PostgreSQL or MySQL API and proxies to any database.
This article will cover the sharding module.
Sharding provides increased scalability for databases. In many cases N sharded database servers can support N times as high volume as 1 database server.
The problem with sharding is that often it requires huge changes to the application code (putting the sharding logic into the application itself).
ShardingsSphere provide transparent sharding - the sharding logic is put in a JDBC driver and Java applications can use that driver like any other JDBC driver.
Simple ShardingSphere JDBC sharding works like:
Because ShardingSphere works at the JDBC level then it supports both plain JDBC and all ORM frameworks build on top of JDBC, which practically means all Java based relational database access.
ShardingSphere has specific support for MySQL, PostgreSQL, Oracle DB and MS SQLServer. But in theory it should work with any database having a JDBC driver and support for SQL92.
But see my results at the end.
The key to ShardingSphere is the configuration.
ShardingSpehere can be configured different ways:
This article will use YAML config files.
PostgreSQL:
dataSources:
ds1:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: org.postgresql.Driver
url: jdbc:postgresql://localhost/Test?currentSchema=ds1
username: postgres
password: hemmeligt
ds2:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: org.postgresql.Driver
url: jdbc:postgresql://localhost/Test?currentSchema=ds2
username: postgres
password: hemmeligt
rules:
- !SHARDING
tables:
t:
actualDataNodes: ds${1..2}.t
databaseStrategy:
standard:
shardingColumn: id
shardingAlgorithmName: t_id_inline
shardingAlgorithms:
t_id_inline:
type: INLINE
props:
algorithm-expression: ds${1 + id % 2}
allow-range-query-with-inline-sharding: true
props:
sql-show: false
Note that for PostgreSQL demo I am using two schemas on same database server. In a real life scenario it would ofcourse be two database servers.
MySQL:
dataSources:
ds1:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://localhost/ds1
username: root
password:
ds2:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://localhost/ds2
username: root
password:
rules:
- !SHARDING
tables:
t:
actualDataNodes: ds${1..2}.t
databaseStrategy:
standard:
shardingColumn: id
shardingAlgorithmName: t_id_inline
shardingAlgorithms:
t_id_inline:
type: INLINE
props:
algorithm-expression: ds${1 + id % 2}
allow-range-query-with-inline-sharding: true
props:
sql-show: false
Note that for MySQL demo I am using two databases on same database server. In a real life scenario it would ofcourse be two database servers.
MS SQLServer:
dataSources:
ds1:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: com.microsoft.sqlserver.jdbc.SQLServerDriver
url: jdbc:sqlserver://localhost;database=DS1;integratedSecurity=true;
username: sa
password: hemmeligt
ds2:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: com.microsoft.sqlserver.jdbc.SQLServerDriver
url: jdbc:sqlserver://localhost;database=DS2;integratedSecurity=true;
username: sa
password: hemmeligt
rules:
- !SHARDING
tables:
t:
actualDataNodes: ds${1..2}.t
databaseStrategy:
standard:
shardingColumn: id
shardingAlgorithmName: t_id_inline
shardingAlgorithms:
t_id_inline:
type: INLINE
props:
algorithm-expression: ds${1 + id % 2}
allow-range-query-with-inline-sharding: true
props:
sql-show: false
Note that for MS SQLServer demo I am using two databases on same database server. In a real life scenario it would ofcourse be two database servers.
Oracle DB:
dataSources:
ds1:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: oracle.jdbc.OracleDriver
url: jdbc:oracle:thin:@localhost:1521:xe
username: ds1
password: hemmeligt
ds2:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: oracle.jdbc.OracleDriver
url: jdbc:oracle:thin:@localhost:1521:xe
username: ds2
password: hemmeligt
rules:
- !SHARDING
tables:
t:
actualDataNodes: ds${1..2}.t
databaseStrategy:
standard:
shardingColumn: id
shardingAlgorithmName: t_id_inline
shardingAlgorithms:
t_id_inline:
type: INLINE
props:
algorithm-expression: ds${1 + id % 2}
allow-range-query-with-inline-sharding: true
props:
sql-show: false
Note that for Oracle DB demo I am using two schemas on same database server. In a real life scenario it would ofcourse be two database servers.
IBM DB2:
dataSources:
ds1:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: com.ibm.db2.jcc.DB2Driver
url: jdbc:db2://localhost:50000/Test:currentSchema=DS1;
username: arne
password: hemmeligt
ds2:
dataSourceClassName: org.apache.commons.dbcp2.BasicDataSource
driverClassName: com.ibm.db2.jcc.DB2Driver
url: jdbc:db2://localhost:50000/Test:currentSchema=DS2;
username: arne
password: hemmeligt
rules:
- !SHARDING
tables:
t:
actualDataNodes: ds${1..2}.t
databaseStrategy:
standard:
shardingColumn: id
shardingAlgorithmName: t_id_inline
shardingAlgorithms:
t_id_inline:
type: INLINE
props:
algorithm-expression: ds${1 + id % 2}
allow-range-query-with-inline-sharding: true
props:
sql-show: false
Note that for IBM DB2 demo I am using two schemas on same database server. In a real life scenario it would ofcourse be two database servers.
The algorithm to split rows on two databases/schemas ds1 and ds2 are very simple as rows go to ds${1 + id % 2}.
ShardingSphere is transparent, so all database access is really just standard Java database access as described here.
Insert plain JDBC:
package demo.sharding;
import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.sql.DataSource;
public class InsertJdbc {
public static void main(String[] args) throws SQLException, IOException {
Logger.getLogger("").setLevel(Level.OFF);
DataSource ds = Config.getDataSource();
try(Connection con = ds.getConnection()) {
try(PreparedStatement ins = con.prepareStatement("INSERT INTO t(id,val) VALUES(?,?)")) {
int n = 0;
for(int i = 1; i <= 100; i++) {
ins.setInt(1, i);
ins.setString(2, "Item #" + i);
n += ins.executeUpdate();
}
System.out.printf("%d inserted\n", n);
}
}
}
}
Insert JPA:
package demo.sharding;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.naming.Context;
import javax.naming.InitialContext;
import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
import javax.sql.DataSource;
import org.sapia.ubik.rmi.naming.remote.EmbeddableJNDIServer;
import demo.T;
public class InsertJpa {
public static void main(String[] args) throws Exception {
Logger.getLogger("").setLevel(Level.OFF);
//
EmbeddableJNDIServer jndi = new EmbeddableJNDIServer("localhost", 9999);
jndi.start(false);
//
Context ctx = new InitialContext();
DataSource ds = Config.getDataSource();
ctx.bind("ssds", ds);
//
EntityManagerFactory emf = Persistence.createEntityManagerFactory("test");
EntityManager em = emf.createEntityManager();
int n = 0;
for(int i = 101; i <= 200; i++) {
T o = new T(i, "Item #" + i);
em.getTransaction().begin();
em.persist(o);
em.getTransaction().commit();
n++;
}
System.out.printf("%d inserted\n", n);
em.close();
//
jndi.stop();
}
}
Select plain JDBC:
package demo.sharding;
import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.sql.DataSource;
public class SelectJdbc {
public static void main(String[] args) throws SQLException, IOException {
Logger.getLogger("").setLevel(Level.OFF);
DataSource ds = Config.getDataSource();
try(Connection con = ds.getConnection()) {
try(PreparedStatement sel = con.prepareStatement("SELECT id,val FROM t WHERE id = ?")) {
int n = 0;
for(int i = 0; i < 200; i++) {
sel.setInt(1, i + 1);
try(ResultSet rs = sel.executeQuery()) {
if(rs.next() ) {
int id = rs.getInt(1);
String val = rs.getString(2);
System.out.printf("%d %s\n", id, val);
n++;
}
}
}
System.out.printf("Total individual: %d\n", n);
}
try(PreparedStatement sel = con.prepareStatement("SELECT id,val FROM t")) {
try(ResultSet rs = sel.executeQuery()) {
int n = 0;
while(rs.next() ) {
int id = rs.getInt(1);
String val = rs.getString(2);
System.out.printf("%d %s\n", id, val);
n++;
}
System.out.printf("Total unordered: %d\n", n);
}
}
try(PreparedStatement sel = con.prepareStatement("SELECT id,val FROM t WHERE id BETWEEN ? AND ?")) {
sel.setInt(1, 10);
sel.setInt(2, 20);
try(ResultSet rs = sel.executeQuery()) {
int n = 0;
while(rs.next() ) {
int id = rs.getInt(1);
String val = rs.getString(2);
System.out.printf("%d %s\n", id, val);
n++;
}
System.out.printf("Total interval: %d\n", n);
}
}
try(PreparedStatement sel = con.prepareStatement("SELECT id,val FROM t ORDER BY id")) {
try(ResultSet rs = sel.executeQuery()) {
int n = 0;
while(rs.next() ) {
int id = rs.getInt(1);
String val = rs.getString(2);
System.out.printf("%d %s\n", id, val);
n++;
}
System.out.printf("Total ordered: %d\n", n);
}
}
}
}
}
Select JPA:
package demo.sharding;
import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.naming.Context;
import javax.naming.InitialContext;
import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
import javax.sql.DataSource;
import org.sapia.ubik.rmi.naming.remote.EmbeddableJNDIServer;
import demo.T;
public class SelectJpa {
public static void main(String[] args) throws Exception {
Logger.getLogger("").setLevel(Level.OFF);
//
EmbeddableJNDIServer jndi = new EmbeddableJNDIServer("localhost", 9999);
jndi.start(false);
//
Context ctx = new InitialContext();
DataSource ds = Config.getDataSource();
ctx.bind("ssds", ds);
//
EntityManagerFactory emf = Persistence.createEntityManagerFactory("test");
EntityManager em = emf.createEntityManager();
int n = 0;
for(int i = 0; i < 200; i++) {
T o = em.find(T.class, i + 1);
if(o != null) {
System.out.printf("%d %s\n", o.getId(), o.getVal());
n++;
}
}
System.out.printf("Total individual: %d\n", n);
List<T> lst1 = em.createQuery("SELECT o FROM T AS o", T.class).getResultList();
for(T o : lst1) {
System.out.printf("%d %s\n", o.getId(), o.getVal());
}
System.out.printf("Total unordered: %d\n", lst1.size());
List<T> lst2 = em.createQuery("SELECT o FROM T AS o WHERE o.id BETWEEN :start_id AND :end_id", T.class)
.setParameter("start_id", 10).setParameter("end_id", 20).getResultList();
for(T o : lst2) {
System.out.printf("%d %s\n", o.getId(), o.getVal());
}
System.out.printf("Total interval: %d\n", lst2.size());
List<T> lst3 = em.createQuery("SELECT o FROM T AS o ORDER BY o.id", T.class).getResultList();
for(T o : lst3) {
System.out.printf("%d %s\n", o.getId(), o.getVal());
}
System.out.printf("Total ordered: %d\n", lst3.size());
em.close();
//
jndi.stop();
}
}
JPA uses the data class:
package demo;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.Table;
@Entity
@Table(name="t")
public class T {
private int id;
private String val;
public T() {
this(0, "");
}
public T(int id, String val) {
this.id = id;
this.val = val;
}
@Id
@Column(name="id")
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
@Column(name="val")
public String getVal() {
return val;
}
public void setVal(String val) {
this.val = val;
}
}
and the persistence.xml:
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
version="2.0">
<persistence-unit name="test">
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<non-jta-data-source>ssds</non-jta-data-source>
<class>demo.T</class>
<exclude-unlisted-classes/>
<properties>
<property name="hibernate.dialect" value="org.hibernate.dialect.XxxxDialect"/>
</properties>
</persistence-unit>
</persistence>
The code:
EmbeddableJNDIServer jndi = new EmbeddableJNDIServer("localhost", 9999);
jndi.start(false);
//
Context ctx = new InitialContext();
DataSource ds = Config.getDataSource();
ctx.bind("ssds", ds);
//
...
//
jndi.stop();
may look a bit unusual, but that is just a workaround to be able JPA with a DataSource in Java SE context. In Java EE context JNDI and DataSource's are done in the server configuration.
Let us first note what works and what does not work:
Insert | Select single | Select multiple unordered | Select multiple interval | Select multiple ordered | |
PostgreSQL | OK | OK | OK | OK | OK |
MySQL | OK | OK | OK | OK | OK |
MS SQLServer | OK | OK | OK | OK | OK |
Oracle DB | OK | OK | OK | OK | Exception |
IBM DB2 | OK | OK | OK | OK | Exception |
Note that the non-working functionality (NullPointerException for Oracle DB and IBM DB2 when trying to ORDER BY) may work in some setups - I only know that it did not work in my setup.
Overall I think this is a very interesting product. It is a way to add horizontal scalability to a only vertical scalable relational database with no or minimal changes to the code.
Version | Date | Description |
---|---|---|
1.0 | March 13th 2022 | Initial version |
See list of all articles here
Please send comments to Arne Vajhøj