Teradata-Gimel-API-Benchmarking
Small Dataset- Yelp dataset
Gimel Teradata Write API is tested for Yelp Tip Dataset. The API is tested with varying number of records written into Teradata from 10000 to 5M. The average performnace is measured by taking average of 3 iterations.(Itr=Iteration)
Following parameters are varied while benchmarking:
- BATCHSIZE (kept constant for this case, 10000)
- Number of executors
- Number of executor cores
- Write type(BATCH/FASTLOAD)
Batchsize=10000 | Executor Cores=1 | Executors=5
Batchsize=10000 | Executor Cores=1 | Executors=10
Batchsize=10000 | Executor Cores=1 | Executors=15
Batchsize=10000 | Executor Cores=4 | Executors=10
FASTLOAD | Executor Cores=1 | Executors=1
Average case performance for all of the above settings (Records vs Time-in-seconds)
(NOTE: BATCHSIZE | NUM_CORES | EXECUTORS
)
Records | 10000-1-5 | 10000-1-10 | 10000-1-15 | 10000-4-10 | FASTLOAD-4-10 |
---|---|---|---|---|---|
10000 | 4.22 | 6.37 | 4.10 | 5.27 | 17.39 |
20000 | 2.96 | 3.19 | 2.99 | 3.05 | 14.64 |
30000 | 3.16 | 3.38 | 4.14 | 2.92 | 15.12 |
40000 | 3.92 | 4.47 | 4.00 | 5.18 | 14.80 |
50000 | 4.58 | 4.39 | 4.62 | 4.50 | 14.79 |
60000 | 5.29 | 4.88 | 5.28 | 4.73 | 14.99 |
70000 | 6.28 | 6.80 | 6.10 | 5.32 | 14.91 |
80000 | 6.55 | 7.16 | 7.30 | 7.19 | 15.50 |
90000 | 7.38 | 7.76 | 7.45 | 7.42 | 15.13 |
100000 | 9.23 | 7.93 | 8.80 | 7.03 | 15.40 |
200000 | 15.57 | 13.88 | 15.84 | 14.00 | 16.58 |
300000 | 23.43 | 21.85 | 23.25 | 17.70 | 18.23 |
400000 | 30.25 | 28.20 | 28.83 | 17.92 | 18.97 |
500000 | 22.73 | 36.20 | 35.87 | 18.59 | 19.95 |
600000 | 24.45 | 41.97 | 41.78 | 18.92 | 21.10 |
700000 | 25.05 | 50.67 | 44.62 | 19.69 | 22.09 |
800000 | 24.67 | 54.64 | 41.59 | 25.33 | 23.08 |
900000 | 30.15 | 54.75 | 38.25 | 27.42 | 24.09 |
1000000 | 29.94 | 29.45 | 41.73 | 29.67 | 25.71 |
2000000 | 43.18 | 27.39 | 29.54 | 58.39 | 35.48 |
3000000 | 67.65 | 35.90 | 29.92 | 83.46 | 43.73 |
4000000 | 85.72 | 46.26 | 37.14 | 114.20 | 57.68 |
5000000 | 102.13 | 52.30 | 43.45 | 131.12 | 63.70 |
Inference
From the plots, there are two sharp points for Batch writes to Teradata i.e. Records size=10000 & Records size=1M. After 1M, as the increasing number of executors makes write into Teradata faster.
Write into Teradata with FASTLOAD has sharp point at 1M, till then time required for write remains almost steady.
It is clearly observed that as we increase the parallelism i.e. increase in the number of executors to write into Teradata in batch mode, performace gets better for more records.
Big Dataset
We tested Gimel Teradata Read/Write API on big dataset/table. Below are some details about data and benchmarking.
About table
This table stores payment metrics and attributes. It has one record per payment transaction. It has 212 columns with various data types. The table has primary key with bigint type.
Size of data
Records in Millions | Size of Data in Gb |
---|---|
1M | 0.987g |
5M | 4.6g |
10M | 9.2g |
25M | 21.5g |
45M | 44.4g |
75M | 74.56g |