9. Appendix: creating server mdtmconfig.json

mdtmconfig.json configures a mdtmFTP server’s parameters. It is used for mdtmFTP server with versions >= 1.1.1. The configuration file should be located at mdtmFTP server’s working directory.

9.1. Topology section

The syntax is defined as:

"topology": [
     {
      "type" : Device_Type,
      "name" : Device_Name,
      "numa" : Numa_ID
     },
     ...
 ]

Device_Type refers to MDTM device type. MDTM defines three types of devices: network, block, and virtual.

  • Network refers to a network I/O device.

  • Block refers to a storage/disk I/O device.

  • Virtual refers to a virtual device, which is defined particularly for mdtmFTP server.

Numa_ID sets which NUMA node a device belongs to (i.e., NUMA location).

Device_Name specifies a device name.

MDTM middleware is typically able to detect physical I/O devices and their locations (i.e., which NUMA node that a I/O device belongs to) on a NUMA system. However, there are two cases that MDTM middleware cannot detect physical I/O devices or their locations correctly:

  1. In a fully virtualized environment, where information on physical I/O devices is not exposed to guest OS.

  2. Some vendors’ I/O devices may not comply to OS rules to expose device information properly.

Under these conditions, system admin should manually configure I/O devices and their NUMA locations.

Virtual device is defined particularly for mdtmFTP server to monitor data transfer status. mdtmFTP server spawns a dedicated management thread to collect and record data transfer statistics. The management thread is associated with a virtual device, which will be pinned to a specified NUMA node.

9.2. Online section

The syntax is defined as:

"online": [
            Device_Name1,
            Device_Name2,
            ...
          ]

This section specifies the I/O devices that are assigned for data transfer.

For example, assume a DTN has the following I/O devices:

  • Ethernet NIC devices

    • eth0 – configured for management access

    • eth1 – configured for WAN data transfer

  • Block I/O devices

    • /dev/sda – system disk

    • /dev/sdb – data repository for WAN data transfer

In this case, the online section would be defined as:

<Online>
  <Device>eth1</Device>
  <Device>sdb</Device>
</Online>
  • For network I/O devices, a user can run ifconfig to list network I/O devices available on the system.

  • For storage/disk IO devices, a user can run lsblk to list storage/disk I/O devices available on the system; and then run df to find out on which storage/disk I/O devices that a data transfer folder will be located.

    Assuming that a DTN system’s lsblk output is:

    $ lsblk
    NAME                          MAJ:MIN  RM  SIZE RO  TYPE MOUNTPOINT
    sda                             8:0     0  1.8T  0  disk
    ├─sda1                          8:1     0  500M  0  part /boot
    └─sda2                          8:2     0  1.8T  0  part
    ├─scientific_bde1-root          253:0   0  50G   0  lvm  /
    ├─scientific_bde1-swap          253:1   0  4G    0  lvm  [SWAP]
    └─scientific_bde1-home          253:2   0  1.8T  0  lvm  /home
    loop0                           7:0     0  100G  0  loop
    └─docker-253:0-203522131-pool   253:3   0  100G  0  dm
    loop1                           7:1     0  2G    0  loop
    └─docker-253:0-203522131-pool   253:3   0  100G  0  dm
    nvme0n1                         259:0   0  1.1T  0  disk /data1
    

    And df output is:

    $ df
    Filesystem                       1K-blocks  Used       Available  Use% Mounted on
    /dev/mapper/scientific_bde1-root 52403200   15999428   36403772   31%  /
    devtmpfs                         65855232   0          65855232   0%   /dev
    /dev/nvme0n1                     1153584388 104952744  990009612  10%  /data1
    /dev/mapper/scientific_bde1-home 1895386900 23602284   1871784616 2%   /home
    /dev/sda1                        508588     376264     132324     74%  /boot
    

If /data1 is used as data transfer folder, the corresponding storage/disk I/O device is nvme0n1.

9.3. Thread section

The syntax is defined as:

"threads": [
               {
                    "type" : "Device_Type",
                    "name" : "Device_Name",
                    "threads" : Num
               },
               ...
 ]

This section defines the number of threads that needs to be allocated for an I/O device. The number of threads allocated for an I/O device should be proportional to the device’s I/O bandwidth. The rule of thumb is that a thread can handle an I/O rate of 10Gbps. For example, four threads should be allocated for a 40GE NIC while one thread be allocated for a 10GE NIC.

Default_Num sets the default number of threads allocated for each I/O device.

If a different number of threads should be allocated for a particular I/O device, a separate entry for the device should to be specified here.

A virtual device should be allocated with 1 thread.

9.4. File section

The syntax is defined as:

"filesegment": File_Size_Threshold

MDTM splits a large file into segments, which are spread to different threads for disk and network operations to increase performance.

File_Size_Threshold sets a file size threshold. A file with a size that exceeds the threshold will be split into multiple segments, which are spread across I/O threads to be transferred in parallel.

9.5. Manually_configured_cpus section

The syntax is defined as:

"manually_configured_cpus" : {
         "storage" : [CPU_index,...],
         "network" : [CPU_index,...]
     }

This section allows users to manually specify core(s) for mdtmFTP I/O threads. It is optional. In some cases, experienced users may want to manually configure cores for mdtmFTP I/O threads to achieve optimum performance.

  • If manually_configured_cpus is not configured, mdtmFTP calls MDTM middleware scheduling service to schedule cores for its threads. For each I/O thread, MDTM middleware first selects a core near the I/O device (e.g., NIC or disk) the thread uses, and then pins the thread to the chosen core.

  • If manually_configured_cpus is configured, mdtmFTP will bypass its normal core scheduling mechanisms. Instead, it assigns and binds its I/O threads to the cores specified in manually_configured_cpus one by one.

9.6. Server section

The syntax is defined as:

blocksize BLOCK_SIZE
direct DIRECTIO_FLAG
splice SPLICE_FLAG
monitor MONITOR_FLAG
  • blocksize sets the block size for disk I/O operations. The block size should be 4K or multiple of 4k (e.g. 4M).

  • direct enables direct I/O. When direct I/O is enabled, file reads and writes go directly from mdtmFTP to the storage device(s), bypassing the OS R/W caches. For bulk data transfer, enabling direct I/O would improve performance.

  • splice enables zero-copy by using the Linux splice mechanism.

  • monitor enables MDTM monitoring.

Note

splice is an experimental feature that may not function well in some systems. You can turn this feature off by setting splice to 0.

9.7. Example

  • A sample mdtmconfig.json without manually_configured_cpus:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
     "topology": [
        {
            "type" : "block",
            "name" : "nvme0n1",
            "numa" : "0"
        }
     ],
     "online": [
         "enp4s0f0",
         "nvme0n1"
     ],
     "threads": [
         {
              "type" : "network",
              "name" : "enp4s0f0",
              "threads" : 2
         },
         {
              "type" : "block",
              "name" : "nvme0n1",
              "threads" : 2
         }
     ],
     "filesegment": "2G",
     "server" : [
         "blocksize 4194304",
         "direct 0",
         "splice 0"
     ]
  }
  • A sample mdtmconfig.json with manually_configured_cpus:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{
     "topology": [
        {
            "type" : "block",
            "name" : "nvme0n1",
            "numa" : "0"
        }
     ],
     "online": [
         "enp4s0f0",
         "nvme0n1"
     ],
     "threads": [
         {
              "type" : "network",
              "name" : "enp4s0f0",
              "threads" : 2
         },
         {
              "type" : "block",
              "name" : "nvme0n1",
              "threads" : 2
         }
     ],
     "filesegment": "2G",
     "manually_configured_cpus" : {
         "storage" : [0, 1, 2, 3],
         "network" : [4, 5, 6, 7]
     },
     "server" : [
         "blocksize 4194304",
         "direct 0",
         "splice 0"
     ]
  }

In general, you need not configure manually_configured_cpus.