9. Appendix: creating server mdtmconfig.json¶
mdtmconfig.json
configures a mdtmFTP server’s parameters.
It is used for mdtmFTP server with versions >= 1.1.1.
The configuration file should be located at mdtmFTP server’s working directory.
9.1. Topology section¶
The syntax is defined as:
"topology": [
{
"type" : Device_Type,
"name" : Device_Name,
"numa" : Numa_ID
},
...
]
Device_Type
refers to MDTM device type. MDTM defines three types
of devices: network, block, and virtual.
Network
refers to a network I/O device.Block
refers to a storage/disk I/O device.Virtual
refers to a virtual device, which is defined particularly for mdtmFTP server.
Numa_ID
sets which NUMA node a device belongs to (i.e., NUMA location).
Device_Name
specifies a device name.
MDTM middleware is typically able to detect physical I/O devices and their locations (i.e., which NUMA node that a I/O device belongs to) on a NUMA system. However, there are two cases that MDTM middleware cannot detect physical I/O devices or their locations correctly:
In a fully virtualized environment, where information on physical I/O devices is not exposed to guest OS.
Some vendors’ I/O devices may not comply to OS rules to expose device information properly.
Under these conditions, system admin should manually configure I/O devices and their NUMA locations.
Virtual
device is defined particularly for mdtmFTP server to
monitor data transfer status. mdtmFTP server spawns a dedicated
management thread to collect and record data transfer statistics. The
management thread is associated with a virtual device, which will be
pinned to a specified NUMA node.
9.2. Online section¶
The syntax is defined as:
"online": [
Device_Name1,
Device_Name2,
...
]
This section specifies the I/O devices that are assigned for data transfer.
For example, assume a DTN has the following I/O devices:
Ethernet NIC devices
eth0 – configured for management access
eth1 – configured for WAN data transfer
Block I/O devices
/dev/sda – system disk
/dev/sdb – data repository for WAN data transfer
In this case, the online section would be defined as:
<Online>
<Device>eth1</Device>
<Device>sdb</Device>
</Online>
For network I/O devices, a user can run
ifconfig
to list network I/O devices available on the system.For storage/disk IO devices, a user can run
lsblk
to list storage/disk I/O devices available on the system; and then rundf
to find out on which storage/disk I/O devices that a data transfer folder will be located.Assuming that a DTN system’s
lsblk
output is:$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 500M 0 part /boot └─sda2 8:2 0 1.8T 0 part ├─scientific_bde1-root 253:0 0 50G 0 lvm / ├─scientific_bde1-swap 253:1 0 4G 0 lvm [SWAP] └─scientific_bde1-home 253:2 0 1.8T 0 lvm /home loop0 7:0 0 100G 0 loop └─docker-253:0-203522131-pool 253:3 0 100G 0 dm loop1 7:1 0 2G 0 loop └─docker-253:0-203522131-pool 253:3 0 100G 0 dm nvme0n1 259:0 0 1.1T 0 disk /data1
And
df
output is:$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/scientific_bde1-root 52403200 15999428 36403772 31% / devtmpfs 65855232 0 65855232 0% /dev /dev/nvme0n1 1153584388 104952744 990009612 10% /data1 /dev/mapper/scientific_bde1-home 1895386900 23602284 1871784616 2% /home /dev/sda1 508588 376264 132324 74% /boot
If /data1
is used as data transfer folder, the corresponding
storage/disk I/O device is nvme0n1
.
9.3. Thread section¶
The syntax is defined as:
"threads": [
{
"type" : "Device_Type",
"name" : "Device_Name",
"threads" : Num
},
...
]
This section defines the number of threads that needs to be allocated for an I/O device. The number of threads allocated for an I/O device should be proportional to the device’s I/O bandwidth. The rule of thumb is that a thread can handle an I/O rate of 10Gbps. For example, four threads should be allocated for a 40GE NIC while one thread be allocated for a 10GE NIC.
Default_Num
sets the default number of threads allocated for each
I/O device.
If a different number of threads should be allocated for a particular I/O device, a separate entry for the device should to be specified here.
A virtual device should be allocated with 1 thread.
9.4. File section¶
The syntax is defined as:
"filesegment": File_Size_Threshold
MDTM splits a large file into segments, which are spread to different threads for disk and network operations to increase performance.
File_Size_Threshold
sets a file size threshold. A file with a
size that exceeds the threshold will be split into multiple segments,
which are spread across I/O threads to be transferred in parallel.
9.5. Manually_configured_cpus section¶
The syntax is defined as:
"manually_configured_cpus" : {
"storage" : [CPU_index,...],
"network" : [CPU_index,...]
}
This section allows users to manually specify core(s) for mdtmFTP I/O threads. It is optional. In some cases, experienced users may want to manually configure cores for mdtmFTP I/O threads to achieve optimum performance.
If
manually_configured_cpus
is not configured, mdtmFTP calls MDTM middleware scheduling service to schedule cores for its threads. For each I/O thread, MDTM middleware first selects a core near the I/O device (e.g., NIC or disk) the thread uses, and then pins the thread to the chosen core.If
manually_configured_cpus
is configured, mdtmFTP will bypass its normal core scheduling mechanisms. Instead, it assigns and binds its I/O threads to the cores specified inmanually_configured_cpus
one by one.
9.6. Server section¶
The syntax is defined as:
blocksize BLOCK_SIZE
direct DIRECTIO_FLAG
splice SPLICE_FLAG
monitor MONITOR_FLAG
blocksize
sets the block size for disk I/O operations. The block size should be 4K or multiple of 4k (e.g. 4M).direct
enables direct I/O. When direct I/O is enabled, file reads and writes go directly from mdtmFTP to the storage device(s), bypassing the OS R/W caches. For bulk data transfer, enabling direct I/O would improve performance.splice
enables zero-copy by using the Linux splice mechanism.monitor
enables MDTM monitoring.
Note
splice
is an experimental feature that may not function
well in some systems. You can turn this feature off by
setting splice to 0
.
9.7. Example¶
A sample mdtmconfig.json without
manually_configured_cpus
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | {
"topology": [
{
"type" : "block",
"name" : "nvme0n1",
"numa" : "0"
}
],
"online": [
"enp4s0f0",
"nvme0n1"
],
"threads": [
{
"type" : "network",
"name" : "enp4s0f0",
"threads" : 2
},
{
"type" : "block",
"name" : "nvme0n1",
"threads" : 2
}
],
"filesegment": "2G",
"server" : [
"blocksize 4194304",
"direct 0",
"splice 0"
]
}
|
A sample mdtmconfig.json with
manually_configured_cpus
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | {
"topology": [
{
"type" : "block",
"name" : "nvme0n1",
"numa" : "0"
}
],
"online": [
"enp4s0f0",
"nvme0n1"
],
"threads": [
{
"type" : "network",
"name" : "enp4s0f0",
"threads" : 2
},
{
"type" : "block",
"name" : "nvme0n1",
"threads" : 2
}
],
"filesegment": "2G",
"manually_configured_cpus" : {
"storage" : [0, 1, 2, 3],
"network" : [4, 5, 6, 7]
},
"server" : [
"blocksize 4194304",
"direct 0",
"splice 0"
]
}
|
In general, you need not configure manually_configured_cpus
.