Threadripper 3990X: compiling 1 billion lines of C++ on 64 cores – InformTFB

Threadripper 3990X: compiling 1 billion lines of C++ on 64 cores

Threadripper 3990X: compiling 1 billion lines of C++ on 64 cores

RAD Studio consists of Delphi and C++Builder. The object Pascal compiler in Delphi is a single-pass compiler, and the compiler itself is not parallel, but when compiling several projects in parallel, it was able to compile 1 billion lines of Object Pascal code in 5 minutes on a machine with a 16-core AMD Ryzen 9 5950x. I wanted to find out if it’s possible to do something similar for C++. This post is part of a series of articles in which we explore the significant performance gains that can be achieved on the fastest processors at the beginning of 2021. How much is 1 billion lines of code? Take a look here.

Parallel compilation in C++Builder

C++Builder has several different compilers, including the classic Borland compiler and modern Clang-based compilers for multiple platforms. Embarcadero also sponsors an open-source Dev-C++ environment that includes the TDM compiler GCC 9.2.0. GCC 9.2.0 contains MAKE, which supports parallel compilation using the command-line parameter -j(Jobs). C++Builder has an addon called TwineCompile that implements parallel compilation in C++Builder. Both C++Builder and Dev-C++ are built using Delphi.

As far as I understood from my research, TwineCompile provides more extensive functionality than MAKE Jobs, because TwineCompile supports background compilation and other performance-enhancing features. Support for additional functions such as compiling in the background depends on the IDE: Dev-C++ does not support it, and C++Builder supports it using TwineCompile. Dev-C++ is a great native C++ IDE for Windows development, and C++Builder improves productivity with a visual designer, powerful built-in VCL RTL, and advanced parallel compilation features. Also, they are based on different C++ compilers, so it’s not really a direct comparison; in fact, they complement each other.

Third-party benchmarks (not related to the project from the post) for 3990X with TwineCompile:

  • Machine parameters: AMD Ryzen Threadripper 3990X (2.9 GHz, 64 cores, 128 threads)
  • Configuration: IDE Compile.
  • Results: without TwineCompile, with TwineCompile
  • 3:35:02, 0:05:44

Parallel compilation in Dev-C++

At the beginning of our quest, Dev-C++ did not support -jthe MAKE flag, so this problem had to be solved first. I managed to update Dev-C++ and release a new version v6. 3 with a built-in flag -j for parallel compilation. In addition, it is now used by default for release builds, which should significantly reduce compilation time for Dev-C++users. The update needed to be released because the command line flag needed to be added to MAKE, not to the compiler’s command line. The implementation took several days, after which a new version v6 was released.3. bundled with this release were all bug fixes for the last two months and a second new feature for selecting custom built-in console applications. Here are the version notes for Dev-C++ v6. 3:

Version 6.3 – January 30, 2021

  • Added: By default, parallel compilation with MAKE Jobs is enabled for release builds.
  • Added: 3 buttons to configure custom command line tabs.
  • Updated: Code completion and menu for dark themes.
  • Updated: wrap-around editor tab selection by CTRL-TAB.
  • Fixed: issue with deleting the Make clean file.
  • Fixed: the status bar doesn’t display all the text.
  • Fixed: hex column issue in the Debug/CPU window.
  • Fixed: closing tabs in the editor in side-by-side mode.

After getting a Dev-C++ IDE that can compile 1 billion lines in C++ in parallel, I needed to get the AMD Threadripper 3990X itself with 64 cores and 128 threads. Threadripper has lower PassMark scores per core than the 5950X, but because it has more cores, the total score is higher. The screenshot below was taken in PassMark and shows a comparison of the two processors. As you can see, the benchmark of a single 5950x core is 3491, while that of a 3990x is 2553. However, the total multicore benchmark of 3990x is 80752, while the 5950x has only 46045.

Note: the video doesn’t mention the more powerful 64-core Threadripper 3990X used in this post.

Y there are cloud machines based on AMD Threadripper 3990X with 256 GB of RAM that meet the requirements of our project. You can choose two Windows configuration options: Windows Standard 2016 and Windows Standard 2019. I chose Windows 2016 and then tried to install this OS on the machine, but I couldn’t do it on any release; this is probably due to the problem of licensing Microsoft processors and cores in Windows Standard 2016. Anyway, the OS was changed to Windows Standard 2019 and everything was installed normally.

So, we have a working machine running Windows 2019 on Threadripper and C++Builder with TwineCompile, plus Dev-C++ v6. 3 with built-in parallel compilation support. Everything is tested and works great. C++Builder was able to compile 1 million lines in C++ from the previous post four times faster than on 5950x, and Delphi was able to compile 1 billion lines of Object Pascal projects 2.5 times faster. We’ll leave these two comparisons for later posts.

One of the tools used to measure CPU usage is MiTeC‘s task Manager DeLuxe. Task Manager DeLuxe surprises with the amount of information provided about Windows. TMX has a dark mode (current for 2021) and a light mode. TMX is manufactured by MiTeC, which also creates a wide range of Delphi components that provide access to a large amount of information that can be found in TMX. You can probably use most of the information from TMX in your application using the MiTeC System Information Component Suite.

When I first ran Task Manager DeLuxe on a 64-core Threadripper 3990x machine, it couldn’t display graphs for individual CPUs and gave an error. I have a commercial license for Task Manager DeLuxe, so I sent an email to Michal from MiTeC and he managed to solve the problem quickly. It has released a new version of Task Manager DeLuxe, which now runs perfectly and runs on a 64-core machine.

The next task was to directly create a project of 1 billion lines in C++ so that it could be compiled. I started with this project Scimark2 for Dev-C++ and developed an application in Delphi to quickly generate the required number of lines of code. Ultimately, I wanted to run an application built from 1 billion lines of C++. The Delphi application takes LU.c and LU files.h and duplicates the last Lufactor function() just enough times to get the required number of lines. The function itself consists of 69 lines, and to avoid name collisions, each generated function has a file name and iteration number.

I tried several different ways of slicing project files in C++ to get more files and fewer lines, or more lines and fewer files. In a Delphi project, I created 4 million rows spread across 250 projects. For a C++ project, one of the slicing methods involved creating 32,000 files of 31,250 lines per file. I came up with this number after testing, because it seemed to me that Dev-C++ works better with small files, a large number of files for a large number of cores, and that a large number of small files better mimics a real project. The second method is 10,666 files of 93,750 lines per file. The third method is 1000 files and 1000000 lines of C++ per file. The list of files is added to the Dev-C++ project file after they are generated, meaning that the Dev-C++ environment will need to load this list of files into its project list.

I have found that there is a bottleneck in Dev-C++ is a function in the code completion and symbols. These functions parse files in the project when it is opened, and suffice to say that this process is not yet parallelized. Sooner or later, Dev-C++ loads, but it takes quite a long time to process 32 thousand files (and even 10,666 files). After I got this sorted out, I turned off code and symbol autocompletion, which allowed me to quickly load a project with 1 billion lines of C++code. Dev-C++ doesn’t seem to have any problems editing a 1-million-line file, and it feels pretty smooth.

I encountered the second problem-the Delphi system.CPUCount procedure reports that there are 32 threads instead of 128. Probably, when we wrote the System.CPUCount procedure, it seemed that 32 cores would be enough, but we have long passed this milestone. In the case of 5950X, which has 16 cores and 32 threads, the procedure works fine, but in the case of 3990X it is wrong. I reported this issue on the Embarcadero Quality portal, but there is already a third-party NumCPULib4Pascal library that should report the correct value. In the meantime, I created my own build of the Dev-C++ executable file and hardcoded 128 threads in it.

We’re almost ready to start compiling 1 billion lines of code! We have equipment, IDES, ready-made compilers, and projects (sliced in different ways). Throughout the process, I compiled 1 billion-line versions of the C++ project with different sizes to identify and fix the aforementioned issues.

Let’s start with a 1 billion-line project, divided into 32,000 by 31,250 lines. This project is being compiled. As it should, it uses all the cores, but when it comes to linking 32,000 files into a single executable, it starts to idle. There is a command-line limit that does not allow transferring 32,000 files to the linker. The maximum length of the Windows command line is 32768 bytes, i.e. USHORT in the Windows API. The second project with 10,666 files and 93,750 lines per file also compiles, but is idle for the same reason.

The third project with 1000 files and 1,000,000 lines per file compiles, but more slowly. It doesn’t use all 128 cores during compilation. When you select-j64, -j128, and-j (automatic selection) in MAKE, you can see that only about 34 of the 64 cores actually work, even though 64 g++processes are running. During this process, 81 GB of RAM is used, so it’s good that the machine has 256 GB. Although all files are compiled after executing the command line, the linker itself crashes with an error, trying to combine all object files into an executable. So far, all the tips found on StackOverflow the problem was not resolved by using various command-line arguments.

g++.exe scimark2.o FFT.o LU.o MonteCarlo.o SOR.o SparseCompRow.o Stopwatch.o Random.o kernel.o array.o LU0.o LU1.o LU2.o LU3.o LU4.o LU5.o LU6.o LU7.o LU8.o LU9.o LU10.o LU11.o LU12.o LU13.o LU14.o LU15.o LU16.o LU17.o LU18.o LU19.o LU20.o LU21.o LU22.o LU23.o LU24.o LU25.o LU26.o LU27.o LU28.o LU29.o LU30.o LU31.o LU32.o LU33.o LU34.o LU35.o LU36.o LU37.o LU38.o LU39.o LU40.o LU41.o LU42.o LU43.o LU44.o LU45.o LU46.o LU47.o LU48.o LU49.o LU50.o LU51.o LU52.o LU53.o LU54.o LU55.o LU56.o LU57.o LU58.o LU59.o LU60.o LU61.o LU62.o LU63.o LU64.o LU65.o LU66.o LU67.o LU68.o LU69.o LU70.o LU71.o LU72.o LU73.o LU74.o LU75.o LU76.o LU77.o LU78.o LU79.o LU80.o LU81.o LU82.o LU83.o LU84.o LU85.o LU86.o LU87.o LU88.o LU89.o LU90.o LU91.o LU92.o LU93.o LU94.o LU95.o LU96.o LU97.o LU98.o LU99.o LU100.o LU101.o LU102.o LU103.o LU104.o LU105.o LU106.o LU107.o LU108.o LU109.o LU110.o LU111.o LU112.o LU113.o LU114.o LU115.o LU116.o LU117.o LU118.o LU119.o LU120.o LU121.o LU122.o LU123.o LU124.o LU125.o LU126.o LU127.o LU128.o LU129.o LU130.o LU131.o LU132.o LU133.o LU134.o LU135.o LU136.o LU137.o LU138.o LU139.o LU140.o LU141.o LU142.o LU143.o LU144.o LU145.o LU146.o LU147.o LU148.o LU149.o LU150.o LU151.o LU152.o LU153.o LU154.o LU155.o LU156.o LU157.o LU158.o LU159.o LU160.o LU161.o LU162.o LU163.o LU164.o LU165.o LU166.o LU167.o LU168.o LU169.o LU170.o LU171.o LU172.o LU173.o LU174.o LU175.o LU176.o LU177.o LU178.o LU179.o LU180.o LU181.o LU182.o LU183.o LU184.o LU185.o LU186.o LU187.o LU188.o LU189.o LU190.o LU191.o LU192.o LU193.o LU194.o LU195.o LU196.o LU197.o LU198.o LU199.o LU200.o LU201.o LU202.o LU203.o LU204.o LU205.o LU206.o LU207.o LU208.o LU209.o LU210.o LU211.o LU212.o LU213.o LU214.o LU215.o LU216.o LU217.o LU218.o LU219.o LU220.o LU221.o LU222.o LU223.o LU224.o LU225.o LU226.o LU227.o LU228.o LU229.o LU230.o LU231.o LU232.o LU233.o LU234.o LU235.o LU236.o LU237.o LU238.o LU239.o LU240.o LU241.o LU242.o LU243.o LU244.o LU245.o LU246.o LU247.o LU248.o LU249.o LU250.o LU251.o LU252.o LU253.o LU254.o LU255.o LU256.o LU257.o LU258.o LU259.o LU260.o LU261.o LU262.o LU263.o LU264.o LU265.o LU266.o LU267.o LU268.o LU269.o LU270.o LU271.o LU272.o LU273.o LU274.o LU275.o LU276.o LU277.o LU278.o LU279.o LU280.o LU281.o LU282.o LU283.o LU284.o LU285.o LU286.o LU287.o LU288.o LU289.o LU290.o LU291.o LU292.o LU293.o LU294.o LU295.o LU296.o LU297.o LU298.o LU299.o LU300.o LU301.o LU302.o LU303.o LU304.o LU305.o LU306.o LU307.o LU308.o LU309.o LU310.o LU311.o LU312.o LU313.o LU314.o LU315.o LU316.o LU317.o LU318.o LU319.o LU320.o LU321.o LU322.o LU323.o LU324.o LU325.o LU326.o LU327.o LU328.o LU329.o LU330.o LU331.o LU332.o LU333.o LU334.o LU335.o LU336.o LU337.o LU338.o LU339.o LU340.o LU341.o LU342.o LU343.o LU344.o LU345.o LU346.o LU347.o LU348.o LU349.o LU350.o LU351.o LU352.o LU353.o LU354.o LU355.o LU356.o LU357.o LU358.o LU359.o LU360.o LU361.o LU362.o LU363.o LU364.o LU365.o LU366.o LU367.o LU368.o LU369.o LU370.o LU371.o LU372.o LU373.o LU374.o LU375.o LU376.o LU377.o LU378.o LU379.o LU380.o LU381.o LU382.o LU383.o LU384.o LU385.o LU386.o LU387.o LU388.o LU389.o LU390.o LU391.o LU392.o LU393.o LU394.o LU395.o LU396.o LU397.o LU398.o LU399.o LU400.o LU401.o LU402.o LU403.o LU404.o LU405.o LU406.o LU407.o LU408.o LU409.o LU410.o LU411.o LU412.o LU413.o LU414.o LU415.o LU416.o LU417.o LU418.o LU419.o LU420.o LU421.o LU422.o LU423.o LU424.o LU425.o LU426.o LU427.o LU428.o LU429.o LU430.o LU431.o LU432.o LU433.o LU434.o LU435.o LU436.o LU437.o LU438.o LU439.o LU440.o LU441.o LU442.o LU443.o LU444.o LU445.o LU446.o LU447.o LU448.o LU449.o LU450.o LU451.o LU452.o LU453.o LU454.o LU455.o LU456.o LU457.o LU458.o LU459.o LU460.o LU461.o LU462.o LU463.o LU464.o LU465.o LU466.o LU467.o LU468.o LU469.o LU470.o LU471.o LU472.o LU473.o LU474.o LU475.o LU476.o LU477.o LU478.o LU479.o LU480.o LU481.o LU482.o LU483.o LU484.o LU485.o LU486.o LU487.o LU488.o LU489.o LU490.o LU491.o LU492.o LU493.o LU494.o LU495.o LU496.o LU497.o LU498.o LU499.o LU500.o LU501.o LU502.o LU503.o LU504.o LU505.o LU506.o LU507.o LU508.o LU509.o LU510.o LU511.o LU512.o LU513.o LU514.o LU515.o LU516.o LU517.o LU518.o LU519.o LU520.o LU521.o LU522.o LU523.o LU524.o LU525.o LU526.o LU527.o LU528.o LU529.o LU530.o LU531.o LU532.o LU533.o LU534.o LU535.o LU536.o LU537.o LU538.o LU539.o LU540.o LU541.o LU542.o LU543.o LU544.o LU545.o LU546.o LU547.o LU548.o LU549.o LU550.o LU551.o LU552.o LU553.o LU554.o LU555.o LU556.o LU557.o LU558.o LU559.o LU560.o LU561.o LU562.o LU563.o LU564.o LU565.o LU566.o LU567.o LU568.o LU569.o LU570.o LU571.o LU572.o LU573.o LU574.o LU575.o LU576.o LU577.o LU578.o LU579.o LU580.o LU581.o LU582.o LU583.o LU584.o LU585.o LU586.o LU587.o LU588.o LU589.o LU590.o LU591.o LU592.o LU593.o LU594.o LU595.o LU596.o LU597.o LU598.o LU599.o LU600.o LU601.o LU602.o LU603.o LU604.o LU605.o LU606.o LU607.o LU608.o LU609.o LU610.o LU611.o LU612.o LU613.o LU614.o LU615.o LU616.o LU617.o LU618.o LU619.o LU620.o LU621.o LU622.o LU623.o LU624.o LU625.o LU626.o LU627.o LU628.o LU629.o LU630.o LU631.o LU632.o LU633.o LU634.o LU635.o LU636.o LU637.o LU638.o LU639.o LU640.o LU641.o LU642.o LU643.o LU644.o LU645.o LU646.o LU647.o LU648.o LU649.o LU650.o LU651.o LU652.o LU653.o LU654.o LU655.o LU656.o LU657.o LU658.o LU659.o LU660.o LU661.o LU662.o LU663.o LU664.o LU665.o LU666.o LU667.o LU668.o LU669.o LU670.o LU671.o LU672.o LU673.o LU674.o LU675.o LU676.o LU677.o LU678.o LU679.o LU680.o LU681.o LU682.o LU683.o LU684.o LU685.o LU686.o LU687.o LU688.o LU689.o LU690.o LU691.o LU692.o LU693.o LU694.o LU695.o LU696.o LU697.o LU698.o LU699.o LU700.o LU701.o LU702.o LU703.o LU704.o LU705.o LU706.o LU707.o LU708.o LU709.o LU710.o LU711.o LU712.o LU713.o LU714.o LU715.o LU716.o LU717.o LU718.o LU719.o LU720.o LU721.o LU722.o LU723.o LU724.o LU725.o LU726.o LU727.o LU728.o LU729.o LU730.o LU731.o LU732.o LU733.o LU734.o LU735.o LU736.o LU737.o LU738.o LU739.o LU740.o LU741.o LU742.o LU743.o LU744.o LU745.o LU746.o LU747.o LU748.o LU749.o LU750.o LU751.o LU752.o LU753.o LU754.o LU755.o LU756.o LU757.o LU758.o LU759.o LU760.o LU761.o LU762.o LU763.o LU764.o LU765.o LU766.o LU767.o LU768.o LU769.o LU770.o LU771.o LU772.o LU773.o LU774.o LU775.o LU776.o LU777.o LU778.o LU779.o LU780.o LU781.o LU782.o LU783.o LU784.o LU785.o LU786.o LU787.o LU788.o LU789.o LU790.o LU791.o LU792.o LU793.o LU794.o LU795.o LU796.o LU797.o LU798.o LU799.o LU800.o LU801.o LU802.o LU803.o LU804.o LU805.o LU806.o LU807.o LU808.o LU809.o LU810.o LU811.o LU812.o LU813.o LU814.o LU815.o LU816.o LU817.o LU818.o LU819.o LU820.o LU821.o LU822.o LU823.o LU824.o LU825.o LU826.o LU827.o LU828.o LU829.o LU830.o LU831.o LU832.o LU833.o LU834.o LU835.o LU836.o LU837.o LU838.o LU839.o LU840.o LU841.o LU842.o LU843.o LU844.o LU845.o LU846.o LU847.o LU848.o LU849.o LU850.o LU851.o LU852.o LU853.o LU854.o LU855.o LU856.o LU857.o LU858.o LU859.o LU860.o LU861.o LU862.o LU863.o LU864.o LU865.o LU866.o LU867.o LU868.o LU869.o LU870.o LU871.o LU872.o LU873.o LU874.o LU875.o LU876.o LU877.o LU878.o LU879.o LU880.o LU881.o LU882.o LU883.o LU884.o LU885.o LU886.o LU887.o LU888.o LU889.o LU890.o LU891.o LU892.o LU893.o LU894.o LU895.o LU896.o LU897.o LU898.o LU899.o LU900.o LU901.o LU902.o LU903.o LU904.o LU905.o LU906.o LU907.o LU908.o LU909.o LU910.o LU911.o LU912.o LU913.o LU914.o LU915.o LU916.o LU917.o LU918.o LU919.o LU920.o LU921.o LU922.o LU923.o LU924.o LU925.o LU926.o LU927.o LU928.o LU929.o LU930.o LU931.o LU932.o LU933.o LU934.o LU935.o LU936.o LU937.o LU938.o LU939.o LU940.o LU941.o LU942.o LU943.o LU944.o LU945.o LU946.o LU947.o LU948.o LU949.o LU950.o LU951.o LU952.o LU953.o LU954.o LU955.o LU956.o LU957.o LU958.o LU959.o LU960.o LU961.o LU962.o LU963.o LU964.o LU965.o LU966.o LU967.o LU968.o LU969.o LU970.o LU971.o LU972.o LU973.o LU974.o LU975.o LU976.o LU977.o LU978.o LU979.o LU980.o LU981.o LU982.o LU983.o LU984.o LU985.o LU986.o LU987.o LU988.o LU989.o LU990.o LU991.o LU992.o LU993.o LU994.o LU995.o LU996.o LU997.o LU998.o LU999.o -o Scimark2.exe -L"C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/lib" -L"C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/x86_64-w64-mingw32/lib" -static-libgcc -mcmodel=large -fPIC -Wl,--image-base -Wl,0x10000000

C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: in function `check_managed_app':

relocation truncated to fit: R_X86_64_PC32 against symbol `.refptr.mingw_initltsdrot_force' defined in .rdata$.refptr.mingw_initltsdrot_force[.refptr.mingw_initltsdrot_force] section in C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o
relocation truncated to fit: R_X86_64_PC32 against symbol `.refptr.mingw_initltsdyn_force' defined in .rdata$.refptr.mingw_initltsdyn_force[.refptr.mingw_initltsdyn_force] section in C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o
relocation truncated to fit: R_X86_64_PC32 against symbol `.refptr.mingw_initltssuo_force' defined in .rdata$.refptr.mingw_initltssuo_force[.refptr.mingw_initltssuo_force] section in C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o

relocation truncated to fit: R_X86_64_PC32 against symbol `.refptr.mingw_initcharmax' defined in .rdata$.refptr.mingw_initcharmax[.refptr.mingw_initcharmax] section in C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o
relocation truncated to fit: R_X86_64_PC32 against symbol `.refptr.__image_base__' defined in .rdata$.refptr.__image_base__[.refptr.__image_base__] section in C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o
C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: 
in function `pre_c_init':
relocation truncated to fit: R_X86_64_PC32 against symbol `.refptr.mingw_app_type' defined in .rdata$.refptr.mingw_app_type[.refptr.mingw_app_type] section in C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o

C:/crossdev/src/mingw-w64-v7-git20191109/mingw-w64-crt/crt/crtexe.c:140:(.text+0x70): relocation truncated to fit: R_X86_64_PC32 against `.bss'
C:/crossdev/src/mingw-w64-v7-git20191109/mingw-w64-crt/crt/crtexe.c:144:(.text+0x80): relocation truncated to fit: R_X86_64_PC32 against symbol `__set_app_type' defined in .text section in C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/x86_64-w64-mingw32/lib/libmsvcrt.a(dwngs00096.o)

C:/crossdev/src/mingw-w64-v7-git20191109/mingw-w64-crt/crt/crtexe.c:146:(.text+0x85): relocation truncated to fit: R_X86_64_PC32 against symbol `__p__fmode' defined in .text section in C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/x86_64-w64-mingw32/lib/libmsvcrt.a(lib64_libmsvcrt_os_a-__p__fmode.o)
C:/crossdev/src/mingw-w64-v7-git20191109/mingw-w64-crt/crt/crtexe.c:146:(.text+0x8c): relocation truncated to fit: R_X86_64_PC32 against symbol `.refptr._fmode' defined in .rdata$.refptr._fmode[.refptr._fmode] section in C:/Program Files (x86)/Embarcadero/Dev-Cpp/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o

C:/crossdev/src/mingw-w64-v7-git20191109/mingw-w64-crt/crt/crtexe.c:152:(.text+0x95): additional relocation overflows omitted from the output
collect2.exe: error: ld returned 1 exit status recipe for target 'Scimark2.exe' failed
mingw32-make.exe: *** [Scimark2.exe] Error 1

After testing, it becomes obvious that the obstacle causing this error is the 2 GB limit on the executable file size (despite using-mcmodel=medium or-mcmodel=large). I managed to get 100 files to compile with 1,000,000 lines per file, and an executable file of approximately 1.1 GB was generated. I started using the-Os (size-optimizing) flag, and it moved the project a bit forward. It is worth noting here that the larger the executable, the worse the Scimark2 benchmark, and this is interesting. The first successful compilation of 1 billion lines of 1000 files of 1,000,000 lines each with the-Os flag generated a 359 MB executable file in 1483 seconds (24.7 minutes). I also tried 500 files of 2,000,000 lines and the compilation took longer. A standard Scimark2 project is four times faster than a project with an additional 1 billion lines when the executable is larger and the-Os flag is applied.

500 files of 2,000,000 lines each used up to 156gb, but not all 64 cores.

It doesn’t seem to me that this compile time accurately characterizes Threadripper 3990x, since at 1 million and 2 million lines of code per file, not all cores were used. I don’t know if this is a problem with MAKE and G++, or with the-j option, which selects the number of cores automatically. There may even be an I / o bottleneck in the machine that doesn’t allow it to handle the load. The smaller the files, the more cores the combination of MAKE/G++ and-j uses. I also tried to compare working with and without the-pipe flag (it allows you to use pipelines instead of files at compile time). What’s also interesting here is that TwineCompile in C++Builder doesn’t seem to have this limitation. When using it, parallel compilation instantly starts all cores.

Fourth compilation

After trying to speed up the compilation of 1 billion lines of C++ code, I loaded 4 instances of Dev-C++ with 250 files of 1,000,000 lines per project and compiled all four projects simultaneously. This is similar to a project with 1 billion lines of Object Pascal, because it compiled 250 projects with 4 million lines of code per project. The results of the Quad compilation are shown below.

Four instances of Dev-C++

Note: there is a bug in this screenshot — only 32 cores and 64 threads are displayed, although in fact there should be 64 cores and 128 threads.

Compilation result…

  • Bugs: 0
  • Alerts: 0
  • output file: C:DScimark2-Dev-Cpp-master_250_1m_DScimark2.exe
  • output size: 90,0009765625 MiB
  • Compilation time: 906,58 s

Compilation result…

  • Bugs: 0
  • Alerts: 0
  • output file: C:DScimark2-Dev-Cpp-master_250_1m_CScimark2.exe
  • output size: 90,0009765625 MiB
  • Compilation time: 909,45 s

Compilation result…

  • Bugs: 0
  • Alerts: 0
  • output file: C:DScimark2-Dev-Cpp-master_250_1m_AScimark2.exe
  • output size: 90,0009765625 MiB
  • Compilation time: 915,17 s

Compilation result…

  • Bugs: 0
  • Alerts: 0
  • output file: C:DScimark2-Dev-Cpp-master_250_1m_BScimark2.exe
  • output size: 90,0009765625 MiB
  • Compilation time: 918,05 s

1 billion lines of C ++ code in 15 minutes on AMD Streamripper 3990X

This project was very interesting. There are a whole bunch of C++ flags for the TDM-GCC compiler like -mtune=native, -mtune=znver2, and-mtune=znver3 that I haven’t tried in this configuration. As we saw from the post, software support for a modern machine with 64 cores and 128 threads still needs to be improved, but in General it works and provides quite serious computing power.

Valery Radokhleb
Valery Radokhleb
Web developer, designer

Leave a Reply

Your email address will not be published. Required fields are marked *