Setting up my Ampere Server (Build Log 2)

Today is the day, the Ampere CPU and motherboard finally arrived. I'm still waiting on the GPU, and I have not yet decided on the SSDs to use for a ZFS pool. So for today, I'm mostly focused on putting the server together and doing some basic setup, such as installing Ubuntu Server for ARM, joining the Docker Swarm, and more.
Overall, the build process was not too bad. Given that I replaced the original fans with more silent (but less performant) Noctuas, I wanted to make sure that the cables were all nicely tucked away so that the airflow won't be too obstructed.

Powering on the system was actually quite scary, as I was getting a really loud constant beep. After a bit of digging, I was able to locate the PSU as the origin of the beep, and figured that it's the redundant PSU alarm. I only plugged in one of the redundant PSUs because I assumed it would "work", and sure it did work, it just made a really loud beeping noise. After plugging in the second PSU, I was relieved that the beep disappeared almost immediately.
I was able to boot into the bios with no hiccup; I was actually quite concerned that something may go wrong, such as a bad memory stick, PCIE issues, etc.

Getting into the boot menu took a bit longer than expected (probably around a solid minute after pressing F11), but it did eventually get into my bootable USB.

I went with a HWE kernel install, as I did get some recommendation in case I ran into issues with my obscure hardware.
The IPMI (OpenBMC) that comes with this motherboard also made installation easy, where I would be able to utilize both the web KVM interface and the BMC host console to go through the setup without a physical KVM.

And we're in!

Idle power wasn't the greatest (I was expecting much better), at 120 watts measured from my Unifi PDU Pro (each of the redundant PSU supplies half of the power, thus we add these two values up)

CPU Power, measured by sensors
was just showing up as 10W, so I'm wondering if it's the fact that I have 8 RAM sticks (vs 2 in my gaming PC), or all of the fanciness of server hardware, such as redundancy features, just inherently uses more power than consumer grade.
This is compared to my 9900x + RTX5000 ADA gaming PC, which idles at 85 watts, as well the combination of my PoE Unifi switch + my Raspberry Pi fleet, which also idles at 85 watts, both also measured through the PDU. So this machine is officially the noisiest and most power-hungry server in my homelab now.
Joining the Docker Cluster
Firstly, I joined my server as a manager in the docker swarm node. I'm not planning for the Ampere server to run any services as part of the cluster, but for now just Portainer.

Then, I tagged my Ampere server with the portainer
label which I use to determine what node portainer container is deployed to.

And after restarting Portainer through compose, it was up and running.
I also had to modify all of my existing stacks to add a deployment rule to not deploy to the Ampere server, as I did not want this server to run any of the web services. And after redeploying those stacks through portainer, my swarm configuration was complete.
Issues, issues, issues
4 main issues to figure out:
- SATA disks connected to my backplane not showing up
- I only see 60GB of available space
- The CPU temp issue
- I'm only getting 1GbE
SATA disk issue
I have an LSI3008 controller plugged into the SATA backplane. I'm supposed to be using this backplane for a ZFS pool. Although I haven't yet decided on which SSDs to buy for the ZFS pool, I was testing the peripherals using a 2.5 SSD lying around, but I noticed it wasn't showing up:
$ lshw -class disk
*-namespace:0
description: NVMe disk
physical id: 0
logical name: hwmon0
*-namespace:1
description: NVMe disk
physical id: 2
logical name: /dev/ng0n1
*-namespace:2
description: NVMe disk
physical id: 1
bus info: nvme@0:1
logical name: /dev/nvme0n1
configuration: wwid=eui.002538521191226d
only my nvme OS drive was showing up
I tried a different slot in the backplane, but nope, same issue.
When building the server, I saw that the backplane had two SATA ports for each drive, and I heard it was supposed to be for redundancy, and I also read that any of the two ports should technically work.

But nope, by switching the SATA connection to the other port that I wasn't using, the drive was now showing up:
$ sudo lshw -class disk
*-disk
description: ATA Disk
product: Samsung SSD 840
physical id: 0.0.0
bus info: scsi@0:0.0.0
logical name: /dev/sda
version: CB6Q
serial: S1DHNSAF636463F
size: 465GiB (500GB)
capacity: 465GiB (500GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=6 logicalsectorsize=512 sectorsize=512 signature=007bfc0d
*-namespace:0
description: NVMe disk
physical id: 0
logical name: hwmon0
*-namespace:1
description: NVMe disk
physical id: 2
logical name: /dev/ng0n1
*-namespace:2
description: NVMe disk
physical id: 1
bus info: nvme@0:1
logical name: /dev/nvme0n1
size: 1863GiB (2TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
Considering that my Chassis uses 2x2 SATA backplanes, I've tested both backplanes after reseating all SATA cables and verified that both worked. I've also verified that hotswapping the drive to each bay worked as well.
Problem 1 solved.
Temp (and noise) issues
Next was to address the temps. The CPU temps were keep climbing upwards, hitting 94 degrees at idle.
Switched back to the OEM fans.

Instantly, there was an improvement, with idle temps now around a stable 45 degrees. However these fans were loud. I wanted to fine tune it for a bedroom setting. The typical tools in Linux for PWM control, which relies on lm-sensors
and pwmcontrol
, were not detecting these fans.
But after some digging, I found that the fans are actually controlled by OpenBMC, which is able to configure some kind of fan curve.
As per the Ampere community:
Managing Temperature and Fans | Wiki.js You can configure the fan response curve by editing /usr/share/swampd/config.json on OpenBMC.
Note it won’t survive reboots so you should store a copy in /etc and copy it over somehow into /usr/share each time the BMC boots.
I went ahead and modified the config.json
like so:
{
"sensors" : [
... redacted ...
],
"zones" : [
{
"id": 0,
"minThermalOutput": 15.0,
"failsafePercent": 15.0,
"pids": [
... redacted ...
{
"name": "TEMP_SOC",
"type": "stepwise",
"inputs": ["TEMP_SOC"],
"setpoint": 30.0,
"failsafePercent": 75.0,
"pid": {
"samplePeriod": 1.0,
"positiveHysteresis": 1.0,
"negativeHysteresis": 1.0,
"isCeiling": false,
"reading": {
"0": 40,
"1": 65,
"2": 75,
"3": 85,
"4": 90
},
"output": {
"0": 15,
"1": 30,
"2": 40,
"3": 50,
"4": 80
}
}
}
]
}
]
}
The important pieces are the reading
and output
values which determine the fan curve, and minThermalOutput
and failsafePercent
which are the default minimum fan speeds which are initially set to 30. That means that 30% is the lowest fan speed it will allow. I had to reduce this to 15 so that the output
values that are lower than 30 would take effect.
Because of the comment saying that any changes to this file doesn't survive reboots, I also made sure to copy the file over to etc/config.json
and modify /usr/lib/systemd/system/phosphor-pid-control.service
to pass in the new config directory:
[Unit]
Description=OpenBMC Fan Control Daemon
[Service]
Type=simple
ExecStart=/usr/bin/swampd --conf /etc/config.json
Restart=always
RestartSec=5
StartLimitInterval=0
[Install]
WantedBy=basic.target
The CPU (seems to have) stabilized at 52 degrees, and I will have to continue to monitor to see how much temp vs noise I can balance. At 15%, these fans are still audible, but the real villain was not the chassis fans. The jet engine in the home was the redundant PSU and its 2 tiny 40mm fans running at essentially full blast.
There does seem to be mods to swap this out, but there are an equal amount of success stories and horror stories on the internet, especially for PSUs with "smart fan issue detection" features.
I will continue to tackle the noise problem, as I still need to spend some time to research my options.
Disk Space issue
Essentially I'm not seeing the full 2TB of usable storage.
root@ampere:/home/teamcity/.BuildServer/system/artifacts/KtorSample/Build# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 26G 3.5M 26G 1% /run
efivarfs 512K 9.3K 503K 2% /sys/firmware/efi/efivars
/dev/mapper/ubuntu--vg-ubuntu--lv 98G 34G 60G 37% /
tmpfs 126G 0 126G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/nvme0n1p2 2.0G 101M 1.7G 6% /boot
/dev/nvme0n1p1 1.1G 6.4M 1.1G 1% /boot/efi
It seems like this is related to some LVM2 behavior, where I'm using LVM2 as the OS partition /dev/mapper/ubuntu--vg-ubuntu--lv 98G 34G 60G 37% /
lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
loop0 0 100% /snap/core22/1752
loop1 0 100% /snap/snapd/23772
loop2 0 100% /snap/core22/1804
nvme0n1
├─nvme0n1p1 vfat FAT32 91CB-6122 1G 1% /boot/efi
├─nvme0n1p2 ext4 1.0 3b749397-8da0-49a1-be61-7d7a360a6376 1.7G 5% /boot
└─nvme0n1p3 LVM2_member LVM2 001 biGHBi-AyfB-wYQ5-lyZ0-9Z4a-GnxN-nIsdNo
└─ubuntu--vg-ubuntu--lv ext4 1.0 b128dca1-8bc3-46f4-96a9-145927c54fc3 59.2G 34% /
fdisk
does show the full disk size though:
fdisk -l
... redacted ...
Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 2203647 2201600 1G EFI System
/dev/nvme0n1p2 2203648 6397951 4194304 2G Linux filesystem
/dev/nvme0n1p3 6397952 3907026943 3900628992 1.8T Linux filesystem
And output of pvs
:
pvs
PV VG Fmt Attr PSize PFree
/dev/nvme0n1p3 ubuntu-vg lvm2 a-- <1.82t <1.72t
Resizing the VG to use all of the free disk space:
sudo lvresize -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv
sudo resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
And now I was able to verify that I was no longer stuck with 60GB of total space.
df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 26G 3.5M 26G 1% /run
efivarfs 512K 9.3K 503K 2% /sys/firmware/efi/efivars
/dev/mapper/ubuntu--vg-ubuntu--lv 1.8T 34G 1.7T 2% /
Network Speed
*-network:0
description: Ethernet interface
product: Ethernet Controller X550
vendor: Intel Corporation
physical id: 0
bus info: pci@0003:03:00.0
logical name: enP3p3s0f0
logical name: /dev/fb0
version: 01
serial: 9c:6b:00:4b:11:08
size: 1Gbit/s
capacity: 10Gbit/s
width: 64 bits
clock: 33MHz
capabilities: pm msi msix pciexpress bus_master cap_list rom ethernet physical tp 100bt-fd 1000bt-fd 10000bt-fd autonegotiation fb
configuration: autonegotiation=on broadcast=yes depth=32 driver=ixgbe driverversion=6.11.0-19-generic duplex=full firmware=0x8000172d, 1.3105.0 ip=192.168.1.223 latency=0 link=yes mode=1920x1200 multicast=yes port=twisted pair speed=1Gbit/s visual=truecolor xres=1920 yres=1200
resources: iomemory:24000-23fff iomemory:24000-23fff irq:104 memory:240000000000-2400003fffff memory:240000800000-240000803fff memory:11800000-1187ffff memory:11900000-119fffff memory:11a00000-11afffff
Size is only showing as 1Gbit/s, whereas I am expecting 2.5Gbit/s.

The ethernet device does support 2.5Gbe connection, but looks like the we are only advertising 100, 1000 (1Gbe), and 10000 (10Gbe) baseT speeds.
$ ethtool -s enP3p3s0f0 speed 2500 duplex full autoneg on
By running this, the advertised link mode was changed to 2500baseT/Full, and now I'm able to see the 2.5 speeds:

However this change will not persist upon reboots as per this thread, so I had to also added a systemd service to enable this on boot.
What's next?
Obviously making the server more quiet. It's so noisy, and my rack is in my office.