Den 03.12.2023 22:29, skrev Terje J.
Hanssen:
Already touched this topic barely in another thread
https://lists.cinelerra-gg.org/pipermail/cin/2023-December/007346.html
But so many SW and HW pieces are mentioned around, it is almost a
full-time reading and study:
VAAPI, MESA, VULKAN, Intel Quick Sync Video etc......
I realize my aging hardware which is fast enough for other tasks,
needs some "AV1 upgrade", if possible.
But first I wonder, what is expected possible to do (obtain) with
AV1 de-/encoding on my existing 64bit hardware:
1) laptop 2018: Dell XPS
13-9370: quad core i7-8550U CPU (8. gen Kabylake) and Intel
UHD Graphics
2) WS infinity: MSI Z170A mobo: quad
core i7-6700K CPU (6. gen Skylake), NVIDIA GeForce GT-730
graphics
A budget friendly first "AV1 HW upgrade" of the workstation 2) if
possible, would be to add a new GPU as Intel Arc A380.
But the question is if this will work at all on that much older
(2015) Skylake platform with i7-6700K CPU?
I've seen CPU bottlenecks has been mentioned and that Arc A380 is
targeted at newer generations CPU ...
Extracted from the first wikipedia reference below about Intel
Alchemist GPUs:
- Featuring 8 Xe-cores, the A380 supports PCI Express 4.0
and has a total board power (TBP) of 75W. The graphics card is
equipped with 6GB GDDR6 memory and a graphics memory interface
of 96 bits, providing a memory bandwidth of 186GB/s.
- Bus interface A380: PCIe 4.0 x8 and for >=A580:
PCIe 4.0 x16
That is, the keyword here seems to be PCIe 4.0 bus speed as a
requirement to utilize the Arc A380 GPU for HWA AV1 encoding (maybe
also for other GPUs?)
well, despite so much time spend looking at dev process for mesa3d I still do not know full details and media encoder process. But isn't it like putting uncompressed frame in vram (as long as you have enough of it - so probably n raw frames between keyframe ideally?) let media engine chw on it, pull resulting compressed bitstream out of vram via pci-express?
So I speculate pcie bandwidth in itself will only matter if you compress both big frame size and long keyframes, so dma engine on card must constantly pump new raw frame data via bus.
I saw some mention of big (resizeable) BAR as requirement for good performance, but opengl/vulkan IMO a bit different because they send often big amount of tiny objects (vertices) via bus for each frame. But may be default 256 Mb in size bar feels a bit small for sending like 1 second of 25 4k frames (300 mb/s)?