高清数字人最后一步：GFPGAN视频提清作者： AI工作流来源： [AI工作流](https://mp.weixin.qq.com/s?__biz=MjM5NTU3ODAyMg==&mid=2447604474&idx=1&sn=ce5b1f0b771e367

高清数字人最后一步：GFPGAN视频提清

By AiBard123
June 26, 2023 - 2 min read

作者： AI工作流  来源： [AI工作流](https://mp.weixin.qq.com/s?__biz=MjM5NTU3ODAyMg==&mid=2447604474&idx=1&sn=ce5b1f0b771e36711c767d85d00dbc11&chksm=b2e1b33385963a25f32b7485f4ad88bb9ad62aa248de3db43b26ffffb5163171786b439c6e78&scene=21#wechat_redirect)

此前的文章，使用stable diffusion生成了主播图片，然后使用sadTalker生成了视频，再使用wav2lip优化了嘴型。但是wav2lip的图像清晰度较低，今天使用 wav2lip-gfpgan 将视频提清。

提清的过程：

视频转图片：使用opencv将视频的每一帧，提取为图片(失去音频信息)；
图片提清：使用GFPGAN将每一帧图片分辨率提升；
图片合并视频：使用ffmpeg将提清后的图片，合并为视频文件；
视频添加音频：使用ffmpge，将上述高清视频文件，与原始音频文件合并，生成带语音播报的高清视频文件；
视频转图片

为 wav2lip-gpfgan建立一个新环境

conda create -n wav2lip-gfpgan python=3.9

下载源码及安装依赖

cd d:/aiworkflow  
git clone https://github.com/ajay-sainy/Wav2Lip-GFPGAN.git  
cd Wav2Lip-GFPGAN  
pip install -r requirements.txt -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

准备材料如下：

D:\AIWorkFlow\Wav2Lip-GFPGAN\inputs\input.wav 由AI生成的播报语音

D:\AIWorkFlow\Wav2Lip-GFPGAN\inputs\output.mp4 wav2lip生成的对嘴形视频(低清)

D:\AIWorkFlow\Wav2Lip-GFPGAN\outputs\lowres\ 低清图片目录（过程文件）

D:\AIWorkFlow\Wav2Lip-GFPGAN\outputs\restored_imgs\ 高清图片目录（过程文件）

在项目根目录下，编辑并运行下载python代码:

import cv2  
from tqdm import tqdm  
from os import path  
  
import os  
  
inputVideoPath = '.\\inputs\\output.mp4'  
imagePath = '.\\outputs\\lowres'  
  
if not os.path.exists(imagePath):  
  os.makedirs(imagePath)  
  
vidcap = cv2.VideoCapture(inputVideoPath)  
numberOfFrames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))  
fps = vidcap.get(cv2.CAP_PROP_FPS)  
print("FPS: ", fps, "Frames: ", numberOfFrames)  
  
for frameNumber in tqdm(range(numberOfFrames)):  
    _,image = vidcap.read()  
    cv2.imwrite(path.join(imagePath, str(frameNumber).zfill(4)+'.jpg'), image)  
  
print("imagePath:",imagePath)  
print("inputVideoPath:",inputVideoPath)

很快生成831张(根据视频长度)图片:

2. 图片提清

2.1 准备GFPGAN环境

进入 GFPGAN-master 目录:

cd GFPGAN-master  
pip install -r requirements.txt -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

2.2 准备模型

GFPGANv1.3.pth 模型放到 GFPGAN-master/experiments/pretrained_models 目录下

2.3 图片提清

在 GFPGAN-master 目录下执行:

python inference_gfpgan.py -i ..\\outputs\\lowres -o ..\\outputs -v 1.3 -s 2 --only_center_face --bg_upsampler None

如果初次执行这段程序，那么程序会先下载两个模型文件，如果系统已有，则跳过下载: https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth (104MB) https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth (83MB)

下载完成后，随即开始转换文件：

这个过程相当漫长，我的素材和环境大概5秒转一帧图片，资源管理器显示主要使用了CPU资源。800多张图片用时一个小时。

3. 图片合并视频

3.1 图片合并为短视频

这一步，使用cv2将上述生成的高清图片合并为数个短视频。

在 Wav2Lip-GFPGAN 目录下编辑并运行python文件 tovideo.py, 请注意 VideoWriter 函数中的帧率，要根据你的输入视频帧率填写, 演示的视频是25帧/秒。而 batchSize 则为每个视频的帧数，这里取值为250，即10秒合并为一个视频：

import os  
import cv2  
import numpy as np  
from tqdm import tqdm  
  
outputPath = ".\\outputs"  
imagePath = '.\\outputs\\restored_imgs\\'  
videoOutputPath = outputPath  
  
dir_list = os.listdir(imagePath)  
dir_list.sort()  
  
batch = 0  
batchSize = 250  
  
for i in tqdm(range(0, len(dir_list), batchSize)):  
  img_array = []  
  start, end = i, i+batchSize  
  print("processing ", start, end)  
  for filename in  tqdm(dir_list[start:end]):  
      filename = imagePath+filename;  
      img = cv2.imread(filename)  
      if img is None:  
        continue  
      height, width, layers = img.shape  
      size = (width,height)  
      img_array.append(img)  
        
  out = cv2.VideoWriter(videoOutputPath+'\\batch_'+str(batch).zfill(4)+'.avi',cv2.VideoWriter_fourcc(*'DIVX'), 25, size)  
  batch = batch + 1  
   
  for i in range(len(img_array)):  
    out.write(img_array[i])  
  out.release()  
  
concatTextFilePath = outputPath + "\\concat.txt"  
concatTextFile=open(concatTextFilePath,"w")  
for ips in range(batch):  
  concatTextFile.write("file batch_" + str(ips).zfill(4) + ".avi\n")  
concatTextFile.close()  
  
concatedVideoOutputPath = outputPath + "\\concated_output.avi"  
print("concatedVideoOutputPath:",concatedVideoOutputPath)  
  
finalProcessedOuputVideo = videoOutputPath+'\\final_with_audio.avi'  
print("finalProcessedOuputVideo:",finalProcessedOuputVideo)

完成后生成下述文件

3.2 短视频合并长视频

使用ffmpge将上述短视频合并为长视频，这个视频的时长应和输入视频时长相同。

cd outputs  
ffmpeg -y -f concat -i .\concat.txt  -c copy .\concated_output.avi

4. 视频添加音频

最后，将上述视频和原始音频合并：

ffmpeg -i .\concated_output.avi -i ..\inputs\input.wav -c:v copy -c:a copy .\final.avi

最后看下提清后的视频质量：

提清前的效果

至此，我们完成了完整的播报数字人工作流。

完整的数字人制作工作流：

Stable Diffision 生成数字人图片；
使用SadTalker，将图片+音频，生成播报视频；
使用wav2lip优化播报的口型；
使用GFPGAN+ffmpeg对视频进行提清；

可关注我们的公众号：每天AI新工具