系列文章:
FFmpeg入门 - 视频播放
FFmpeg入门 - rtmp推流
FFmpeg入门 - Android移植
FFmpeg入门 - 格式转换
我们现在已经能在安卓上播放视频画面了,但是声音部分还是缺失的,这篇博客就来把视频的音频播放模块也加上。
为了音频和视频可以分别解码播放,我们需要对之前的代码做重构,将媒体流的读取和解码解耦:
MediaReader从文件流中读取出AVPacket交由VideoStreamDecoder和AudioStreamDecoder做视频与音频的解码。我们在MediaReader里加上线程安全机制,使得视频和音频可以分别在各自的工作线程中进行解码。
音频分⽚(plane)与打包(packed) 解码出来的AVFrame,它的data字段放的是视频像素数据或者音频的PCM裸流数据,linesize字段放的是对齐后的画面行长度或者音频的分片长度:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 /** * For video, size in bytes of each picture line. * For audio, size in bytes of each plane. * * For audio, only linesize[0] may be set. For planar audio, each channel * plane must be the same size. * * For video the linesizes should be multiples of the CPUs alignment * preference, this is 16 or 32 for modern desktop CPUs. * Some code requires such alignment other code can be slower without * correct alignment, for yet other it makes no difference. * * @note The linesize may be larger than the size of usable data -- there * may be extra padding present for performance reasons. */ int linesize[AV_NUM_DATA_POINTERS];
视频相关的在之前的博客 中有介绍,音频的话可以看到它只有linesize[0]会被设置,如果有多个分片,每个分片的size都是相等的。
要理解这里的分片size,先要理解音频数据的两种存储格式分⽚(plane)与打包(packed)。以常见的双声道音频为例子,
分⽚存储的数据左声道和右声道分开存储,左声道存储在data[0],右声道存储在data[1],他们的数据buffer的size都是linesize[0]。
打包存储的数据按照LRLRLR…的形式交替存储在data[0]中,这个数据buffer的size是linesize[0]。
AVSampleFormat枚举音频的格式,带P后缀的格式是分配存储的:
1 2 3 4 5 AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar AV_SAMPLE_FMT_FLTP, ///< float, planar AV_SAMPLE_FMT_DBLP, ///< double, planar
不带P后缀的格式是打包存储的:
1 2 3 4 5 AV_SAMPLE_FMT_U8, ///< unsigned 8 bits AV_SAMPLE_FMT_S16, ///< signed 16 bits AV_SAMPLE_FMT_S32, ///< signed 32 bits AV_SAMPLE_FMT_FLT, ///< float AV_SAMPLE_FMT_DBL, ///< double
音频数据的实际长度 这里有个坑点备注里面也写的很清楚了,linesize标明的大小可能会大于实际的音视频数据大小,因为可能会有额外的填充。
@note The linesize may be larger than the size of usable data – there
may be extra padding present for performance reasons.
所以音频数据实际的长度需要用音频的参数计算出来:
1 2 3 int channelCount = audioStreamDecoder.GetChannelCount(); int bytePerSample = audioStreamDecoder.GetBytePerSample(); int size = frame->nb_samples * channelCount * bytePerSample;
音频格式转换 视频之前的demo中已经可以使用OpenGL播放,而音频可以交给OpenSL来播放,之前我写过一篇《OpenSL ES 学习笔记》 详细的使用细节我就不展开介绍了,直接将代码 拷贝来使用。
但是由于OpenSLES只支持打包的几种音频格式:
1 2 3 4 5 6 #define SL_PCMSAMPLEFORMAT_FIXED_8 ((SLuint16) 0x0008) #define SL_PCMSAMPLEFORMAT_FIXED_16 ((SLuint16) 0x0010) #define SL_PCMSAMPLEFORMAT_FIXED_20 ((SLuint16) 0x0014) #define SL_PCMSAMPLEFORMAT_FIXED_24 ((SLuint16) 0x0018) #define SL_PCMSAMPLEFORMAT_FIXED_28 ((SLuint16) 0x001C) #define SL_PCMSAMPLEFORMAT_FIXED_32 ((SLuint16) 0x0020)
这里我们指的AudioStreamDecoder的目标格式为AV_SAMPLE_FMT_S16,如果原始音频格式不是它,则对音频做转码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 audioStreamDecoder.Init (reader, audioIndex, AVSampleFormat::AV_SAMPLE_FMT_S16); bool AudioStreamDecoder::Init (MediaReader *reader, int streamIndex, AVSampleFormat sampleFormat) { ... bool result = StreamDecoder::Init (reader, streamIndex); if (sampleFormat == AVSampleFormat::AV_SAMPLE_FMT_NONE) { mSampleFormat = mCodecContext->sample_fmt; } else { mSampleFormat = sampleFormat; } if (mSampleFormat != mCodecContext->sample_fmt) { mSwrContext = swr_alloc_set_opts ( NULL , mCodecContext->channel_layout, mSampleFormat, mCodecContext->sample_rate, mCodecContext->channel_layout, mCodecContext->sample_fmt, mCodecContext->sample_rate, 0 , NULL ); swr_init (mSwrContext); mSwrFrame = av_frame_alloc (); mSwrFrame->channel_layout = mCodecContext->channel_layout; mSwrFrame->sample_rate = mCodecContext->sample_rate; mSwrFrame->format = mSampleFormat; } return result; } AVFrame *AudioStreamDecoder::NextFrame () { AVFrame *frame = StreamDecoder::NextFrame (); if (NULL == frame) { return NULL ; } if (NULL == mSwrContext) { return frame; } swr_convert_frame (mSwrContext, mSwrFrame, frame); return mSwrFrame; }
这里我们使用swr_convert_frame进行转码:
1 2 3 4 int swr_convert_frame (SwrContext *swr, AVFrame *output, const AVFrame *input ) ;
这个方法要求输入输出的AVFrame都设置了channel_layout、 sample_rate、format参数,然后回调用av_frame_get_buffer为output创建数据buff:
1 2 3 4 5 6 7 8 9 10 11 12 int swr_convert_frame (SwrContext *swr, AVFrame *output, const AVFrame *input) ;
SwrContext为转码的上下文,通过swr_alloc_set_opts和swr_init创建,需要把转码前后的音频channel_layout、 sample_rate、format信息传入:
1 2 3 4 5 6 struct SwrContext *swr_alloc_set_opts (struct SwrContext *s, int64_t out_ch_layout, enum AVSampleFormat out_sample_fmt, int out_sample_rate, int64_t in_ch_layout, enum AVSampleFormat in_sample_fmt, int in_sample_rate, int log_offset, void *log_ctx); int swr_init (struct SwrContext *s) ;
视频格式转换 之前的demo里面我们判断了视频格式不为AV_PIX_FMT_YUV420P则直接报错,这里我们仿照音频转换的例子,判断原始视频格式不为AV_PIX_FMT_YUV420P则使用sws_scale进行格式转换:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 bool VideoStreamDecoder::Init (MediaReader *reader, int streamIndex, AVPixelFormat pixelFormat) { ... bool result = StreamDecoder::Init (reader, streamIndex); if (AVPixelFormat::AV_PIX_FMT_NONE == pixelFormat) { mPixelFormat = mCodecContext->pix_fmt; } else { mPixelFormat = pixelFormat; } if (mPixelFormat != mCodecContext->pix_fmt) { int width = mCodecContext->width; int height = mCodecContext->height; mSwrFrame = av_frame_alloc (); mSwrFrame->width = width; mSwrFrame->height = height; mSwrFrame->format = mPixelFormat; av_frame_get_buffer (mSwrFrame, 0 ); mSwsContext = sws_getContext ( mCodecContext->width, mCodecContext->height, mCodecContext->pix_fmt, width, height, mPixelFormat, SWS_BICUBIC, NULL , NULL , NULL ); } return result; } AVFrame *VideoStreamDecoder::NextFrame () { AVFrame *frame = StreamDecoder::NextFrame (); if (NULL == frame) { return NULL ; } if (NULL == mSwsContext) { return frame; } sws_scale (mSwsContext, frame->data, frame->linesize, 0 , mCodecContext->height, mSwrFrame->data, mSwrFrame->linesize); return mSwrFrame; }
sws_scale看名字虽然是缩放,但它实际上也会对format进行转换,转换的参数由SwsContext提供:
1 2 3 4 5 6 7 8 9 10 11 12 struct SwsContext *sws_getContext ( int srcW, int srcH, enum AVPixelFormat srcFormat, int dstW, int dstH, enum AVPixelFormat dstFormat, int flags, SwsFilter *srcFilter, SwsFilter *dstFilter, const double *param );
sws_scale支持区域转码,可以如我们的demo将整幅图像进行转码,也可以将图像切成多个区域分别转码,这样方便实用多线程加快转码效率:
1 2 3 4 5 6 7 8 9 int sws_scale ( struct SwsContext *c, const uint8_t *const srcSlice[], const int srcStride[], int srcSliceY, int srcSliceH, uint8_t *const dst[], const int dstStride[] ) ;
srcSlice和srcStride存储了源图像部分区域的图像数据,srcSliceY和srcSliceH告诉转码器这部分区域的坐标范围,用于计算偏移量将转码结果存放到dst和dstStride中。
例如下面的代码就将一幅完整的图像分成上下两部分分别进行转码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 int halfHeight = mCodecContext->height / 2 ;uint8_t *dataTop[AV_NUM_DATA_POINTERS] = { frame->data[0 ], frame->data[1 ], frame->data[2 ] }; sws_scale (mSwsContext, dataTop, frame->linesize, 0 , halfHeight, mSwrFrame->data, mSwrFrame->linesize); uint8_t *dataBottom[AV_NUM_DATA_POINTERS] = { frame->data[0 ] + (frame->linesize[0 ] * halfHeight), frame->data[1 ] + (frame->linesize[1 ] * halfHeight), frame->data[2 ] + (frame->linesize[2 ] * halfHeight), }; sws_scale (mSwsContext, dataBottom, frame->linesize, halfHeight, mCodecContext->height - halfHeight, mSwrFrame->data, mSwrFrame->linesize);
AVFrame内存管理机制 我们创建了一个新的AVFrame用于接收转码后的图像:
1 2 3 4 5 6 7 8 9 10 11 12 mSwrFrame = av_frame_alloc (); mSwrFrame->width = width; mSwrFrame->height = height; mSwrFrame->format = mPixelFormat; av_frame_get_buffer (mSwrFrame, 0 );
av_frame_alloc创建出来的AVFrame只是一个壳,我们需要为它提供实际存储像素数据和行宽数据的内存空间,如上所示有两种方法:
1.通过av_frame_get_buffer创建存储空间,data成员的空间实际上是由buf[0]->data提供的:
1 2 3 4 5 6 7 8 LOGD ("mSwrFrame --> buf : 0x%X~0x%X, data[0]: 0x%X, data[1]: 0x%X, data[2]: 0x%X" , mSwrFrame->buf[0 ]->data, mSwrFrame->buf[0 ]->data + mSwrFrame->buf[0 ]->size, mSwrFrame->data[0 ], mSwrFrame->data[1 ], mSwrFrame->data[2 ] );
通过av_image_fill_arrays指定外部存储空间,data成员的空间就是我们指的的外部空间,而buf成员是NULL:
1 2 3 4 5 6 7 8 9 LOGD ("mSwrFrame --> buffer : 0x%X~0x%X, buf : 0x%X, data[0]: 0x%X, data[1]: 0x%X, data[2]: 0x%X" , buffer, buffer + bufferSize, mSwrFrame->buf[0 ], mSwrFrame->data[0 ], mSwrFrame->data[1 ], mSwrFrame->data[2 ] );
而av_frame_free内部会去释放AVFrame里buf的空间,对于data成员它只是简单的把指针赋值为0,所以通过av_frame_get_buffer创建存储空间,而通过av_image_fill_arrays指定外部存储空间需要我们手动调用av_free去释放外部空间。
align 细心的同学可能还看到了av_image_get_buffer_size和av_image_fill_arrays都传了个16的align,这里对应的就是之前讲的linesize的字节对齐,会填充数据让linesize变成16、或者32的整数倍:
1 @param align the value used in src for linesize alignment
这里如果为0会填充失败:
而为1不做填充会出现和实际解码中的linesize不一致导致画面异常:
av_frame_get_buffer则比较人性化,它推荐你填0让它自己去判断应该按多少对齐:
1 2 3 * @param align Required buffer size alignment. If equal to 0, alignment will be * chosen automatically for the current CPU. It is highly * recommended to pass 0 here unless you know what you are doing.
完整代码 完整的demo代码已经放到Github 上,感兴趣的同学可以下载来看看