Saturday, December 31, 2016

Bookmark: Video distribution and streaming (next generation video coding and streaming)

VIDEO DISTRIBUTION AND STREAMING

This chapter main discuss adaptive video streaming, HLS and MSS.

In HLS, there are two types playlist: super and dynamic

For the dynamic playlist file,
the #EXT-X-MEDIA-SEQUENCE tag identifies the sequence number of the first chunk, 101.ts. It is used to align chunks from different quality levels.
The #EXT-X-TARGETDURATION:2 tag indicates the expected duration of the chunks.
The #EXT-X-KEY:METHOD=NONE tag shows that no encryption was used in this sequence of chunks.
The #EXTINF:2 tags indicate the duration of each chunk.
On-demand playlists are distinguished from live playlists by the #EXT-X-PLAYLIST-TYPE and #EXT-X-ENDLIST tags.

MSS: MICROSOFT SILVERLIGHT SMOOTH STREAMING

Not like HLS using ts format,MMS use mp4 file. MSS employs the MPEG-4 Part 14 (MP4) container and stores each chunk as an MP4 movie fragment within a single contiguous MP4 file for each quality level. Hence, an MSS video is recorded in full length as a single MP4 file (one file per quality level) but is streamed to the client as an ordered sequence of fragmented MP4 file chunks.

The chunks are forwarded by an encoder to a Microsoft IIS server, which aggregates them for each quality profile into an “ismv” file for video and an “isma” file for audio. The “ismv” file contains the complete video with the chunks. Each “ismv” file corresponds to a video encoded at a specific quality level. If a video is encoded with different rates, several “ismv” files are produced. Unlike the Apple fragmenter, playback of each chunk is not possible because the chunks are embedded within the “ismv” file. However, the complete “ismv” video can be played back using the Windows Media player. An aggregate file format is used to store all the chunks and extract them when a specific request is made. For instance, the “ism” file specifies the bit rate of the “ismv” file. The “ismc” file contains the number of chunks and the chunk duration is controlled by the key frame interval, which can vary from 1 to 100s. Unlike MPEG-2 TS, the audio and video information can be transported as separate chunks if desired and then combined by the player.

The file starts with file-level metadata (“moov”) that describes the overall video and the bulk of the payload is contained in fragment boxes that carry fragment-level metadata (“moof”) and media data (“mdat”). The file is terminated with an index box (“mfra”) that allows easy and accurate seeking within the video. The IIS server also creates an XML manifest file that contains information about the available quality levels. The HLS playlist specifies URLs. However, the MSS manifest file contains information that allows the client to create a URL request based on timing information in the stream (Figure 8.39). For on-demand service, the manifest files contain timing and sequence information for all the chunks in the video. Because metadata is provided in every chunk (the current chunk holds timestamps of the next chunk or two), this allows the client to access subsequent chunks without a refreshed manifest file. Hence, the manifest file need not be updated frequently. This is in contrast to HLS where, as new chunks become available, the playlist is updated to reflect the latest available chunks.

HLS vs MMS

Table 8.22 HLS and MSS 802.11n Streaming

	MSS	HLS
Player startup delay	1–2s (independent of number of quality levels)	8–10s for one quality level, 1–2s for eight quality levels
Video data buffering	About 30s	Rate dependent (more buffering for higher rate)
Number of quality levels	More levels reduce player stalling but increase delay in quality switch	More levels reduce player stalling but increase delay in quality switch
Chunk rate slowdown	One chunk every few seconds	One chunk every few seconds
Duplicate chunk(s) on video quality switch	None	One or more chunks
Playlist or manifest file	Requested at start of streaming	Requested periodically

ADOBE HTTP DYNAMIC STREAMING

In addition to the play command, RTMP incorporates a play2 command that can switch to a different rate bitstream without changing the timeline of the content played (Figure 8.62). This is useful for implementing random access and trick modes.

MPEG-DASH

It supports on-demand, live, and time-shifted applications and ads can be inserted between segments. It specifies use of either fragmented MP4 or MPEG-2 TS chunks. The media presentation description (MPD) is an XML manifest that is repeatedly downloaded. It is a key element of the standard and describes the accessible segments and corresponding timing. This enables quality or bitstream switching where video segments or chunks from different representations can be aligned and combined to a single conforming bitstream.

Friday, December 30, 2016

Bookmarks: MULTIVIEW CODING(next generation video coding and streaming)

MVC coding add inter-view prediction to decrease redundancy. In H264 extension, it supports 2 or multi-view compression. Similarly, Hevc provides a MVC extension and caters for UHD 3D videos. It done not change block level tools. Any block-level process that is useful for multiview HEVC can only be enabled using hooks.Motion prediction hooks do not significantly impact single-view HEVC coding because they are designed to improve inter-view coding.
MVC stream contains one base view and several nonbase views. Nonbase view can reference to other views, SEI provide detail information about it. New slice type--MVC slice is nonbase view. Anchor frames can be decoded without previous frames and thus serve as random access points. Random access at non intracoded frames is also possible using gradual decoding refresh (GDR). Anchor and nonanchor frames can have different dependencies, which can be signaled in the SPS.
H264 reference software JM include MVC functions.
The basic concept of inter-view prediction is to exploit temporal motion tools such as disparity vectors.In JM, it is MVCEnableInterView option. The base view is encoded in a similar way to single-view H.264 encoding.

MVC does not allow prediction of a frame in one view at a given time using a frame from another view at a different time.
but, in h264, the improvement in coding efficiency for MVC is marginal when compared to H.264 (less than 1% for two-view S3D). The video quality is hardly affected. Test results suggest that inter-view sample interpolation methods may lead to degradation in video quality, especially for higher resolution videos. In addition, these methods create dependency between the views and require more processing time compared to intraview methods. Thus, MVC may not provide significant coding efficiency when compared to H.264.

Friday, December 23, 2016

nginx streaming server

download nginx from http://nginx.org
unzip it, run nginx.exe, it is equal to run: nginx -c conf\nginx.conf we can change it default port from 80 to other. the trace and error log is in logs folder
nginx command:

nginx -s stop
nginx -s quit
nginx -s reload
nginx -s reopen
nginx -V

we can use tasklist /fi "imagename eq nginx.exe" to check nginx task. or use taskkill /F /IM nginx.exe > nul to close nginx

but, this version doesn't support streaming, so, we can change it to another version:

http://nginx-win.ecsds.eu/ it contains nginx-rtmp-module . I get nginx 1.7.11.3 Gryphon
modify the nginx-win.conf, add :

rtmp {

server {

listen 1935;

chunk_size 4000;

application live {

live on;

record off;

}

application hls {

live on;

hls on;

hls_path /tmp/hls;

}

application dash {

live on;

dash on;

dash_path /tmp/dash;

}

start nginx: nginx.exe -c ./conf/nginx-win.conf
broadcast one file: ffmpeg -re -i "C:\myprogram\testfile\test_1M.mp4" -c copy -f flv rtmp://localhost/live/stream
play it from rtmp server: ffplay rtmp://localhost/live/stream

Bookmark： H.264 MULTIVIEW VIDEO CODING

In H264, the stereo coding includes main view and side view. Decode main view does not need data in side view. it is independent. Side view refer to main view, side view frame can refer main view picture in same time.

For multi view, similar

The base view is 2.
note1: any view picture only reference picture in other view on same time or pictures in same view. it does not refer picture in other view other time.

Monday, December 12, 2016

bookmarks: Real-time Depth Reconstruction from Stereo Sequences (Hansung Kim)

this paper calculate depth maps based on background and moving object. it assume that the background is fixed. so, it is not fit for movie.

Two ways to get depth:
active：utilize ultrasonic or lasers to illuminate the work space,yield fast and accurate depth information
passive：based on computer vision,estimating depth information from acquired images and camera parameters

Disparity estimation problem: find the corresponding pair I1 and I2 of a single world point w in two separate image views.

REAL-TIME DISPARITY ESTIMATION:
We assume that a stereo camera set does not move, and there is no moving object for a few seconds in an initialization step for generating background information. Accurate and detailed disparity information for background is estimated in advance, then only disparities of moving foreground regions are calculated and merged into background disparity fields.

Background disparity estimation:
block-based disparity estimation and spatial correlation

Foreground segmentation:
use a foreground segmentation technique using background subtraction and inter-frame based on the segmentation algorithm

Foreground disparity estimation:
for speed, only calculate boundary block

Sunday, December 11, 2016

bookmarks: Obtaining Depth Information from Stereo Images

This paper gives an overview of the main processing steps for depth perception
with a stereo camera.
Depth perception from stereo vision is based on the triangulation principle. We
use two cameras with projective optics and arrange them side by side, such that
their view fields overlap at the desired object distance. By taking a picture with each camera we capture the scene from two different viewpoints.

\ell = \frac{d}{\tan \alpha} + \frac{d}{\tan \beta}

\ell =d\left({\frac {\cos \alpha }{\sin \alpha }}+{\frac {\cos \beta }{\sin \beta }}\right)

\ell =d\ {\frac {\sin(\alpha +\beta )}{\sin \alpha \sin \beta }}

so,

d=\ell \ {\frac {\sin \alpha \sin \beta }{\sin(\alpha +\beta )}}

For each surface point visible in both images, there are two rays in 3D space
connecting the surface point with each camera’s centre of projection. In order to obtain the 3D position of the captured scene we mainly need to accomplish two
tasks: First, we need to identify where each surface point that is visible in the left image is located in the right image. And second, the exact camera geometry must be known to compute the ray intersection point for associated pixels of the left and right camera. As we assume the cameras are firmly attached to each other,the geometry is only computed once during the calibration process.

so, the distance of position in left and right scenes of one point can be used to calculate the distance.

Calibration process is used to compute mainly the distortion of lens(camera optics). it can remove image distortions results in straight epipolar lines.we can remove the image distortions by reversely applying the distortion learnt during the calibration process. The resulting undistorted images have straight epipolar lines,depicted in Figure 2 (middle).
rectification an additional perspective transformation to the images, so that the epipolar lines are aligned with the image scanlines.the resulting images are shown in Figure 2 (bottom).

after calibration and rectification, we can search the object in left scene simply in one line in right scene, it can be found easily. this step is called stereo match. For each pixel in the left image, we can now search for the pixel on the same scanline in the right image, which captured the same object point.

after stereo match, we found the offset of all pixel. we call it Disparity map

finally, make a reprojection, We can then again use the camera geometry obtained during calibration to convert the pixel based disparity values into actual metric X, Y and Z coordinates for every pixel. This conversion is called reprojection. We can simply intersect the two rays of each associated left and right image pixel

Saturday, December 10, 2016

bookmarks: Optimized implementation of an MVC decoder ---Jochen Britz

focus on : design and implementation of a real-time decoder based on
the FFmpeg framework, that is compliant with H.264/AVC Annex H (referred
to as MVC)
In order to make ffmpeg support MVC, we

make it support new NAL unit type
enhance data structure to support new SPS, PPS parameters
implement new buffer management(DBP) to store MVC buffer

Hierarchical B-picture:

Hierarchical B frame can be referenced by other B frame in lower hierarchy.

The main gain is accomplished by applying a prediction hierarchy and spreading the quantization parameters QPk over the hierarchy levels k, so that the pictures lower in hierarchy have a higher quantization. One simple way to do so is:

QPk = QPk−1 + (k == 1?4 : 1)

The reason why higher quantization for pictures lower in hierarchy is possible without affecting perceptual quality is that the lower the picture is in the hierarchy, the less it can be referenced such that its impact on other pictures is lower. Since pictures of the same hierarchy level should not be next to each other, according to display order, they do not have a big effect on the viewer.

IDR pictures are intra-coded pictures without spatial or temporal references.

Successors of IDR pictures are prohibited to reference any predecessor of

the IDR picture. Successors and predecessors are according to both the

display and the decoding order.

Anchor pictures are similar to IDR pictures but they allow spatial references.

Since any kind of temporal references are still prohibited, their reference

pictures have to be either IDR or anchor pictures.

Key pictures are either IDR or anchor pictures. They are called key pictures

because they are the access points for decoding the stream.

following, Author writes how to modify ffmpeg to add new mvc decoder. it is very detail, so, if you want to do that again, you can read it carefully.

Thursday, December 8, 2016

multiple video coding

MVC (Multiview Video Coding)

Multiview Video Coding was standardized as an extension (annex) to the H.264/AVC standard. In MVC, the views from different cameras are encoded into a single bitstream that is backwards compatible with single-view H.264/AVC. One of the views is encoded as "base view". A single-view (e.g. constrained baseline or progressive high profile) H.264/AVC decoder can decode and output the base view of an MVC bitstream. MVC introduces inter-view prediction between views exemplarily as illustrated in figure 8. MVC is able to compress stereoscopic video in a backwards compatible manner and without compromising the view resolutions. If the server is aware of the UE capabilities, it can omit sending the view components of the non-base view to a device that does not support 3D or does not have enough bitrate to deliver both views.

Assume we have 8 view stream S0,S1,...,S7. T0,T1,...,T11 is time, all view in one time is access unit.

we set S0,S2,S4,S6 as base view, because decoding S0,S2,S4,S6 don't need S1,S3,S5,S7. this mean base view is kind of independent view.
T0 and T8 are anchor access unit. just like I frame, Decoding T0 and T8 doesn't need other access unit.
I,B,and P frame are normal. frame b is different,it only can be referred by same access unit picture. can't be referred by picture in same view.
note: one picture can't refer picture in other view and other time.

profile:
profile_idc = 118 multiview more than 2
profile_idc = 128 two views

Encode and decode:

1. view first order

encode anchor access unit
encode first view up to next access unit
encode othes views one by one

this encode order need many frame buffers.
2. time first

encode anchor access unit
encode all views in next time

this encode order need less frame buffers: 8*4+2

there are other coding scheme:

Tuesday, December 6, 2016

ffmpeg command parameters

1. push one file to rtmp server
ffmpeg -re -i "test.3gp" -f flv rtmp://127.0.0.1/live/stream

2. translate one file into hls m3u8 format

ffmpeg -re -i %1 -codec:v libx264 -map 0 -f hls -hls_list_size 6 -hls_wrap 10 -hls_time 10 playlist.m3u8

3. translate one video file format

ffmpeg -i %1 -bsf:a aac_adtstoasc -acodec copy -vcodec copy .\out.mp4

4. broadcast mp4 file by rtmp
ffmpeg -re -i localFile.mp4 -c copy -f flv rtmp://server/live/streamName

5. dump rtmp stream into file
ffmpeg -i rtmp://server/live/streamName -c copy dump.flv

6. translave video to H264 format, send to another rtmp
ffmpeg -i rtmp://server/live/originalStream -c:a copy -c:v libx264 -vpre slow -f flv rtmp://server/live/h264Stream

7. translate audio to aac format
ffmpeg -i rtmp://server/live/originalStream -c:a libfaac -ar 44100 -ab 48k -c:v libx264 -vpre slow -vpre baseline -f flv rtmp://server/live/h264Stream

8. translate one rtmp into several rtmp with different para
ffmpeg -re -i rtmp://server/live/high_FMLE_stream -acodec copy -vcodec x264lib -s 640×360 -b 500k -vpre medium -vpre baseline rtmp://server/live/baseline_500k -acodec copy -vcodec x264lib -s 480×272 -b 300k -vpre medium -vpre baseline rtmp://server/live/baseline_300k -acodec copy -vcodec x264lib -s 320×200 -b 150k -vpre medium -vpre baseline rtmp://server/live/baseline_150k -acodec libfaac -vn -ab 48k rtmp://server/live/audio_only_AAC_48k

9. broadcase camera
ffmpeg -r 25 -f dshow -s 640×480 -i video=”video source name”:audio=”audio source name” -vcodec libx264 -b 600k -vpre slow -acodec libfaac -ab 128k -f flv rtmp://server/application/stream_name

10.