Reverse-Engineering DVR Protocol
Just a quick report of me trying to find out how DVR sends video data to the viewer over the network.
I wanted to watch video stream of some old Nadatel? H-0410L 4 Channel DVR device on GNU/Linux. All it had was an ActiveX viewer for Internet Explorer.
I found out that there is also a native windows client so I downloaded it and it worked. I started tcpdump and started looking at the data transmitting between the client and the server. I realized that there are three separate connections being made! The first one just starts the communication and is idle. The second one is used for controlling commands and the third one is used for video stream.
By watching the communication and content of individual packets I was able to reconstruct the protocol structure.
Here is the packet log (-> specifies client-to-server packet, <- specifies server-to-client one).
FIRST CONNECTION – hmm
-> 00000000 00000000 00000000
SECOND CONNECTION – control connection
-> 14000000 00000C00 00000000 35154525 00000000 00000000 00000000 00000000 (it looks like a beginning sequence with the password) <- 00000000 (server replies something like ACK) -> 18000000 00001E00 00000000 (unknown request) <- 18000000 04000000 04000000 02000000 00000000 45150000 (unknown response) -> 14000000 00001900 00000000 (requesting protocol or server version) <- 76352E37 00000000 00000000 00000000 00000000 (response with the version) -> 00000000 00004600 00000000 (requesting model number) <- 482D3034 31304C00 00000000 00000000 00000000 (model response: H-0410L) -> 00000000 00004700 00000000 (requesting number of channels) <- 04 (4 channels) -> 00000000 00004800 00000000 (number of channels which are not connected?) <- 01 -> 00000000 00004900 00000000 (firmware version request) <- 302E392E 31314C28 32303130 30343236 29204555 00000000 00000000 0000 (firmware response: 0.9.11L(20100426) EU) -> FF000000 00000F00 00000000 (stream request? after it streams begins in the third connection)
THIRD CONNECTION – data connection
<- 01000000 00000000 BC070000 7C725004 0C000000 (frame header) xxxxxxxx - header xxxxxxxx - packet ID xxxxxxxx - packet size xxxxxxxx - frame ID xx - 0C if it's a first frame packet <- DC648550 86021010 408F0000 00000001 6742001E 9A740580 93200000 03000000 000168CE 3C800000 (… 1320 bytes - frame data)
By watching similarities in data I realized how long the packet header is and from changing numbers what they mean. When having this knowledge, I was able to extract raw frame data.
This was the first time I worked with video streams, but I thought FFmpeg could help here. And I was right. It turned out it’s H264 704×576 yuv420p stream and that FFmpeg can decode it. It took a few days to set the decoder correctly and get correct images.
Then I worked on a new task – comparing frame data and storing frames as png on the disk when there is a bigger movement detected. This turned out to be the most difficult task. It took a few weeks (non full-time) of work to tweak various parameters and technics. I started with hashing images by pixel data sums. That proved to be really bad. Then decided to integrate OpenCV which is a really good stuff for image processing.
I tried several methods, first methods used grayscale images as the basis. This wasn’t good enough. Then I compared differences in colors and finally I am using a hybrid solution masking out non-interesting parts of the image, comparing change in colors (by using “dot product” between color values), applying threshold, filtering out small areas, eroding, finding contours, approximating them and computing the area of the objects marked by the contours.
This way it is able to detect a moving people while ignoring most general changes like the amount of sun shining on the grass. It still reports many false movements, but it’s acceptable.
During the reverse-engineering I used several command line tools like netcat, xxd and perl. During the process I was using commands like these:
cat /tmp/commx.txt | perl -ne 'print "$1 " if /.*>([A-Z0-9 ]+)<.*/' | xxd -r -p > /tmp/fr3.bin echo "14000000 00000C00 00000000 35154525 00000000 00000000 00000000 00000000 FF000000 00000F00 00000000" | xxd -r -p | ncat 34.93.34.146 3113 > /tmp/ctrl.bin
Sorry I wrote this report after the whole process so I can’t write all steps (commands) one by one, but what I was doing was storing the communication and trying to separate out some parts of it and then find out what those parts are. I also tried to replicate the communication by sending some previously stored communication data to the DVR and watching its response.