Hello Readers,
This week, I did an interesting (at least to me) experiment and study about how to compare the Audio files (Audio data).
Background
@Rajesh is my good friend at @Work and my morning walk buddy. He is a hard core Electronics guy , He gives lots of gyan about the Radio waves, AM signals etc etc .. Mostly I wont get it but he is patient enough to explain in plain words which can be understood by a computer science engineer (its me :)).
@work, we are working on Digital Radios, To simplify the scenario, imagine,
1. Radio station pick up the song - Which may be raw PCM data / Compressed as high quality MP3
2. Radio station transmits the song over the Air
3. Your home radio box receives it and reproduce the same Song - By now, it wont be in original file format
4. You will hear the song with acceptable quality
5. Great
@Work in the labs we simulate this situation with various equipments and test the song received is in acceptable quality.
@Rajesh, explain me the pain of this testing. Everytime we need to literally sit and listen the song to know that radio receiver is working fine and reproducing the song with good quality.
The main issue is that Songs / Real life record content is totally different than the audio test vectors (Ex: Sine wave tone). Some of the problems will only occurs to the Songs but in the same setup Sine wave tone works fine.
Obviously, @Rajesh told more about electronics and behaviour of the Radio Signals. I wont explain here (until I get to know it properly :) ).
We were talking about automation of audio listening tests. Due to the nature of the audio data, we can not compare them as a file comparison. Moreover in our situation, Source and Destination audio format is changing and Audio quality also changing. Opps lots of variables.
@Rajesh asked me to find some way to test it.
I was approching this problem from the computer science point of view.
My first thought was, we need to have some model which is similar to human ears. Irrespective of audio file format, quality changes, Humans can find out the similar songs.
Obviously, Google is the best friend while learning. I knew that, there are lots of music recognition applications available in mobile world. I have used Shazam, It could identify the songs properly.
I never gave a good thought about how Shazam kind of applications are working but now its needed for me.
I learnt that Shazam kind of music recognition apps are working using Acoustic fingerprint algorithms.
There are many Acoustic fingerprint algorithms are available. I chose the open source chromaprint algorithm developed by Lukáš Lalinský.
chromaprint converts the song / music as a musical notes and creates the fingerprints for various parts of the song. So the file format change and quality change is least affects this algorithm. Its a brilliant idea.
Read How chromaprint works, if you really interested in the algorithm.
chromaprint provides the pre built tool called fpcalc. fpcalc used to calculate the acoustic fingerprint. To install the fpcalc in Fedora linux,
Note: Ignore the warning message from fpcalc
Now, I have a tool to find the acoustic fingerprints. I also need some test data to proceed. I have downloaded a public available song from youtube as MP3 and also recorded my own voice using Audacity.
If you like to work on MP3 files directly in Audacity, You need to install,
The normal Audacity package is not having support to import MP3 / MP4 audio.
I have created following files using Audacity,
1. hello_10.wav - My hello voice. Total length 10 sec
2. hello_10.mp3 - hello_10.wav is converted as a MP3 file
3. hello_10_noise1.wav - hello_10.wav, Added more noise in middle and small point in beginning.
4. hello_10_less_noise.wav - hello_10.wav, with less noise added.
Tools are installed, Test audio files are available, Lets starts the experiment,
Ok. We have some acoustic fingerprints data but what do with this data. This acoustic fingerprints are encoded with specific format. We need a RAW acoustic fingerprints. Lets get the RAW acoustic fingerprints,
Sweet, Now we have some numerical data. But still we need to make sense out this numerical data. For my situation, I need to see that two songs (one from the source which is radio station and other from radio receiver) are matching with in the threshold. For sure, The two songs wont be same.
Lets consider the numerical (signed 32bit integers) fingerpoint data as a data sets. So we have two sets in our hand. I choose the R Square assess of goodness of fit method.
In my situation, Since the source song and destination song content is same, I should get a positive correlation and in ideal lab conditions, R square should be 1.0 but so many things will happen during the radio transmission. So we can't get 1.0 but the it should be grater than 0.0. If I get 0.0 as a R square, something really wrong happen during the trasnmission.
Dont ask me why I have chose goodness of fit statistics method instead of something in electronics domain like FFT and its derivatives... I yet to test it with real data. I chose it based on pure instinct.
I wrote a perl script to do the whole process of running fpcalc to get the fingerprints, Calculate the R square and report the goodness of fit. Here is the script,
This script takes the two audio files and calculate the goodness of fit based on the acoustic fingerprints data. Lets run it,
The output clearly shows that my method of comparing the audio files is not greatly affected by the audio file formats (MP3 / WAV).
When I calculate the goodness of fit for hello_10.wav & hello_10.mp3, I get 92%, which shows that these are almost same songs.
When more noise added the goodness of fit goes low.
I have also tested with real songs / music files. The outcome is convincing.
Can we automate the audio listening test with this approach?
Yes. we need to already set the goodness of fit tolerance level before we start the audio listening test.
Goodness of fit tolerance level is based on the source song, compressing techniques, transfer medium (AM / FM) etc ... Which I still need to workout.
I hope, with this method, we can reduce the human audio listening times during the testing.
Nevertheless, Nothing can replace human ears when it comes to music / audio testing. The above kind of methods only assists and provides the likely hood of problematic test vectors.
I will update once I work real captured data in the lab.
If this method works well, I am planning to create a perl module to wrap the libchromaprint functionality. so we can call the fpcalc as a perl function instead of subprocess.
Bye Bye ...
This week, I did an interesting (at least to me) experiment and study about how to compare the Audio files (Audio data).
Background
@Rajesh is my good friend at @Work and my morning walk buddy. He is a hard core Electronics guy , He gives lots of gyan about the Radio waves, AM signals etc etc .. Mostly I wont get it but he is patient enough to explain in plain words which can be understood by a computer science engineer (its me :)).
@work, we are working on Digital Radios, To simplify the scenario, imagine,
1. Radio station pick up the song - Which may be raw PCM data / Compressed as high quality MP3
2. Radio station transmits the song over the Air
3. Your home radio box receives it and reproduce the same Song - By now, it wont be in original file format
4. You will hear the song with acceptable quality
5. Great
@Work in the labs we simulate this situation with various equipments and test the song received is in acceptable quality.
@Rajesh, explain me the pain of this testing. Everytime we need to literally sit and listen the song to know that radio receiver is working fine and reproducing the song with good quality.
The main issue is that Songs / Real life record content is totally different than the audio test vectors (Ex: Sine wave tone). Some of the problems will only occurs to the Songs but in the same setup Sine wave tone works fine.
Obviously, @Rajesh told more about electronics and behaviour of the Radio Signals. I wont explain here (until I get to know it properly :) ).
We were talking about automation of audio listening tests. Due to the nature of the audio data, we can not compare them as a file comparison. Moreover in our situation, Source and Destination audio format is changing and Audio quality also changing. Opps lots of variables.
@Rajesh asked me to find some way to test it.
I was approching this problem from the computer science point of view.
My first thought was, we need to have some model which is similar to human ears. Irrespective of audio file format, quality changes, Humans can find out the similar songs.
Obviously, Google is the best friend while learning. I knew that, there are lots of music recognition applications available in mobile world. I have used Shazam, It could identify the songs properly.
I never gave a good thought about how Shazam kind of applications are working but now its needed for me.
I learnt that Shazam kind of music recognition apps are working using Acoustic fingerprint algorithms.
There are many Acoustic fingerprint algorithms are available. I chose the open source chromaprint algorithm developed by Lukáš Lalinský.
chromaprint converts the song / music as a musical notes and creates the fingerprints for various parts of the song. So the file format change and quality change is least affects this algorithm. Its a brilliant idea.
Read How chromaprint works, if you really interested in the algorithm.
chromaprint provides the pre built tool called fpcalc. fpcalc used to calculate the acoustic fingerprint. To install the fpcalc in Fedora linux,
1 2 3 4 5 6 | [root@bakkisweety ~]# yum install libchromaprint-devel chromaprint-tools libchromaprint Loaded plugins: langpacks, show-leaves, upgrade-helper Package libchromaprint-devel-1.1-3.fc21.x86_64 already installed and latest version Package chromaprint-tools-1.0-6.fc21.x86_64 already installed and latest version Package libchromaprint-1.1-3.fc21.x86_64 already installed and latest version Nothing to do |
1 2 3 | [root@bakkisweety Audacity]# fpcalc -version fpcalc: /usr/lib64/nvidia-304xx/libOpenCL.so.1: no version information available (required by /lib64/libavutil.so.54) fpcalc version 1.1.0 |
Note: Ignore the warning message from fpcalc
Now, I have a tool to find the acoustic fingerprints. I also need some test data to proceed. I have downloaded a public available song from youtube as MP3 and also recorded my own voice using Audacity.
If you like to work on MP3 files directly in Audacity, You need to install,
1 2 3 4 | [root@bakkisweety ~]# yum install audacity-freeworld.x86_64 Loaded plugins: langpacks, show-leaves, upgrade-helper Package audacity-freeworld-2.0.6-1.fc21.x86_64 already installed and latest version Nothing to do |
The normal Audacity package is not having support to import MP3 / MP4 audio.
I have created following files using Audacity,
1. hello_10.wav - My hello voice. Total length 10 sec
2. hello_10.mp3 - hello_10.wav is converted as a MP3 file
3. hello_10_noise1.wav - hello_10.wav, Added more noise in middle and small point in beginning.
4. hello_10_less_noise.wav - hello_10.wav, with less noise added.
Tools are installed, Test audio files are available, Lets starts the experiment,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | [root@bakkisweety Audacity]# fpcalc hello_10.wav fpcalc: /usr/lib64/nvidia-304xx/libOpenCL.so.1: no version information available (required by /lib64/libavutil.so.54) FILE=hello_10.wav DURATION=10 FINGERPRINT=AQAAQWOiRRojodY0IZG-J0Eq7YLDo3cR86C4i5CbHXkvoXeRqVq4IDkfWL2EpmmK-0HYBUyebpA7HfmFOumERLryJEilXXB49C5iHg2XE1JWpkOuC3VuZKqmLUjOPbB6oQnTVDh9fDkyPd0gdzryC3XSItMVLgkS67hOVHUR82jIE1L2dMh1oc6NtJq2QDL3IFUvNKGPpym-Pch0b5AbIb-E3kWmKgoXIbE-XCeqNsXpIyxPAAGEA8wQQ4BQAgEEjBRAOMAQBEgQAYBBCBgpGBIMEQgQEQIBhIwDDAkAEA [root@bakkisweety Audacity]# fpcalc hello_10.mp3 fpcalc: /usr/lib64/nvidia-304xx/libOpenCL.so.1: no version information available (required by /lib64/libavutil.so.54) FILE=hello_10.mp3 DURATION=10 FINGERPRINT=AQAAQWOiKNIYCXUyTUIifVeQSrvQhMddxDwo7iLkZkfeS-hdZKoWLkgsfoF_ognTFKePsMsJJk8HudORX6iTaUKm70mQUNrh8ETvIubRcLkIKWeO_LpQxzXSal2QyJwXWD2aME2F08eXI9PTDXKnI79QJy0yfVyCxPKJ60RVFzGPhssJKSvTIdeF50ZTTVuQyNyDVL3QhD6epvj2INO9QW6E_BJ6F5mqjEuCxPJxneib4j7CkCcAAYRTzBBDAAECAISAkQIYAxgRAgggADAIASMFQIIhCBAiQiCAkHEAIAEQQAA [root@bakkisweety Audacity]# fpcalc hello_10_noise1.wav fpcalc: /usr/lib64/nvidia-304xx/libOpenCL.so.1: no version information available (required by /lib64/libavutil.so.54) FILE=hello_10_noise1.wav DURATION=10 FINGERPRINT=AQAAQWOiKNIYCXXSCYl05UmQSrvg8Ohd5IezXYTEZjny6hJ6F5mqdUFy7oFVWWjyVPiDNsrRPJWKF0f4HuLH42GPTF-F5xeqoz3chCQ-HPmQ_PhOoaZwfqi1ywhzpobc6cgv9C4yVVG4JEis4zpR1UXMoyFPMHs6JNeF50ZTTSMSmbuCVL3QhMfjIt4O6hYhN0feS-hdZKqicBES6_gl1GmK-wjLAwGEA4YZYQAQxFCAEFMCCGIMAMBoiUQWAiCBBABGAoYEQ8QBhIBwCCFlCJBMIIAA [root@bakkisweety Audacity]# |
Ok. We have some acoustic fingerprints data but what do with this data. This acoustic fingerprints are encoded with specific format. We need a RAW acoustic fingerprints. Lets get the RAW acoustic fingerprints,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | [root@bakkisweety Audacity]# fpcalc hello_10.wav -raw fpcalc: /usr/lib64/nvidia-304xx/libOpenCL.so.1: no version information available (required by /lib64/libavutil.so.54) FILE=hello_10.wav DURATION=10 FINGERPRINT=1445291460,1450701268,1375130111,1369883486,1353107262,1357172526,1348782959,1092933959,1126558149,1445293508,1449616852,1475801597,1375130622,1353106270,1357238126,1365560174,1365571967,1126693191,1109748165,1445293508,1450668500,1375130111,1369883486,1353107262,1357172526,1348782959,1097136479,1126689221,1176857028,1449485780,1475867133,1375130622,1369883486,1357238062,1348782958,1365571950,1126693191,1109748165,1445293508,1449619924,1375130109,1370936190,1353107326,1357172526,1348782959,1097132383,1126689221,1176857028,1449485780,1475867125,1375130623,1369883486,1353172782,1357171502,1365563758,1126553927,1126525381,1445293508,1449616852,1408684541,1370936190,1353107326,1357238062,1348782958,1097132415 [root@bakkisweety Audacity]# fpcalc hello_10.mp3 -raw fpcalc: /usr/lib64/nvidia-304xx/libOpenCL.so.1: no version information available (required by /lib64/libavutil.so.54) FILE=hello_10.mp3 DURATION=10 FINGERPRINT=1445293508,1467478484,1375130111,1369883486,1353107246,1357172526,1348782959,1092933959,1126558149,1445293508,1449616852,1475801597,1370936190,1353106270,1357238062,1348782958,1097136511,1126693191,1109748165,1445293508,1450701268,1375130109,1370932062,1353107262,1357172526,1348782959,1092942175,1126689093,1176857028,1449616852,1475867125,1370936318,1369883486,1357238062,1348782958,1365571950,1126693191,1109748165,1445293508,1449619924,1442206205,1370936190,1353107326,1357172526,1348782959,1097136479,1126689221,1176857028,1449485764,1475867124,1375130623,1369883486,1353172782,1357171502,1365563758,1126553927,1126525381,1445293508,1449616852,1375097341,1370936190,1353107326,1357238126,1348782958,1097132383 [root@bakkisweety Audacity]# fpcalc hello_10_noise1.wav -raw fpcalc: /usr/lib64/nvidia-304xx/libOpenCL.so.1: no version information available (required by /lib64/libavutil.so.54) FILE=hello_10_noise1.wav DURATION=10 FINGERPRINT=1445293508,1450668500,1375130111,1369883486,1353107262,1357172526,1348782895,1092933967,1126689221,1445293508,1449616852,1475867133,1375130622,1369867102,1357218606,1365541678,1348777278,-718818034,-718953202,-718951154,-735593185,-197747395,-734078147,-730969324,-781317164,-714208380,-680654444,-696314444,-696307276,-687918668,-692637259,-692767818,-744148490,-744196634,-773554842,1399122294,1126693191,1109748165,1445293508,1449616852,1375130109,1370936190,1353107326,1357172526,1348782959,1097132383,1126689223,1176857028,1449485764,1467478516,1375130623,1369883486,1353107246,1357171502,1348786543,1092999495,1126558149,1445293508,1449616852,1408684541,1375130494,1353106302,1357238126,1348782958,1365567871 [root@bakkisweety Audacity]# fpcalc hello_10_less_noise.wav -raw fpcalc: /usr/lib64/nvidia-304xx/libOpenCL.so.1: no version information available (required by /lib64/libavutil.so.54) FILE=hello_10_less_noise.wav DURATION=10 FINGERPRINT=1445293508,1450668500,1375130111,1369883486,1353107262,1357172526,1348782895,1092933967,1126689221,1445293508,1449616852,1475867133,1375130622,1369883486,1357236014,1365561134,1365571966,1395128654,1445324878,1445325006,1445428447,1429624063,1362515327,1348880221,1357270604,1353075276,1369725916,1474587644,1449422308,1450468836,1475858924,1375130622,1369883486,1357367086,1357171502,1365571950,1126684999,1109748165,1445293508,1449616852,1375130109,1370936190,1353107326,1357172526,1348782959,1097132383,1126689223,1176857028,1449485764,1467478516,1375130623,1369883486,1353107246,1357171502,1348786543,1092999495,1126558149,1445293508,1449616852,1408684541,1375130494,1353106302,1357238126,1348782958,1365567871 |
Sweet, Now we have some numerical data. But still we need to make sense out this numerical data. For my situation, I need to see that two songs (one from the source which is radio station and other from radio receiver) are matching with in the threshold. For sure, The two songs wont be same.
Lets consider the numerical (signed 32bit integers) fingerpoint data as a data sets. So we have two sets in our hand. I choose the R Square assess of goodness of fit method.
In my situation, Since the source song and destination song content is same, I should get a positive correlation and in ideal lab conditions, R square should be 1.0 but so many things will happen during the radio transmission. So we can't get 1.0 but the it should be grater than 0.0. If I get 0.0 as a R square, something really wrong happen during the trasnmission.
Dont ask me why I have chose goodness of fit statistics method instead of something in electronics domain like FFT and its derivatives... I yet to test it with real data. I chose it based on pure instinct.
I wrote a perl script to do the whole process of running fpcalc to get the fingerprints, Calculate the R square and report the goodness of fit. Here is the script,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | #!/usr/bin/env perl #Script to calculate the similarity of the two audio files using chromaprint fingerprint algorithem #License: LGPL2.1+ #Author: Bakkiaraj M [http://npointsolutions.blogspot.in/] #Usage: audio_chromaprint_diff.pl audio1_file audio2_file #Note: Install the fpcalc tool version 1.1 before you run this script #Refer URLs #Refer: http://acoustid.org/chromaprint #Refer: http://en.wikipedia.org/wiki/Coefficient_of_determination use strict; use warnings; use Statistics::LineFit; use Data::Dumper; use Capture::Tiny ':all'; #Globals my $fp1ArrRef; my $fp2ArrRef; my $audioLen = 9999; #Len in secs for fingerprint calculation. Just give rough high number. Later enhance based on songs. my $fpcalctool = '/usr/bin/fpcalc -raw -length '.$audioLen.' '; #Get the file names from the command line my $fn1 = $ARGV[0]; my $fn2 = $ARGV[1]; my $fn1FPs = 0; my $fn2FPs = 0; my $fpDiff = 0; if (!defined ($fn1) or !defined ($fn2)) { print "\n Usage: perl $0 audio_file1 audio_file2"; exit (-1); } if (! -s $fn1 or ! -s $fn2) { print "\n ERROR: $fn1 or $fn2 is not a proper file."; exit (-1); } #Function sub calcFingerPrint { my $fileName = shift @_; #print "\n EXEC: $fpcalctool \"$fileName\""; my ($fpdata, $stderr, $exit) = capture { system( $fpcalctool.' "'.$fileName.'"' ); }; unless ($exit == 0) { print "\n ERROR: While running fpcalc tool"; print "\n CMD: ",$fpcalctool.' "'.$fileName.'"'; print "\n STDERR: ", $stderr; exit ($exit); } if ($fpdata =~m/FINGERPRINT=(.*)/g) { my @fpDataArray = (); @fpDataArray = split (/,/,$1); return \@fpDataArray; } else { return []; } } $fp1ArrRef = calcFingerPrint($fn1); $fn1FPs = scalar @$fp1ArrRef; $fp2ArrRef = calcFingerPrint($fn2); $fn2FPs = scalar @$fp2ArrRef; #$,=" "; print "\n File1: $fn1 Tot FPs ",$fn1FPs; #print "\n ", @$fp1ArrRef; print "\n File2: $fn2 Tot FPs ", $fn2FPs; #print "\n ", @$fp1ArrRef; if ($fn1FPs != $fn2FPs) { #Equalise th array items , so it will be equal. if ($fn1FPs > $fn2FPs) { splice ($fp1ArrRef,$fn2FPs); } else { splice ($fp2ArrRef,$fn1FPs); } print "\ Note: File1 & File2 have different FingerPrints, Lowest number of Fingerprints will be used in R2"; } my $lfit = Statistics::LineFit->new(); $lfit->setData($fp1ArrRef, $fp2ArrRef); printf "\n\n Goodnees of fit R2 for File1 & File2 = %.8f \n", $lfit->rSquared(); exit (0); |
This script takes the two audio files and calculate the goodness of fit based on the acoustic fingerprints data. Lets run it,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | [root@bakkisweety Audacity]# /mnt/WinD/Eclipse_WorkSpace/PerlExamples/audio_chromaprint_diff.pl hello_10.wav hello_10.mp3 File1: hello_10.wav Tot FPs 65 File2: hello_10.mp3 Tot FPs 65 Goodnees of fit R2 for File1 & File2 = 0.92182237 [root@bakkisweety Audacity]# /mnt/WinD/Eclipse_WorkSpace/PerlExamples/audio_chromaprint_diff.pl hello_10.wav hello_10_less_noise.wav File1: hello_10.wav Tot FPs 65 File2: hello_10_less_noise.wav Tot FPs 65 Goodnees of fit R2 for File1 & File2 = 0.50358946 [root@bakkisweety Audacity]# /mnt/WinD/Eclipse_WorkSpace/PerlExamples/audio_chromaprint_diff.pl hello_10.wav hello_10_noise1.wav File1: hello_10.wav Tot FPs 65 File2: hello_10_noise1.wav Tot FPs 65 Goodnees of fit R2 for File1 & File2 = 0.02005060 |
The output clearly shows that my method of comparing the audio files is not greatly affected by the audio file formats (MP3 / WAV).
When I calculate the goodness of fit for hello_10.wav & hello_10.mp3, I get 92%, which shows that these are almost same songs.
When more noise added the goodness of fit goes low.
I have also tested with real songs / music files. The outcome is convincing.
Can we automate the audio listening test with this approach?
Yes. we need to already set the goodness of fit tolerance level before we start the audio listening test.
Goodness of fit tolerance level is based on the source song, compressing techniques, transfer medium (AM / FM) etc ... Which I still need to workout.
I hope, with this method, we can reduce the human audio listening times during the testing.
Nevertheless, Nothing can replace human ears when it comes to music / audio testing. The above kind of methods only assists and provides the likely hood of problematic test vectors.
I will update once I work real captured data in the lab.
If this method works well, I am planning to create a perl module to wrap the libchromaprint functionality. so we can call the fpcalc as a perl function instead of subprocess.
Bye Bye ...