Friday, August 29, 2008

This program shows that probability of inaccuracy is more than 20% in "SigmaDelta" strategy.run this program in "eightbits" folder after running "getfiles".
getfiles.m
autostat.m
as I made the observation, SigmaDelta strategy declares that when we embed a random data in a non-embeded file, in most cases, Sigmadelta increases. and dominantly this strategy is based on this idea. but what happens if this idea is misleaded in 20% of its works?
This program gets an original file, calculates its sigmadelta, then embeds data and again calculates sigmadelta.should the second sigmadelta be lower than or equal with the first one, this program detecs error and shows it on the screen. at last, it calculates number of errors and percentage of error occurances in all examinee files...for me, it found 23% of errors in 94 files. is it desired?
maybe my point of view is not correct or something is misunderstood by me?

Thursday, August 28, 2008

Summary

We first wrote a program that finds all the 8-bit *.wav files and then wrote another program to calculate sub for pairs in each file and save the result. After that we calculated average of subs for each file. Certainly average of modified data was less than original data, so we decided to have a threshold. if average of a file is less than this threshold we'll assume that this file is embedded and classify files by their average. (Because some original files have small average and sub and in comparison with files with large sub and average may cause mistakes.)
You can see some averages in the *.excel file which is uploaded in pervious post.Now we should find suitable threshold for each pair in each class and then find out whether the file is exactly embedded or not.

Wednesday, August 27, 2008

some changes should be done on "getfiles", it finally will be as below:
clear
files=dir(fullfile(matlabroot,'work\Data hide\eightbits\*.wav'));
sizef=size(files);
for k=1:sizef(1)
name{k}=getfield(files,{k,1},'name');
end
i=1;
for k=1:sizef(1)
fname=char(name(k))
[data fs b]=wavread(fname);
if(b==8)
eightbit{i}=fname;
i=i+1;
end
end


put "getfiles.m" and "autostat.m"(which is explained in previous post) in the same folder with your wav files. you can put all your wav files,not only 8-bit ones... "getfiles" will exclude other files itself.
so first run getfiles and then autostat... then all .mat files are now put in that folder. you can import them one by one and copy and paste into excel...
Remember that you should start from the last .mat file! :-D
but wait! I think that's better to make a multiple deminsional matrix in one time and then copy it to exel?! that must be so simple!
I wrote a new program which saves all audio files's data into a folder.We can simply import data and use it later.or do an automatic statistical research in matlab.(the program is written below, pay attention to red colored lines)

i added only one more loop to our previous program.
notice that eightname is a matrix which has the name of all the files.(the same 94 8-bit files we found!) this matrix is made by "getfiles" about which I have explained in a few posts ago.


%determining input and output
%importing input and recognizing fs and bits/sample
sizef=sizef(1);
for k=1:sizef
eightname=char(eightbit(k));
[data fs b]=eval('wavread (eightname)');
sizem=size(data);
%double->integer data
data=data.*2^(b-1)+2^(b-1);
data=uint8(data);
%count
countero=zeros(256,1);
for j=0:255
for i=1:sizem(1)
if data(i)==j
countero(j+1)=countero(j+1)+1;
end
end
end
%producing random data
r=randint(sizem(1),1);
%embedding random data to original data
for i=1:sizem(1)
data(i)=bitset(data(i),1,r(i));
end
%count
counterm=zeros(256,1);
for j=0:255
for i=1:sizem(1)
if data(i)==j
counterm(j+1)=counterm(j+1)+1;
end
end
end
%getting subtract
j=0;
for i=1:2:255
j=j+1;
km(j,1)=abs(counterm(i)-counterm(i+1));
end
j=0;
for i=1:2:255
j=j+1;
ko(j,1)=abs(countero(i)-countero(i+1));
end
matname=[eightname '.mat']
save([eightname '.mat'])
k=k+1;
end


also I made an exel file of about 10 columns. I calculated average of subs and average of every row in exel. I think we can find some useful points in average of every column. take a look on the file please.
see that average of data has decreased noticeably in modified files.
and maybe, as parisa said before, we can consider every row's average as the thereshold in order to discreminate pairing from non-pairing rows. then we will count number of positive answers. we expect to have considerably more positive answers in modified audio files.

Thursday, August 21, 2008

I devised the following program and used it in order to find all 8bit wav files in my computer.

clear
files=dir(fullfile(matlabroot,'work\Data hide\audio\*.wav'));
size=size(files);
for k=1:size(1)
name{k}=getfield(files,{k,1},'name');
end
i=1;
for k=1:size(1)
[data fs b]=wavread(name(k));
if(b==8)
eightbit{i}=name(k);
i=i+1;
end
end

notice that you should put all wav files in a folder and give its path to this program(line2) and then run it. you can see the name of 8bit files in matrix "eightbit". if it caused any errors first of all make sure that none of your files are compressed.
by means of this program I found 93 8bit wav files. now I'm going to download final program of last session and run it on all these files and see pairing phenomenon. but since this program doesn't matter whether the files are mono or stereo I wondered if the files should be mono or not...
I'm really eager to see pairing!!

Wednesday, August 20, 2008

here is a way which we can record audio in Matlab and use it for our statistical research:

micrecorder = audiorecorder(11025,8,1);
record(micrecorder,2);
% Now, speak into microphone
stop(micrecorder);
speechplayer = play(micrecorder);
% Now, listen to the recording
stop(speechplayer);
data = getaudiodata(micrecorder);

this program records a mono(1 channle) 8bit audio with 11025 samples then plays it back and saves it into "data" to be used in Matlab.I adjusted the program to record only 2 seconds, scince this function is not intended for long recording otherwise it will cause lack of memorgy and Matlab performance may degrade.

another way is using "wavrecord":

y= wavrecord(2*11025,11025,'int8');
wavplay(y,11025);

Both.seseion8

We correct our program and Finally could see pairing in histogram.
Changs in program are as follow:

  1. multiple daata by 2^(b-1) nor 2^b; beacuse data is between (-1 & 1) nor( -.5 &.5) so if we multiple by 2^b e have only even numbers,

  2. to have possitive data we add a constant num (2^(b-1)) and after embedding minus data from that constant,

  3. use hist() structure for plotting histogram,

  4. correct "bitset(data(i),1,r(i));" to "data(i)=bitset(data(i),1,r(i)); "!!!

  5. correct "eval('wavwrite(data,size(1),o)');" to "eval('wavwrite(data,fs,b,o)');" (if we don't give b as NBITS ,take NBITS=16 by defualt and can cause problem in *.wav file.) .

you can download program here.

Also you can see the histograms bellow:



To do list for next session:

  1. Write a program for extracting data.

Thursday, August 14, 2008

Both.Session7

Finally We used "input" MATLAB function to determine the input file and also name of the new output file.
So by means of this program it's really easy to hide a random data to a large number of audio files and investigate about changes and the "pairing" phenomenon.

also our new program can detect number of samples, bits per sample and other information needed. First It's necessary to find "bits per sample" of our audio file and then hide our data in the LSB bits of every sample,then our program continues to hide data in LSB bits appropriately and at last plots histogram of modified data.

as we have mentioned in our previous post, "setbit" cannot be used for "negative numbers". so we have to change negatives to positives and then hide data. in order to solve this problem we added all negative numbers with 2^bits(bits equals to bits per sample).
with this strategy all negative numbers change to their 2's complements and become positive. after hiding data the program changes all positive numbers greater than 2^(bits-1) back to their negative form.

primarily we tested a few files and it seems that the program is working correctly. the audio file which has a hidden data is not apparently different from the original file.(but I believe that a file named "ringin" has a noticeable change but Parisa doesn't agree with me!)

you can see the program below or download it here.

clear
in=input('type name of the wav file:','s');
o=input('type an output name:','s');
[data fs b]=eval('wavread (in)');
size=size(data);
data=data*2^b;
for i=1:size(1)
if data(i)<0 r="randint(size(1),1);" i="1:size(1)" i="1:size(1)">(2^(b-1)-1)
data(i)=data(i)-2^b;
end
end
data=data*(1/2^b);
eval('wavwrite(data,size(1),o)');

To do list for next session:

  • testing lots of audio files and finding out if they make a recognizable difference when they are modified or not.
  • find out if there is pairing in histogram of modified audio files
  • paying attention to stereo files with 2column matrices

Sunday, August 10, 2008

Both.Session6

We decided to make modified file by ourself because our goal is putting data in every 8bits ( but we didn’t have enough information about that program which is explained in last session ).
To hide data we wrote in this program ( in an M-file ):
data=wavread ('1');
data=data*256*256;
data=uint16(data);
r=randint(22046*2,1);
for i=1:22046
bitset(data(i),1,r(i));
bitset(data(i),9,r(22046+i));
end
data=double(data)
data=data*(1/(256*256));
wavwrite(data,22046,'2');
 Certainly this program has some problems that we should solve them.
For hiding data , first we should read a *.wav file and show it 8bits -8bits.now we use wavread(). This instruction gives some values(vectors): vector of data, vector (or amount ) of the bits using for exhibiting each of data and the frequency of sampling .( we use only from data because we have other value but if we have an input with unknown parameters we have to use other values ).
We have a problem in this step,because in the vector of data we have 16bits data but we need 8bits data . now we continue without paying attention to this problem .
Then we multiple data vector by 256*256 to have signed num instead of double num . after that, we made a random vector, that the 0 and 1 have same probablity , as our bit string with rand(m,n) that m=frequency of sampling & n=1 . in next step , by using a loop we hide data in original data vector . in this loop we use from bitset() to change less value bit in every 8bit to our bit string .we should attend to 2 points:
  1. Because we have 16 bits in each row of data vector ,so we should have 2*Fs numbers ,
  2. In bitset() we couldn’t use negative num , we have to change our data to unsigned format .
Now we have a modified data but for having a vector like original one we do as follow:
First change format to double . then multiple data by1/(256*256) .after that make *.wav file by wavwrite() ( to use wavwrite() you should give correct frequency ).we listen to this file .it is similar to original but because of deleting negative num its quality wasn’t very good.

To do list for next session :

  1. Find out how we can read data 8bit-8bit ,
  2. Editing our program to get input ,
  3. Solve the problem of setbit ( there are two suggestion :first sum data with a constant number and after change minus data from that constant num ,second use from two’s complement),
  4. Plot histogram and find the possible changes.

Friday, August 8, 2008

F.Session5

Today I found the "C# data hide" program which I explained last session. It can produce a modified file from an original *.wav file. It uses a key file to hide the text embedded to the file. Also it can extract the cryptic text and the original file from a modified one. Of course the key file is needed to do so.
A view of this software is provided below:

We may use someone else's modified files in a large amount to do a statistical research or maybe we'll use above program to create lots of hidden data audio files.
I also studied 2 sections of MATLAB-jahad:
Section5.charachters and strings in MATLAB
Section6.loops and if-else instructions

To do list for the next session:
1.studying MATLAB-jahad from section7
2.don't forget about thinking if making all the numbers positive is correct or not.
3.do the statistical research

Monday, August 4, 2008

both.Session4

In this session we find a useful software named:"hex editor" . you can easily find this software by searching internet. There're lots of free downloads available. In this software we can open "wav" files as "hex" and view the file's bytes and also modify them.

But we should notice that it's not possible to modify every byte we like. Some special bytes are essential for a wav file.

We've studied wav files more carefully. Below is the main explanation for the format of a wav file:( source:http://www.codeproject.com/KB/security/steganodotnet8.aspx )

------------------------------------------------------------------------------------------------------

The Wave File Format

Have you ever looked at a Wave file in a HEX editor? It starts like that, and continues with unreadable binary data:

Every RIFF file starts with the text "RIFF", followed by the Int32 length of the entire file:

The next fields say that this RIFF file contains Wave data and open the format chunk:

The length of the following format chunk must be 16 for PCM files:

Now the format is being specified by a WAVEFORMATEX structure:

The format chunk can be followed by some extra information. Then the interesting parts begin with the data chunk.

The data chunk contains all the Wave samples. That means the rest of the file is pure audio data. Little changes might be hearable, but won't destroy the file.

------------------------------------------------------------------------------------------------

Also in this article there's a C# program which can embed esoteric data to a wav file, but the program's download link didn't work.

Another way to create *.wav file from variables is in MATLAB by this instruction:

wavwrite(u,Fs,bits,'filename')

But when using this instruction we should pay attention to give a correct input variable. For example its numbers should be between 1 & -1…

We began to use "stem(data)" instead of "hist(data)" because "hist" only plots some special numbers but "stem" can show all the numbers between 0 and 256 one by one.

Also since MATLAB use to divide numbers by 256, we decided to multiple floating point numbers by 256 instead of 1000. then we add numbers with 100 to have all the numbers positive. After that we count numbers by using two intricate loops. We did all these works in an m-file program which you can see below:


counter=zeros(256,1);

new=pahang*256;

new=round(new);

new=new+100;

for i=0:256

for j=1:22046

if (new(j)==i)

counter(i+1)=counter(i+1)+1;

end

end

end

i=[0:255];

figure;stem(counter)


On the pictures below you can see how "stem" can plot a better histogram of our data.

In Next step, we modified a *.wav file by "hex editor" and then imported it into MATLAB and ran above m-file program on it. Since our changes were'nt very much we couldn't find any critical difference between their histograms. When we looked for any changes in their vectors we made this observation that only first arrays have been changed a little. You can see numbers in the pictures below. First column shows modified audio file vector and second column shows the original one. The third column shows whether they are different or not.( if different 0 and if equl 1 is showed.)

To do list for next session:

  1. Making a large number of modified files and doing a statistical research on them and find the difference between their histogram and the original file's.
  2. continue studying matlab-jahad
  3. Search about this point that whether making all the numbers positive is correct or not.

Sunday, August 3, 2008

F.Session3

In this session I've studied sections 3 and 4 of MATLAB-jahad. Main ideas in these sections are explained below: f
1. Solving a series of equations by matrices in MATLAB: We can solve equations easily by two matrices in MATLAB, one is coefficients matrix and another is the known numbers (numbers on the right side of every equation). Even we can solve equations which number of their unknown variables is more than number of equations.
2. .introducing matrix functions: like det(determinant), inverse or sudo-inverse,...
3. logic operations and their related functions: like and,or,xor,not and also you will learn how to create matrieces which are results of comparing other matrices.

Also in this session I tried to obey Parisa's suggestion. She suggested finding a way for exporting modified variables to audio files. She believes that there should be something like this in MATLAB, but till now I haven't find anything. I found "export" in "signal processing tool" and also "save as" in both "Signal processing tool" and "File menu", but they didn't work.
While I was searching for exporting data I found out "Signal processing tool" which is accessible from "Start" menu. This tool seems really powerful. We can import MATLAB variables into it. Then it will plot the variable and also its spectrum. I think another useful option is the ability to play a sound related to the variable.
As I said in previous session (my previous post), I modified the original data and converted it to a vector of 0-800. Now I imported original data and also modified one into the "signal processing tool" and compared them. I noticed that there's no critical difference between them. The only difference is that modified data has more energy that the original one. In fact you'll hear it with a higher volume.
So at this moment I'm thinking about 2 main ideas:
1. Finding a way (ex. a C program) which can separate bytes of an audio file and easily modify and then save it.
2. modify data in matlab and change every number of data's vector base on esoteric data and then bring it back to its own energy.(by minusing(?) 412 and dividing by 1000) and then compare it with original data by the means of "Signal processing tool" in MATLAB.
this picture shows this powerful tool:


To do list for next session:
1. thinking about 2 statements above
2. continue studying MATLAB-jahad from section5

Friday, August 1, 2008

F.Session2

In my opinion the m-file which Sahar sent to me was not useful. You can download the file.
Section2 of MATLAB-Jahad was all about matrices, how to create them in a short way, algebraic and mathematic operations with matrices, deleting or choosing a special part of matrix or find arrays of matrix ,ex:find(a>5) will return all the arrays greater than 5 from matrix a.
I've designed an m-file(see it below) which converts double floating numbers to integers between 0-800, but when I plotted the histogram it wasn't very good, I think!

m-file:
data*1000;
ans+412;
round(ans);
hist(ans);

You can see the results in pictures below:(1.plot(ans) 2.hist(ans))

1.


2.



To do list for next session:
Continue studying MATLAB from section3.
Think on writing an m-file which can plot a better histogram.

Recognizing audio files containing hidden data, using FPGAs

Project outline: That might be strange if you hear that the music file which you enjoy listening to may contain esoteric data! As an example we believe that if someone substitutes every bit of his esoteric data with LSB bits of every byte in audio file, albeit the file has changed but human's ear cannot recognize the difference made by this substitution.
In this way it's possible to hide a secret data in a common music file and exchange it via an unsecure channel. Any auditor in the channel usually cannot even guess that there is a secret data in the file.

History of this strategy: previous similar works are mostly in the field of image processing. In fact this investigation is based on similar works with images.
At this point we hope that by plotting histogram of audio file we'll be able to distinguish modified file from an unmodified original one. We believe that there's a correlation between columns in histogram of original file but it's not applied to a modified one.
Our final goal: at last we hope that we can design an Fpga based system which can recognize audio files containing hidden data.

Project explanation:
At first we tried to use MATLAB and its signal processing tools to analyze modified and unmodified files. We should do a statistical research to finally find the main point of difference and apply it to our system.
The first step is finding bytes which has made the original audio file and plot it's histogram.
One way is to using "import data" in MATLAB. But we should notice that the default format of numbers in MATLAB is "double floating". And it imports audio file to a 1 column (if mono) or 2 columns (if stereo) matrix. The numbers are between 1 and -1 and in double format. But these kinds of numbers aren't suitable to plot a histogram. So we should invent another special way to create a vector of suitable numbers. Maybe that will be suitable to multiple all the numbers by 1000 and then round them simply by applying "data=round(data)" in MATLAB.Then we can plot histogram by this instruction:
hist(data)
In this point that became clear that we should have more specialized information about MATLAB and its "signal processing tools", so in next section we will make you more familiar with Signal processing in MATLAB.

Main primary points about MATLAB:
You can save all the variables currently in the workspace and load them the next time you need them by "save" and "load" commands.
Notice that "format" command only affects output display formats of numbers and does not affect how MATLAB computations are done.
You can change format of numbers using this command:
>> a=uint8(a);


To do list for next session:

  • Continue studying MATLAB-jahad section2
  • The file sent by Sahar which shows bytes of audio files

Wednesday 2008/31/7