OT, but some of you might find it / the approach interesting....
European researchers at a security conference in Switzerland last week
demonstrated computer-based techniques that can identify blacked-out words
and phrases in confidential documents.
The researchers showed their software at the conference, called Eurocrypt,
by analyzing a presidential briefing memorandum released in April to the
commission investigating the Sept. 11 attacks. After analyzing the document,
they said they had high confidence the word "Egyptian" had been blacked out
in a passage describing the source of an intelligence report stating that
Osama Bin Ladin was planning an attack in the United States.
The researchers, David Naccache, the director of an information security lab
for Gemplus, a Luxembourg-based maker of banking and security cards, and
Claire Whelan, a computer science graduate student at Dublin City University
in Ireland, also applied the technique to a confidential Defense Department
memorandum on Iraqi military use of Hughes helicopters.
They said that although the name of a country had been blacked out in that
memorandum, their software showed that it was highly likely the document
named South Korea as having helped the Iraqis.
The challenge of identifying blacked-out words came to Naccache as he
watched television news on Easter weekend, he said in a telephone interview
"The pictures of the blacked-out words appeared on my screen, and it piqued
my interest as a cryptographer," he said. He then discussed possible
solutions to the problem with Whelan, whom he is supervising as a graduate
adviser, and she quickly designed a series of software programs to use in
analyzing the documents.
Although Naccache is the director of Gemplus, a large information security
laboratory, he said that the research was done independently from his work
The technique he and Whelan developed involves first using a program to
realign the document, which had been placed on a copying machine at a slight
angle. They determined that the document had been tilted by about half a
By realigning the document, it was possible to use another program Whelan
had written to determine that it had been formatted in the Arial font. Next,
they found the number of pixels that had been blacked out in the sentence:
"An Egyptian Islamic Jihad (EIJ) operative told an xxxxxxxx service at the
same time that Bin Ladin was planning to exploit the operative's access to
the U.S. to mount a terrorist strike." They then used a computer to
determine the pixel length of words in the dictionary when written in the
The program rejected all of the words that were not within three pixels of
the length of the word that was probably under the blacked-out area in the
The software then reduced the number of possible words to just seven from
1,530 by using semantic guidelines, including the grammatical context. The
researchers selected the word "Egyptian" from the seven possible words,
rejecting "Ukrainian" and "Ugandan," because those countries would be less
likely to have such information.
After the presentation at Eurocrypt, the researchers discussed possible
measures that government agencies could take to make identifying blacked-out
words more difficult, Naccache said in the phone interview. One possibility,
he said, would be for agencies to use optical character-recognition
technology to rescan documents and alter fonts.
In January, the State Department required that its documents use a more
modern font, Times New Roman, instead of Courier, Naccache said. Because
Courier is a monospace font, in which all letters are of the same width, it
is harder to decipher with the computer technique. There is no indication
that the State Department knew that.
Experts on the Freedom of Information Act said they feared the computer
technique might be used as an excuse by government agencies to release even
more restricted versions of documents.
"They have exposed a technique that may now become less and less useful as a
result," said Steven Aftergood, a senior research analyst at the Federation
of American Scientists, of the research project. "We care because there are
all kinds of things withheld by government agencies improperly."
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!