2011-03-04: Personal Digital Archiving Conference 2011

Last week, along with Dr. Nelson, I attended the 2nd annual conference of Personal Digital Archiving held at the Internet Archive in the heart of the foggy city, San Francisco. The weather was not on our side as the sunny state was facing the worst weather in quite a while. This didn't turn my spirit down as I was excited to be in room with experts and passionate geniuses whose collective IQ could cause an integer over-flow!



The general atmosphere was really nice; participants were very friendly and eager to introduce themselves and get to know you. I got exposed to a ton of ideas, projects and insights over coffee sometimes while other times just going up and down the stairs. My only regret is that I don't have a contact card as I got a bunch of them; I got to get me some of these!

So that the readers can relive this experience with me I have divided the conference into two days each in turn is divided into sessions. I will try to highlight a thing or two from each session and I will try to find the videos for the entire conference.

Day 1:

At 9am the conference started with Brewster Kahle and Jeff Ubois introducing Cathy Marshallfrom Microsoft Research who gave an amazing keynote entitled: "People are People and things change". It was a really insightful speech summarizing problems we face in dealing with data and backups. She gave examples from her own experience with her computer and the process of "not" backing it up, her tweets which needed to be backed up as well. She came across the note that even when we backup stuff we tend to replicate the entire folders and make copies not maintain an organized list of the resources, people always think that archiving data should be done by someone else.

Gary Wrightfrom Family search gave an insight from his paper entitled "Preserving your family history records digitally" (Legacy Dox) about best archiving practices. He also introduced the Millenniata Disc for data preservation. Jeremy Leighton Johnfrom the British Library came up next and discussed ways of processing digital manuscripts. Evan Caroll, the author and founder of TheDigitalBeyond, followed. He discussed a very interesting question: What happens to your digital assets when you die? Who gains access to them? Do you want them to be destroyed? He also discusses why certain assets grow to have more importance according to the sentimental value behind them.

After the break Ellysa Cahoy and Scott McDonaldfrom Penn State University proposed their ideas and project by which faculty members help in the archiving process on their level further more. Judith Zissmanused her software design expertise to discuss a very interesting idea about Agile Archiving. She discussed how to implement the Agile Manifesto but in the personal archiving process. I wish she could have given further examples though. After that Stan James, who later became my friend, introduced “The Smallest Day”. A project Stan and his father set up together to collect, archive, arrange, tag and connect all their family photos, documents, letters, postcards and more. It is an awesome project that utilized lots interesting technologies like Dragon software for voice recognition, Mechanical Turk, Ancestry.com…etc. Lori Kendallfollowed discussing and defining the concept of “personal” in regards to archiving by giving an example from her ancestors photos. Jason Scottfollowed with a wonderful speech first presenting himself as a collector. Then discussing that it is not enough to keep things safe but we need to make those collections accessible and available online. He discussed in agony the catastrophe of dropping down Geocities and its consequences. On a side note Jason Scott’s cat has 1.4 million followers on twitter and ranked on the top 200 to be followed!


Lunch was next, it was quite refreshing to discuss with other people ideas and thoughts. We had a tour in the internet archive and I took some photos, we saw microfilm readers, Book scanners (not the first time to see one, I saw a whole bunch of them in Alexandria Library). After lunch Birkin Dianafrom Brown University talked about Metadata on archived items and ways to enhance this metadata. Kate Leggfrom NCAR (National Center for Atmospheric Research) presented the center’s project in enabling ease of access to archived content and stated Warren Washington’s collection they have as a model that could be adopted in subsequent collections. From Bookism, Jay Datemadiscussed the issue of compatibility and creating standards of archiving (suggested why not make archive.txt file same as robots.txt and humans.txt files). He also talked about the paper of Jeremy John “The future of saving our past”. The next session was by Ben Gross and Evan Prodromouwhere they started by discussing the Social Data aspect added to archiving. They discussed FOAF, SIOC, ATOM, OpenDD, POCO and Activity Streams. Marc A. Smithdiscussed in a very interesting way where in the social graph an individual is located. He utilized MapXL, NodeXL to visualize the US Senates and to my surprise (and my lack of political knowledge) clusters started to appear which showed the Right and Left wings. From Berkeley Ray Larsoncame up next and introduced the SNAC project discussing also the Authority control and show the two possibilities: Having several names for one person, and having several persons with same name.

Financials and Economics were the theme of the next panelwhere Jeff Ubois, Brewster Kahle, Steve Griffinand David Rosenthaldiscussed the costs of archiving and showed that the bottleneck was in the scanning process as 80% of the cost goes in the human part of the process. 10 cents per page was the claimed number and discussed that a box of paper can cost from $200 to $680 to be archived. Those are the pay-once models like Presto Prime which costs $2000 per Terabyte to be preserved forever. You can notice that the hardware is the least cost as it is merely $50/TB. The LOCKSS system was introduced as well by Rosenthal. The closing keynote was made by Brian Fetzpatrickfrom Google and DataLiberation.org. He showed statistics of the Internet cut-off on Egypt the last month as an example of control over the data. He argued that there is a necessity to make data free from the framework beneath and introduce the Import/Export button to all products.

After dinner and reception the demos session started by Joanne Langand her project “About One” which is an amazing tool to gather information, data, documents and content for the family to help organize and manage life. The slogan was small pieces of information can build a connected life. Michael Ashenfelderfrom the Library of Congress talked next. Debbie Weissman discussed the possibility of claiming ownership of preserved content. Laura Welcherfrom the Long Now Foundation came up next and introduced the Rosetta Project aiming to archive 7000 language for fear that some languages can go extinct. She also introduced a hierarchy of languages, language commons and sources in a wiki-like theme. Susam Kostalfrom San Francisco magazine discussed the concept of digital Hoarding and its relation to physical hoarding. Then Jonathan Good(whom I had a very nice chat earlier about the Egyptian revolution) gave a demo on his project 1000memories.com how friends and loved ones can remember a passed person by collectively gather his/her photos, testimonials, or even start a grant in his/her name. He also showed a dedicated page for the 384 martyrs who died in the Egyptian revolution each photo linked to a dedicated profile so that people don’t forget who were those people and get to know what their lives were. The day was concluded by Denim Smith’spresentation on his project “My Internet Cooperation”.

Day 2:

The second day was also really interesting but definitely shorter. It started with a keynote speech by Clifford Lynchwhere he gave a very insightful talk about the different forms of exposure of the personal documents to the public. He discussed how the personal archiving concept evolved from just individual private shards accessed individually to shared content with the spread of social media till it finally reached the public domain. He also argued that we need an archive “button” in lots of the digital media.

Danial Reetzwith the DIY Book Scanner gave a very interesting talk with a different focus, it was on cameras and technologies. Initially it started when he made “an instructable” on how to build a cheap book scanner. He discussed how cameras vary in power and how the production is affected by users requirements in enhancements. He argued that sometimes users wanted the best modified photo not the best “real” one, slimming cameras, face enhancement lenses…etc. He wondered if it would have been better to invest in adding “document capturing” capabilities to cameras, perhaps OCR too. Dwight Swansonfollowed up next discussing Home Movies, their evolution and archival. Rich Gibsonfrom Gigapan Project showed how Extreme close-up images can give more stories (like in the Italian fashion runway zooming to the designer on the tag).

After the break 2 poets and writers, Devin Becker and Collier Nogues, made a survey on a broad group of writers and their methods of saving their documents and writings, their archiving and organization. Hong Zhanga PhD student from the University of Illinois followed then Jason Zalingerfrom Rensselaer Polytechnic Institute in New york. Jason presented his study on possible enhancements to Google’s Gmail by adding these concepts: Forget label for unwanted emails, Digital Regret for undoing the send, Sleep on It for postponing confirming the send, Word Cloud,…etc. Aiden Doherty and Cathal Gurrinfrom Dublin University presented a very interesting and intriguing concept which is LifeLog. A small wearable device that logs, takes snapshots, GPS coordinates, temperature sensors…etc and store them in a searchable memory platform. They have been wearing these devices for the last 4.5 years!

Ted Nelsongave an awesome speech about how if things were designed differently from the beginning it would have been better. He argued that the documents on the computer are the biggest example. The slides he had didn’t work initially but later that day he showed an amazing demo for Xanadu, a project he is working on for a long time and it introduced a very new data structure which is the multidimensional cells which I found fascinating! Ed Feigenbaumfrom Stanford introduced SALT (Self Archiving Legacy Toolkit) and talked about the initiative they started at Stanford. Then Christina Englebart, daughter of Douglas Englebart (the inventor of the mouse), gave a presentation about her institute's work in collecting the digital artifacts regarding her father’s legacy.

Lunch was as the day before a good opportunity to mix, mingle and exchange ideas. It helped a lot that it was sunny so most of the people had lunch outside in the sun. When we came back Cal Leefrom the University of North Carolina introduced the Forensics aspect in Digital preservation. Richard Coxfrom University of Pittsburgh followed up next then Mark Matienzo from Yale University Library and Amelia Abreu from University of Washington.

Gordon Bell from Microsoft Research came up next with a talk about his life experience in the health aspect. He illustrated how he gathered all his records from the very first one from decades ago in order to have a better picture on his health situation after he had a heart attack. The project MyLifeBits shows this initiative. Khaled Hassounah came up next and introduced a very successful PHR (Personal Health Record) service named MedHelp. Then Linda Branagan from Medweb argued the difference between EMR (Electronic Medical Record) and PHR (Health Panel Video).



After this final break, Elizabeth Churchill from Yahoo! Research lead a panel discussing forensics in the digital world. Kam Woods from the University of North Carolina presented the Forensic Toolkit Imager and Sam Miester from the University of Maryland discussed data from failed businesses like the Sherwood case.

As a grand finale, the Author Rudy Ruckergave a very interesting talk filled with insightful thoughts, humor and sarcasm discussing Digital Immortality by creating a digital replica of thought and memories which he named it LifeBox. LifeBox acts as a bot that can imitate your responses and be able to answer and give opinions based on your thoughts of memories it can stay forever even after your death for your great grand children.

As a summary it was an amazing conference, not just because I attended those 48 sessions but it gave me a priceless opportunity to meet those bright individuals and broaden my scope of thoughts. As a matter of fact I was inspired to come up with several ideas for my thesis proposal!

Also on other note, I found out that the size of the internet is 20x8x8 ft and it is located in a parking lot in Santa Clara California.

For more about the conference from another prospective please check out Collin Thorman's blog posts, the Library Of Congress's news page, Dick Eastman's blog, Christina Engelbart collective IQ post, Ellysa Cahoy's blog, Don Hawkins's article, collection of posts on The Waki Librarian's day1, day2 and #PDA2011 on twitter.

(2011-03-20 Update:) I have associated links to video recording of each of the sessions, press theand it will display it, photos from the conference can be seen here, courtesy of The Internet Archive and Jeff Ubois.

-- Hany SalahEldeen

Comments