You have probably heard the old idiom about how something on the internet is there forever. Unfortunately, it is more than just a turn of phrase; it’s a matter of fact, as the internet tends to make it easy to save and redistribute any information posted within its databanks. While you can technically delete information from a webpage, the information is almost always stored in a cloud and could be restored with the right amount of effort.
This is rarely an issue if the domain you posted on is strictly under your control, but this is not always the case. In many cases, a third party could place information about you or your company on a domain you have no control over. The worst situation is that even if you post information to a domain that you control and then delete the domain, you still risk losing control of the data.
This is because there are websites dedicated to archiving other websites so people can continue browsing their content. The main culprit of this archiving phenomenon is a website known as archive.org, a website whose name explains it all.
Archive.org specializes in mirroring domains and saving them for users who want to browse the websites despite them no longer being supported. Unfortunately, the domains might be something you wanted to be deleted forever, and archive.org decided to bring it back despite you, meaning you might be looking for a way to finish the job. This article will try to inform you of the best chance of successfully purging pages from archive.org and its Wayback Machine.
What is the Wayback Machine?
If you are not familiar with the deepest recesses of the internet, you might not be aware of what the Wayback Machine is or what it does. The answer is simpler than you might imagine. The Wayback Machine is the name given to the database that archive.org uses to store long-defunct domains. The archival site takes domains, copies their contents before the site was taken down, and allows users to access them by typing the domain name in the Wayback Machine’s search bar.
Once you have typed the name of the website you are looking for in the Wayback Machine’s search bar, it will pull up the corresponding results. There are limitations to what the Wayback Machine can store, such as being unable to store working versions of any downloads once hosted on the site. However, the majority of the information available on a domain stored on archive.org is readily accessible once you have looked it up. This means any information that was once readily available on the domain is once again accessible to users who are curious in the present.
The concept of the Wayback Machine was built on good intentions, ensuring that no information was lost. However, some information can be harmful if not deleted before the wrong users access it. Generally, the most harmful effect of the Wayback Machine is to give users access to outdated information that could be harmful to your reputation and cost you personally or professionally. This begs the question of whether it is possible to remove content from archive.org or not.
Step #1: Block Future Access from the Internet Archive
It is possible to remove information from archive.org so your website’s information is not used against you. However, it is a complicated and time-consuming process requiring you to adjust your website’s code.
The first step to neutralizing the Wayback Machine’s access to your domain is to update the robots.txt file attached to your domain’s URL. The robots.txt file allows you to control the access of automated programs to send correspondence to or access your website. While archive.org has a less than favorable opinion of robots.txt files, they are still obligated to honor them.
Your website should already have a robots.txt file that was created at the same time as your website’s URL. The file itself is a text document, as you likely gathered from the file extension, and can be edited with virtually any text editor. This includes applications such as:
- Microsoft Word
Using one of these text editors, you can open the robots.txt file and edit the contents. While making the edits, take care not to delete any entries in the text file. Doing so could compromise the effectiveness of your robots.txt file and open your domain up to automated traffic. Instead, you should add two new lines to the end of your text file. These two lines should be written as:
Follow that exact format and save the edit to your robots.txt file. Once that is complete, you should have complete protection from archive.org and its Wayback Machine. This is more of a preventative measure since any variations saved before you implemented this addition to your robots.txt file is still fair game. Fortunately, there are other methods you can use to have information taken down instead.
Step #2: Issue a DMCA Removal Request, if Possible
Fortunately, no matter what the content of your website, archive.org is still beholden to federal regulations that permit you to protect your intellectual property. One of the tools the United States government implemented to protect this data is the creation of the Digital Millennium Copyright Act (DMCA).
The DMCA allows you to have your content taken down from any third-party website, including archive.org. However, taking advantage of DMCA protections will require more detailed advice from a legal professional to execute properly. After all, DMCA is a legal matter that requires legal counsel since trying to do it alone is virtually impossible.
At this point, we feel it is important to clarify that we are not attorneys and cannot provide legal advice. We can only offer the use of DMCA takedowns as an option if you have intellectual property stored on your domain. You must consult a local attorney first to determine if you have a DMCA claim. Once you have consulted with your attorney, you can fill out a DMCA takedown letter that you will send to archive.org’s owners.
If the DMCA claim is valid, archive.org will have no option but to acquiesce to the takedown request or face the wrath of the copyright court. Although, DMCA might not always be applicable and will depend on your circumstances. However, DMCA is not the only tool you need to take the information down.
Step #3: Ask to Have the Content Removed
Like DMCA takedown requests, archive.org is still expected to honor the wishes of the domain’s original owner. If you are adamant about taking down an archived version of your website, you can put in a request for them to take the page down. However, this does mean you will have to directly contact archive.org’s owners to request the page’s removal. Your best bet is to send e-mail correspondence to archive.org after you have completed the previous step. Your odds of having archive.org listen to your request increase if the e-mail comes from an address linked to the domain you are inquiring over.
For example, removing a Google page would be best accomplished by sending an e-mail from an address linked to Google. A free e-mail service like Gmail or Outlook could slow your request considerably, whereas a business e-mail would be more effective. When composing the e-mail, you will want to compose it as professionally as possible.
For example, consider drafting it as such:
Subject: Request to Remove <Domain Name> From Internet Archive Wayback Machine
“To Whom It May Concern,
My name is <Your Name>, the owner of <Domain Name>. I am hereby requesting the removal of <Domain Name> content from <Date of Oldest Archive> up to the present date and forward.
Please find attached a formal DMCA notice and evidence of my ownership over <Domain Name>.
Thank you for your prompt response.
Remember that the above draft is a recommendation rather than a strict guide on composing your e-mail. If you consult with an attorney, they might be able to draft the message for you, as can any 3rd party agency you contract. Being polite and well-spoken will help improve your chances of cooperation from archive.org. However, you might have noted the line about evidence of ownership over the domain. That will factor into the next step you will need to take.
Step #4: Prove Ownership of the Domain
To secure your takedown request’s authority, you must prove that you are the owner of the domain you want to be removed from the Wayback Machine. Unfortunately, archive.org does not maintain records of who owns the website and will not be aware that you own the domain unless informed. This means you will need to locate some form of physical evidence that you own and are responsible for the domain in question. Fortunately, there should be no shortage of evidence for you to access if you have a working knowledge of your domain’s host service.
The best evidence will be a receipt or invoice from the host for your domain that lists you as their client and owner of the domain. Fortunately, most providers offer a comprehensive history of these invoices that you can access at any time and attach to your correspondence with archive.org as proof of ownership. In a worst-case scenario, you might have to send an e-mail to the hosting provider to get a copy of the last invoice, but this should not take long.
The evidence you attach to the e-mail to archive.org will be compared to public domain records, meaning any discrepancies will result in a denied takedown request. However, if your ownership over the domain is proven with or without the evidence you submitted, your chances of an approved request increase. That said, while it is possible to send in the request without evidence, archive.org will almost certainly respond with a request for proof of ownership. Ultimately, it is easier and quicker to attach the proof from the beginning.
Once you have completed your request with evidence, there is only one thing left to do. You will have to play the waiting game.
Step #5: Wait for a Response, and Follow Up
Sending in a request is simpler than you might have expected, but the time it takes to get an answer is less straightforward. Unfortunately, takedown requests take time and are not instantaneous. However, the time it takes for the request to be processed and the page removed can vary wildly. You can rest assured that archive.org will reply to your request, but it might take a few days to hear back as they field your request and any others they receive.
If you are concerned about your request, we recommend that you send a follow-up e-mail to archive.org if you do not hear back within 72 hours. However, remember that archive.org is based in California and operates on U.S. Pacific Time, so you must account for the different time zones. They also do not work through weekends or on U.S. holidays. Keeping these details in mind might explain why a response is not as forthcoming as you hoped. That said, it is important to remember that if this seems daunting, it is possible to retain professional assistance.
Take Back Your Reputation!
While archive.org is not a malicious website, its Wayback Machine can complicate your attempts to cultivate a specific reputation. Fortunately, despite the internet’s power to preserve the information you attempt to delete, archive.org allows provisions to have the pages taken down. While the burden of proof lies with you, you must confirm that you are the domain’s original owner. Once you have accomplished this, sending a takedown request is comparatively simple. However, you have options if you feel overwhelmed and want support in sending the takedown request.
We at Reputation, for example, can be of assistance in taking down pages on archive.org’s website. We offer a wide array of reputation marketing services to protect you from the harmful press. As your representatives, we could process the request on your behalf so you can focus on your main work. Managing a reputation is difficult without combatting other websites like archive.org over archived content. So, if you need help in protecting your domain and image, visit our website and take your reputation back!