Danish Government full database (11GB) of approximately 1,000,000 companies in the dataset got leaked via a torrent site, thepiratebay. It looks like the hack was motivated to free Aaron Swartz and against Danish government to require citizens to provide data for government databases.
The press release is as follows:
The files in this torrent contain of the snapshot the the Danish Government database of companies. ΓΓé¼┼ôCVR, Det Centrale VirksomhedsregisterΓΓé¼┬¥ translates directly to ΓΓé¼┼ôThe Central Company RegisterΓΓé¼┬¥. The contents of the database is currently browsable on the cvr.dk website, but the database is not available in bulk unless you purchase a license.
The snapshot was obtained during the summer of 2011 by systematically harvesting data from the public parts of the cvr.dk website.
CVRfull.zip: Archive containing xml files with company information, including html from cvr.dk
CVRCompact: As above, but without html
The included fields are as follows:
cvr: CVR-number (8-digit unique id, last digit is a checksum)
corporationtype: Integer denoting type of company, eg. ΓΓé¼┼ô10 EnkeltmandsvirksomhedΓΓé¼┬¥ (Sole Proprietorship)
incorporated: Date of registration
dissolved: Date of dissolution, if dissolved
industry: Code of the companyΓΓé¼Γäós main areas of business, eg. ΓΓé¼┼ô494100 VejgodstransportΓΓé¼┬¥ (Transport of goods by road)
documentcontent: Html of company page from cvr.dk (minus header and footer), only available in the ΓΓé¼┼ôfullΓΓé¼┬¥ version
The other fields are name, address, phone, fax and email — they should be self-explanatory. If youΓΓé¼Γäóre only interested in the information in these fields you should just get the compact file. If you want to parse more info out of the page you should get the full version which includes html.
There are approximately 1,000,000 companies in the dataset. CVR reports 550,000 companies in existence, but that is likely not including the dissolved ones.
This data is made freely available because it is wrong for the Danish government to require citizens to provide data for government databases, then use taxpayer money to gather, collate and store that data, only to ask citizens to pay if they want access to that same information from the the government.
Free Aaron Swartz
Complete leak can be download from https://thepiratebay.org/torrent/6619217/