Petter Reinholdtsen

Legal to share more than 11,000 movies listed on IMDB?
7th January 2018

I've continued to track down list of movies that are legal to distribute on the Internet, and identified more than 11,000 title IDs in The Internet Movie Database (IMDB) so far. Most of them (57%) are feature films from USA published before 1923. I've also tracked down more than 24,000 movies I have not yet been able to map to IMDB title ID, so the real number could be a lot higher. According to the front web page for Retro Film Vault, there are 44,000 public domain films, so I guess there are still some left to identify.

The complete data set is available from a public git repository, including the scripts used to create it. Most of the data is collected using web scraping, for example from the "product catalog" of companies selling copies of public domain movies, but any source I find believable is used. I've so far had to throw out three sources because I did not trust the public domain status of the movies listed.

Anyway, this is the summary of the 28 collected data sources so far:

 2352 entries (   66 unique) with and 15983 without IMDB title ID in free-movies-archive-org-search.json
 2302 entries (  120 unique) with and     0 without IMDB title ID in free-movies-archive-org-wikidata.json
  195 entries (   63 unique) with and   200 without IMDB title ID in free-movies-cinemovies.json
   89 entries (   52 unique) with and    38 without IMDB title ID in free-movies-creative-commons.json
  344 entries (   28 unique) with and   655 without IMDB title ID in free-movies-fesfilm.json
  668 entries (  209 unique) with and  1064 without IMDB title ID in free-movies-filmchest-com.json
  830 entries (   21 unique) with and     0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json
   19 entries (   19 unique) with and     0 without IMDB title ID in free-movies-imdb-c-expired-gb.json
 6822 entries ( 6669 unique) with and     0 without IMDB title ID in free-movies-imdb-c-expired-us.json
  137 entries (    0 unique) with and     0 without IMDB title ID in free-movies-imdb-externlist.json
 1205 entries (   57 unique) with and     0 without IMDB title ID in free-movies-imdb-pd.json
   84 entries (   20 unique) with and   167 without IMDB title ID in free-movies-infodigi-pd.json
  158 entries (  135 unique) with and     0 without IMDB title ID in free-movies-letterboxd-looney-tunes.json
  113 entries (    4 unique) with and     0 without IMDB title ID in free-movies-letterboxd-pd.json
  182 entries (  100 unique) with and     0 without IMDB title ID in free-movies-letterboxd-silent.json
  229 entries (   87 unique) with and     1 without IMDB title ID in free-movies-manual.json
   44 entries (    2 unique) with and    64 without IMDB title ID in free-movies-openflix.json
  291 entries (   33 unique) with and   474 without IMDB title ID in free-movies-profilms-pd.json
  211 entries (    7 unique) with and     0 without IMDB title ID in free-movies-publicdomainmovies-info.json
 1232 entries (   57 unique) with and  1875 without IMDB title ID in free-movies-publicdomainmovies-net.json
   46 entries (   13 unique) with and    81 without IMDB title ID in free-movies-publicdomainreview.json
  698 entries (   64 unique) with and   118 without IMDB title ID in free-movies-publicdomaintorrents.json
 1758 entries (  882 unique) with and  3786 without IMDB title ID in free-movies-retrofilmvault.json
   16 entries (    0 unique) with and     0 without IMDB title ID in free-movies-thehillproductions.json
   63 entries (   16 unique) with and   141 without IMDB title ID in free-movies-vodo.json
11583 unique IMDB title IDs in total, 8724 only in one list, 24647 without IMDB title ID

I keep finding more data sources. I found the cinemovies source just a few days ago, and as you can see from the summary, it extended my list with 63 movies. Check out the mklist-* scripts in the git repository if you are curious how the lists are created. Many of the titles are extracted using searches on IMDB, where I look for the title and year, and accept search results with only one movie listed if the year matches. This allow me to automatically use many lists of movies without IMDB title ID references at the cost of increasing the risk of wrongly identify a IMDB title ID as public domain. So far my random manual checks have indicated that the method is solid, but I really wish all lists of public domain movies would include unique movie identifier like the IMDB title ID. It would make the job of counting movies in the public domain a lot easier.

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

Tags: english, opphavsrett, verkidetfri.

Created by Chronicle v4.6