Changes: Building Community around Open Source vs. Open Data

Revision as of 19:09, 15 July 2012

host: Michael Burnstein

notes: Andrew Davis similarities and differences b/w open data communities and open source communities*open data communities have more internal debate ovver licensing

the field is new and there has not been enough litigation for businesses to guage risk
Copyright restrictions are weightier for data vs software

- in the US a database (of copyrightable facts) cannot itself be copyrighted -- aggregation isn't "creative"
- in Europe aggregation is considered to be a creative task (this has hindered open data startups from starting in Europe)
- in the US you can extract the facts from a dataset and not be liable for copyright infringement -- The copyright situation in europe is more tenuous
- open data communities are difficult to start because rules are not the same all over the world
- restrictions require that those joining/using data startups to be known personally to insure that they are not litigous
- open street maps had to abandon a CC viral license model

 How do we incentiveize contribution to Open Data?*people will put their data into the public domain to contribute to a specific cause

You don't have to appeal to "what's right" or "social good"

- bringing business to open data requires showing business benefits
- Opening research data

- - peer review in the public space is one way of convincing academia
  - getting rid of database access fees is something academics are highly interested in
  - scandals in falsafied data has brought public knowledge of open data to the netherlands (stoppel, stannel (sp?))

"don't try to evangelize non-geeks about the benefit of open data because they don't care"http://xkcd.com/743/ Read vs. Write Access in Open Data (access to data vs. contributing to data)*how do you verify public contribution

- community verification (10 people agree, so it's probably right)
- trusted users
- community users can crowsource coverage of data verification
- don't allow public access
- multiple repositories can be used to verify each other
- Closed Source code can restrict duplication and insures quality demands
- humans must be involved, you can't autiomate all verification
What happens when bots target a dataset for corruption?

- time thresholds are often used to prevent bot corruption

Public vs. Private Data*you must track the source of all data

medical data has certain fields that can never be shared
lawyer wiki is completely closed to allow for open discussion
is it enough to close or denormalize some data to the public to maintain privacy?
allow users to express their level of consent with clear wording

Examples*Tri-Met routing built upon open street map

- tri met is responsible for route and timetable accuracy
- tri met is not responsible for map accuracy
- tri-met verifies the open street map data every night and submits corrections back to the community
First Monday
AOL anonymized search query data was quickly de-anonymized

@@ Line 1: / Line 1: @@
+host: <span style="color:rgb(0,0,0);font-family:Arial,sans-serif;line-height:17px;">Michael Burnstein</span>
-live notes: http://piratepad.net/XtHgXNz0bu
+<span style="color:rgb(0,0,0);font-family:Arial,sans-serif;line-height:17px;">notes: Andrew Davis</span>
+<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">similarities and differences b/w open data communities and open source communities</span>*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">open data communities have more internal debate ovver licensing</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">the field is new and there has not been enough litigation for businesses to guage risk</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Copyright restrictions are weightier for data vs software</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">in the US a database (of copyrightable facts) cannot itself be copyrighted -- aggregation isn't "creative"</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">in Europe aggregation is considered to be a creative task (this has hindered open data startups from starting in Europe)</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">in the US you can extract the facts from a dataset and not be liable for copyright infringement -- The copyright situation in europe is more tenuous</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">open data communities are difficult to start because rules are not the same all over the world</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">restrictions require that those joining/using data startups to be known personally to insure that they are not litigous</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">open street maps had to abandon a CC viral license model</span>
+  <span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">How do we incentiveize contribution to Open Data?</span>*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">people will put their data into the public domain to contribute to a specific cause</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">You don't have to appeal to "what's right" or "social good"</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">bringing business to open data requires showing business benefits</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Opening research data</span>
+***<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">peer review in the public space is one way of convincing academia</span>
+***<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">getting rid of database access fees is something academics are highly interested in</span>
+***<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">scandals in falsafied data has brought public knowledge of open data to the netherlands (stoppel, stannel (sp?))</span>
+ <span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">"don't try to evangelize non-geeks about the benefit of open data because they don't care"</span><span class="author-g-3thw8rb4aumbfkve url" style="cursor:auto;padding-top:1px;padding-bottom:1px;">http://xkcd.com/743/</span> Read vs. Write Access in Open Data (access to data vs. contributing to data)*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">how do you verify public contribution</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">community verification (10 people agree, so it's probably right)</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">trusted users</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">community users can crowsource coverage of data verification</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">don't allow public access</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">multiple repositories can be used to verify each other</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Closed Source code can restrict duplication and insures quality demands</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">humans must be involved, you can't autiomate all verification</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">What happens when bots target a dataset for corruption?</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">time thresholds are often used to prevent bot corruption</span>
+ <span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Public vs. Private Data</span>*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">you must track the source of all data</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">medical data has certain fields that can never be shared</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">lawyer wiki is completely closed to allow for open discussion</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">is it enough to close or denormalize some data to the public to maintain privacy?</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">allow users to express their level of consent with clear wording</span>
+ <span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Examples</span>*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Tri-Met routing built upon open street map</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">tri met is responsible for route and timetable accuracy</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">tri met is not responsible for map accuracy</span>
+**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">tri-met verifies the open street map data every night and submits corrections back to the community</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">First Monday</span>
+*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">AOL anonymized search query data was quickly de-anonymized</span>
+<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">
+</span>