Community Leadership Summit Wiki
No edit summary
(first run through)
Line 1: Line 1:
   
  +
host: <span style="color:rgb(0,0,0);font-family:Arial,sans-serif;line-height:17px;">Michael Burnstein</span>
live notes: http://piratepad.net/XtHgXNz0bu
 
  +
  +
<span style="color:rgb(0,0,0);font-family:Arial,sans-serif;line-height:17px;">notes: Andrew Davis</span>
  +
<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">similarities and differences b/w open data communities and open source communities</span>*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">open data communities have more internal debate ovver licensing</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">the field is new and there has not been enough litigation for businesses to guage risk</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Copyright restrictions are weightier for data vs software</span>
  +
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">in the US a database (of copyrightable facts) cannot itself be copyrighted -- aggregation isn't "creative"</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">in Europe aggregation is considered to be a creative task (this has hindered open data startups from starting in Europe)</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">in the US you can extract the facts from a dataset and not be liable for copyright infringement -- The copyright situation in europe is more tenuous</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">open data communities are difficult to start because rules are not the same all over the world</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">restrictions require that those joining/using data startups to be known personally to insure that they are not litigous</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">open street maps had to abandon a CC viral license model</span>
  +
<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">How do we incentiveize contribution to Open Data?</span>*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">people will put their data into the public domain to contribute to a specific cause</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">You don't have to appeal to "what's right" or "social good"</span>
  +
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">bringing business to open data requires showing business benefits</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Opening research data</span>
  +
  +
***<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">peer review in the public space is one way of convincing academia</span>
  +
***<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">getting rid of database access fees is something academics are highly interested in</span>
  +
***<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">scandals in falsafied data has brought public knowledge of open data to the netherlands (stoppel, stannel (sp?))</span>
  +
<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">"don't try to evangelize non-geeks about the benefit of open data because they don't care"</span><span class="author-g-3thw8rb4aumbfkve url" style="cursor:auto;padding-top:1px;padding-bottom:1px;">http://xkcd.com/743/</span> Read vs. Write Access in Open Data (access to data vs. contributing to data)*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">how do you verify public contribution</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">community verification (10 people agree, so it's probably right)</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">trusted users</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">community users can crowsource coverage of data verification</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">don't allow public access</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">multiple repositories can be used to verify each other</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Closed Source code can restrict duplication and insures quality demands</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">humans must be involved, you can't autiomate all verification</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">What happens when bots target a dataset for corruption?</span>
  +
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">time thresholds are often used to prevent bot corruption</span>
  +
<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Public vs. Private Data</span>*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">you must track the source of all data</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">medical data has certain fields that can never be shared</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">lawyer wiki is completely closed to allow for open discussion</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">is it enough to close or denormalize some data to the public to maintain privacy?</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">allow users to express their level of consent with clear wording</span>
  +
<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Examples</span>*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">Tri-Met routing built upon open street map</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">tri met is responsible for route and timetable accuracy</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">tri met is not responsible for map accuracy</span>
  +
**<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">tri-met verifies the open street map data every night and submits corrections back to the community</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">First Monday</span>
  +
*<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">AOL anonymized search query data was quickly de-anonymized</span>
  +
<span class="author-g-3thw8rb4aumbfkve" style="cursor:auto;padding-top:1px;padding-bottom:1px;">
  +
</span>

Revision as of 19:09, 15 July 2012

host: Michael Burnstein

notes: Andrew Davis similarities and differences b/w open data communities and open source communities*open data communities have more internal debate ovver licensing

  • the field is new and there has not been enough litigation for businesses to guage risk
  • Copyright restrictions are weightier for data vs software
    • in the US a database (of copyrightable facts) cannot itself be copyrighted -- aggregation isn't "creative"
    • in Europe aggregation is considered to be a creative task (this has hindered open data startups from starting in Europe)
    • in the US you can extract the facts from a dataset and not be liable for copyright infringement -- The copyright situation in europe is more tenuous
    • open data communities are difficult to start because rules are not the same all over the world
    • restrictions require that those joining/using data startups to be known personally to insure that they are not litigous
    • open street maps had to abandon a CC viral license model
 How do we incentiveize contribution to Open Data?*people will put their data into the public domain to contribute to a specific cause
  • You don't have to appeal to "what's right" or "social good"
    • bringing business to open data requires showing business benefits
    • Opening research data
      • peer review in the public space is one way of convincing academia
      • getting rid of database access fees is something academics are highly interested in
      • scandals in falsafied data has brought public knowledge of open data to the netherlands (stoppel, stannel (sp?))
"don't try to evangelize non-geeks about the benefit of open data because they don't care"http://xkcd.com/743/ Read vs. Write Access in Open Data (access to data vs. contributing to data)*how do you verify public contribution
    • community verification (10 people agree, so it's probably right)
    • trusted users
    • community users can crowsource coverage of data verification
    • don't allow public access
    • multiple repositories can be used to verify each other
    • Closed Source code can restrict duplication and insures quality demands
    • humans must be involved, you can't autiomate all verification
  • What happens when bots target a dataset for corruption?
    • time thresholds are often used to prevent bot corruption
Public vs. Private Data*you must track the source of all data
  • medical data has certain fields that can never be shared
  • lawyer wiki is completely closed to allow for open discussion
  • is it enough to close or denormalize some data to the public to maintain privacy?
  • allow users to express their level of consent with clear wording
Examples*Tri-Met routing built upon open street map
    • tri met is responsible for route and timetable accuracy
    • tri met is not responsible for map accuracy
    • tri-met verifies the open street map data every night and submits corrections back to the community
  • First Monday
  • AOL anonymized search query data was quickly de-anonymized