caching/vendor/github.com/google/licenseclassifier
Matt Moore 08192258fa
[master] Auto-update dependencies (#265)
Produced via:
  `./hack/update-deps.sh --upgrade && ./hack/update-codegen.sh`
/assign n3wscott vagababov
/cc n3wscott vagababov
2020-05-05 08:53:44 -07:00
..
internal/sets Update test-infra (#21) 2019-01-17 17:08:31 -08:00
stringclassifier Migrate caching to go mod (#258) 2020-04-27 14:35:51 -07:00
.travis.yml [master] Auto-update dependencies (#265) 2020-05-05 08:53:44 -07:00
CHANGELOG Migrate caching to go mod (#258) 2020-04-27 14:35:51 -07:00
CONTRIBUTING.md Migrate caching to go mod (#258) 2020-04-27 14:35:51 -07:00
LICENSE Update test-infra (#21) 2019-01-17 17:08:31 -08:00
README.md Migrate caching to go mod (#258) 2020-04-27 14:35:51 -07:00
classifier.go [master] Auto-update dependencies (#265) 2020-05-05 08:53:44 -07:00
file_system_resources.go [master] Auto-update dependencies (#265) 2020-05-05 08:53:44 -07:00
forbidden.go Update test-infra (#21) 2019-01-17 17:08:31 -08:00
go.mod [master] Auto-update dependencies (#265) 2020-05-05 08:53:44 -07:00
go.sum [master] Auto-update dependencies (#265) 2020-05-05 08:53:44 -07:00
license_type.go [master] Auto-update dependencies (#265) 2020-05-05 08:53:44 -07:00

README.md

License Classifier

Build status

Introduction

The license classifier is a library and set of tools that can analyze text to determine what type of license it contains. It searches for license texts in a file and compares them to an archive of known licenses. These files could be, e.g., LICENSE files with a single or multiple licenses in it, or source code files with the license text in a comment.

A "confidence level" is associated with each result indicating how close the match was. A confidence level of 1.0 indicates an exact match, while a confidence level of 0.0 indicates that no license was able to match the text.

Adding a new license

Adding a new license is straight-forward:

  1. Create a file in licenses/.

    • The filename should be the name of the license or its abbreviation. If the license is an Open Source license, use the appropriate identifier specified at https://spdx.org/licenses/.
    • If the license is the "header" version of the license, append the suffix ".header" to it. See licenses/README.md for more details.
  2. Add the license name to the list in license_type.go.

  3. Regenerate the licenses.db file by running the license serializer:

    $ license_serializer -output licenseclassifier/licenses
    
  4. Create and run appropriate tests to verify that the license is indeed present.

Tools

Identify license

identify_license is a command line tool that can identify the license(s) within a file.

$ identify_license LICENSE
LICENSE: GPL-2.0 (confidence: 1, offset: 0, extent: 14794)
LICENSE: LGPL-2.1 (confidence: 1, offset: 18366, extent: 23829)
LICENSE: MIT (confidence: 1, offset: 17255, extent: 1059)

License serializer

The license_serializer tool regenerates the licenses.db archive. The archive contains preprocessed license texts for quicker comparisons against unknown texts.

$ license_serializer -output licenseclassifier/licenses

This is not an official Google product (experimental or otherwise), it is just code that happens to be owned by Google.