Re: ML-Policy and tesseract-ocr
Hi Marvin,
On 2019-08-12 18:35, Marvin Renich wrote:
> * Mo Zhou <[email protected]> [190812 10:31]:
>> To this end, I wrote the policy #5 [3]:
>>
>> A package that includes a machine learning model, must also include
>> the corresponding training program, or depend on the package that
>> provides
>> the corresponding training program.
>>
>> Does that make sense? If it looks good, then the solution
>> for this bug is already obvious enough.
>
> Perhaps I am not interpreting what you are saying correctly, but I would
> say it is wrong. The corresponding training program must be packaged in
> Debian, but it seems unlikely that there would be a binary package
> dependency from the model to the training program
The original "policy" was based on a rather strong restriction that
training script must be present when an ML model has been installed.
I meant "Depends" on the original text, but perhaps "Suggests" is better
than that since "Depends" may introduce circular dependency or the
arch-all-dep-on-arch-any problem.
That means "depend on ..." could be revised to "`Suggests:`"
> (result of running the training program with
> specific input data, if I understand correctly?)
Yes, correct.
> The source package would need to Build-Depend on the training
> program and its inputs, but in general there would not need to be a
> normal Depends.
I see. The idea is that an ELF binary (ML model) doesn't have to
Depend on it's compiler (training program) and source (input data).
This makes sense to me and the "Suggest:" restriction may be better.
The "Suggest:" relationship implicitly hints the user about the
following questions:
1. what is the binary blob /usr/.../foobar.ml-model installed by the
package foobar?
2. where did these digits come from?
3. how can I well understand how this model is created by the
original author?
4. how do I obtain a similar model with my own dataset?
etc.
For most users I think they'll not try do actually dig into
the detail of the model, or even try to understand what it
is. So changing the model -> training script relationship
from "Depends" to "Suggest" could also avoid pulling the
whole stack of training software when installing the model.
> Perhaps you were just being sloppy about Build-Depends vs Depends, but
> when writing policy it is important to be very specific about that.
Thanks, I'll keep that in mind.
Reply to: