> AI companies can also not just 'hide' false information from users while they internally still process false information
To my understanding of this:
1. The false claims were hallucinations, i.e. do not exist in training data
2. OpenAI have filtered the output to prevent it generating these particular false claims
Seems a tricky situation if just the false claim being represented ephemerally in internal model activations, or the model having weights that can lead it to that claim, is defamation.
> Seems a tricky situation if just the false claim being represented ephemerally in internal model activations, or the model having weights that can lead it to that claim, is defamation.
It's not so much defamation, but compliance with the GDPR that's the issue. A key tenet of the GDPR is that people should have the right to see what personal identifying information an organisation is keeping about them and to have it deleted or corrected if the person is unhappy with that (there are exceptions such as keeping data for law enforcement/detection etc).
Just because the PII is encoded into model weights doesn't mean that an organisation can then deny that it exists, especially when the outputs show some of that data.
For those exercising their GDPR right to rectification, my general feeling is that a filter applied onto the model to prevent the invalid output should be considered sufficient here. Judging a system by an incomplete internal representation part way through the computation feels like a bar that essentially all systems would fail - like if someone's record initially has them as 0 years old for a few microseconds before fields are initialized, or if a data structure on disk is made up of a base checkpoint then sequentially-applied diffs.
In a practical sense, for cases like the article's where the error seemingly arises out of random noise (due to relative obscurity of information about the subject) it's unclear to me to what extent that the false claim is really "in the model weights" in any way at all.
In a potentially more philosophical sense, even for cases where it's not down to noise I'd question whether the weights/activations have real-world meaning beyond what they are presented to the user as. If some bool in a user's record is shown on the GUI as "Married" then that's the meaning of that bool. Equally if some activations are mapped to "Insufficient information about John Smith" instead of "John Smith is a murderer" then I feel that becomes the meaning of those activations and weights.
I think the issue is whether an individual's rights under GDPR are being honoured or not. If the applied filter doesn't allow for relatively easy bypasses that demonstrates that either the data wasn't sufficiently corrected or deleted, then I think that would work. However, if an organisation claims to have removed that PII, but subsequently can leak it, then the filter is simply a band-aid and the organisation isn't GDPR compliant.
There's also a major problem with LLMs if the training data includes PII that the individual never gave consent for it to be used for that purpose.
If the EU applies the regulations as the group mentioned in the article alleges, it would mean no LLM based tools can be legal in the EU. And then the EU will wonder why they’re lacking entrepreneurs or whatever, without connecting the dots. I hope they instead revise the GDPR.
It seems completely reasonable that if an organisation is going to use personal identifying information that they are responsible for correcting it when necessary. It's not good enough to just throw your hands up and claim that you can't alter it.
Maybe LLMs need to be not be fed tons of real life information if they are then going to produce non-correctable lies about real people. Something like an anonymising filter may be needed.
> AI companies can also not just 'hide' false information from users while they internally still process false information
To my understanding of this:
1. The false claims were hallucinations, i.e. do not exist in training data
2. OpenAI have filtered the output to prevent it generating these particular false claims
Seems a tricky situation if just the false claim being represented ephemerally in internal model activations, or the model having weights that can lead it to that claim, is defamation.
> Seems a tricky situation if just the false claim being represented ephemerally in internal model activations, or the model having weights that can lead it to that claim, is defamation.
It's not so much defamation, but compliance with the GDPR that's the issue. A key tenet of the GDPR is that people should have the right to see what personal identifying information an organisation is keeping about them and to have it deleted or corrected if the person is unhappy with that (there are exceptions such as keeping data for law enforcement/detection etc).
Just because the PII is encoded into model weights doesn't mean that an organisation can then deny that it exists, especially when the outputs show some of that data.
For those exercising their GDPR right to rectification, my general feeling is that a filter applied onto the model to prevent the invalid output should be considered sufficient here. Judging a system by an incomplete internal representation part way through the computation feels like a bar that essentially all systems would fail - like if someone's record initially has them as 0 years old for a few microseconds before fields are initialized, or if a data structure on disk is made up of a base checkpoint then sequentially-applied diffs.
In a practical sense, for cases like the article's where the error seemingly arises out of random noise (due to relative obscurity of information about the subject) it's unclear to me to what extent that the false claim is really "in the model weights" in any way at all.
In a potentially more philosophical sense, even for cases where it's not down to noise I'd question whether the weights/activations have real-world meaning beyond what they are presented to the user as. If some bool in a user's record is shown on the GUI as "Married" then that's the meaning of that bool. Equally if some activations are mapped to "Insufficient information about John Smith" instead of "John Smith is a murderer" then I feel that becomes the meaning of those activations and weights.
I think the issue is whether an individual's rights under GDPR are being honoured or not. If the applied filter doesn't allow for relatively easy bypasses that demonstrates that either the data wasn't sufficiently corrected or deleted, then I think that would work. However, if an organisation claims to have removed that PII, but subsequently can leak it, then the filter is simply a band-aid and the organisation isn't GDPR compliant.
There's also a major problem with LLMs if the training data includes PII that the individual never gave consent for it to be used for that purpose.
If the EU applies the regulations as the group mentioned in the article alleges, it would mean no LLM based tools can be legal in the EU. And then the EU will wonder why they’re lacking entrepreneurs or whatever, without connecting the dots. I hope they instead revise the GDPR.
It seems completely reasonable that if an organisation is going to use personal identifying information that they are responsible for correcting it when necessary. It's not good enough to just throw your hands up and claim that you can't alter it.
Maybe LLMs need to be not be fed tons of real life information if they are then going to produce non-correctable lies about real people. Something like an anonymising filter may be needed.