Is it OK to show my database schema to Chat GPT and GitHub Copilot?

on June 23, 2023

I’m answering two questions from Brent Ozar’s list of user questions open for answers.

Q: What’s your opinion of entering confidential info in chat gpt? Will we see AI therapist chat bots?

Q: In terms of security, is it OK to expose your database to tools like GitHub Copilot in Azure Data Studio? Someone will know that your email address column is not encrypted or a stored procedure is not parsing its input parameters when dynamic T-SQL is built.

In general, it’s best to assume that any data you enter into an online website or into an extension or tool may be stored, used, shared, or sold– until you find documentation that explicitly promises otherwise. This is true both for generative AI tools as well as any other tooling.

It’s also best to chat with your leadership about this. Many companies have very specific policies about what you can and cannot do with company data.

ChatGPT and OpenAI

Don’t enter someone else’s confidential info into the ChatGPT website. Don’t enter your own confidential info into the ChatGPT website unless you want it to no longer be confidential. The privacy policy and related article, How ChatGPT and Our Language Models Are Developed go into more information about how data shared with ChatGPT’s website is used.

Note that the ChatGPT website is not the same as the Azure Open AI service. These are different tools, and they have different privacy policies. For the Azure Open AI Service, the article Data, privacy, and security for Azure OpenAI service covers what data is processed and how it is used.

It is possible to use the Open AI service under a data agreement that opts you out of the logging and human review process, but you need to specifically arrange that. In other words, it is possible for applications to be built that can use even very sensitive, highly confidential, or tightly regulated data safely with these services.

GitHub Copilot

If you’re using GitHub Copilot for business, check out the GitHub Copilot for Business Privacy Statement.

The statement explains that snippets of your code that are used to provide suggestions are discarded after suggestions are returned, and are not retained. It does collect some telemetry data regarding how you use the IDE and editor along with general usage data.

GitHub Copilot has now been around long enough that many software development companies have reviewed the product and created a policy about whether or not it should be used by the organization, so checking with your management can save you some time especially in this case.

What about all these other “copilots?”

Lots of companies, including Microsoft, are introducing assistants and copilots throughout their platforms.

I’m sorry, but we’re gonna get to check the data privacy statements for each one, friends. Here’s an example for the Microsoft Power Platform.

AI therapists have been around for a while

AI Therapist chatbots already exist, and have for a while, actually!

One that I really like is “Woebot” – https://woebothealth.com/. Woebot is a free app (at least for now, has been for a while though) that specializes in cognitive behavioral therapy style conversations.

The Woebot FAQ talks a bit about how the data is stored and options to delete it, and the privacy policy specifies under what conditions they will share data with third parties.