@gendercensus Echoing a lot of the replies here. 50,000 unique identifiers, 5M data points is really trivial for most decent computers if managed correctly. I recommend not using excel or libre office as it'll be frustrating. You will want to use postgresql, R, or Julia as recommended. If you literally have zero experience with these languages I recommend taking up people's offers to help.
My experience is in R, and it literally is just a few lines of code to get the data loaded in a format that will allow exploration through many different statistical lenses. R was designed specifically for statistical analysis, and slicing and dicing large data sets (orders of magnitude larger than what you are working with). There are lots of packages that enable pretty sophisticated analyses with a single line of code.
I am also willing to assist. You've done the hard part of creating the survey, marketing it, and getting responses. Let others help <3
@gendercensus It'll be fine when doing simple distributions for each individual question as 50,000 rows per sheet is manageable. But, when you want to create statistics across multiple questions to tease out potentially significant and actionable nuances (e.g., x% of age group y represent the majority of answer a in question 3), the spreadsheet format and interface will start to be a barrier, and increase the likelihood of errors.
I have experience working with large complex spreadsheets, and I find them very challenging to debug (I'm currently doing this for a spreadsheet with 96 sheets, 127,141 cells, 56,917 formulas, and 864 charts so I have experience here).
R allows analysis of statistically significant correlations across any number of questions with literally 1 - 5 lines of code. It's much easier to debug. I will also claim that, for statistics, you'll have many more options and much more confidence in the results using R.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!