Free / Cost Effective Database for 100+GB Data

Questions on how we spend our money and our time - consumer goods and services, home and vehicle, leisure and recreational activities
Post Reply
Topic Author
DarkHelmetII
Posts: 426
Joined: Mon Jul 24, 2017 12:25 pm

Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Sun Apr 21, 2019 11:24 am

I want to store 100+ GB of data in the cloud or on my computer in a database. Six tables, simple structured data; just a lot of it.

Any ideas on cost effective services? More specifics:

1) Security is not a significant issue as raw data is from the public domain
2) Must be able to connect to the database with open source packages such as R

User avatar
curiouskitty
Posts: 236
Joined: Sun Oct 18, 2009 1:40 am

Re: Free / Cost Effective Database for 100+GB Data

Post by curiouskitty » Sun Apr 21, 2019 11:34 am

I would use PostgreSQL (or MySQL) for sure. Then you can query the data in any major language like R.

If you have time columns or something else to filter by, you can add a quick index or two to make querying more efficient.

You can probably just use your computer but if its too slow or you want better persistence across devices, then you can set up a database on Amazon EC2 or pay a little more to use Amazon RDS and not need to configure it.

Perhaps you could consider simple text searching or MapReduce as other alternatives, but SQL is likely to get you the furthest with minimal infrastructure setup.

nordsteve
Posts: 732
Joined: Sun Oct 05, 2008 9:23 am

Re: Free / Cost Effective Database for 100+GB Data

Post by nordsteve » Sun Apr 21, 2019 12:54 pm

A local database with storage in a SSD is going to be your best bet. Cloud SQL databases are probably going to be more expensive than you need for your application.

A local pgsql or MS SQL Server instance will handle this just fine.

Topic Author
DarkHelmetII
Posts: 426
Joined: Mon Jul 24, 2017 12:25 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Fri Apr 26, 2019 3:49 pm

nordsteve wrote:
Sun Apr 21, 2019 12:54 pm
A local database with storage in a SSD is going to be your best bet. Cloud SQL databases are probably going to be more expensive than you need for your application.

A local pgsql or MS SQL Server instance will handle this just fine.
I've now got 33 million rows x 125 columns in a single postgresql table. Takes forever to run a query (e.g. minutes to simply return the # of records). What are my options for optimizing this thing?

Also, realizing this is the bogleheads forum ... what are the relevant "techheads" forums out there to have such conversations?

User avatar
elcadarj
Posts: 173
Joined: Tue Aug 30, 2016 8:21 am

Re: Free / Cost Effective Database for 100+GB Data

Post by elcadarj » Fri Apr 26, 2019 4:06 pm

Try StackOverflow. I’ve had good luck with advanced excel questions.
https://stackoverflow.com/questions/tagged/mysql

mrmass
Posts: 275
Joined: Thu Jul 26, 2018 6:35 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by mrmass » Fri Apr 26, 2019 4:14 pm

DarkHelmetII wrote:
Fri Apr 26, 2019 3:49 pm
nordsteve wrote:
Sun Apr 21, 2019 12:54 pm
A local database with storage in a SSD is going to be your best bet. Cloud SQL databases are probably going to be more expensive than you need for your application.

A local pgsql or MS SQL Server instance will handle this just fine.
I've now got 33 million rows x 125 columns in a single postgresql table. Takes forever to run a query (e.g. minutes to simply return the # of records). What are my options for optimizing this thing?

Also, realizing this is the bogleheads forum ... what are the relevant "techheads" forums out there to have such conversations?
Post over in https://www.reddit.com/r/PostgreSQL/
Or something database related for more help.

inbox788
Posts: 6662
Joined: Thu Mar 15, 2012 5:24 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by inbox788 » Fri Apr 26, 2019 4:20 pm

DarkHelmetII wrote:
Sun Apr 21, 2019 11:24 am
I want to store 100+ GB of data in the cloud or on my computer in a database. Six tables, simple structured data; just a lot of it.

Any ideas on cost effective services? More specifics:

1) Security is not a significant issue as raw data is from the public domain
2) Must be able to connect to the database with open source packages such as R
Well, Google Drive and One Drive are at 15 GB free, so 7+ accounts? Or you might go for one of the promotions like 100GB free for 2 years.

https://www.makeuseof.com/tag/free-clou ... gle-drive/

Here's 100GB free, but I think it's only for photos. https://degoo.com/

As far as the database, it all depends on what you're doing with all that data. If you're just storing, it can go anywhere, but if you're doing some crunching, decide how it's going to be done. You're not going to want to transfer 100GB back and forth a lot. What type of connection are you using and what are the download and upload speeds? Do a simple calculation, how long will it take you to upload the 100GB data and download it again.

michaeljc70
Posts: 5815
Joined: Thu Oct 15, 2015 3:53 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by michaeljc70 » Fri Apr 26, 2019 7:15 pm

We need more info. 33 million rows? Are there indexes on the table that contain the column you are filtering (searching) on? That is important. One client of mine had a database that inserted 12 million rows a minute (internet ads).

Jenkins
Posts: 1
Joined: Thu Nov 15, 2018 11:06 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by Jenkins » Fri Apr 26, 2019 8:14 pm

Postgres is all I use for database, it's great.

But count() in Postgres is notoriously slow. It should be avoided when possible.

I have a table that I add 125M records per month and it's fairly responsive as long as I use indexes carefully.

Sourc3
Posts: 110
Joined: Wed Aug 05, 2015 4:45 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by Sourc3 » Fri Apr 26, 2019 9:07 pm

Jenkins wrote:
Fri Apr 26, 2019 8:14 pm
Postgres is all I use for database, it's great.

But count() in Postgres is notoriously slow. It should be avoided when possible.

I have a table that I add 125M records per month and it's fairly responsive as long as I use indexes carefully.

Launch a Presto cluster on AWS, use that to ingest the data to S3. Shut it off when not in use. It will handle billions of records easily.

Postgres shouldn't choke on 125M. Are you partitioning your table?

Topic Author
DarkHelmetII
Posts: 426
Joined: Mon Jul 24, 2017 12:25 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Sun Dec 01, 2019 8:14 am

Sourc3 wrote:
Fri Apr 26, 2019 9:07 pm
Postgres is all I use for database, it's great.

Launch a Presto cluster on AWS, use that to ingest the data to S3. Shut it off when not in use. It will handle billions of records easily.

Postgres shouldn't choke on 125M. Are you partitioning your table?
1) How / why would I partition? Will this expedite queries? Perhaps newbie question but I am just not that familiar with the technical aspects.
2) When I run large / complex queries on my laptop my "Disk" goes to 100% (CPU and Memory don't go anywhere near this level). Will upgrading to a solid state hard drive help?
3) How easy / difficult to migrate from local PostgreSQL database to an AWS Presto cluster?

Nummerkins
Posts: 363
Joined: Tue Jun 01, 2010 4:41 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by Nummerkins » Sun Dec 01, 2019 8:48 am

DarkHelmetII wrote:
Sun Dec 01, 2019 8:14 am
Sourc3 wrote:
Fri Apr 26, 2019 9:07 pm
Postgres is all I use for database, it's great.

Launch a Presto cluster on AWS, use that to ingest the data to S3. Shut it off when not in use. It will handle billions of records easily.

Postgres shouldn't choke on 125M. Are you partitioning your table?
1) How / why would I partition? Will this expedite queries? Perhaps newbie question but I am just not that familiar with the technical aspects.
2) When I run large / complex queries on my laptop my "Disk" goes to 100% (CPU and Memory don't go anywhere near this level). Will upgrading to a solid state hard drive help?
3) How easy / difficult to migrate from local PostgreSQL database to an AWS Presto cluster?
2. Yes, a spinning disk is killing your performance. A single disk can do about a hundred input/output operations per second. A baseline SSD will easily do tens of thousands. Once you fix this you will need to add indexes and make sure you have enough RAM to hold the db in memory as it is queried.

regularguy455
Posts: 134
Joined: Tue Jul 18, 2017 2:08 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by regularguy455 » Sun Dec 01, 2019 9:39 am

What’s the acceptable level of performance? This will drive the solution you pick. If you’re willing to wait, Postgres is the cheapest solution. If not, there are many cloud/high performance technologies you could use (I.e. Spark).

HawkeyePierce
Posts: 740
Joined: Tue Mar 05, 2019 10:29 pm
Location: Colorado

Re: Free / Cost Effective Database for 100+GB Data

Post by HawkeyePierce » Sun Dec 01, 2019 11:35 am

Sounds like you just need a basic primer on RDBMS operations and optimizations. Your local library probably has one.

I would caution against any NoSQL databases. Your use case is small enough that their complexity will not be worth the trouble.

Topic Author
DarkHelmetII
Posts: 426
Joined: Mon Jul 24, 2017 12:25 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Sun Dec 01, 2019 2:40 pm

regularguy455 wrote:
Sun Dec 01, 2019 9:39 am
What’s the acceptable level of performance? This will drive the solution you pick. If you’re willing to wait, Postgres is the cheapest solution. If not, there are many cloud/high performance technologies you could use (I.e. Spark).
Acceptable level of performance is a good question. Waiting five to ten seconds is okay but not five to ten minutes. Obviously this hinges on scope and complexity of query; some which currently take seconds and some upwards if 10 minutes or more. Already upgraded RAM so SS Hard drive might be next. And / or just get new machine altogether.

regularguy455
Posts: 134
Joined: Tue Jul 18, 2017 2:08 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by regularguy455 » Mon Dec 02, 2019 11:15 am

DarkHelmetII wrote:
Sun Dec 01, 2019 2:40 pm
regularguy455 wrote:
Sun Dec 01, 2019 9:39 am
What’s the acceptable level of performance? This will drive the solution you pick. If you’re willing to wait, Postgres is the cheapest solution. If not, there are many cloud/high performance technologies you could use (I.e. Spark).
Acceptable level of performance is a good question. Waiting five to ten seconds is okay but not five to ten minutes. Obviously this hinges on scope and complexity of query; some which currently take seconds and some upwards if 10 minutes or more. Already upgraded RAM so SS Hard drive might be next. And / or just get new machine altogether.
If it were me, I’d benchmark a subset of the data on a couple potential solutions. Try it on different cloud technologies and in a Postgres database and see the performance difference. You might be surprised how inexpensive a cloud solution is.

Ultimately the cloud is easier to setup and scale up. You also have access to technical support if you get stuck. So really it’s a matter of how you value your time and how long you’re willing to wait.

Topic Author
DarkHelmetII
Posts: 426
Joined: Mon Jul 24, 2017 12:25 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Wed Dec 11, 2019 12:43 pm

nordsteve wrote:
Sun Apr 21, 2019 12:54 pm
A local database with storage in a SSD is going to be your best bet. Cloud SQL databases are probably going to be more expensive than you need for your application.

A local pgsql or MS SQL Server instance will handle this just fine.
How do I migrate to SSD? What are the tactical steps, such as:
1) Buy SSD online. Is this any good: https://www.amazon.com/Samsung-Inch-Int ... 148&sr=8-3?
2) Migrate Windows with something like https://www.partitionwizard.com while SSD is externally connected?
3) Physically install SSD?

Thank you.

tedgeorge
Posts: 68
Joined: Thu Oct 03, 2013 8:29 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by tedgeorge » Wed Dec 11, 2019 2:06 pm

The SSD probably comes with software to do the clone of your current hard drive so just use that. You'll need a USB external hard drive enclosure to put the new drive in so your computer will recognize it and do the clone.

When its done, you swap them out and you're hopefully good to go! Then you can put your old hard disk in the enclosure and use it for a spare drive.

Topic Author
DarkHelmetII
Posts: 426
Joined: Mon Jul 24, 2017 12:25 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Wed Dec 11, 2019 3:35 pm

tedgeorge wrote:
Wed Dec 11, 2019 2:06 pm
The SSD probably comes with software to do the clone of your current hard drive so just use that. You'll need a USB external hard drive enclosure to put the new drive in so your computer will recognize it and do the clone.

When its done, you swap them out and you're hopefully good to go! Then you can put your old hard disk in the enclosure and use it for a spare drive.
Is it really that easy? If so great. Looking for the "catch," but maybe I'm just behind the times. Maybe it is this easy, outside of the inherent risk of something going wrong for which I'm backed up anyways.

tedgeorge
Posts: 68
Joined: Thu Oct 03, 2013 8:29 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by tedgeorge » Wed Dec 11, 2019 4:05 pm

Yes. Much easier these days than the past and having to deal with partitions and several steps.

michaeljc70
Posts: 5815
Joined: Thu Oct 15, 2015 3:53 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by michaeljc70 » Wed Dec 11, 2019 7:14 pm

I would probably use MySQL. Most of my career in software development, we used SQL Server or Oracle which are expensive. I am not sure what the current limitations are, but for some time you could get free versions of SQL Server for use in a limited environment (developer, one PC or whatever). That is a lot of data and not going to be fast on a slow computer and/or without proper indexing.

Dottie57
Posts: 7176
Joined: Thu May 19, 2016 5:43 pm
Location: Earth Northern Hemisphere

Re: Free / Cost Effective Database for 100+GB Data

Post by Dottie57 » Wed Dec 11, 2019 7:44 pm

HawkeyePierce wrote:
Sun Dec 01, 2019 11:35 am
Sounds like you just need a basic primer on RDBMS operations and optimizations. Your local library probably has one.

I would caution against any NoSQL databases. Your use case is small enough that their complexity will not be worth the trouble.
This.

I worked on large scale databases with many 100s of terabytes of information. There were some very definite ways to make working with data faster/efficient. Most efficiency comes from db design.

Sourc3
Posts: 110
Joined: Wed Aug 05, 2015 4:45 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by Sourc3 » Wed Dec 11, 2019 8:26 pm

Are you using this for analysis only or are you going to insert new rows? If it’s just for analysis you’re not going to beat the performance of a columnar database (presto which was suggested before is one of them). You can look at options by googling.

If you will insert data into this db and update it etc postgresql will be your best bet. To be honest you don’t have that much data to need anything fancy.

You just need to partition your table, put it on an SSD, and increase the memory on your machine. Followed by putting one or two right indexes.

Your best bet is to understand the execution plan of the query (the DB first calculates what it needs to do to run your query and then runs it, you can look at this plan to get ideas about what is taking a long time) and then tune it.

Pick your OLTP or OLAP database (first question I presented above) and then just download a simple book/guide on best practices.

Good luck. 🍀

User avatar
ClevrChico
Posts: 1559
Joined: Tue Apr 03, 2012 8:24 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by ClevrChico » Wed Dec 11, 2019 10:01 pm

You could export to csv and store in AWS S3 for about $10/month. Use AWS Athena and Glue to turn the S3 data into a pretend database. Queries are pretty cheap to run. Maybe $15-$20/month all in for some smallish workloads and 100 GB of storage. I do something similar for generating reports out of large datasets. Obviously not free and it does require tech skills, but that's probably as cheap as you'll get using cloud to process/query 100 GB of data.

(I wouldn't use said solution for a back-end database powering an app, but for report generation that would melt Excel, it's great and super fast.)

ARoseByAnyOtherName
Posts: 353
Joined: Wed Apr 26, 2017 12:03 am

Re: Free / Cost Effective Database for 100+GB Data

Post by ARoseByAnyOtherName » Wed Dec 11, 2019 10:26 pm

Without understanding your usage pattern nobody will be able to give you good advice. All the suggestions in this thread are useless without knowing this

Is this a static dataset that you are only querying against? Or are you doing a lot of adding and/or deleting of rows and tables? What are you trying to do with this data?

Here’s another useless suggestion: forget about all this AWS etc business and just store all your data in a SQLite database on your local drive. As long as you aren’t trying to do multiple concurrent writes it’s as good as any.

Post Reply