Free / Cost Effective Database for 100+GB Data

Questions on how we spend our money and our time - consumer goods and services, home and vehicle, leisure and recreational activities
Post Reply
Topic Author
DarkHelmetII
Posts: 421
Joined: Mon Jul 24, 2017 12:25 pm

Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Sun Apr 21, 2019 11:24 am

I want to store 100+ GB of data in the cloud or on my computer in a database. Six tables, simple structured data; just a lot of it.

Any ideas on cost effective services? More specifics:

1) Security is not a significant issue as raw data is from the public domain
2) Must be able to connect to the database with open source packages such as R

User avatar
curiouskitty
Posts: 236
Joined: Sun Oct 18, 2009 1:40 am

Re: Free / Cost Effective Database for 100+GB Data

Post by curiouskitty » Sun Apr 21, 2019 11:34 am

I would use PostgreSQL (or MySQL) for sure. Then you can query the data in any major language like R.

If you have time columns or something else to filter by, you can add a quick index or two to make querying more efficient.

You can probably just use your computer but if its too slow or you want better persistence across devices, then you can set up a database on Amazon EC2 or pay a little more to use Amazon RDS and not need to configure it.

Perhaps you could consider simple text searching or MapReduce as other alternatives, but SQL is likely to get you the furthest with minimal infrastructure setup.

nordsteve
Posts: 729
Joined: Sun Oct 05, 2008 9:23 am

Re: Free / Cost Effective Database for 100+GB Data

Post by nordsteve » Sun Apr 21, 2019 12:54 pm

A local database with storage in a SSD is going to be your best bet. Cloud SQL databases are probably going to be more expensive than you need for your application.

A local pgsql or MS SQL Server instance will handle this just fine.

Topic Author
DarkHelmetII
Posts: 421
Joined: Mon Jul 24, 2017 12:25 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Fri Apr 26, 2019 3:49 pm

nordsteve wrote:
Sun Apr 21, 2019 12:54 pm
A local database with storage in a SSD is going to be your best bet. Cloud SQL databases are probably going to be more expensive than you need for your application.

A local pgsql or MS SQL Server instance will handle this just fine.
I've now got 33 million rows x 125 columns in a single postgresql table. Takes forever to run a query (e.g. minutes to simply return the # of records). What are my options for optimizing this thing?

Also, realizing this is the bogleheads forum ... what are the relevant "techheads" forums out there to have such conversations?

User avatar
elcadarj
Posts: 173
Joined: Tue Aug 30, 2016 8:21 am

Re: Free / Cost Effective Database for 100+GB Data

Post by elcadarj » Fri Apr 26, 2019 4:06 pm

Try StackOverflow. I’ve had good luck with advanced excel questions.
https://stackoverflow.com/questions/tagged/mysql

mrmass
Posts: 275
Joined: Thu Jul 26, 2018 6:35 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by mrmass » Fri Apr 26, 2019 4:14 pm

DarkHelmetII wrote:
Fri Apr 26, 2019 3:49 pm
nordsteve wrote:
Sun Apr 21, 2019 12:54 pm
A local database with storage in a SSD is going to be your best bet. Cloud SQL databases are probably going to be more expensive than you need for your application.

A local pgsql or MS SQL Server instance will handle this just fine.
I've now got 33 million rows x 125 columns in a single postgresql table. Takes forever to run a query (e.g. minutes to simply return the # of records). What are my options for optimizing this thing?

Also, realizing this is the bogleheads forum ... what are the relevant "techheads" forums out there to have such conversations?
Post over in https://www.reddit.com/r/PostgreSQL/
Or something database related for more help.

inbox788
Posts: 6647
Joined: Thu Mar 15, 2012 5:24 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by inbox788 » Fri Apr 26, 2019 4:20 pm

DarkHelmetII wrote:
Sun Apr 21, 2019 11:24 am
I want to store 100+ GB of data in the cloud or on my computer in a database. Six tables, simple structured data; just a lot of it.

Any ideas on cost effective services? More specifics:

1) Security is not a significant issue as raw data is from the public domain
2) Must be able to connect to the database with open source packages such as R
Well, Google Drive and One Drive are at 15 GB free, so 7+ accounts? Or you might go for one of the promotions like 100GB free for 2 years.

https://www.makeuseof.com/tag/free-clou ... gle-drive/

Here's 100GB free, but I think it's only for photos. https://degoo.com/

As far as the database, it all depends on what you're doing with all that data. If you're just storing, it can go anywhere, but if you're doing some crunching, decide how it's going to be done. You're not going to want to transfer 100GB back and forth a lot. What type of connection are you using and what are the download and upload speeds? Do a simple calculation, how long will it take you to upload the 100GB data and download it again.

michaeljc70
Posts: 5782
Joined: Thu Oct 15, 2015 3:53 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by michaeljc70 » Fri Apr 26, 2019 7:15 pm

We need more info. 33 million rows? Are there indexes on the table that contain the column you are filtering (searching) on? That is important. One client of mine had a database that inserted 12 million rows a minute (internet ads).

Jenkins
Posts: 1
Joined: Thu Nov 15, 2018 11:06 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by Jenkins » Fri Apr 26, 2019 8:14 pm

Postgres is all I use for database, it's great.

But count() in Postgres is notoriously slow. It should be avoided when possible.

I have a table that I add 125M records per month and it's fairly responsive as long as I use indexes carefully.

Sourc3
Posts: 108
Joined: Wed Aug 05, 2015 4:45 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by Sourc3 » Fri Apr 26, 2019 9:07 pm

Jenkins wrote:
Fri Apr 26, 2019 8:14 pm
Postgres is all I use for database, it's great.

But count() in Postgres is notoriously slow. It should be avoided when possible.

I have a table that I add 125M records per month and it's fairly responsive as long as I use indexes carefully.

Launch a Presto cluster on AWS, use that to ingest the data to S3. Shut it off when not in use. It will handle billions of records easily.

Postgres shouldn't choke on 125M. Are you partitioning your table?

Topic Author
DarkHelmetII
Posts: 421
Joined: Mon Jul 24, 2017 12:25 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Sun Dec 01, 2019 8:14 am

Sourc3 wrote:
Fri Apr 26, 2019 9:07 pm
Postgres is all I use for database, it's great.

Launch a Presto cluster on AWS, use that to ingest the data to S3. Shut it off when not in use. It will handle billions of records easily.

Postgres shouldn't choke on 125M. Are you partitioning your table?
1) How / why would I partition? Will this expedite queries? Perhaps newbie question but I am just not that familiar with the technical aspects.
2) When I run large / complex queries on my laptop my "Disk" goes to 100% (CPU and Memory don't go anywhere near this level). Will upgrading to a solid state hard drive help?
3) How easy / difficult to migrate from local PostgreSQL database to an AWS Presto cluster?

Nummerkins
Posts: 358
Joined: Tue Jun 01, 2010 4:41 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by Nummerkins » Sun Dec 01, 2019 8:48 am

DarkHelmetII wrote:
Sun Dec 01, 2019 8:14 am
Sourc3 wrote:
Fri Apr 26, 2019 9:07 pm
Postgres is all I use for database, it's great.

Launch a Presto cluster on AWS, use that to ingest the data to S3. Shut it off when not in use. It will handle billions of records easily.

Postgres shouldn't choke on 125M. Are you partitioning your table?
1) How / why would I partition? Will this expedite queries? Perhaps newbie question but I am just not that familiar with the technical aspects.
2) When I run large / complex queries on my laptop my "Disk" goes to 100% (CPU and Memory don't go anywhere near this level). Will upgrading to a solid state hard drive help?
3) How easy / difficult to migrate from local PostgreSQL database to an AWS Presto cluster?
2. Yes, a spinning disk is killing your performance. A single disk can do about a hundred input/output operations per second. A baseline SSD will easily do tens of thousands. Once you fix this you will need to add indexes and make sure you have enough RAM to hold the db in memory as it is queried.

regularguy455
Posts: 134
Joined: Tue Jul 18, 2017 2:08 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by regularguy455 » Sun Dec 01, 2019 9:39 am

What’s the acceptable level of performance? This will drive the solution you pick. If you’re willing to wait, Postgres is the cheapest solution. If not, there are many cloud/high performance technologies you could use (I.e. Spark).

HawkeyePierce
Posts: 729
Joined: Tue Mar 05, 2019 10:29 pm
Location: Colorado

Re: Free / Cost Effective Database for 100+GB Data

Post by HawkeyePierce » Sun Dec 01, 2019 11:35 am

Sounds like you just need a basic primer on RDBMS operations and optimizations. Your local library probably has one.

I would caution against any NoSQL databases. Your use case is small enough that their complexity will not be worth the trouble.

Topic Author
DarkHelmetII
Posts: 421
Joined: Mon Jul 24, 2017 12:25 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by DarkHelmetII » Sun Dec 01, 2019 2:40 pm

regularguy455 wrote:
Sun Dec 01, 2019 9:39 am
What’s the acceptable level of performance? This will drive the solution you pick. If you’re willing to wait, Postgres is the cheapest solution. If not, there are many cloud/high performance technologies you could use (I.e. Spark).
Acceptable level of performance is a good question. Waiting five to ten seconds is okay but not five to ten minutes. Obviously this hinges on scope and complexity of query; some which currently take seconds and some upwards if 10 minutes or more. Already upgraded RAM so SS Hard drive might be next. And / or just get new machine altogether.

regularguy455
Posts: 134
Joined: Tue Jul 18, 2017 2:08 pm

Re: Free / Cost Effective Database for 100+GB Data

Post by regularguy455 » Mon Dec 02, 2019 11:15 am

DarkHelmetII wrote:
Sun Dec 01, 2019 2:40 pm
regularguy455 wrote:
Sun Dec 01, 2019 9:39 am
What’s the acceptable level of performance? This will drive the solution you pick. If you’re willing to wait, Postgres is the cheapest solution. If not, there are many cloud/high performance technologies you could use (I.e. Spark).
Acceptable level of performance is a good question. Waiting five to ten seconds is okay but not five to ten minutes. Obviously this hinges on scope and complexity of query; some which currently take seconds and some upwards if 10 minutes or more. Already upgraded RAM so SS Hard drive might be next. And / or just get new machine altogether.
If it were me, I’d benchmark a subset of the data on a couple potential solutions. Try it on different cloud technologies and in a Postgres database and see the performance difference. You might be surprised how inexpensive a cloud solution is.

Ultimately the cloud is easier to setup and scale up. You also have access to technical support if you get stuck. So really it’s a matter of how you value your time and how long you’re willing to wait.

Post Reply